Rational Rules: Towards a Theory of Moral Learning 2020941874, 9780198869153

Moral systems, like normative systems more broadly, involve complex mental representations. Rational Rules proposes that

343 69 3MB

English Pages [264] Year 2021

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Rational Rules: Towards a Theory of Moral Learning
 2020941874, 9780198869153

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Rational Rules

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Rational Rules Towards a Theory of Moral Learning

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

SHAUN NICHOLS

1

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

3

Great Clarendon Street, Oxford, OX2 6DP, United Kingdom Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries © Shaun Nichols 2021 The moral rights of the author have been asserted First Edition published in 2021 Impression: 1 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by licence or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

You must not circulate this work in any other form and you must impose this same condition on any acquirer Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016, United States of America British Library Cataloguing in Publication Data Data available Library of Congress Control Number: 2020941874 ISBN 978–0–19–886915–3 DOI: 10.1093/oso/9780198869153.001.0001 Printed and bound by CPI Group (UK) Ltd, Croydon, CR0 4YY Links to third party websites are provided by Oxford in good faith and for information only. Oxford disclaims any responsibility for the materials contained in any third party website referenced in this work.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

For Sarah and Julia

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Contents Preface Acknowledgments List of Figures

ix xiii xv

I. RATIONALITY AND RULES 1. Rationality and Morality: Setting the Stage 2. The Wrong and the Bad: On the Nature of Moral Representations

3 25

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

II. STATISTICAL LEARNING OF NORM SYSTEMS 3. Scope

49

4. Priors

82

5. Closure

95

6. Status

109 III. PHILOSOPHICAL IMPLICATIONS

7. Moral Empiricism

139

8. Rational Rules and Normative Propriety

164

9. Rationalism, Universalism, and Relativism

192

10. Is It Rational to Be Moral?

211

References Index

227 245

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Preface My first book in moral psychology, Sentimental Rules, emphasized the role of emotions in moral judgment. But I never thought emotions exhausted moral judgment. There are numerous features of moral judgment that are hard to explain just by appealing to emotions. Why do we tend to think that it’s wrong to produce a bad consequence, but not wrong (or not as wrong) to tolerate such a consequence happening? How do we come to think that some evaluative claims are universally true but others only relatively true? What kinds of rules can be learned? How do we determine whether some novel act is permitted or prohibited? These are questions that arise for moral psychologists and experimental philosophers. Most work in these areas aims to uncover the processes and representations that guide judgments. This is the agenda in discussions about whether people are incompatibilists about free will (e.g., Murray & Nahmias 2014), whether moral judgment is driven by distorting emotions (e.g., Greene 2008), and whether judgments about knowledge are sensitive to irrelevant details (e.g., Swain et al. 2008). Much less attention has been paid to historical questions about how we ended up with the representations implicated in philosophically relevant thought. There are different kinds of answers to these historical questions. One might offer distal answers that appeal to the more remote history of the concept. For instance, an evolutionary psychologist might argue that some of our concepts are there because they are adaptations. Or a cultural theorist might argue that some of our concepts are there because they played an important role in facilitating social cohesion. On the more proximal end of things, we can attempt to determine how the concepts might have been acquired by a learner. Those proximal issues regarding acquisition will be the focus in this book.¹ I will argue that we can explain many of the features of moral systems in terms of

¹ Of course proximal and distal issues are not unrelated. For an evolutionary psychologist, the proposal that a concept is an adaptation will typically be accompanied by the expectation that the characteristic (proximal) development of the concept is not explicable in terms of domain-general learning mechanisms (see, e.g., Tooby & Cosmides 1992).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

x



rational learning from the evidence. To locate this in contemporary moral psychology, a bit of background is in order. In naturalistic moral psychology, sentimentalism is the dominant view (e.g., Blair 1995; Greene 2008; Haidt 2001; Nichols 2004c; Prinz 2007), and there is considerable evidence that emotions have numerous influences on our moral psychology. Emotions seem to impact our judgments about moral dilemmas (e.g., Bartels & Pizarro 2011; Koenigs et al. 2007). Emotions seem to influence the resilience of certain moral rules (Nichols 2004c). Emotions seem to motivate prosocial behavior (Batson 1991). Emotions seem to motivate punishment for cheaters (Fehr & Gächter 2002). Sentimentalists have drawn on these results to argue for philosophical conclusions. To take what is perhaps the most prominent example, the impact of emotion on certain kinds of moral judgments has been used to challenge the rationality of those judgments (e.g., Greene 2008; Singer 2005). I have counted myself among the sentimentalists, but I’ve also argued that emotional reactions don’t provide a complete explanation of moral judgment. In particular, I’ve argued that rules play an essential role in our moral psychology (Nichols 2004c). However, I had no account of how we come to learn these rules. Many moral rules seem to trade on subtle distinctions. For instance, from a young age, children treat harmful actions as worse than equally harmful omissions. Children also judge that it’s wrong to harm one person to save five others from harm. Children are never explicitly taught the distinctions to which these judgments conform. The prevailing explanation for how we come to have such subtle distinctions is nativist. Contemporary moral nativists hold that the best explanation for the uniformity and complexity of moral systems is that moral judgments derive from an innate moral acquisition device (e.g., Harman 1999; Mikhail 2011). Such views hold that the moral systems we have are partly constrained by human nature. Just as linguistic nativism proposes constraints on possible human languages, moral nativism implies that there are constraints on possible human moralities (Dwyer at al. 2010). Although nativist accounts have been widely criticized (e.g., Nichols 2005; Prinz 2008; Sterelny 2010), there has been no systematic alternative explanation for how children acquire such apparently complex moral systems. My collaborators and I have been developing such an alternative explanation for the acquisition of moral systems. The inspiration comes from an unlikely source—statistical learning. Recent cognitive science has seen the ascendance of accounts which draw on statistical learning to explain how we end up with the representations we have (Perfors et al. 2011; Xu et al. 2012).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



xi

I’ve come to think that statistical learning provides a promising avenue for answering central questions about how we come to have the moral representations we do. I will argue that a rational learning approach can explain several aspects of moral systems, including (i) how people learn to draw the act/allow distinction given limited evidence, (ii) how people come to have a bias in favor of act-based rules, and (iii) how people use consensus information as evidence on whether a moral claim is universally true. The picture that emerges reveals a starkly different side of moral systems than traditional sentimentalism. The learning processes invoked are, by standard accounts, rational. This insulates moral judgment from important charges of irrationality. For instance, if our deontic judgments depend on rules, and these rules are acquired via rational inference, then we cannot fault the process by which the judgment is made. This doesn’t insulate the judgments from every critique. For instance, the rules themselves might be defective. But that challenge requires a deeper inquiry into the epistemic credentials of the rules. The resulting account also contrasts sharply with nativism. The learning processes that I will draw on are not specific to the moral domain. Indeed, statistical learning affords the moral psychologist a diverse empiricist toolkit. Moreover, the rational learning account suggests that humans are flexible moral learners, with no innate constraints on the kinds of rules that humans can learn. The view that I defend is obviously rationalist in important ways. But that doesn’t entail a rejection of the significance of emotions for moral judgment. Indeed, I continue to think that much of the sentimentalist picture is correct. Emotions play a critical role in amplifying the rules of morality. This plausibly holds for online decision-making—rules that resonate with strong emotions will end up having a greater influence in our decision-making. The emotional amplification of rules also likely explains the cultural resilience of certain moral rules. To ignore these influences of emotions is to ignore fundamental aspects of human morality. A persistent commitment of sentimentalists down the ages is that without the emotions, we would have radically different normative systems than we do. I certainly don’t mean to retreat from that sentimentalist commitment. However, the fact that emotions are critical to our moral systems doesn’t mean that the role of rationality is negligible. On the contrary, I’ll argue, rational learning provides a much better explanation than emotions for how we acquire normative systems in all their complexity. The ultimate view, I think, must be some

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

xii



Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

form of rational sentimentalism, where rational learning and emotions both contribute in key ways to our moral judgments. But in this volume, I want to emphasize the rational side of things. Although moral judgment and decision might be distorted in many ways, there’s reason to be optimistic that the fundamental capacity for acquiring moral rules is rational and flexible. The way we learn rules is plausibly responsive to the evidence in appropriate ways, and, at least at some developmental stages, supple enough to adjust to new rules in the face of new evidence.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Acknowledgments First, I’d like to thank my collaborators on the empirical studies that form the center of this book: Alisabeth Ayars, Hoi-Yee Chan, Jerry Gaus, Shikhar Kumar, Theresa Lopez, and Tyler Millhouse. I owe special debts to Theresa, Alisabeth, and Jerry. Theresa’s dissertation (Lopez 2013) is the first work that maintained that Bayesian approaches to cognition might provide an alternative to Chomskyan accounts of moral cognition. If it hadn’t been for Theresa’s insightful dissertation, I never would have started a project on moral learning. Alisabeth worked extensively on the project when she was a graduate student in psychology at Arizona. She had several key experimental ideas; she was also incisive on the theoretical issues (as evidenced in Ayars 2016). This project would have been much worse without her contributions. Finally, Jerry was an ideal collaborator on the empirical work that we did together. More generally, Jerry has been an intellectually invigorating colleague and friend. It was my good fortune to be in the same department with him. Many friends and colleagues have influenced my thinking on these matters through conversations, discussions in Q&A, and comments on some of the chapters. In particular, I’d like to thank Mark Alfano, Ritwik Agrawal, Cristina Bicchieri, Thomas Blanchard, Selmer Bringsjord, Mike Bruno, Stew Cohen, Juan Comesaña, Fiery Cushman, Justin D’Arms, Colin Dawson, Caleb Dewey, John Doris, LouAnn Gerken, Josh Greene, Steven Gross, Heidi Harley, Toby Handfield, Dan Jacobson, Jeanette Kennett, Max Kleiman-Weiner, Josh Knobe, Max Kramer, Tamar Kushnir, Sydney Levine, Jonathan Livengood, Don Loeb, Tania Lombrozo, Edouard Machery, Bertram Malle, Ron Mallon, Eric Mandelbaum, John Mikhail, Adam Morris, Ryan Muldoon, Scott Partington, Ángel Pinillos, Dave Pizarro, Jesse Prinz, Hannes Rakoczy, Peter Railton, Sarah Raskoff, Chris Robertson, Connie Rosati, David Rose, Adina Roskies, Richard Samuels, Hagop Sarkissian, Sukhvinder Shahi, Dave Shoemaker, David Sobel, Tamler Sommers, Kim Sterelny, Justin Sytsma, Josh Tenenbaum, John Thrasher, Hannah Tierney, Mark Timmons, Bas Van Der Vossen, Steve Wall, Jen Wright, Jonathan Weinberg, David Wong, and Fei Xu. All of the empirical studies for this project were funded in part by the U.S. Office of Naval Research under award number 11492159. I’m grateful

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

xiv



to Paul Bello, who was the ONR program officer, for supporting the work, as well as for numerous helpful discussions about it. Chapter 6 draws substantially on material from Ayars & Nichols (2020), Rational learners and metaethics, Mind & Language, 35(1), 67–89. I thank the journal for permission to reprint that material here. I spent academic year 2017–18 on fellowship at the Center for Human Values at Princeton. I’m grateful to the Center and to the University of Arizona for affording me the opportunity to focus on writing the book. In addition to freeing up time to write, I got excellent feedback from many people at the Center, including Stephanie Beardman, Mitch Berman, Liz Harman, Dylan Murray, Drew Schroeder, Amy Sepinwall, Peter Singer, Michael Smith, Monique Wonderly, and especially Mark van Roojen. Mark read and commented on much of the book while I was there, and he’s been a tireless and wonderful correspondent about these issues ever since. Walter Sinnott-Armstrong arranged to have his research group, Madlab, read the first draft of the manuscript. This was incredibly helpful. I’m grateful to all the lab members for taking the time to read and think about the manuscript. I’d like to single out several people in the group whose comments led to changes in manuscript: Aaron Ancell, Jana Schaich Borg, Clara Colombatto, Paul Henne, J. J. Moncus, Sam Murray, Thomas Nadelhoffer, Gus Skorburg, Rita Svetlova, and Konstantinos Tziafetas. Mike Tomasello also participated, which was a delight. And of course I’m especially indebted to Walter, both for organizing the event and for being characteristically constructive and indefatigable. Dan Kelly also read and gave terrific comments on the entire manuscript at a later stage. His careful attention led to numerous improvements in book. I had the benefit of three excellent referees for OUP, one of whom was Hanno Sauer (the other two remained anonymous). Thanks to all of them, and to Peter Momtchiloff for his characteristically excellent stewardship at OUP. Victor Kumar first encouraged me to write this book. The book turned out to be a lot more work than I expected, but I still thank Vic for prompting me to write it, and for excellent comments along the way. Michael Gill and I have been discussing issues at the intersection of moral philosophy and cognitive science for twenty years, and his influence and encouragement has been central to this work. Finally, I’m lucky to have been able to talk with Rachana Kamtekar about every sticky philosophical problem in the book, and everything else besides.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

List of Figures 1.1 Stimuli for probabilistic inference task

18

3.1 Potential scopes of rules

53

3.2 Hierarchical representation of hypothesis space, based on similarity judgments

54

3.3 Representing the sizes of hypotheses

57

3.4 Extension of words represented as subset structure

59

3.5 Sample violations of novel rules

62

3.6 Hypothesis space for Principle of Double Effect

67

3.7 Results on intentional/foreseen study

69

3.8 Alternative hypothesis space for Principle of Double Effect

70

3.9 Set of potential patients for a rule

76

3.10 Schematic depiction of display for parochial norms study

78

3.11 Schematic depiction of display for parochial norms study, 20 percent condition

79

4.1 Complete list of violations for overhypothesis study

90

5.1 The rectangle game

99

6.1 Universalist and relativist models for different patterns of consensus

116

6.2 Correlation between perceived consensus and judgments of universalism

119

6.3 Results on universalism/relativism for abstract cases, by domain

124

6.4 Different relativist models for split consensus

131

7.1 Empiricist model of learning

144

7.2 Hypothesis space for scope of rules

160

9.1 The Stag Hunt

200

9.2 Choosing sides

200

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

PART I

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

RATIONALITY AND RULES

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

1 Rationality and Morality

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Setting the Stage

“Moral distinctions are not derived from reason.” Thus does Hume begin his discussion of morality in the Treatise. Rather, Hume says, moral distinctions come from the sentiments. Contemporary work in moral psychology has largely followed Hume in promoting emotions rather than reason as the basis for moral judgment (e.g., Blair 1995; Greene 2008; Haidt 2001; Nichols 2004c; Prinz 2007; cf. May 2018; Sauer 2017). While I think moral judgment is tied to emotions in multiple ways, in this book I want to explore the rational side of moral judgment. I’ll argue that rational processes play a critical and underappreciated role in how we come to make the moral judgments we do. In this chapter, I’ll describe the basic phenomena that I want to illuminate with a rational learning account, and I will explicate the primary notion of rationality that will be in play.

1. The Phenomena Don’t lie. Don’t steal. Keep your promises. These injunctions are familiar and central features of human moral life. They form part of the core phenomena to be explained by an adequate psychological account of moral judgment. Why do we make the judgment that it’s wrong to lie or steal? In addition to these specific judgments, an adequate moral psychology must also explain important distinctions that seem to be registered in lay moral judgment. For example, people tend to think that producing a bad consequence is worse than allowing the consequence to occur. Much of the work attempting to tease out an implicit understanding of these distinctions is done using trolley cases (Foot 1967; Greene et al. 2001; Harman 1999; Mikhail 2011; Thomson 1976, 1985). For instance, people tend to say that in the following case, what the agent does is not permissible.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

4   Footbridge: Is it permissible for Frank to push a man off a footbridge and in front of a moving boxcar in order to cause the man to fall and be hit by the boxcar, thereby slowing it and saving five people ahead on the tracks? By contrast, people tend to say that what the agent does (or rather fails to do) is permissible: Footbridge-Allow: Is it permissible for Jeff not to pull a lever that would prevent a man from dropping off a footbridge and in front of a moving boxcar in order to allow the man to fall and be hit by the boxcar, thereby slowing it and saving five people ahead on the tracks? People also tend to say that what the agent does in the following case is permissible:

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Bystander: Is it permissible for Dennis to pull a lever that redirects a moving boxcar onto a side track in order to save five people ahead on the main track if, as a side-effect, pulling the lever drops a man off a footbridge and in front of the boxcar on the side track, where he will be hit? (Cushman et al. 2006: 1083–4) These cases have been taken to suggest that people are sensitive to surprisingly subtle distinctions in their normative evaluations. If people really are sensitive to these distinctions in their moral judgments, these are relatively high-level psychological phenomena. At an even higher level, we find that people seem to have systematic judgments about the nature of morality itself. For instance, people tend to think that moral claims have a different status than conventional claims. This has been explored extensively with questions like the following: Authority dependence: If the teacher didn’t have a rule against hitting, would it be okay to hit other students? For actions like hitting, people, including pre-school children, tend to say that it’s wrong to hit even if the teacher doesn’t have a rule. But for actions like talking during story-time, people are more likely to say that if the teacher doesn’t have a rule on the matter, it’s okay to talk during story time (e.g., Nucci 2001; Turiel 1983). More recently, people have explored the

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

  :   

5

extent to which people think moral claims are universally true, using questions like the following: Disagreement: If John and Mark make different judgments about whether it’s okay to rob a bank, does one of them have to be wrong? For actions like bank robbery and assault, people tend to say that if two people make different judgments, one of them has to be wrong, but they do not tend to say this when it comes to aesthetic claims or matters of taste (Goodwin & Darley 2008; Nichols 2004a; Wright et al. 2013). These are the phenomena that I want to investigate. Note that much of our moral lives is not included here. I won’t try to explain our aversion to suffering in others, our propensity to guilt and shame, or our use of empathy and perspective taking in moral assessment. Nor will I try to characterize the ethical abilities enjoyed by non-human animals. The moral capacities that I’m targeting are, as far as we can tell, uniquely human. How do we arrive at these sophisticated judgments, distinctions, and meta-evaluations?

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

2. A Challenge to Common-Sense Morality Before setting out my positive story, I want to address briefly the prevailing skepticism about moral judgment. Moral psychologists often cast lay morality as critically flawed. There is evidence that moral judgment is compromised by incidental emotions, misleading heuristics, and confabulation. Philosophers have used such evidence to develop debunking arguments according to which key areas of common-sense ethical judgment are epistemically rotten—they are based on epistemically defective processes (see Sauer 2017 for discussion of debunking arguments). Debunking arguments have been developed for both common-sense normative ethics and common-sense metaethics.

2.1 Debunking Normative Ethics Perhaps the most familiar debunking accounts draw on dual process theories, according to which there are two broad classes of psychological processes. System 1 processes tend to be fast, effortless, domain specific, inflexible, insensitive to new information, and generally ill-suited to effective long-term cost–benefit

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

6  

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

reasoning. System 2 processes are flexible, domain general, sensitive to new information, and better suited to long-term cost–benefit analysis, but they are also slow and effortful. On Greene’s dual process account of moral judgment, when we are presented with the option of pushing one innocent person off of a Footbridge to save five other innocent people, there is competition between a System 1 emotional process (screaming “don’t!”) and a System 2 process that calculates the best consequence (saying “5 > 1, dummy”). The proposal is that cases like Footbridge trigger System 1 emotions that subvert System 2 utilitarian cost–benefit analysis. A closely related dual process model comes from Jonathan Haidt (2001). On Haidt’s social intuitionist account, our moral reactions tend to be driven by System 1 affectively valenced intuitions. System 2 plays a subsidiary role—it primarily generates post hoc justifications for our affective intuitions (2001: 815). One of the key studies that motivates Haidt’s view suggests that people will hold on to their moral views even when they are unable to provide a justification for them. For instance, participants were presented with a vignette in which siblings Julie and Mark have a consensual and satisfying sexual encounter, using multiple forms of birth control: Julie and Mark: Julie and Mark are brother and sister. They are traveling together in France on summer vacation from college. One night they are staying alone in a cabin near the beach. They decide that it would be interesting and fun if they tried making love. At the very least it would be a new experience for each of them. Julie was already taking birth control pills, but Mark uses a condom too, just to be safe. They both enjoy making love, but they decide not to do it again. They keep that night as a special secret, which makes them feel even closer to each other. What do you think about that? Was it OK for them to make love? When presented with this vignette, most participants said that it was not okay for Julie and Mark to make love. When asked to defend their answers, participants often appealed to the risks of the encounter, but the experimenter effectively rebutted the justifications (e.g., by noting the use of contraceptives). Nonetheless, the participants continued to think that the act was wrong, even when they couldn’t provide any undefeated justifications. A typical response was: “I don’t know, I can’t explain it, I just know it’s wrong” (Haidt 2001: 814). Haidt interprets this pattern as a manifestation of

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

  :   

7

two processes: the moral condemnation is driven by an affective intuition (rather than reasoning) and the proffered justification comes from post hoc rationalizing—confabulation. As Greene and Haidt characterize System 1 processes, the judgments that issue from those processes are unlikely to be responsive to evidence. Greene argues that if System 1 is indeed what leads people to judge that it’s wrong to push in cases like Footbridge, this provides the foundation for an argument that challenges the rational propriety of non-utilitarian judgment. Greene suggests that deontological judgments, like “it’s wrong to push the guy in front of the train,” are defective because they are insensitive to rational considerations, in sharp contrast with consequentialist evaluations:

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

[T]he consequentialist weighing of harms and benefits is a weighing process and not an ‘alarm’ process. The sorts of emotions hypothesized to be involved here say, ‘Such-and-such matters this much. Factor it in.’ In contrast, the emotions hypothesized to drive deontological judgment are . . . alarm signals that issue simple commands: ‘Don’t do it!’ or ‘Must do it!’ While such commands can be overridden, they are designed to dominate the decision rather than merely influence it. (Greene 2008: 64–5)

Greene maintains that since our deontological judgments derive from emotional reactions that are not responsive to rational considerations, we should ignore them in normative theorizing (2008; see also Singer 2005: 347).¹ Although there is a diverse array of evidence supporting the view that emotions play a role in judgments about Footbridge (e.g., Amit & Greene 2012; Bartels & Pizarro 2011; Koenigs et al. 2007), emotions cannot provide a complete explanation for the basic phenomenon of non-utilitarian moral judgment. Many dilemmas that people rate as generating very little emotional arousal—e.g., those involving lying, stealing, and cheating—elicit non-utilitarian responses (see, e.g., Dean 2010). Consider, for instance, cases of promise breaking. People don’t get emotionally worked up by vignettes that involve promise breaking, but they still make non-utilitarian judgments about promise breaking. For instance, in one study, participants were asked whether it was okay for one person to break a promise in order to prevent two other people from breaking promises; in this case people maintained that it was wrong for the first person to break a promise even

¹ For direct responses to this argument, see Berker (2009) and Timmons (2008).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

8   though it would minimize promise breaking overall (Lopez et al. 2009: 310). So emotion doesn’t seem to be required to make non-utilitarian judgments. Indeed, the asymmetry between Footbridge and Bystander is found even when the potential human victims are replaced by teacups (Nichols & Mallon 2006).² The fact that people make non-utilitarian judgments in the absence of significant affect indicates that there must be some further explanation for these responses. This undercuts debunking arguments that depend on the view that non-utilitarian judgments are primarily produced by arational emotional reactions. The fact that we find non-utilitarian judgments without concomitant affect also exposes the need for a different explanation for the pattern of non-utilitarian judgment that people exhibit.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

2.2 Debunking Metaethics As noted above, people tend to think that at least some moral claims are universally true, and they treat aesthetic claims as only true relative to an individual or group (Goodwin & Darley 2008, 2012). Why is this? Why do people believe of some moral claims that they are universally true? Philosophers have offered several explanations for the belief in universalism, and the best-known proposals serve as debunking explanations. In his influential treatment, Mackie (1977) proposes a number of non-rational explanations for the belief in universalism. One idea is that motivational factors, like the desire to punish or compete, play a distorting role in generating metaethical judgments (see, e.g., Mackie 1977: 43; see also Fisher et al. 2017; Rose & Nichols 2019). Another of Mackie’s suggestions is that the belief in universalism derives from the tendency to project our moral attitudes onto the world. Relatedly, our emotional reactions toward ethical violations may persuade us that moral wrongs are universally wrong. The most direct attack on the propriety of metaethical judgments comes from a study by Daryl Cameron and colleagues (2013). They presented subjects with brief descriptions of practices in other cultures (e.g., “Marriages are arranged by the children’s parents”). In some cases, these descriptions were presented on a background displaying a disgusting picture (unrelated to the content of the description); in other cases, the background ² In addition, recent work indicates that Bystander is just as emotionally arousing as Footbridge (Horne & Powell 2016).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

  :   

9

was emotionally neutral. Cameron and colleagues found that when the description was accompanied by a disgusting picture, participants were more likely to give universalist responses.³ Such an influence is plausibly epistemically defective. Cameron and colleagues make this clear by drawing on the distinction between incidental and integral effects of emotions:

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Integral emotions may contain information that should appropriately influence moral judgments: guilt may signal that you have behaved badly towards others, and anger may signal that others have behaved badly towards you (Frank 1988). In contrast, incidental emotions are conceptually unrelated to subsequent judgments, and so are ethically irrelevant (Doris & Stich 2005). Whereas incidental emotions may influence moral judgments, they are not appropriately cited as evidence in the justification of these judgments (719).

If you are more universalist about arranged marriages because you are seeing a revolting picture of worms, then you’re being swayed by an epistemically defective process. I focus on the study by Cameron and colleagues because it has a clean experimental design, and it provides some of the most direct evidence for the role of an epistemically defective affective process in judgments of universalism. However, there is a pressing limitation of the study. Although the results indicate that there is some influence of epistemically defective processes, the extent of influence is, for debunking purposes, trivial. The mean difference in universalist judgments produced by inducing disgust was only 0.1 on a 5-point scale.⁴ Thus, the strongest debunking conclusion this study can fund is: “To some slight extent, people are not justified in their belief that a claim is universally true.” Clearly, we cannot take these results to show that people’s belief in moral universalism is largely based on a defective process. The results simply don’t explain much of why people think moral claims are universally true. As a result, they don’t do much by way of debunking the belief.

³ Cameron and colleagues used a slightly different universalism measure than the standard disagreement measure (Section 1). They asked participants to evaluate whether an activity practiced in other cultures (e.g., “Marriages are arranged by the children’s parents”) is wrong regardless of the culture in which it is practiced. ⁴ More generally, it turns out that the impact of occurrent emotion on moral judgment is quite weak (e.g., Landy & Goodwin 2015; May 2014, 2018).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

10

 

I’ve argued that some of the most prominent debunking arguments are inadequate. Obviously this is a limited selection of the debunking arguments that have been made. There is a broader lesson here, though. The most prominent kinds of arguments that purport to debunk lay ethical judgments appeal to the distorting effects of occurrent emotions. But many of the ethical judgments that we want to understand do not seem to be explained by occurrent emotional processes (see also Landy & Goodwin 2015; May 2014, 2018). So I think there is good reason to be skeptical of the attacks on lay moral judgment. However, skepticism about these accounts hardly constitutes a positive defense. Even if the extant debunking arguments fail, that doesn’t mean lay moral judgment is in good repair. The main work of this book is to promote a detailed positive defense of the rationality of lay moral judgment.

3. Rationality

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

3.1 The Many Rationalisms “Rationalism” is used in several strikingly different ways in philosophy. For much of this book, the notion of rationality in play will be an evidentialist one on which a person’s belief is rational or justified just in case it is supported by their evidence. I will argue that several key distinctions in common-sense morality are acquired through a process of rational inference based on the evidence that the learner receives. I will set out the evidentialist notion of rationality in a bit more detail below, but first, I want to chart several different notions of rationalism which contrast with evidentialism in important ways. In metaethics, “rationalism” is often used to refer to a view about the relation between moral requirements and reasons for action. This actionfocused version of rationalism (sometimes called “moral/reasons existence internalism”) holds that it is a necessary truth that if it is morally right for a person to Φ then there is a reason for that person to Φ (Smith 1994; van Roojen 2015).⁵ This view of rationality, unlike a pure evidentialist view, ties ⁵ Michael Smith distinguishes two versions of this rationalist thesis. The conceptual rationalist thesis holds that “our concept of a moral requirement is the concept of a reason for action; a requirement of rationality or reason.” The substantive rationalist thesis holds that this conceptual claim bears out in the world. That is, “there are requirements of rationality or reason corresponding to the various moral requirements” (1994: 64–5).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

  :   

11

rationality directly to action. Evidentialist approaches to rationality can, of course, have rationality inform action, since one’s actions will and should be guided by one’s rationally acquired beliefs. But evidential rationality only applies to beliefs, not directly to desires or actions. In cognitive science, “rationalism” is often used to pick out nativist views in contrast to empiricist views (e.g., Chomsky 2009). These views are emphatically not rationalist in the evidentialist sense (see, e.g., Mikhail 2011: 32). A leading idea of Chomskyan rationalism is precisely that there are capacities the acquisition of which cannot be explained in terms of inference over the evidence. For instance, Chomskyans maintain that children’s acquisition of grammar cannot be explained in terms of children drawing apt inferences from the available linguistic data. As we will see in Chapter 7, moral Chomskyans hold that there is a moral grammar the acquisition of which can’t be explained in terms of evidential reasoning. This will be at odds with much of the story that I develop. A third notion of rationalism, which prevailed in moral philosophy in the Early Modern period, is an a priori notion of rationalism. Mathematics served as the leading example here. The early moral rationalists maintained that mathematical truths are a priori, and that many of these truths are selfevident and accessible to us without relying on any kind of experiential evidence. Similarly, the early moral rationalists (e.g., Clarke, Locke, Balguy) held there are a priori moral principles that are self-evident, and these selfevident principles provide the basis for deductive inferences to further moral claims (see Gill 2007 for discussion). In Chapter 9, I will take up the relation between this traditional moral rationalism and the more modest moral rationalism that I’ll promote. The evidentialist notion of rationality contrasts with all of the above, but it is the dominant framework in analytic epistemology. According to the kind of evidentialism that I’ll be using, S’s belief that P is rational or justified to the extent that S’s belief that P is responsive to her total relevant evidence. As I want to use the notion, a person’s belief can be responsive to the evidence even if she lacks conscious access to the reasoning process. Of course, when subjects do report their reasoning process, if the process they report is a process that is responsive to the evidence, this gives us good reason to think that they are in fact making judgments in ways that are—to some extent—evidentially rational. Still, such conscious access is not

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

12

 

necessary for responding to the evidence.⁶ Algorithms might be responsive to the evidence even when unconscious. In addition to its prominence in analytic epistemology, evidentialism also coheres reasonably well with the notion of rationality that anchors discussions in naturalized epistemology.⁷ Perhaps the best-known statement of rationality in that literature comes from Ed Stein:

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

[T]o be rational is to reason in accordance with principles of reasoning that are based on rules of logic, probability theory and so forth. If the standard picture of reasoning is right, principles of reasoning that are based on such rules are normative principles of reasoning, namely they are the principles we ought to reason in accordance with. (Stein 1996: 4)

On this familiar description, one must reason “in accordance” with the principles of probability theory to be rational. What exactly is it to reason “in accordance” with the rules of logic and probability theory? Much of the literature on reasoning is conspicuously vague about this (for discussion, see Nichols & Samuels 2017). This much is obvious, though: reasoning is a process. To clarify the nature of processes, and the nature of reasoning processes in particular, we can draw on David Marr’s influential account of levels of analysis in cognitive science.

3.2 Processes Marr explains levels of analysis in terms of their guiding questions. The first level is the computational level, and its basic questions are: “What is the goal of the computation, why is it appropriate, and what is the logic of the strategy by which it can be carried out?” The second level, representation and algorithm, asks: “What is the representation for the input and output, and what is the algorithm for the transformation?” The third level, that of hardware implementation, won’t occupy us here, but it asks, “How can the representation and algorithm be realized physically?” (Marr 1982: 25). ⁶ Insofar as evidentialism requires that beliefs are responsive to evidence, the view is typically taken to be at odds with simple forms of reliabilism (for discussion, see Goldman & Beddor 2016). I won’t try to engage this issue in the book, but reliabilists might maintain that the inferences that I promote are justified insofar as they are based on reliable processes. ⁷ Research on heuristics and biases has been used to challenge the idea that people are rational in this evidentialist way (see, e.g., Stich 1990).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

  :   

13

Marr illustrates different levels of analysis with the example of a lowly cash register. At the computational level, the process we find in a cash register is addition, which is a function that takes two numbers as input and yields one number, in ways specified by an arithmetic theory (22). Addition is what the cash register does. The computational level also concerns the question why addition is used for this device, rather than, say, division or multiplication? Here, Marr says, we can draw on our intuitions about what is appropriate for the context. In the context of exchanging groceries for money, it seems most appropriate to sum the costs of the items purchased (22–3). Intuitions are just one resource for gleaning the why of psychological processes. Evolutionary psychologists emphasize adaptationist considerations to explain why a certain process is appropriate for a context (Cosmides & Tooby 1995: 1202).⁸ Marr’s second level of analysis involves the actual representations and algorithms of the process (Marr 1982: 23). The computation of addition can be carried out in different ways. One dimension of flexibility is the representational system itself. One might use different kinds of symbols to represent numbers, e.g., binary, Arabic, or hash marks. The other dimension of flexibility is the algorithms that are deployed. Importantly, the kinds of algorithms that are appropriate will be constrained by the kinds of representations in play. With hash marks, one can use concatenation for addition (|||| + ||| = |||||||), but obviously this would be a disaster for Arabic numerals (4 + 3 = 43?). Even if the representations are fixed as, say, Arabic numerals, different algorithms can be used to carry out addition. One common algorithm for addition mirrors the “carrying” algorithm people learn in grade school—add the least significant digits, carry if necessary, move left, and repeat. Another algorithm for addition uses “partial sums,” separately adding the 1 place-value column, the 10 placevalue column, the 100 place-value column, and then summing these partial sums. These are different processes at the algorithmic level but not at the computational level.

⁸ Although Marr doesn’t mention it, sometimes we might identify what the process is, even if the purpose is unclear or somehow inapt. Imagine that we observe the cash register receive two inputs—$2 and $3—and it generates $6 as output; then, when another $2 item is entered, the output is $12. Eventually it becomes clear that what the machine is doing is multiplication. Why it’s doing multiplication is not because that’s the right process in this context (based on our intuitions or evolution). Perhaps the machine was hacked or perhaps there’s a short circuit that remapped + to *.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

14

 

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

3.3 Rational Processes We can now recruit Marr’s distinctions to impose greater specificity in characterizing rational processes.⁹ On one construal, for a system to be “in accordance” with the rules of logic and probability theory is for the process to execute algorithms that encode the rules of logic and probability theory. So, for instance, if modus tollens is a proper rule of logic, and a given process deploys a modus tollens algorithm to transition from ((P ! Q) and ~Q) to ~P, then that process is rational on the algorithmic construal of accordance. In that case, we can say the transition from input to output is algorithmically rational.¹⁰ Another way to understand “accordance” with the rules of logic and probability theory is in terms of the computational-level description. On that approach, a process accords with logic and probability theory when what the process computes—the input–output profile of the process— corresponds to the function of the relevant logico-probabilistic rules. In that case, we can say that the process is computationally rational. As we saw above, the computational-level description of a process is neutral about the actual algorithms involved in the transition from input to output.¹¹ Often in cognitive science, what we really want to capture is the algorithmic level process. But it’s also the case that often we settle for less. We can start by trying to show that a process is rational at the computational level, with the hope that this can eventually be filled out with an algorithmic

⁹ This discussion is based on joint work with Richard Samuels (Nichols & Samuels 2017). ¹⁰ Richard Samuels and I distinguish strong and weak versions of the thesis that some cognitive process is in algorithmic accordance with logic and probability theory (Nichols & Samuels 2017). Strong algorithmic accordance requires that there is an isomorphism between analytic probability theory and the inferential process being evaluated. The idea is that an algorithm is only rational if it proceeds as prescribed by the mathematics of probability theory. Samuels and I suggest that this looks to be an excessively demanding way to characterize rational psychological processes. Instead, we argue for weak algorithmic accordance, which allows that a process can be rational when the algorithm implements a good Bayesian approximation method (2017: 24–5). What makes for a good approximation method might vary by context. Although I won’t discuss weak algorithmic accordance further in this book, the notion of weak algorithmic accordance makes it rather easier for a psychological process to count as rational. By contrast, strong algorithmic rationality imposes severe demands on the requirement for rationality (2017: 22). ¹¹ Not all processes characterized at the computational level need conform to logic and probability theory. For instance, part of the visual system of the housefly is described at the computational-level as having the goal of landing (Marr 1982: 32–3); but it needn’t be the case that the transition from perceptual stimulus to landing behavior in the housefly conforms to the laws of probability.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

  :   

15

analysis (see, e.g., Xu & Tenenbaum 2007b: 270).¹² Making a case that moral learning is characterized by a process that is rational at the computational level is a step toward an algorithmic analysis, and often the most we can hope for at this point is a computational analysis.¹³ However, for some of the inferences that I will promote in this book, I will suggest that we do have the beginnings of an algorithmic account, reflected, for instance, in the explanations offered by the experimental subjects.

4. Statistical Learning

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

4.1 Statistical Learning Some kinds of statistical learning involve extremely sophisticated techniques which require enormous computational resources, but other kinds of statistical learning are humble and familiar forms of inference. Imagine you’re on a road trip with a friend and you have been sleeping while he drives. You wake up wondering what state you’re in. You notice that most of the license plates are Kansas plates. You can use this information to conclude that you are in Kansas. This is a simple form of statistical learning. You are consulting samples of license plates (the ones you see), and using a principle on which samples reflect populations (in this case, the population of license plates). This, together with the belief that Kansas is the only state with a preponderance of Kansas plates warrants your new belief that you are in Kansas. Early work on statistical reasoning in adults indicated that people are generally bad at statistical inference (e.g., Kahneman & Tversky 1973). For instance, people seem to neglect prior probabilities when making judgments about likely outcomes. In a striking experiment, Kahneman and Tversky (1973) presented one group of participants with the following scenario: A panel of psychologists have interviewed and administered personality tests to 30 engineers and 70 lawyers, all successful in their respective fields.

¹² Griffiths and colleagues (2015) suggest that the transition from the computational to the algorithmic level can be facilitated by an intermediate level which adverts to resource constraints. ¹³ Note that we don’t need to qualify that a Marrian rational process is merely pro tanto rational. This is because computational and algorithmic rationality are defined narrowly in terms of the function of the process. For example, the algorithm for addition is defined in terms of a restricted class of inputs and outputs and the dedicated transitions between them.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

16

  On the basis of this information, thumbnail descriptions of the 30 engineers and 70 lawyers have been written. You will find on your forms five descriptions, chosen at random from the 100 available descriptions. For each description, please indicate your probability that the person described is an engineer, on a scale from 0 to 100. (241)

Another group of participants got the same scenario, but with the base rates reversed—in this condition there were said to be 30 lawyers and 70 engineers. Participants were then given the five descriptions (allegedly chosen at random) mentioned in the instructions. One of the descriptions fits with a stereotype of engineers:

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Jack is a 45-year-old man. He is married and has four children. He is generally conservative, careful and ambitious. He shows no interest in political and social issues and spends most of his free time on his many hobbies which include home carpentry, sailing, and mathematical puzzles. (241)

With this description in hand, subjects are supposed to indicate how likely it is (from 0 to 100) that Jack is an engineer. Kahneman and Tversky found that participants in both conditions gave the same, high, probability estimates that Jack is an engineer. The fact that there were 70 engineers in one condition and 30 in the other had no discernible effect on subjects’ responses (1973: 241). Another description was designed to be completely neutral between the lawyer and engineer stereotype: Dick is a 30-year-old man. He is married with no children. A man of high ability and high motivation, he promises to be quite successful in his field. He is well liked by his colleagues. (142)

When given the description that was neutral with respect to the stereotypes, obviously participants should go with the base rates; instead, in both conditions, they tended to say that there was a 50 percent chance that the person was an engineer (1973: 242). This study is representative of the broad pattern of results in the Heuristics & Biases tradition, which has exposed numerous ways in which people make mistakes in statistical reasoning. In the wake of this pessimistic line of research, a new wave of cognitive psychology celebrates people’s basic abilities in statistical inference. A wide range of cognitive phenomena have been

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

  :   

17

promoted as instances of rational inference, including categorization (Kemp et al. 2007; Smith et al. 2002), word learning (Xu & Tenenbaum 2007b), and parsing (Gibson et al. 2013). In addition, since the late 2000s, work in developmental and cognitive psychology suggests that children actually have an early facility with statistical reasoning. I’ll present two sets of findings from this emerging research. It is normatively appropriate to draw inferences from samples to populations when samples are randomly drawn from that population, but typically not otherwise. To see whether children appreciated this aspect of statistical inference, Xu and Garcia (2008) had infants watch as an experimenter reach into an opaque box (without looking in the box) and pull out four red ping-pong balls and one white one. In that case, it’s statistically appropriate to infer that the box has mostly red balls. In keeping with this, when infants were then shown the contents of the box, they looked longer when the box contained mostly white balls than when the box contained mostly red balls. Xu and Denison (2009) then investigated whether the nature of the sampling made a difference. At the beginning of the task, the experimenter showed a preference for selecting red balls over white ones. Then, the experimenter selected balls from an opaque container as in the experiment reported above. But in this study, for one condition the experimenter was blindfolded while in the other she had visual access to the contents of the box. Xu and Denison found that babies were more inclined to expect the population of balls to resemble the sample in the blindfolded condition as compared to when the experimenter could see the balls she was choosing. It seems that the babies were sensitive to whether or not the sampling was random. Building on these findings, Kushnir and colleagues found that children use sampling considerations to draw inferences about preferences. When a puppet took five toy frogs from a population with few frogs, the children tended to think the puppet preferred toy frogs; but children tended not to make this inference when a puppet took five toy frogs from a population that consisted entirely of frogs (Kushnir et al. 2010). For a second example, consider another feature of good probabilistic reasoning: If you have priors (e.g., you know the percentage of the population afflicted with a certain disease) then you should use those priors (e.g., in making inferences from a person’s symptoms to whether they have the disease); furthermore, if you get new information, you should update the priors. Girotto and Gonzalez (2008) explored such reasoning in children using a task with chips of different shapes and colors. Figure 1.1 depicts a

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

18

 

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Figure 1.1 Stimuli for probabilistic inference task (adapted from Girotto & Gonzalez 2008: 328)

stylized version of the materials—four black circles, 1 black square and 3 white squares. In the “prior” task, kids were shown just the square chips and told “I’m going to put all the chips in this bag . . . . I will shake the bag and I will take a chip from it without looking.” Then they were told that if the chip was black they would give a chocolate to Mr. Black (a puppet) and if the chip was white it would go to Mr. White (another puppet). Then they were asked which puppet they choose to be. Of the 4th grade school children, 91 percent correctly chose the puppet most likely to win the chocolate (332). Girotto and Gonzalez explored whether the child can also update in this kind of task. The child was shown all eight chips and asked which is more likely to win (with black advantage 5:3). Children tend to say correctly that black is more likely to win. Then all eight chips are put in the bag and the experimenter reaches in. He says, “Ah, listen. I’m touching the chip that I have drawn and now I know something that might help you to win the game. I’m touching the chip that I have in my hand and I feel that it is [a square]” (331). The kids are allowed to revise their judgment, and they tend to answer correctly that now it’s more likely that white will win (334). Note that children succeed in these tasks with no training—they produce the correct response immediately. Subsequent work found that children and adults in two pre-literate Mayan groups also succeed in these tasks (Fontanari et al. 2014). These are just two examples of ways in which even young children seem to make statistically appropriate inferences. We will see several further examples in the course of the book. The lesson of this work is that despite the foibles that have been revealed by the Heuristics and Biases tradition, people, including very young children, possess a substantial competence at probabilistic reasoning.

4.2 Statistical Learning and Rationality The dominant paradigm for explaining these results is Bayesian learning (see, e.g., Perfors et al. 2011). The advocates of this view stress the rational

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

  :   

19

nature of Bayesian inference. For example, Amy Perfors and colleagues write:

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Bayesian probability theory is not simply a set of ad hoc rules useful for manipulating and evaluating statistical information: it is also the set of unique, consistent rules for conducting plausible inference . . . . Just as formal logic describes a deductively correct way of thinking, Bayesian probability theory describes an inductively correct way of thinking. (Perfors et al. 2011: 313)

For many experiments, it’s not obvious that the data support the view that children are engaged in a form of Bayesian updating (see, e.g., Nichols & Samuels 2017). But there is little doubt that the inferences in the tasks reviewed in Section 4.1 are plausible candidates for meeting familiar notions of evidential rationality. Critically, in the above tasks, the child makes inferences that are appropriate given the evidence to the agent. For instance, in the ping-pong ball task, the infant is right to infer from a random sample of mostly red balls that the population is mostly red. All of the evidence she has supports this conclusion.¹⁴ In addition, these tasks take exactly zero training. The normatively appropriate pattern appears on the first (and only) trial. Much prominent work in Bayesian psychology claims only to be giving an analysis of people’s judgments at the computational level (e.g., Xu & Tenenbaum 2007b: 270). But at least in some cases, an algorithmic analysis is also plausible, and one can get evidence for this from people’s reports of their reasoning process (see, e.g., De Neys & Glumicic 2008; Ericsson & Simon 1984). It’s likely that for tasks like the chips task from Girotto and Gonzalez, adults would be quite capable of articulating the reasoning process they actually go through, which might well provide evidence that their reasoning process is algorithmically rational.

¹⁴ One potential worry about the rationality of everyday inferences from samples is that the samples might be unrepresentative. It is plausible that when people have evidence that the sample is unrepresentative, if they ignore this in their statistical inferences, their inferences are rationally compromised. However, when a person has no evidence that a sample is unrepresentative, it seems uncharitable to declare their inferences from the sample to be rationally corrupt. That is, when there is no evidence that a sample is unrepresentative, it’s reasonable for a learner to make inferences as if it is representative. Indeed, if we were so cautious as to withhold inferences on the bare possibility that a sample is unrepresentative, we would rarely make inferences. To suggest such inferential caution borders on recommending skepticism.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

20

 

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

4.3 Statistical Learning and Empiricism I’ll be promoting a kind of empiricist account of moral learning, in terms of statistical inference. I’ll discuss this at some length in Chapter 7, but for now I want to provide a bit of the intellectual background that informs the discussion. In contemporary cognitive science, the debate between empiricists and nativists is all about acquisition (see, e.g., Cowie 2008; Laurence & Margolis 2001). Take some capacity like the knowledge of grammar. How is that knowledge acquired? Empiricists about language acquisition typically maintain that grammatical knowledge is acquired from general purpose learning mechanisms (e.g., statistical learning) operating over the available evidence.¹⁵ Nativists about language acquisition maintain instead that there is some domain-specific mechanism (e.g., a specialized mechanism for acquiring grammar) that plays an essential role in the acquisition of language. In the case of grammatical knowledge, debate rages on (e.g., Perfors et al. 2011; Yang et al. 2017). But it’s critical to appreciate that there is some consensus that for certain capacities, an empiricist account is most plausible while for other capacities, a nativist account is most plausible. On the empiricist end, research shows that infants can use statistical evidence to segment sequences of sounds into words. The speech stream is largely continuous, as is apparent when you hear a foreign language as spoken by native speakers. How can a continuous stream of sounds be broken up into the relevant units? In theory, one way that this might be done is by detecting “transitional probabilities”: how likely it is for one sound (e.g., a syllable) to follow another. In general, the transitional probabilities between words will be lower than the transitional probabilities within words. Take a sequence like this: happy robin As an English speaker, you will have heard “PEE” following “HAP” more frequently than ROB following PEE. This is because “HAPPEE” is a word in English but “PEEROB” isn’t. This sort of frequency information is ubiquitous in speech. And it could, in principle, be used to help segment a stream into words. When the transitional probability between one sound and the next is very low, this is evidence that there is a word boundary between the ¹⁵ Note that empiricists allow that these general-purpose learning mechanisms themselves might be innate; after all, we are much better at learning than rocks, trees, and dust mites.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

  :   

21

sounds. In a groundbreaking study, Jenny Saffran and colleagues (1996) used an artificial language experiment to see whether babies could use transitional probabilities to segment a stream. They created four nonsense “words”: pabiku tibudo golatu daropi These artificial words were strung together into a single sound stream, varying the order between the words (the three orders are depicted on separate lines below, but they are seamlessly strung together in the audio): pabikutibudogolatudaropi golatutibudodaropipabiku

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

daropigolatupabikutibudo By varying the order of the words, the transitional probabilities are varied too. Transitional probabilities between syllabus pairs within a word (e.g., bi-ku) were higher than between words (e.g., pi-go) (p = 1.0 vs. p = 0.33). After hearing two continuous minutes of this sound stream, infants were played either a word (e.g., pabiku) or part word (e.g., pigola). Infants listened longer (i.e., showed more interest) when hearing the part word, which indicates that they were tracking the transitional probabilities.¹⁶ This ability to use statistical learning to segment sequences isn’t specific to the linguistic domain. It extends to segmenting non-linguistic tones (Saffran et al. 1999) and even to the visual domain (Kirkham et al. 2002). Perhaps humans have additional ways to segment words, but at a minimum, there is a proven empiricist account of one way that we can segment streams of continuous information into parts using statistical learning. Nativists can claim victories too, however. Birdsong provides a compelling case. For many songbirds, like the song sparrow and the swamp sparrow, the song they sing is species specific. It’s not that the bird is born with the exact song it will produce as an adult, but birds are born with a “template song” which has important elements of what will emerge as the adult song. One line of evidence for this comes from studies in which birds are reared in isolation from other birds. When the song sparrow is raised in

¹⁶ Listening time was measured by how long babies looked towards the source of the sound, which was either on the left or the right side of the room.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

22

 

isolation, it will produce a song that shares elements of the normal adult song sparrow song; the same is true for the swamp sparrow. And critically, the song produced by the isolate song sparrow differs from the song produced by the isolate swamp sparrow. This provides a nice illustration of a nativist capacity. It’s not that experience plays no role whatsoever—the specific song that the bird produces does depend on the experience. But there is also an innate contribution that is revealed by the song produced by isolate birds. The template gives the bird a head start in acquiring the appropriate song (see, e.g., Marler 2004). The examples of birdsong and segmentation of acoustic strings show that it’s misguided to think that there is a general answer to the nativist/empiricist debate. There are numerous nativist/empiricist debates, and it’s important to evaluate the disputes on a case by case basis. The cases I offer in Part II of this book are all empiricist learning stories, based on principles of statistical inference. Importantly, however, the arguments in Part II make no claim to a thoroughgoing empiricism. The work starts with learners who already have facility with concepts like agent, intention, and cause. It also starts with the presumption that learners have the capacity for acquiring rules. I argue that, given those resources and the evidence available to children, their inferences are rational. This is all consistent with the nativist claim that the acquisition of the concept of agent (for example) depends on innate domain-specific contributions and that there is an innate capacity to learn rules (Nichols 2006: 355–8).

4.4 A Schema for Statistical Learning Accounts Let’s now turn to the characteristics of such learning accounts. Suppose we want to argue that some concept (for example) was acquired via statistical inference over the available evidence. Several things are needed. (1) One needs a description of the concept (belief, distinction, etc.) the acquisition of which is to be explained. We can call the target concept the acquirendum (Pullum & Scholz 2002). Part of the work here is to argue that we do in fact have the concept or distinction or belief that is proposed as the acquirendum (A). (2) Insofar as statistical learning is a form of hypothesis selection, one needs to specify the set of hypotheses (S) that the learner considers in

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

  :   

23

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

acquiring A. This set of hypotheses will presumably include A as well as competing hypotheses. (3) One will also need an empirical assessment of the evidence (E) that is available to the learner. For this, one might consult, inter alia, corpus evidence on child-directed speech. (4) The relevant statistical principle(s) (P) need to be articulated. These principles should make it appropriate for a learner with the evidence E and the set of hypotheses S to infer A. (5) Finally, a complete theory of acquisition would tie this all together by showing that the learner in fact does use the postulated statistical principle P and the evidence E to select A among hypotheses S. Few empiricist theories of acquisition manage to provide convincing evidence for all of these components. The last item in particular is well beyond what most learning theorists hope to achieve. For instance, (5) would require extremely fine-grained longitudinal analyses of the evidence available to individual children and their use of that evidence in learning. In place of this daunting demand, a learning theorist might aim for a weaker goal—to show that learners are capable of using the relevant kind of evidence to make the inferences that would be appropriate given the postulated statistical principles. That is, instead of trying to capture the learner’s actual acquisition of the concept, one might settle for something a good deal weaker: (5*) Show that when the learner is given evidence like E, she makes inferences that would be appropriate if she were deploying the postulated statistical principles P. This requirement is of course weaker than (5) in that it doesn’t try to show the actual transition that occurs when a child acquires the concept. Rather, the goal is to show that learners are appropriately sensitive to the evidence.¹⁷ In addition, (5*) is intended to make a weak claim about the precision and accuracy of the probabilistic representations. I don’t argue (I don’t even believe) that people make precisely accurate probabilistic inferences from the evidence. Rather, the goal of this book is to argue that, for a range of important elements in our moral psychology, when people learn those ¹⁷ The term “sensitive” has a technical meaning in analytic epistemology (e.g., Nozick 1981), but I intend the ordinary notion on which being sensitive roughly means responding appropriately under different conditions.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

24

 

elements, they make inferences that are roughly the kinds of inferences that they should make, given the evidence.¹⁸ Our next task is to determine the kinds of representations that are implicated in moral judgment. That will be crucial to characterizing the acquirenda, and it will be the focus of the next chapter.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

¹⁸ This modest defense of human rationality is reflected in some of Tania Lombrozo’s work. For instance, in her lovely work on simplicity, she shows that people are responsive to evidence in a Bayesian fashion but they overweight the importance of simplicity (2007).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

2 The Wrong and the Bad On the Nature of Moral Representations

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

1. Introduction It is bad when a puppy falls off of a cliff. It is wrong when a person throws a puppy off a cliff. There are some obvious differences between these unfortunate events. For instance, the latter involves a person and an action. Any account of the mental representation of these events would have to register these differences. But there might also be a more subtle difference in the corresponding moral representations. The representation of the badness of the puppy falling is naturally taken to be a representation of the value of the event. It’s a bad value. The representation in the second case will also involve a value representation since the puppy is injured in that scenario too. But it’s possible that the characteristic representation for the second scenario involves something more than registering a bad value. It might involve a structured representation of a rule against injuring innocents, composed of abstract concepts like impermissible, harm, and knowledge. The idea that moral judgments of wrongness implicate structured rules is hardly new.¹ But many theories of moral judgment try to make do with a much more austere set of resources. It is a familiar pattern in cognitive science to seek low-level explanations for apparently high-level cognitive phenomena—witness classic disputes about symbolic processing. Some influential connectionist approaches attempt to explain cognition with no recourse to symbols (McClelland et al. 1986). We find a related trend in accounts of moral judgment that exclude rules in favor of lower level factors. In low-level accounts of moral judgment, the primitive ingredient is typically some kind of simple value representation. Blair’s account of the moral/ ¹ In philosophy, particularism is set in opposition to rule-based approaches to morality (e.g., Dancy 2009). It’s often unclear the extent to which particularism is supposed to make descriptive claims about human psychology. That is, it’s unclear whether particularists mean to deny that rules play a causal role in ordinary moral judgment. But if particularists do mean to deny this, then their view runs into significant difficulties (see Zamzow 2015).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

26

 

conventional distinction is based on the distress associated with seeing others in distress (Blair 1995). Greene’s account of responses to dilemma cases is based on alarm-like reactions of ancient emotions systems (Greene 2008). Cushman (2013) and Crockett (2013) seek to explain judgments in dilemmas by appeal to the kinds of value representations implicated in habits. Railton (2014) draws on more sophisticated value representations, but still stops well short of anything like rules framed over abstract categories. Low-level accounts are often attractive because they build on processes that are uncontroversially present in the organism. In the case of moral judgment, few dispute that humans find it aversive to witness suffering; similarly, it’s widely acknowledged that humans come to find certain kinds of actions aversive through reinforcement learning. Thus, if we can explain moral judgment in terms of some such widely accepted low-level processes, then we have no need to appeal to such cognitive extravagances as richly structured rules constructed from abstract concepts like impermissible and harm. Despite its tough-minded appeal, the race to lower levels can neglect the very phenomena we want to understand. Trying to explain human cognition without adverting to symbolic processing makes it difficult to capture core phenomena like the systematicity and inferential potential of thought (Fodor & Pylyshyn 1988). Similarly, I argue, it is difficult to capture the distinctive nature and specificity of wrongness judgments without adverting to structured rules. Before we get to the arguments, I want to explicate the operative notions of value representation and rule representation.

2. Value Representations One ubiquitous feature of navigating the world involves representing certain actions as good and others as bad. A hungry rat will regard an outcome of getting a cracker as having a certain amount of positive value, and it will regard an outcome of getting a grape as having even greater positive value.² By contrast, rats regard an outcome of being immersed in a pool as having a certain amount of negative value. Generally speaking, if an organism regards a certain outcome as having a positive value, then it will have some motivation to bring that outcome about, and if it regards a certain outcome

² The value associated with the particular outcome (e.g., of getting a grape) is sometimes called the “instantaneous utility function” in expected utility theory and the “reward” in reinforcement learning.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

     

27

as having a negative value, then it will have some motivation to avoid that outcome. Value representations fit neatly into venerable work on reinforcement learning. Early work on maze learning revealed that rats have an impressive ability to build models. For instance, in one task, satiated rats were put in a Y maze that had food at the end of the left branch and water at the end of the right branch. Although the rats weren’t hungry or thirsty, they were led to run down each branch twice a day for seven days. After this, the rats were made either hungry or thirsty and then put into the maze. The hungry rats tended to go to the left branch (where the food had been on the previous days) and the thirsty rats tended to go to the right branch (where the water had been on the previous days). This shows that simply by running the branches the rats built a model of the maze, including what was in the maze (food and water) and where it was (left/right). This is just one simple example, but it’s representative of the rat’s talent at learning models of its environment (see, e.g., Tolman 1948). Value representations also fit neatly into an expected utility framework. On that familiar framework, an agent considers both the value attached to a particular outcome and the probability that a certain action will lead to that outcome. In the simple case, the system just multiplies the value associated with an outcome by the probability that the action will lead to that outcome. The system computes this expected utility for each action under consideration, and then selects the action that has the highest expected utility. For instance, a rat might come to expect that pressing a lever will lead to an outcome in which the rat gets a grape, and this will motivate the rat to press the lever. A rat might also take into account the relative probabilities of certain outcomes given certain actions. Let’s say the rat values the outcome of getting a grape at 0.7, and values the outcome of getting a cracker at 0.5. The rat’s past familiarity with a maze has led the rat to have a model of the maze specifying that if it goes left there is a 0.1 chance of a grape and a 0.9 chance of nothing; if it goes right there is a 0.6 chance of a cracker and a 0.4 chance of nothing. In that case, even though the rat prefers the grape to the cracker, it assigns a higher expected value to the action of going right. In all of these cases, we think of the rat as being guided by a value assigned to an outcome. Thus, these are dubbed outcome-based value representations (e.g., Cushman 2013). Outcome-based value representations are the kinds of value representations that underlie goal-directed behavior. However, sometimes organisms represent an action itself as having a positive (or negative) value, without

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

28

 

representing some goal or good consequence that will result from the action. The basic idea is familiar from habitual actions, like absentmindedly putting your pen to your mouth. When you do this, you usually don’t have a goal in mind that is served by putting your pen to your mouth. Rather, the behavior is produced by a system that places values on actions themselves. Such representations are action-based value representations. While outcome-based value representations draw on the organism’s knowledge of models, action-based value representations are not integrated with models. They are simply values assigned to particular kinds of actions. This means that the kinds of procedures by which action-based value representations are acquired don’t depend on the agent learning a model of its environment. Accordingly, this kind of learning is sometimes called “model-free” reinforcement learning. Instead of constructing a model, the agent simply develops a value for particular actions given a situation. For instance, after getting food from pushing a lever several times, a rat might come to assign a positive value to lever-pushing itself. Such action-based value representations drive habitual behavior, and this behavior can persist even when the original goal of the behavior is undermined. For a simple action like pushing a lever to get food, it can be hard to determine whether the act is driven by an outcome-based value representation (obtain food) or an action-based value representation (hit the lever). We can distinguish these explanations when the goal is “devalued.” In a characteristic devaluation experiment, a rat first learns that pushing the lever is the way to get food. The rat is then removed from the cage, fed until it is completely satiated, and put back into the cage. In some conditions, rats will immediately start pushing the lever even though they don’t eat the food that tumbles out. This habit-like behavior can also be observed if a hungry rat is led to the clear knowledge that there is no food available, such that pressing the lever will not lead to food. Nonetheless, under certain conditions, the rat will still press the bar, out of habit. As we will see, several theorists have tried to explain moral judgment in terms of simple value representations and reinforcement learning (Sections 4 and 5 below). These accounts are intended to explain moral judgment with minimal resources. In keeping with the examples in this section, the kinds of processes and representations invoked in these theories are primarily rooted in research on rat learning. Rats are great and all, but I will argue that to explain moral judgment, we need to go beyond rodent psychology. Moral judgment involves more than just outcome-based and action-based value representations; moral judgment also involves complex representations of

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

     

29

rules. Of course we can use the label “value representation” in a way that would allow us to say that complex rules are associated with values. An expected utility model can assign a subjective cost for breaking a prohibition rule. In such a case, we might say that there is a representation that assigns negative value to actions that violate the rule, and we can call this a “value representation.” But the interest of the recent work on value representations and moral judgment has been the attempt to explain moral judgment with more austere resources that do not incorporate complex rule representations, and I will accordingly use “value representation” to pick out representations of actions or outcomes that are not specified in terms of complex rule representations. Now I will turn to the task of saying more fully what a rule representation is.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

3. Rule Representations Which kinds of representations are implicated in moral judgment? This question is naturally pitched at Marr’s level 2, where the goal is to characterize the representations and algorithms that an organism actually uses to solve problems. Of course, these representations must be neurally implemented in some way, but that is not a question that will occupy us here. I will argue that in order to explain moral judgment, we need to appeal to rule representations as characterized by the following three claims: (1) Rule representations are partly composed of concepts, including abstract concepts like , , and . (2) Rule representations are structure dependent. That is, the same set of concepts will constitute different rules depending on how the concepts are structured in the rule. (3) Rule representations are (at least typically) not hypothetical. The structure dependence of rules means that if Xs are different from Ys, then the rule that it’s impermissible to put Xs on Ys is different from the rule that it’s impermissible to put Ys on Xs. The claim that the rules are not hypothetical draws on Kant’s distinction between hypothetical and nonhypothetical imperatives. Hypothetical imperatives are rules that serve one’s interests like “Put oil in your car.” This imperative applies to us because we desire to prevent our engine from seizing up. If for some reason we want our engine to seize up (say, because we’re conducting an engine test) then the

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

30

 

imperative no longer applies. Some imperatives, however, apply to us even if they don’t serve our interests. Kant’s examples here were moral imperatives, like “Don’t lie”; this moral imperative applies to us even when lying seems to be in our best interests. In a widely influential essay, Philippa Foot argues that moral imperatives aren’t the only cases of non-hypothetical imperatives. Foot begins by noting that on Kant’s characterization, hypothetical imperatives are “those telling a man what he ought to do because . . . he wants something and those telling him what he ought to do on grounds of self-interest” (Foot 1972: 306). She then proceeds to give examples of nonmoral norms that are not hypothetical in this self-interested sense. One of her examples is the rule of etiquette that invitations addressed in the third person should be answered in the third person; Foot claims that “the rule does not fail to apply to someone who has his own good reasons for ignoring this piece of nonsense, or who simply does not care about what, from the point of view of etiquette, he should do” (Foot 1972: 308). Even though I may have no interest in following the rule of etiquette, it still applies to me.³ A large swathe of normative judgment, I’ll suggest, depends on rule representations as characterized by these three features. Although I will argue that such rule representations are essential to moral judgment, there are prima facie reasons to favor value representation approaches instead. As noted, value representations don’t involve the same kind of conceptual and structural complexity as rule representations. And as we’ll see, moral rules are alleged to have significant structural and conceptual complexity (see, e.g., Mikhail 2011: 150–2). Insofar as simple value representations can explain the same phenomena as complex rule representations, this is a reason to favor the value representation account. A very different prima facie reason to favor a value representational account of moral judgment comes from the relationship between moral judgment and motivation. Moral judgment seems to be directly motivational. When we regard something as morally wrong, that typically provides at least some motivation not to do it. Even if this isn’t a conceptual truth, as some philosophers hold (see, e.g., Smith 1994; van Roojen 2014), it’s empirically plausible that, at least for most people, coming to think something is morally wrong brings with it some motivation not to do that thing. ³ Foot’s other example invokes a club rule: “The club secretary who has told a member that he should not bring ladies into the smoking-room does not say, ‘Sorry, I was mistaken’ when informed that this member is resigning tomorrow and cares nothing about his reputation in the club” (Foot 1972: 308–9). Here again, it is not in the member’s interests to obey the rule, but it is still the case that he is breaking the rule—he is doing something that he is not supposed to do.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

     

31

A value-representational approach to moral judgment is well positioned to capture this close link between judgment and motivation, since value representations are intrinsically tied to motivation. To have a positive value representation for lever-pressing entails having a motivation to press the lever. So if we think of moral judgments simply as expressions of value representations, it follows that moral judgment will be intrinsically tied to motivation. In contrast to value representations, rules seem only indirectly connected with motivation. It seems that we often acquire knowledge of rules without being motivated to follow them. I can come to learn that it’s impermissible to end a sentence with a preposition without being motivated whatsoever to conform to that rule. Thus a value representation approach to moral judgment seems designed to explain the relation between moral judgment and motivation, whereas a rule representation approach seems to have no such natural explanation.⁴ The above considerations about complexity and motivation seem to favor the view that moral judgments are best explained by value representations. There is also an apparent normative advantage of value representations over rule representations. Notoriously, rules can institutionalize bad practices. And rules that initially had good effects can persist even after they cease to contribute to our interests. (This feature of rules is implicated in the familiar problem of rule worship levied against rule-consequentialists (e.g., Railton 1984: 156).) The fact that a rule fails to serve our interests isn’t enough to make the rule go away. The situation is quite different for value representations guided by reinforcement learning. Reinforcement learning is poised for constant updating of value representations. If pressing a lever consistently produces food, a rat will assign positive value to pressing the lever. But if the food reward stops, the rat will update the value representation and lose the motivation to hit the lever. As the environment changes, the costs and benefits change, and the values accorded to various behaviors get updated. By contrast, a rule might persist long after it becomes harmful. The sensitivity of value representations to changes in the environment is at least often an advantage of value representations over rule representations. That is, value representations will often be better at tracking our current interests than the comparatively less sensitive rule representations. Insofar as sensitivity constitutes an adaptive advantage for value representations, we might

⁴ In Chapter 10, I revisit this issue and defend the idea that rule representations can be intrinsically tied to motivation.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

32

 

expect that this adaptive advantage would favor value representations over rule representation in the moral domain as in other domains. Thus value representations have several prima facie advantages over rule representations. This provides ample reason to attend carefully to proposals that attempt to explain moral judgment in terms of value representations.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

4. Action-Based Value Representations and Moral Judgment In a striking convergence, Molly Crockett (2013) and Fiery Cushman (2013) independently proposed that we can explain intuitive judgments about moral dilemmas like Footbridge and Bystander by appealing to different kinds of value representations. Outcome-based value representations focus on the consequences (like number of lives saved), whereas action-based value representations focus on the kind of action (like pushing a person) (Crockett 2013: 363; Cushman 2013: 279). Accordingly, outcome-based value representations are said to facilitate utilitarian verdicts whereas action value representations are said to facilitate deontological verdicts (Crockett 2013: 364; Cushman 2013: 282). As we’ll see, this proposal might challenge the rational credentials of familiar moral judgments. To evaluate this proposal, we need to put in place a bit more of the theory.

4.1 Action-Based Value Representations and Rationality As we saw in Section 2, organisms acquire action-based value representations via model-free learning. The fact that these representations do not incorporate a model can explain why animals perform habitual actions that seem to be irrational, like pushing a lever despite the lack of interest in the food. Here’s Cushman: This apparently irrational action is easily explained by a model-free mechanism. The rat has a positive value representation associated with the action of pressing the lever in the “state” of being in the apparatus. This value representation is tied directly to the performance of the action, without any model linking it to a particular outcome. The rat does not press the lever expecting food; rather, it simply rates lever pressing as the behavioral choice with the highest value. (Cushman 2013: 279)

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

     

33

Model-free learning of action-based value representations can generate apparently irrational behavior in us too. Say you need to set the table for dinner, and the clean plates are in the dishwasher. You go to the dishwasher and absent-mindedly put the plates in the cupboard instead of on the table. This is because of the established habit of moving plates from dishwasher to cupboard. Your habit here leads to behavior that subverts your goal. This all coheres with the kinds of rational shortcomings highlighted by Greene’s account of System 1 processes—the model-free system is insensitive to background information and long-term goals, and so can easily produce behavior that would conflict with cost–benefit reasoning. Although model-free learning generates value representations in ways that are arational (in the sense that they are insensitive to the available evidence), action-based value representations can still contribute to an agent’s decisions in rationally appropriate ways. To see this, it will be helpful to consider a new example. Take the instinctive aversion to breathing under water. This aversion has a good goal-based origin, since typically trying to breathe underwater will have a bad consequence. But our aversion to breathing underwater has also acquired an action-based value representation, as revealed by the fact that many people learning to scuba dive have difficulty breathing under water, even though they know that there is oxygen available through the mouthpiece. This aversion to breathing underwater actually poses a hazard to the novice diver because the habitual tendency to hold one’s breath can lead to a wide range of problems while diving. Divers learn to overcome this aversion. To link this up with rational choice, imagine three people, each of whom has acquired (through model-free learning) a very strong aversion to the action, breathing underwater. This aversion can be extinguished provided the learner gets enough practice. Two of these people, the resolute diver and the weak-willed diver, each has a strong desire to scuba dive, such that he believes it would greatly enhance his life. The resolute diver decides to work to extinguish the aversion to breathing underwater, which makes good rational sense given the value he places on diving. The weak-willed diver forgoes diving because of the action-based aversion, and this does not look rational since he is giving up something he regards as highly valuable. The third person, the indifferent diver, has only a minimal desire to scuba dive, and he decides not to work to extinguish the aversion to breathing underwater. This makes rational sense for him—the rewards of diving aren’t worth the aversive experiences that would be incurred in extinguishing the aversion.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

34

 

Thus, while an action-based value representation is oblivious to background knowledge and goals, such value representations can serve as inputs to an agent’s rationally apt decision-making (as in the resolute and indifferent divers) or rationally defective decision-making (as in the weak-willed diver).

4.2 Action-Based Value Representations and Moral Judgment

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

As noted above, Cushman and Crocket aim to explain judgments about moral dilemmas by drawing on two kinds of value representations, actionbased and outcome-based. Since Cushman provides a more extensive defense of the view, I’ll focus on his presentation. He characterizes the distinction between value representations as follows: the functional role of value representation in a model-free system is to select actions without any knowledge of their actual consequences, whereas the functional role of value representation in a model-based system is to select actions precisely in virtue of their expected consequences. This is the sense in which modern theories of learning and decision making rest on a distinction between action- and outcome-based value representations. (Cushman 2013: 279; see also Railton 2017: 182)

Cushman then suggests that this distinction can explain responses to familiar kinds of moral dilemmas. When presented with the possibility of pushing a man in front of a train to save five people, we resist the pushing because our model-free system has assigned a negative value to the action-type pushing, and we have this negative value-representation because pushing typically led to negative outcomes (e.g., harm to victim) (Cushman 2013: 282).⁵ Cushman defends his account by drawing on experimental work that shows that participants are in fact averse to performing actions that are typically harmful but happen to be harmless in the experiment. For instance, subjects were asked to use a rock to hit either a manifestly fake hand or a nut.

⁵ Cushman invokes more complex representations in his overall model of moral psychology (see, e.g., Cushman 2013: 281, 285–6). My focus here is just on the question of whether the austere model-free account can explain the distinction people make in paradigmatic cases like Footbridge.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

     

35

The former action resembles actions that are typically harmful, and the latter resembles actions are typically not harmful. Subjects who hit the fake hand showed significantly stronger physiological responses than those who hit the nut (Cushman et al. 2012). This indicates that we do indeed have an actionbased aversion to action types that are typically associated with causing harm (Cushman 2013: 286). Let’s return to the issue of the rationality of moral judgment. We saw above with the divers that action-based value representations can serve as inputs to rational choice. So even if the decision not to push in Footbridge is driven by an action-based aversion, that alone would not entail that the decision is irrational. However, the action-based explanation can be incorporated pretty naturally into a broader argument for the irrationality of the decision not to push, in keeping with Greene’s (2008) original debunking argument. Greene argued that our judgments in Footbridge are generated by an alarm-like emotion that screams don’t and ignores morally critical information like the known benefits of pushing (in this case, a net savings of four lives). Similarly in Cushman’s account the action-based representation is not sensitive to this morally critical information. And if an agent’s judgment ignores such weighty factors in deference to an aversion to pushing, their judgment is rationally suspect. If the reason we resist pushing is simply because of an aversive feeling, this might indeed seem poor grounds for moral judgment.

4.3 Action-Based Value Representations and Moral Judgment: Descriptive Adequacy If responses to dilemmas like Footbridge were driven by action-based aversion, this would provide a basis for challenging the rational basis of those judgments. However, there is reason to doubt that action-based aversion can explain moral judgment. Finding something aversive is not the same as judging it wrong. The novice diver finds it aversive to breathe underwater without judging that it is wrong for him to do to so, much less judging the act immoral. Thus, to understand our judgments of wrongness (e.g., that it is wrong to push the man in front of the train), we apparently need something more than aversion. Indeed, this point applies to the very experiments that Cushman and colleagues report. Subjects are averse to pretending to smash a person’s hand with a rock; but it’s unlikely that they judge this pretense morally wrong.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

36

 

More generally, natural aversions tend to be triggered by concrete cues—we find a crying face aversive, but not a simple statement that someone, somewhere, is crying. Normative judgments, by contrast, typically involve abstract categories like harm. So, while our feelings track specific cues, our moral judgments track the abstract category. Consider the famous line attributed to Stalin, “A single death is a tragedy; a million deaths is a statistic.” We might well find it more aversive to imagine a single person being murdered than to acknowledge the murder of a million. But we would certainly not make the moral judgment that the murder of one is more wrong than the murder of a million. Our judgments about the wrongness of the action are defined over the abstract category murder, not the aversion. An additional problem with the model-free account is that it predicts that very atypical actions that cause harm would not have acquired the modelfree negative value and so wouldn’t be regarded as morally wrong in the same way as typically harmful actions (Cushman 2013: 282; see also Ayars 2016). Yet it is quite likely that people would regard it as similarly wrong to push the guy off the Footbridge with a giant zucchini despite no learning history with such zucchinis. Indeed, children will condemn an action that is harmful even if that type of action is usually harmless. For instance, even though petting an animal is typically harmless or pleasurable, when children are told that petting hurts an animal, the children judge it wrong to pet the animal (Zelazo et al. 1996). There is an easy way to address these deficiencies—by appealing to rules as a critical component of moral judgment (e.g., Mallon & Nichols 2010, 2011; Nichols & Mallon 2006; Nichols et al. 2016). Action-based aversion is insufficient for moral judgment since moral judgment is generated not merely by registering aversive feelings but by categorizing an act as a violation of a represented prohibition.⁶ And atypical actions can be registered as violations so long as the atypical act falls under the category of action prohibited by the rule. If we grant that rules play an essential role in moral judgment, then showing that we have arational aversions doesn’t suffice to show that our moral judgments are arational.⁷ When we judge that it is wrong to push ⁶ In a subsequent paper, Cushman also adverts to rules as a way to solve this problem (2015: 60–1). ⁷ It is, of course, consistent with a rule-based account that action-based aversions play an important role in moral judgments of wrongness. Moral rules that forbid actions that are intrinsically aversive might be weighted more heavily and be more likely to persist (Nichols 2004c). However, if indeed rules play a critical role in moral judgment, then there is no direct

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

     

37

someone off of a bridge, the bare aversion to pushing is not the only thing that leads us to judge that it’s wrong to push. We also have an internalized rule that prohibits the action. As a result, we can’t debunk the judgment that it is wrong to push unless we are given some reason why this internalized rule, or its role in the judgment, is rationally problematic.

5. Emotion Learning and Moral Representations

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

If moral judgments were simply a product of action-based value representations issued by model-free learning, this would threaten the rational credentials of much everyday moral judgment. However, value representations that are linked to models of the world seem better suited to rationally vindicating moral judgment. Peter Railton has recently promoted the rational basis of moral judgment by drawing on such resources (Railton 2014: 837–8). Thus, although Railton doesn’t appeal to rule representations, his account draws on significantly more sophisticated resources than the action-based value representations we’ve been focusing on so far.

5.1 The Broad Affective System and Rationality As noted in Chapter 1, dual process theories often characterize System 1 as rationally defective—inflexible, domain specific, insensitive to new information, and ill-suited to effective long-term cost–benefit reasoning. Railton maintains that recent work paints a very different picture. We do have a set of resources for unconscious decision-making, which Railton calls the “broad affective system.” Affect is central to this system (Railton 2014: 827), but far from being an inflexible alarm-like response, the broad affective system is a flexible learning system (813), that can incorporate information from multiple domains (817, 823), and is capable of “guiding behavioral selection via the balancing of costs, benefits, and risks” (833). It is this system, Railton suggests, that generates our intuitions that an action is risky or promising or that an excuse smells fishy (823). argument from the arational etiology of the aversion to the conclusion that people’s moral judgments can be dismissed as rationally defective. We might ask the extent to which the aversion compromises overall moral judgments and decisions, but we must also reckon with the contribution of the rule itself, which is not reducible to the aversion. The situation thus looks to be disanalogous to the irrationality of the weak-willed diver.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

38

 

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

How does the broad affective system fare epistemically? To be sure, the broad affective system is sensitive to a broader range of evidence than action-based value representations. However, the process by which we come to attune our emotions to risks and benefits is still critically less flexible and less sensitive to new information than general cognition. For instance, if I tell you that the yellow pill will make you ill, you will refrain from taking it, but not because my testimony generated an attuned fear or disgust response to the pill. We can immediately incorporate such testimonial evidence into our decision-making without the attunement of the broad affective system.⁸ Nonetheless, Railton maintains that the broad affective system is rational in an important way: “the overall picture of the broad affective system in animals and humans is remarkably congruent with our philosophical understanding of the operation of rational procedures for learning and decision making” (835). As Railton notes, this system “is a learned information structure rather than a set of stimulus-response connections (for example, it separately encodes and updates value, risk, expected value, and relational and absolute space)” and thus, “it can properly be spoken of as more or less accurate, complete, reliable, grounded, or experience-tested.” As a result, Railton says, the broad affective system “has the necessary features to constitute a proto-form of implicit practical knowledge” (838).⁹

5.2 The Broad Affective System and Moral Judgment The broad affective system plays a key role in how we update our values. This is obviously true for non-moral values. Rats acquire taste aversions ⁸ Of course, this testimonial evidence (“The yellow pill will make you ill”) and the subsequent belief (the yellow pill will make me ill) can itself contribute to later processing by the broad affective system. We might acquire an aversion to the yellow pill. However, the key point is that the incorporation of testimony looks very different from the kind of reinforcement learning found in the broad affective system. We often move directly from testimony to belief in a kind of one-shot learning. This interpretation is bolstered by the fact that changing the words will change the effect of the testimony. Replace “yellow” with “red,” “pill” with “candy,” or “ill” with “well,” and the behavior shifts accordingly. ⁹ Although the broad affective system is not nearly so limited as the model-free system, it remains the case that agents are plausibly characterized as irrational when they are driven by this system to act in ways they acknowledge to be imprudent or suboptimal upon reflection. For example, many people have an attuned aversion to exercise because of the discomfort they experience when beginning an exercise regimen. This attuned aversion can lead agents to avoid exercising even when they know that moderate exercise would alleviate various ailments (e.g., back pain). Such an agent is arguably being irrational in allowing her broad affective system to carry the decisions she would otherwise make differently.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

     

39

when they come to associate tastes with subsequent nausea. The rat learns to assign a negative affective value to the taste, and this value might be incorporated into a model of a maze with different food options. Similarly, values with apparent moral import can also be shaped by the broad affective system. Consider, for instance, the natural aversion rats and monkeys have to distress signals of their conspecifics (Greene 1969; Masserman et al. 1964). In a somewhat disturbing experiment, a monkey learned that it needed to pull a chain to get food; subsequently the experimenter made it such that pulling the chain would yield food but it would also trigger a shock to a conspecific in an adjoining cage. In this task, several of the monkeys stopped pulling the chain. Their experience of witnessing the distress cues of a conspecific leads them to behave in a way that has a good moral outcome. One explanation for this behavior is that experiences of witnessing the distress cues of conspecifics generates an affectively attuned appreciation that pulling the chain causes an outcome to which they are independently averse. The broad affective system presumably plays a role in determining what we find good and bad, and it does this by laying down value representations. But what about moral judgments of wrongness, the kinds of examples with which we started? Railton suggests that the broad affective system can explain these judgments as well. Recall Haidt’s case of siblings Julie and Mark having consensual sex (Chapter 1, Section 1.2). Haidt maintains that when people defend their condemnation of Julie and Mark’s behavior by adverting to the riskiness of the encounter, this is nothing more than post hoc confabulation. Railton suggests otherwise, and illustrates the point with a different sibling case, Jane and Matthew, who decide that it would be interesting and fun if they tried playing Russian roulette with the revolver they are carrying with them for protection from bears. At very least it would be a new experience for each of them. As it happens, the gun does not go off, and neither suffers any lasting trauma from the experience. They both enjoyed the game, but decide not to do it again. They keep that night as a special secret, which makes them feel even closer to each other. What do you think about that, was it OK for them to play Russian roulette with a loaded revolver? (Railton 2014: 849)

Most people think it obvious that it was not okay for the siblings to play Russian roulette. Railton goes on to draw the parallel to Haidt’s Julie and Mark: “Julie and Mark played Russian roulette with their psyches, arguably with more than a one-in-six chance of serious harm. The fact that

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

40

 

experimental subjects had such harms uppermost in their minds when queried about their disapproval need not show mere confabulation, since running the risk of these harms is relevant to the question whether their conduct was ‘OK’ ” (849) see Jacobson 2012 for a similar example. Railton’s proposal seems to be that participants’ responses to Haidt’s vignette reflect the kinds of risks that were aptly registered by the broad affective system. This account promises to give a kind of vindicatory explanation for people’s judgments about Julie and Mark having sex. The judgments themselves derive from our becoming emotionally attuned to the costs, benefits, and risks associated with such behavior.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

5.3 The Broad Affective System and Moral Judgment: Descriptive Adequacy Is the typical subject’s judgment about Julie and Mark generated by the attunement of the broad affective system to the harms, benefits and risks of incest? There’s reason to be skeptical. Like Haidt’s subjects, my immediate judgment about the case was that it was wrong for Julie and Mark to have sex. Why? Well, I can assure you that it wasn’t from experiences making out with my sister. Most of us don’t learn to condemn incest via our experiences with incestuous activities or by learning about the bad consequences from others who have had the requisite experiences. What about the psychic costs that are risked by incest? First of all, psychic costs of sexual intercourse often aren’t sufficient to generate condemnation. If two friends have sex despite knowing that there is a high risk of psychic harm, we might say that they exhibited bad judgment, but this isn’t the same as what we find in the Haidt study, where participants say of Mark and Julie, “I can’t explain it, I just know it’s wrong.” This contrasts sharply with the case of friends who have ill-advised sex; in that case we know exactly why we regard it as bad—because of the risks. Second, when presented with the case of Julie and Mark, a key part of the condemnation plausibly comes from the fact that it’s categorized as incest. We learn to condemn incest not through a process of personal discovery but because we are told that it’s wrong. And the degree of psychic risk we associate with sibling sex plausibly depends on the fact that we think incest is wrong (as opposed to just registering the naturally emerging costs and benefits of sibling sex). In a group where there is no stigma against sibling

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

     

41

sex (e.g., ancient Egyptians (Hopkins 1980)), there would be significantly less cost to the practice.¹⁰ The importance of categorizing an act as a violation is also evident from people’s concern about whether an act falls under a proscribed category.¹¹ For instance, people care about whether a sexual encounter counts as incest. This is apparent from a casual web search for “is it incest,” which returns thousands of hits. Here are some representative queries: “I stayed at my cousins house a few nights ago and hooked up with her step brother who is a year older than me . . . I’m not sure how to feel about it, is it incest because he’s my step cousin or just kind of weird haha.”¹² “Is it incest if i have sexual relations with my cousin?”¹³

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

“Ugh. Is it incest if you have sex with your adopted brother?” (Asking for a friend.)¹⁴ There is a further reason to think that categories play an essential role in the condemnation of incest. Otherwise we can’t explain the variation in incest condemnation across cultures. In some cultures (parts of Korea and India), first-cousin marriage is absolutely forbidden; in other cultures (e.g., in Saudi Arabia), it is permitted; in other cultures, it is wrong to marry one’s parallel cousin (i.e., the child of a parent’s same-sex sibling), but not a cross-cousin (i.e., the child of a parent’s opposite sex sibling). These different norms, and the different practices that flow from these norms, are the product of cultural norms being passed down from generation to generation.¹⁵ ¹⁰ In a recent paper, Stanley and colleagues (2019) gave the Julie and Mark case to participants and asked about both risk of harm and wrongfulness. They found a correlation between judgments of risk and judgments of wrongfulness. But this doesn’t support the idea that people come to think that incest is wrong by gauging the harm it cases. Rather, in this study it’s likely that the differential risk assessment was driven by a prior view that incest is wrong. For instance, one of the potential harms suggested by the experimenters was that “At some later time, their friends and family could have found out that they had sex.” But presumably the distinctive risk here is that friends and family think incest is wrong. ¹¹ I’m indebted to Alisabeth Ayars for this observation. ¹² https://glowing.com/community/topic/72057594038730600/is-this-incest-or-just-weird. ¹³ https://answers.yahoo.com/question/index?qid=20090109153158AAecIl6. ¹⁴ https://answers.yahoo.com/question/index?qid=20111005141957AAzJozL. ¹⁵ There is obviously a question about why siblings tend to lack sexual interest in each other. One prominent explanation is we have a mechanism with the evolved function of generating sexual disinterest between co-reared children (e.g., Lieberman et al. 2003). But that isn’t the same question as why we condemn the act. For many actions that I find unappealing (e.g., eating vegemite), I certainly don’t morally condemn those who do it. It is true, of course, that incest prohibitions are culturally widespread, and it’s possible that the prevalence of such norms depends on an evolved mechanism that makes sibling sex

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

42

 

This is just one kind of rule, but norm systems in general have determinative proscriptions surrounding marriage, sex, and insults. This also holds for harm-based norm systems. Norm systems determine what can be harmed (e.g., cattle, outsiders, children), how they can be harmed (e.g., slaughtering, swindling, spanking), and when they can be harmed (e.g., for food, for advantage, for punishment). It is very important for members of each community to learn the local system. To get it wrong can mean punishment, ostracism, even death. And people do generally get these things right. A rule-based account is obviously well suited to explain why people can get it right, because such an account draws on concepts that offer the greatest precision available. If people systematically judge that it is wrong to marry parallel cousins, then this is because they encode a rule defined over the concept “parallel-cousin.” If people systematically judge that it is wrong to slaughter cattle, then this is because they encode a rule defined over the concept “cattle.” Accounts of moral judgment based solely on aversion thus have difficulties with both the specificity of moral judgments and the fact that the judgments are of impermissibility. By contrast, a rule-based system easily accommodates both of these core phenomena of moral judgment. At a minimum, it is hard to see how anything but a rule-based system can accommodate cases like the norm systems surrounding cousin marriage. And a rule-based system can easily extend to cases like prohibitions on murder, theft, and so on. That is, once we grant that judgments about wrongful marriage are guided by rules defined over abstract categories like parallel cousin, it is natural to grant that judgments about wrongful harm are guided by rules defined over abstract categories like harm, knowledge, and intention. Indeed, against this background, it seems unparsimonious to hold that wrongness judgments in the moral domain constitute a special island of wrongness judgments that does not involve rules. I’ve suggested that the condemnation of incest does not emerge through learning the natural rewards and punishments of engaging in the behavior. We don’t practice the behavior and thereby develop the recognition that the act is wrong. This is true for much of the moral domain. Consider cheating on tests. Most people judge that this is wrong before they ever cheat on a test. Why? At least in part because they are told that it’s wrong to cheat on tests.

unappealing. For instance, we might expect norms that prohibit unpleasant acts to have a cultural advantage over other norms (cf. Nichols 2004c). But even in this case, the norms are not the same as the affective response.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

     

43

Or consider theft. Children typically don’t try out stealing and have a gradual affective attunement to the costs of stealing that inclines them against theft. Again, a critical factor in children coming to think stealing is wrong is that we tell them that it is. In none of these cases do we find the appreciation of wrongness to emerge from a calculation of the costs, benefits, and risks. Rather, we learn rules that proscribe these various behaviors. It bears emphasis here that rules can be learned very quickly. By contrast reinforcement learning is often slow, since the organism needs to determine which aspect of the environment is relevant to getting the right outcome.¹⁶ Imagine trying to train your dog not to bring her ball into the kitchen. Assuming your dog doesn’t have any words, the training will require lots of punishment to get the dog to appreciate that it’s the ball (and not the doll) that she isn’t supposed to bring. And it’s not clear whether the dog will ever learn that it’s only the kitchen that is off limits. Now imagine you want to teach your 4-year-old child not to bring her ball into the kitchen. Since the child does have language, including words for “ball” and “kitchen,” it would be a sadistic parent who opted to use reinforcement learning on their child rather than simply telling them the rule, “Don’t bring your ball in the kitchen.” The child can learn this rule in one trial. Or take the instruction we offer our children regarding serious moral issues like sexual harassment, racial discrimination, and invasion of privacy. Few would suggest that to discourage such bad behavior we can simply rely on a conceptually austere regimen of rewards and punishments. It’s not just that it’s unlikely that children would arrive at the right views through such reinforcement learning, it’s also that there would be many more violations along the way.

6. Rule Representations Redux I’ve argued that the value-representation accounts cannot explain moral judgment, and that we must advert to rule representations. But in Section 3, we saw several obstacles for explaining moral judgment in terms of rule representations, and I’d like to briefly revisit these issues. First, we face the problem of rule representations and moral motivation. Value representations are intrinsically motivational and so they naturally ¹⁶ In some domains, like taste, we do find one trial learning (see, e.g., Rozin 1986; Welzl et al. 2001). But there is little reason to think that in the broad domain of norms, one trial conditioned learning is common.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

44

 

explain why updating one’s moral representation would also update one’s motivation. It seems that rule representations don’t afford this natural connection between moral representations and motivation. Rule-based accounts of moral judgment do have some resources for explaining moral motivation. Moral rules often prohibit behavior that is antecedently likely to trigger negative emotions. We have a rule that prohibits causing others to suffer, and we (like many other animals) find others’ suffering aversive. Indeed, the aversiveness of the prohibited consequence plausibly played a role in the cultural success of many rules (Nichols 2004c). Nonetheless, this associated aversiveness isn’t adequate to the problem of moral motivation, for reasons related to one of the objections to aversion-based models of moral judgment—specificity (Section 5.3). Judgments of wrongness track the specificity of the category (e.g., incest, theft, cheating) and not basic aversions. Similarly, moral motivation tracks the specificity of judgment and not the independent aversiveness of the consequences. It’s because I judge it wrong to steal that I’m inclined to refrain from stealing (and to disapprove of those who steal). My motivation is against stealing; it isn’t simply against the unpleasant consequences often associated with stealing. The content of moral motivation is isomorphic with the content of the moral rules. In Chapter 10, I’ll offer an empirical argument that rule-representations can be directly motivating. But for now, I’ll just note that the problem of motivational specificity also afflicts the value representation accounts. Just as value representation accounts have difficulty accommodating the specificity of the judgment that it is wrong to marry parallel cousin, these accounts will have difficulty accommodating the specificity of the concomitant motivation to avoid marrying parallel cousins. So it’s far from clear that value representations, as I’ve been exploring them in this chapter, have an advantage over rule representations when it comes to moral motivation. Another apparent advantage of value representations noted in Section 3 is that value representations can be updated continuously by reinforcement learning, whereas rules are more likely to persist even when the rules don’t serve our interests. Value representations are more sensitive and adaptable than rule representations. Perhaps this is one reason that value representations are favored by Railton (e.g., 1986, 2014). On Railton’s view it seems that each child traverses her own path through costs, benefits, and risks to a set of value representations. It is a vision of moral development as ruggedly individualist learning. However it’s not so obvious that this best serves the demands of the social world. Constant updating is great for navigating mazes. But given the vicissitudes of the world and the differences between

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

     

45

people, social coordination often depends more on stability and publicity than on vigilant updating. When I go out on the road, I don’t want people to be updating the relative merits of driving on the right the way a rat evaluates which path will yield the most cheese. I just want my fellow motorists to follow the local norm.¹⁷ Rugged individualism might be optimal for rats in mazes, but not for commuters on roads. When we need to coordinate with each other, it’s often important to have rules that remain stable across intertemporal and inter-subjective differences in value representations. More broadly, having stable and public rules is important for (1) knowing what one should do, (2) assessing whether someone has violated the rule, and (3) knowing when to expect to be punished (see, e.g., Gaus 2011: 113). Thus, even though rule representations are less sensitive to changes in the environment than value representations, this insensitivity might be an advantage insofar as social coordination is critical. There remains of course the problem of the complexity of rule representations. And it will be a major occupation of this book to try to explain how we acquire the kinds of complex rule representations that seem to be implicated in moral judgment. But the complexity is plausibly limited in some important ways. There is a rule against lying. But people also think that sometimes it’s okay to lie. How do we explain this apparent inconsistency? One option is to maintain that the rule against lying specifies a range of possible exceptions. However, this promises to render the rule much more detailed and hence harder to learn, harder to store, and harder to coordinate on. An alternative, which I prefer, is that the rule against lying (for example) doesn’t specify exceptions, but it can be overridden by other consideration.¹⁸ Rules, on this view, are just one factor feeding into global decision-making. Thus, one might recognize that the rule against lying has been broken, but maintain that it’s all things considered okay to break the rule given competing considerations (Nichols & Mallon 2006). I’ve argued that value representations can’t possibly explain the character of our moral judgments. But this is not to deny the manifest significance of value representations to moral psychology and decision-making generally. For instance, we learn to assign negative value to actions that cause immediate

¹⁷ In this book, I take such norms to be acquired as rule representations. ¹⁸ These alternatives partly track the distinction between “specificationist” and prima facie (or pro tanto) theories of rights in normative ethics (see, e.g., Wellman 1995).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

46

 

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

distress to our friends and family; these value representations make us less likely to perform such harmful actions. But to explain moral judgment, we also need something more representationally complex, namely, systems of structured rules. The next several chapters attempt to explain the acquisition of such complex representations.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

PART II

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

S T A T I S T I C A L L E A R N I N G OF NORM SYSTEMS

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

3

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Scope The previous chapter provides reason to believe that structured rules play an essential role in moral judgment. However, a major limitation of rule-based theories is that it has been unclear how the rules get acquired. This problem is especially pressing given the apparent complexity of the rules. For instance, harking back to the moral dilemmas in Chapter 1, people judge that it is permissible to throw the switch but not to push the man. A rulebased explanation of this might hold that the rule against harm applies to intentional harms (as in Footbridge) but not necessarily to harms that are foreseen but not intentionally produced (as in Bystander). People also judge that it is worse to actively cause a harm than to allow a similar harm to occur. Here, a rule-based explanation might say that the rules apply to actions, but not to allowings. Even children reveal these patterns in reasoning about dilemmas (see, e.g., Levine & Leslie forthcoming; Mikhail 2011; Pellizzoni et al. 2010). These are subtle distinctions, and it’s hard to see how kids could learn rules that regiment these distinctions. As a result, the most prominent account of the acquisition of these rules appeals to an innate moral grammar (e.g., Dwyer et al. 2010; Mikhail 2011).¹ A key part of the nativist argument is that children acquire these rules early despite scant evidence. Susan Dwyer and colleagues put the point well: [A]lthough children do receive some moral instruction, it is not clear how this instruction could allow them to recover moral rules . . . [W]hen children are corrected, it is typically by way of post hoc evaluations . . . and such remarks are likely to be too specific and too context dependent to provide a foundation for the sophisticated moral rules that we find in children’s judgments about right and wrong. (Dwyer et al. 2010: 491)

¹ Of course, the kinds of low-level views examined in Chapter 2 try to explain the responses people give to moral questions. It’s just that they try to do so without appealing to complexly structured rules.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

50

 

Nativists use these points to argue that our capacity for moral judgment, and specifically our acquisition of moral rules, requires innate morality-specific constraints. The nativists are right that children don’t get a lot of explicit training on rules. They are certainly not told things like: “this rule applies to what agents do but not to what agents allow to happen.” Jen Wright and Karen Bartsch conducted a detailed analysis of a portion of CHILDES, a corpus of natural language conversations with several children (MacWhinney 2000). They coded childdirected speech for two children (ages 2 to 5) for moral content. Wright and Bartsch (2008: 70) found that only a small fraction of moral conversation adverted to rules or principles (~5 percent). By contrast, disapproval, welfare, and punishment were frequently implicated in moral conversation. The lack of explicit tutelage on rules is compounded by the fact—stressed by nativists—that any particular instance of disapproval will involve specific features (e.g., of the immediate situation), and the child has to learn to abstract away from those features to glean the general rule. Although there is very little reference to rules in child-directed speech, there is a lot of no! don’t! and stop! But it seems as if these injunctions won’t provide enough information to fix on the content of the rule, and this promises to be a pervasive problem for the young learner. To repeat a key point from Dwyer and colleagues, “such remarks are likely to be too specific and too context dependent to provide a foundation for the sophisticated moral rules that we find in children’s judgments about right and wrong” (Dwyer et al. 2010: 6). Any particular case of training will typically be open to too many different interpretations to allow for the child to draw the appropriate inferences about the relevant distinctions. The nativists are right that the evidence available to the child seems to underdetermine the content. But in this chapter I’ll argue that recent work in statistical learning can provide an alternative account of how these distinctions may be acquired.

1. A Learning Theoretic Account of the Act/Allow Distinction 1.1 Acquirendum The rules that guide moral judgment have an apparent complexity that demands explanation. For instance, children judge that actions that produce harms are worse than allowings that produce equal harm; children also

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



51

distinguish intentional harms from unintended but foreseen harms (see, e.g., Levine & Leslie forthcoming; Mikhail 2011; Pellizzoni et al. 2010). If we are to explain people’s facility with moral distinctions in terms of complex rules, we need some account of how they arrive at such complex rules despite scant explicit instruction. Although there are several distinctions that might be explored, for the acquirendum I will initially focus on the fact that the scope of rules tends to range over actions rather than consequences. Children arrive at rules that apply to what an agent does, but not to what that agent allows. Given that children don’t receive any explicit instructions on the matter (Dwyer et al. 2010: 492), how do they acquire rules whose scope ranges over actions rather than consequences? I want to explain how children come to make judgments that respect the act/allow distinction. But first, I need to make a different distinction that bears on rule learning. In learning a rule, often one will quickly adopt the rule in both judgment and action. For instance, if you’re approaching a temple for the first time, and at the entrance someone says, “You can’t wear shoes in the temple,” you might immediately come to judge that it’s wrong to wear shoes in the temple and proceed to take your shoes off. But sometimes in learning a rule, you will simply discern that there is such a rule (or that you are being taught such a rule) without adopting it. For instance, if someone tells you, “You can’t wear white after Labor Day” you might immediately discern that this must be a rule without ever adopting the rule in either judgment or action. For the purposes of giving a rational learning theory of rule acquisition, the first order of business is simply explaining how people come to discern that the rules respect certain distinctions. Thus, in this chapter, the goal is to provide an account of how the learner might figure out what the scope of the rule is. The learner might figure out what the scope of the rule is without adopting it in judgment or action. (As a matter of fact, I think that when children learn a new rule, they often immediately adopt the rule in both judgment and action once they discern what the rule is. I will take this up in Chapters 8 and 10.) But for now, the key question is how a child could come to even discern that the rule they’re learning respects the act/allow distinction.

1.2 Hypotheses The question is, why does the child infer that a given rule applies to actions rather than consequences? Let’s start with the hypothesis

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

52

 

space. For a prohibition rule, we might characterize the competing hypotheses as follows: (HAct) Act-based: the rule being taught prohibits agents from producing a certain consequence.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

(HConsequence) Consequence-based: the rule being taught prohibits agents from producing a certain consequence and also from allowing such a consequence to come about or persist. Act-based rules are obviously narrower in scope than consequence-based rules since all of the violations that fall under HAct also fall under HConsequence , but there will be many instances that are violations under HConsequence but not HAct.² It will be helpful to have some more general terminology to capture this relationship. When the set picked out by one rule is a subset of the set picked out by another rule, I will say that the first rule has narrow scope and the second rule has wide scope. (Obviously, these notions will be relative to the particular rules under consideration; but this will typically be clear from context.) Thus, we will count an act-based rule as having narrow scope and a consequence-based rule as having wide scope. Figure 3.1 represents the subset relation that obtains between an actbased rule and the corresponding consequence-based rule. Imagine that the consequences we’re interested in are car-scratchings. An act-based version of this rule might say that for any given agent, it’s wrong for that agent to scratch a car. A consequence-based version might say that for any given agent, it’s wrong for that agent to scratch a car or allow a car to be scratched. In effect, the consequence-based rule mandates that agents minimize scratchings. There are a few things to note about this hypothesis space. First, on both hypotheses, violations of the rule involve cases in which the consequence occurs—it’s not a violation merely to intend to produce the consequence. ² There are subtle issues about exactly how to characterize the distinction that I’m trying to capture with the two hypotheses. For instance, one might wonder how the hypotheses would bear on a scenario in which a person has to break one promise in order to avoid breaking two promises. Does the rule against promise-breaking operate as a strict side constraint even on minimizing one’s own violations? There is some evidence that people think that the right thing to do is to minimize one’s own promise-breaking, but not to break a promise in order to minimize someone else’s promise-breaking (Lopez et al. 2009). But we don’t need to settle these issues in order to appreciate the point that faces the learner. For however precisely the “actbased” set is characterized, it will obviously be much smaller than the consequence-based set. And that is what will matter for the learner trying to decide between the hypotheses.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi



53

Consequences brought about by agent Consequences

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Figure 3.1 Potential scopes of rules. A rule restricted to the smaller box (e.g., it is wrong for an agent to scratch a car) has narrow scope; a rule that ranges over the larger box (e.g., it is wrong for an agent to scratch a car or allow a car to be scratched) has wide scope

But that seems right. If Matt does his best to murder someone but fails, he has probably violated many moral rules, but not the one that says “it’s wrong to murder.” He hasn’t violated that rule because he hasn’t produced the consequence of killing someone.³ Second, this hypothesis space seems plausible to us as theorists—we assimilate consequence-based rules to agent-neutral consequentialism and act-based rules to deontology. But it’s a further question whether people think of the hypothesis space in this way. That is, do people naturally think of these hypotheses—that a rule might be act-based or it might be consequence-based? My colleagues and I conducted rule-learning experiments that suggest that people do naturally think of the hypothesis space in this way. We had participants engage in a task where they learned novel rules. We first presented participants with sample violations of a novel rule. Some of these sample violations involved a person producing some consequence (e.g., “Mike puts a block onto the shelf”) and some of the sample violations involved a person allowing some consequence (e.g., “Dan doesn’t pick up a jump-rope that he notices on the shelf”). We then examined how participants generalized to new cases. The details of that will be given below in Section 1.5. For present purposes, what matters is the subsequent task. After participants had gone through all of the different sample violations ³ There might be a kind of meta-rule though, that it is wrong to intend to break the rule. Thanks to Brandon Ashby for this point.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

54

 

and made their judgments about which other cases were violations, we had them rate how similar the candidate violations were to each other. We presented participants with pairs of scenarios assembled from the sample violations, and asked them to rate how similar they regarded each pair of scenarios. So, for instance they might be presented with the following pair: “Mike puts a block onto the shelf.”

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

“Dan doesn’t pick up a jump-rope he notices on the shelf.” Then they would simply indicate the extent to which those two scenarios are similar. The idea was to see whether participants regarded actions as more similar to each other and distinct from allowings. Of course, similarity judgments depend on the background task. If asked to group things by size, then a dachshund is more similar to a ferret than it is to a Rottweiler. To ensure that participants were focusing on the relevant similarity metric, they were instructed to make these similarity judgments based on the same aspects of the scenarios that were important in making their earlier decisions about the meaning of the rules (cf. Xu & Tenenbaum 2007b: 254). From this we were able to generate a hierarchical representation of the hypothesis space, and, as predicted, there were clearly distinct clusters for actions and allowings (Nichols et al. 2016: 540, fig. 4). An abbreviated depiction of this hierarchy appears in Figure 3.2. 1.8 1.6

H

1.4 1.2 1 0.8 0.6 0.4

C

D

0.2 cc1 cc4 cc2 cc3 ci1 ci4 ci2 ci5 ci3

Figure 3.2 Hierarchical representation of hypothesis space, based on similarity judgments. Letter H represents the “domain” (in this case, violations involving something written on the chalkboard). C and D represent unique clusters. All the cases in D are cases in which the agent produced the consequence, and all of the cases in C are cases in which the agent allowed the consequence (abbreviated from Nichols et al. 2016)

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



55

In the results represented in Figure 3.2, all of the sample violations involved something written on a chalkboard. People tend to think that cases in which an individual produced the consequence (i.e., writing something on the chalkboard) were very similar to each other (this is cluster D in the figure), and they tended to think that cases in which an individual allowed the consequence (i.e., didn’t erase a drawing on the chalkboard) were very similar to each other (this is cluster C in the figure); in addition, participants regarded elements in one cluster as quite different from elements in the other cluster (this is reflected by the distance between the lower level nodes, C and D, and the higher level node H). Again, participants’ judgments were about how similar the two scenarios were for the purposes of trying to determine the content of a novel rule. Given that they treated the actions as one cluster that is distinct from the allowings cluster, this naturally generates competing hypotheses about the content of a novel rule. One hypothesis is that the rule applies only to the action-cluster (corresponding to node D in the figure). Another hypothesis is that the rule applies to both the action-cluster and the allowing-cluster (corresponding to node H in the figure). And of course, these are exactly the two hypotheses suggested in the hypothesis space above. It remains to be seen whether this is something common across many populations, or whether it is culturally specific. But at least for some cultures, the hypothesis space for rule learning plausibly includes HAct and HConsequence. Our studies indicate that the two hypotheses of interest, HAct and HConsequence , do in fact seem to be natural for people learning novel rules. However, there is an obvious third hypothesis available: (HAllow) Allow-based: the rule being taught prohibits agents also from allowing certain consequences to come about or persist. This hypothesis is especially apparent from Figure 3.2, where it corresponds to node C. For present purposes, I want to focus on HAct and HConsequence because the key question of interest is how children end up having rules that are act-based when the evidence that they get is consistent with the rules being consequence-based. There is an important question about whether allow-based rules are learnable. In Chapter 7, I’ll present some evidence that they are indeed learnable, and this bears on certain debates about the innateness of moral grammar. Nonetheless, children will have been exposed to few if any allow-based rules, and so there would be no

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

56

 

strong expectation for a novel rule to be allow-based. As we’ll see in Chapter 4, such expectations matter for learning novel rules. Moreover, the evidence children get about rules tends to be inconsistent with the hypothesis that the rule is allow-based, since most of the violations that are brought to children’s attention are cases in which an agent produces an action (see Section 1.4). That means that the evidence concerning most rules is not consistent with the allow-based interpretation. Following this pattern in the actual evidence children get, we arranged our experiments to focus on distinguishing act-based and consequence-based rules (see Section 1.5). One final point about the hypothesis space under consideration is that it only applies to prohibition rules, but children also learn prescriptive rules. For instance, kids learn the rule that they must brush their teeth every night. Prescriptive rules involve directing an agent to do something, and so what counts as narrow vs. wide scope cannot be simply act-based versus consequence-based. Nonetheless, I think we can still usefully characterize prescriptive rules in terms of narrow and wide scopes. This can be illustrated by an example adapted from Parfit about caring for children. A rule that requires that people ensure that children are cared for would be wide scope, and a rule that requires that parents ensure that they care for their children would be narrow scope (see Parfit 1984: 104; McNaughton & Rawling 1991: 173–4). What makes the latter rule narrow scope is that in some intuitive way the scope of the rule is constrained by the sphere of the agent. To take the less lofty example of dental hygiene, a narrow scope rule would require that one brush one’s teeth,⁴ and a wide scope one would require that one maximize teeth brushing. Obviously the former is in the sphere of the agent—they are the agent’s teeth after all. The latter rule extends much more broadly. Our work has focused entirely on proscriptive rules, but the same methods and arguments would plausibly apply to making inferences about whether a prescriptive rule is narrow scope (i.e., within the sphere of the agent) or wide scope.

⁴ Sinnott-Armstrong (in conversation) suggests that the narrow scope rule is better cast as: don’t leave one’s teeth unbrushed. After all, to use his example, when you come home from the dentist, your teeth have already been brushed and you don’t think that you need to brush your teeth again. Again, the exact details of how to characterize the narrow scope rule don’t need to be settled for us to recognize that the narrow scope rule picks out a much smaller set than the consequence-based rule.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi



57

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

1.3 Statistical Principle: The Size Principle The statistical principle I will invoke to explain why children select the actbased hypothesis is the “size principle” (Tenenbaum & Griffiths 2001; Xu & Tenenbaum 2007b). To get an intuitive sense of the principle, imagine that a friend has a box of four fair dice, each with a different denomination: 4, 6, 8, and 10. He pulls out one die at random and rolls it ten times, reporting that the outcomes were 3 2 2 3 4 2 3 4 2 2. Is it likely that he’s rolling the ten-sided die? Of course not. Why? Because you would have expected some numbers over 4 if it were the ten. If it were the ten, it would be a suspicious coincidence that all the observations were ≤4. The size principle offers a systematic way to capture this intuitive fact. Let’s call the hypothesis that the die is foursided h₄, the hypothesis that the die is six-sided h₆, and so on. We can represent the size of the hypotheses by a nested structure (Figure 3.3). One way to think about this figure is that the four-sided die hypothesis generates a proper subset of the predictions generated by the ten-sided die hypothesis. The size principle implies that in such a nested structure, if the prior probabilities of the hypotheses are the same, the smallest hypothesis consistent with the evidence will have the highest posterior probability (in our example, this is hypothesis h₄). The principle is intuitive. It’s also mathematically extremely simple. Again, suppose that your friend pulls out a die at random, so the prior probability is the same for h₄, h₆, h₈, and h₁₀. Suppose again the first roll comes up 3. That result is consistent with both h₄ and h₁₀, but the likelihood of 3 under h₄ is 0.25, and the likelihood of 3 under h₁₀ is 0.1. The second roll is 2. That result, too, has likelihood of 0.25 under h₄ and 0.1 under h₁₀; since we now have two rolls that are consistent with both h₄ and h₁₀, we square those terms for the joint probability for the likelihood, yielding 0.0625 for h₄ and 0.01 for h₁₀. With three consistent rolls (3, 2, 2), we cube the probabilities to yield 0.0156 as the likelihood given h₄, and 0.001 for h₁₀. This process illustrates the fact that smaller hypotheses that are consistent with the data

Figure 3.3 Representing the sizes of hypotheses. The numbers represent the highest denomination of the die; the rectangles represent the relative sizes of the hypotheses

10 8 6 4

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

58

 

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

(e.g., h₄) will yield greater likelihoods than larger hypotheses (e.g., h₁₀), and this advantage increases exponentially with each new data point.⁵ Xu and Tenenbaum use the size principle to explain how the absence of evidence might play a role in word learning. When learning the word “dog,” children need only a few positive examples in which different dogs are called “dog” to infer that the extension of the term is ⟦dog⟧ rather than ⟦animal⟧. Pointing to a Poodle, a Labrador, and a Chihuahua suffices. You don’t also need to point to a blue jay or a halibut and say “that’s not a dog.” Xu and Tenenbaum explain this in terms of the size principle. The extension of Dalmatian is a subset of the extension of dog which is a subset of the extension of animal (Figure 3.4). As a result, the likelihood of getting these three examples (a Poodle, a Labrador, and a Chihuahua) is higher if the extension of the word is ⟦dog⟧ as compared with ⟦animal⟧. Xu and Tenenbaum report experiments that suggest that participants conform to this kind of inference. In a word learning task, participants (both adults and children) were presented with a nonsense syllable, e.g., “Here is a fep,” accompanied by a pictured object; the task was to generalize the application of that word to other depicted objects. In some trials, participants saw one sample application of the word. For example, they might be told “Here is a fep” and shown a picture of a Dalmatian. In other ⁵ The basic idea of the size principle does not presuppose Bayesianism. However, it’s useful to explain how the size principle fits into a Bayesian framework. Bayes’s rule, in its proportional form, can be stated as follows: pðhjdÞ / pðdjhÞpðhÞ The term p(h) expresses the prior probability one assigns to the hypothesis; p(d|h) expresses the “likelihood,” that is, the probability of getting the data one observes given the hypothesis under evaluation; p(h|d) is the probability that the hypothesis is true given the data—this is the “posterior” probability which one arrives at after multiplying the prior times the likelihood. So what this equation says is that the posterior probability of the hypothesis given the data [p(h| d)] is proportional to the likelihood of the data given the hypothesis [p(d|h)] times the prior probability of the hypothesis [p(h)]. (To determine the exact probability of the hypothesis, one would also need to calculate the prior probability of the data occurring. This calculation is sometimes cumbersome. Fortunately, when comparing hypotheses that are supposed to explain the very same data, one need not calculate the prior probability of the data to determine how much more probable one hypothesis is than the other. This is what the proportional form of the rule allows us to calculate.) The size principle applies to the likelihood term in Bayes’s rule. In its general form, the size principle can be expressed as follows: h in 1 pðdjhÞ ¼ sizeðhÞ For the case of the dice, the size principle tells us the probability of getting these data (3, 2, 2) for the various hypotheses under consideration. To repeat from above, with these data (3, 2, 2), pðdjh4 Þ ¼ 0:0156 and pðdjh10 Þ ¼ 0:001:. Since the competing hypotheses have the same prior probability, these differences in the likelihoods will be directly mirrored in the posterior probabilities of the hypotheses.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi



Dalmatian

blue jay

Chihuahua

wren

Labrador

59

crow

Poodle dog

bird animal

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Figure 3.4 Extension of words represented as subset structure (adapted from Xu & Tenenbaum 2007b: 248)

trials, they were shown three sample applications. For instance, they might be told “Here are three feps” and shown pictures of three Dalmatians. When shown three examples of Dalmatians, participants were more likely to generalize only to Dalmatians than when given a single example of a Dalmatian, suggesting that they are sensitive to the number of samples— as they get more examples consistent with the narrowest hypothesis, they are more likely to restrict their generalizations. In addition, when the three samples were different kinds of dogs (e.g., a Dalmatian, a terrier, and a mutt), participants generalized to all dogs, but not to items that were not dogs (Xu & Tenenbaum 2007b: 253).⁶ Just as the hypotheses concerning the dice and the animals form subset structures (Figures 3.3 & 3.4), a subset structure characterizes distinctions of interest in the normative domain, including the act-based vs. consequencebased distinction that forms our hypothesis space (Figure 3.1). Given this subset structure, the size principle has the potential to explain critical features of rule learning. Imagine trying to learn a rule of conduct for a

⁶ After the word-learning portion of the task, the adult participants were presented with pairs from the learning phase (e.g., Dalmatian and terrier) and asked to indicate, for each pair, how similar they are. They were explicitly told to base their similarity ratings on the features of the objects that were important to their judgments in the word-learning phase. The similarity ratings provide natural clustering (e.g., Dalmatians cluster more with other dogs than with birds) and this is used to generate a hierarchical representation of the hypothesis space guiding subjects’ word learning. This is the paradigm that we used to determine the hypothesis space for rule learning (see Figure 3.2).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

60

 

different culture. The available hypotheses are: hn—the rule prohibits putting things on the sand, and hw—the rule prohibits tolerating things being on the sand. Hypothesis hn has narrow scope, applying to an agent’s action; hw has wide scope, applying to what the agent does or allows. Now imagine that you learn several randomly sampled violations of the rule, with the opportunity of seeing violations that are allowings as well as violations that are actions, yet all the observed violations are cases in which a person has put something on the sand. If the prior probabilities are the same for hn and hw, then, following the size principle, you should think that the narrow scope hypothesis is more probable than the wide scope hypothesis. As with the dice, it would be a statistically suspicious coincidence if hw were the right hypothesis, given that all the evidence is consistent with hn. Thus, even though children don’t get explicit instruction on the act/allow distinction, it’s possible that they can infer that a rule is act-based from the kinds of violations they learn about. If all of the violations they learn about are cases in which the violator produced the consequence, this might count as evidence that the operative rule applies narrowly to what an agent does rather than more widely to include what people allow. This is just the basic insight of the size principle. In the case of the dice, the fact that none of the rolls were over 4 would be a suspicious coincidence if the die were ten-sided. The data about the die-rolls provided an opportunity to get evidence favoring h₁₀. The absence of such evidence counts against h₁₀. In the case of rule learning, the child has ample opportunity to get evidence favoring consequence-based rules. As a result, if none of the violations children have observed are “allowings” this would be a suspicious coincidence if the rule were a consequence-based rule. The absence of evidence here is itself telling evidence.

1.4 Evidence Available to the Learner The foregoing gives us an analysis of the problem. A rational learner trying to decide between the hypotheses would infer that a rule is act-based when all the sample violations of the rule are actions. The next question is, what does the child’s evidence look like? To answer this, we had naïve individuals code a portion of a standard database for child-directed speech (CHILDES). The first stage of coding identified the cases in which an adult communicated something relevant to rules of conduct in general (not just moral conduct). Those statements were then coded again for consistency with

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi



61

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

narrow-scope interpretations of the rules. Coders were trained on the distinction between narrow scope and wide scope. They were told that narrow-scope rules have the form “agent(s) shouldn’t cause outcome S,” whereas wide-scope rules have the form: “agent(s) should not cause outcome S nor allow such a state of affairs to persist.” The results were overwhelmingly consistent. Over 99 percent of observed violations were coded as consistent with a narrow-scope interpretation of the rule–they were instances in which an agent did something as opposed to allowing something. Typical examples include “don’t hit anybody with that Adam,” “don’t throw paper on the floor,” and “don’t write on that.” Of course, there were also many cases of parents just saying “no!” “don’t” or “stop it” to express disapproval of a child’s action. But there were virtually no cases of “don’t let that happen.”⁷ This corpus evidence suggests that for the vast majority of rules the child learns, there is a conspicuous lack of evidence in favor of HConsequence. It’s not that the child lacked the opportunity to get evidence in favor of HConsequence. Presumably there were many occasions on which a person allowed an outcome to occur without being called out for a violation. This lack of evidence for HConsequence counts as evidence in favor of HAct.

1.5 Sensitivity to the Evidence The last part of the learning theoretic project here is to see whether people are actually sensitive to the kind of evidence that would be relevant for the foregoing rational learning story. Do people use evidence of violations to make inferences about the scope of rules? As noted above, we conducted learning studies on novel rules to examine this. Participants were given sample violations of a novel rule and asked to generalize to new cases. We used unfamiliar and unemotional rules so that participants’ judgments wouldn’t be biased by prior moral beliefs or affective responses. The violations involved either a shelf, a chalkboard, or a park. Some of the sample violations were actions. For example, participants would be told that John is violating the rule in this scenario: “John puts a ball on the shelf.” Other violations were allowings. For example, participants would be told that Mary ⁷ There were only two examples coded as inconsistent with a narrow scope interpretation – cases in which allowings were violations. Interestingly, one of these involved a child being told not to let his younger brother fall. This might reflect a case in which we really do have rules that are more consequence oriented, based on obligations to vulnerable populations.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

62

 

Consequences intentionally brought about by agent Consequences

Consequences intentionally brought about by agent

Consequences intentionally brought about by agent

Consequences

Consequences

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Figure 3.5 Sample violations of novel rules

is violating the rule in this scenario: “Mary saw a puzzle on the shelf and left it there.”⁸ For some rules, participants were given a single sample violation that was an action; for other rules, participants were given three sample violations, all actions; and for other rules, participants were given one violation that was an action and two that were allowings (see Figure 3.5). For each rule, after being given the sample violations, participants were asked whether the rule applied to new cases. Some of the new cases were actions (e.g., “Chris places a toy truck on the shelf”) and some were allowings (e.g., “Emily sees a marble on the shelf and walks past it”). In general, we found that people’s inferences about the scope of new, unfamiliar rules were sensitive to the evidence. When the sample violations people received were all consistent with the narrow-scope hypothesis (i.e., involved people bringing about the consequence), people only generalized to other cases in which the agent produced the consequence. This held for both the one sample trials and the three sample trials.⁹ Thus, when given ⁸ The learner actually has to discern multiple things about the rule. For instance, she has to determine whether the rule involves chalkboards or shelves. If she figures out that the rule involves shelves, she might then need to determine whether the rule prohibits approaching the shelf, touching the shelf, or placing something on the shelf. In our experiments the learner had to figure out these features of the rules. But our interest, of course, was in another feature of the rule, its scope. We wanted to explore the learner’s ability to determine the scope of the rule— whether the rule prohibits an agent producing the consequence or requires that the agent minimizes all consequences of that kind. ⁹ The fact that we found no difference between the one sample and three sample trials suggests that there is a bias in favor of act-based rules. I’ll take up this issue in the next chapter. However, as is clear from the dice case, under certain conditions the size principle can be leveraged to draw stronger inferences from three samples than from a single sample. (With a single roll of 3, the likelihood of H₄ is 2.5 times higher than that of H₁₀, but with three rolls that are consistent with H₄, the likelihood of H₄ is 15.625 times higher than that of H₁₀.) In subsequent studies (Nichols et al. 2016, studies 2a and 2b), we were able to show that people

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



63

examples consistent with HAct, people seemed to infer HAct for the rule. However, if the sample violations were inconsistent with the narrow-scope hypothesis (because some of the sample violations were allowings), then participants overwhelmingly generalized to both actions and allowings. That is, they generalized in ways that suggested that they endorse HConsequence for the rule. This suggests that people are sensitive to evidence that bears on whether the rule applies only to what an agent does, or also to what an agent allows. Overall, the evidence indicates that the scope of moral rules can be acquired (1) with only a few examples and (2) without explicit instruction about what the rule does not prohibit. The fact that people are sensitive to the scope of the samples suggests that if we were presented with “allowings” as violations, we would have learned the broader rule. However, as we’ve seen, children are typically not exposed to such evidence. To summarize the learning theory, we can characterize the rational analysis part of the theory as follows. If a rule is act-based, the set of possible violations will be a subset of the possible violations of the corresponding consequencebased rule. If the prior probabilities for these two possibilities are the same, then, given the size principle, if all the evidence is consistent with the hypothesis that the rule is act-based, it follows that the hypothesis that it’s act-based is more probable than the hypothesis that it’s consequence-based, and this advantage increases dramatically with each new data point that is consistent with the act-based hypothesis. The corpus data suggests that virtually all of the evidence that children get about rules is consistent with the hypothesis that the rule under consideration is act-based. Thus, if children are rational learners, they should infer that these rules are act-based (assuming that they have no prior bias about whether a new rule is act-based or consequence-based). In addition, we found that in learning novel rules, adults are sensitive to the kind of evidence that is relevant for determining whether a rule is act-based or consequence-based. were sensitive to whether there were one or three samples. These studies involved several modifications of the original study. First, we used a restaurant scenario where wide-scope rules seemed more natural. Second, we focused on a single kind of consequence—a napkin on the windowsill. Third, following Xu and Tenenbaum (2007a), we presented participants with a list of eleven candidate violations from which either one or three were randomly sampled examples of violations, and participants had to determine whether two of the remaining items on this list (one an action and another an allowing) were also violations. Finally, we used a scaled response rather than the dichotomous response in the earlier study. Under those conditions, we did find (in both between- and within-subject studies) that participants were more inclined to think the rule was narrow-scope when given three sample violations rather than a single sample.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

64

 

Finally, a quick note about the level of explanation on offer. The rational analysis of these inferences counts as a computational-level explanation. Given the input children get, a rational learner should infer act-based rules. Our studies didn’t explicitly investigate whether people are using rationally appropriate algorithms in drawing inferences about the scope of rules. But there is some reason to think that they are. For instance, when people are given a mix of violations, some of which are actions and some of which are allowings they overwhelmingly infer that the rule is consequence-based; it’s extremely plausible that people are correctly using the samples to exclude the hypothesis that the rule is act-based. Indeed, people probably have conscious awareness of this; that is, they know that they’re judging that the rule is consequence-based because some of the violations are allow-based rather than act-based. So this much of the story is probably an algorithmic one. It’s also at least somewhat plausible that people are algorithmically sensitive to suspicious coincidences. For instance, in the case of the dice, even if people can’t articulate the reasons, it’s likely that they register that if all of the rolls are four or under, there would be something weird about the outcomes if it were a ten-sided die. Now, even if adults solve these tasks using rational algorithms, it’s not clear to what extent these processes are available in children. But there is good reason to be optimistic that there is an algorithmically rational process. Furthermore, given the simplicity of the tasks, it is a perfectly feasible research project to try to determine whether there is an algorithmically rational process and if so, what it is.

2. Principle of Double Effect The act/allow distinction is pervasive. There is also a more subtle principle that seems to characterize much moral judgment—the Principle of Double Effect (PDE). The Principle gets defined in different ways, but a key part of the idea is that if you’re trying to do something good (e.g., stop the spread of a disease), it can be permissible to knowingly cause some harm (e.g., quarantine people against their will) that would otherwise be impermissible. Thus, according to the PDE, it can be permissible to bring about a bad consequence when that consequence is foreseen but not intentionally produced. This is supposed to explain particular judgments in trolley cases, like the judgment that it is permissible to pull the switch in Bystander (Chapter 1). Diverting the train will save five innocent people on the main track, and kill one innocent person on the side track. This is permissible

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



65

because although the bad effect (killing the one) is foreseen, it was not done intentionally. The classic philosophical literature on trolley cases attempts to articulate what the correct normative principles are. In that literature, the issue about the PDE is whether it should be recognized as part of a correct normative theory (Foot 1967; Thomson 1985). More recently, several philosophers, including especially moral Chomskyans, have promoted a psychological thesis regarding the PDE (e.g., Harman 1999; Mikhail 2011). They maintain that lay judgments about trolley cases like Bystander indicate that the PDE characterizes rules found in the moral system itself (Harman 1999: 113–14; Mikhail 2011: 360). The trolley cases are outlandish. But something like the PDE also seems to be reflected in the criminal law in an intuitively recognizable way (see, e.g., Mikhail 2007: 145). Imagine a physician who happens to be at a department store trying on a shirt when he gets a call that he is needed immediately. The physician runs out, knowingly wearing the shirt he hasn’t paid for. That is plausibly not a case of stealing because there is no mens rea—the physician did not intentionally steal the shirt. Contrast this with a physician who happens to be in a pharmacy and learns that a patient urgently needs a clamp that is only available in this pharmacy. The physician grabs the clamp and rushes to save the patient. Here there is mens rea—he fully intended to take the clamp. That does look to be an instance of stealing, albeit a permissible one. Our easy intuitive grasp of these legal distinctions calls out for explanation (see also Mikhail 2011: 173).

2.1 Learning Theoretic Account of PDE I’ll promote a learning story for something like the PDE, though as we will see, the details are not quite as simple as in act/allow. Let’s start with an early discussion of the PDE from Harman: According to this principle, there is an important distinction between what you aim at, either as one of your ends or as a means to one of your ends, and what you merely foresee happening as a consequence of your action. It is much worse, for example, to aim at injury to someone, either as an end or as a means, than to aim at something that you know will lead to someone’s injury. Doing something that will cause injury to someone is

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

66

  bad enough; but according to the principle of Double Effect, it is even worse to aim at such injury. (Harman 1977: 58)

The key distinction that Harman invokes here is between what you “aim at” or do intentionally and what you foresee happening as a result of what you do. The view that the PDE is somehow part of a basic normative competence is presumably not that people have an explicit articulation of the PDE. Rather, the idea is that at least some of our moral rules restrict the scope of the rule to consequences that are produced intentionally. Thus, the acquirendum of interest is a rule representation whose scope is restricted to intentionally produced consequences. To frame this in terms of a learning theory, let’s start by supposing that the learner is trying to decide between these two hypotheses: (HIntentional) The rule being taught prohibits agents from intentionally producing a certain consequence.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

(HForeseen) The rule being taught prohibits agents from performing an action they know (foresee) will produce a certain consequence. If consequences produced intentionally constitute a proper subset of consequences produced with foresight, the size principle will be available for making inferences from sample violations to scope (see Figure 3.6). The basic move is familiar: if the hypothesis space can be regimented as in Figure 3.6 and the prior probabilities of the hypotheses are the same, then if all of the sample violations are cases in which the agent produced the consequence intentionally (i.e., if all the violations fit into the smallest box), the size principle implies that HIntentional is more probable than HForeseen. To see whether people are sensitive to evidence regarding these different scopes, we conducted another novel rule-learning study. The rule involved paper on a desk. There were three conditions. In the intentional samples condition the samples of violations were all cases in which the agent intentionally removed paper from the desk (e.g., “John moved a piece of construction paper from the desk to the shelf.”). In the accidental samples condition, the cases consisted of one example in which a person intentionally removed paper from the desk and two examples in which the consequence was accidental (e.g., “Bill put down the garage door and the vibration caused a letter to fall off of the desk.”). In the foreseen samples condition, the

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi



67

Consequences foreseen by agent Consequences intentionally brought about by agent Consequences

Figure 3.6 Hypothesis space for Principle of Double Effect

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

cases consisted of one example in which the agent intentionally removed paper from the desk and two examples involving a foreseen consequence that was not intentionally produced (e.g., “Bill used a book to scoot a bug off the desk even though, as he expected, this also caused a paper to slip from the desk onto a table.”). Within each of these conditions, half of the participants were given a test case in which the consequence was intentionally produced and the other half were given a case in which the consequence was foreseen but not intentionally produced. Both test cases begin with the following scenario: Ed notices that wind is coming up through a vent under the desk. It’s clear to him that the wind is about to blow all the papers off the desk that aren’t secured. The intentional case continues as follows: There are six pieces of construction paper on the desk. To stop the five pieces from blowing off the desk, Ed has only one option, and he takes that option: Ed grabs one of them and puts it over the vent to stop the wind from blowing off the other pieces of paper. Once the piece of paper is put over the vent, the wind is stopped and the other papers remain on the desk. The foreseen but not intentional case continues like this: One piece of blue construction paper happens to be under a large box; five pieces of construction paper are scattered next to the box. To stop five pieces from blowing off the desk, Ed has only one option, and he takes that option:

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

68

 

Ed picks up the box and sets it down over the five pieces of paper, knowing that the blue piece will be blown off the desk. The blue piece is blown off the desk, but the other papers remain on the desk. Our prediction was that, regardless of condition, participants should maintain that Ed is violating the rule in the intentional case. But for the foreseen but not intentional case, we predicted that participants in the intentional sample condition would be less likely to think that Ed is violating the rule. The idea was that if all of the sample violations are cases in which the agent intentionally produces the consequence, the size principle generates a higher likelihood for the smaller hypothesis. Our prediction was borne out. Participants were much less likely to say that Ed violated the rule when the sample violations were all intentional (see Figure 3.7). This suggests that people are sensitive to evidence in a way that would facilitate learning that the scope of the new rule ranges over intentional actions (HIntentional), and not more broadly over all foreseen actions (HForeseen).

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

2.2 The Shape of the Hypothesis Space In the simple application of the size principle above, I have assumed that the set of intentional actions is a proper subset of the set of foreseen actions. However, there is reason to doubt that this captures the way ordinary people think about intentional action.¹⁰ As we’ll see, this means we probably need to redraw the hypothesis space. Consider the following example from Gil Harman (1976: 433): “A sniper shoots at a soldier from a distance, trying to kill him, knowing that the chances of success are slim . . . . If he succeeds, despite the odds, the sniper kills the soldier intentionally.” In this case the sniper does not expect to kill the soldier and yet if he does kill the soldier, it seems like he intentionally killed him. Joshua Knobe (2003: 313–14) showed that ordinary people do in fact claim that in a similar case the shooter intentionally killed the person. Thus it seems like intentionally produced consequences do not comprise a subset of foreseen consequences.

¹⁰ In our original paper on learning these distinctions, we only considered this simple hypothesis space (Nichols et al. 2016). But I’ve now come to think that the situation is likely more complicated than that.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi



69

7

6

5

Trained on accidental consequences

4

Trained on foreseen consequences Trained on intentional consequences

3

2

1

Intentional test case

Foreseen test case

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Figure 3.7 Results on intentional/foreseen study for judgments of how likely the agent was violating the rule (1 = definitely not breaking the rule; 7 = definitely breaking the rule) with standard error bars

One way to amend the account is to redefine the hypotheses. Instead of defining the alternative to HIntentional in terms of whether the consequence was foreseen simpliciter, we could define a new hypothesis: (HForeseen+) The rule being taught prohibits agents from performing an action they know (foresee) will produce a certain consequence if they achieve their end. This revised hypothesis would preserve the nested structure of the hypothesis space. The smaller hypothesis is that the rule applies to consequences the agent intentionally produces. And the encompassing hypothesis is that the rule applies to foreseen consequences of getting what one is aiming for. With that rendering, the sniper’s killing of the solder does fall within both HIntentional and HForeseen+. If we retain that nested structure then the size principle can operate to drive the inference to HIntentional over HForeseen+. Although this move is technically available and would enable a simple application of the size principle, it depends on defining a new category, and it’s not at all obvious that this is how ordinary people would represent the hypothesis space. A more natural way to capture cases like the sniper is to keep the ordinary notions of intentional and foreseen, and redraw the

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

70

 

boundaries of the hypotheses, such that the set of intentionally produced consequences falls partly outside of the set of foreseen consequences of one’s action (Figure 3.8). Cases like the sniper fall in the space that lies within intentional but outside of foreseen. This alternative approach to the hypothesis space no longer places intentional as a subset of foreseen. That doesn’t exclude using the size principle. Although I’ve presented the size principle in terms of nested hypotheses, the size principle doesn’t actually depend on there being strict subsets. What it does require is differently sized hypotheses. To return to dice, imagine your friend has a ten-sided die with labels 1–10 and a four-sided die with labels 1, 2, 3, and #. In that case, the possible outcomes of the four-sided die do not nest entirely within the possible outcomes of the ten-sided die. For “#” is not on any side of the ten-sided die. Now imagine that your friend rolls one of the dice and you have to guess whether it’s h₁₀ (the ten-sided die) or h# (the four-sided die). The rolls come up 3 2 2. This puts us in the exact situation as before with the size principle—the three rolls are consistent with both hypotheses. Thus we cube the probabilities, which yields 0.0156 as the likelihood given h#, and 0.001 for h₁₀. As in the other cases, we would have strong evidence here that it’s the four-sided die. Similarly then, it might be legitimate to use the size principle to infer that the rule applies narrowly to HIntentional and not to the larger HForeseen. I say that it might be legitimate to use the size principle here but it all depends on the relative size of the hypotheses. The boxes in Figure 3.8 are obviously not to scale. But the relative sizes of HIntentional and HForeseen might be quite wrong as well. And the size principle will only work if HIntentional is actually smaller than HForeseen (i.e., HIntentional À HForeseen < HForeseen À HIntentional). There are long-standing discussions about which of the foreseen

Consequences foreseen by agent

Consequences intentionally brought about by agent Consequences

Figure 3.8 Alternative hypothesis space for Principle of Double Effect Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



71

consequences of one’s actions count as intentional. Harman, for instance, considers a sniper who knows that shooting his gun will have two effects— alerting the enemy and heating up the barrel of his gun. Harman maintains that the sniper intentionally produced the consequence of alerting the enemy, but not the consequence of heating up the barrel of the gun (Harman (1976: 434). These kinds of considerations might be marshaled to argue that HIntentional is smaller than HForeseen. But that kind of counting game is not one I’m inclined to play. Nonetheless, once we redraw the boundaries of the hypotheses as in Figure 3.8, we see that there might be a different source of data available to the learner. First, let’s return to our dice. Obviously if one of the rolls is 7 you have very good evidence for h₁₀ over h#, and if one of the rolls is # you have very good evidence for h# over h₁₀. Now, the same goes for HIntentional and HForeseen. If the learner gets evidence of a violation that occurs in the HIntentional À HForeseen margin, that is strong evidence for HIntentional. That is, if you learn that it is a violation to intentionally produce a consequence that you didn’t expect to achieve, this is good evidence that the rule applies to intentionally produced actions and not to foreseen but unintentional consequences. We have not done any analyses to see whether children ever get evidence that it’s a violation when an agent intentionally produces a consequence despite the odds, but it seems plausible that they do. A child with a BB gun who manages to shoot his brother from a great distance is quite likely to be in a lot of trouble. Thus, even if the child can’t use the size principle to infer HIntentional, she might be able to use evidence in the margin to make the inference that a new rule is restricted to intentionally produced consequences (i.e., HIntentional).

2.3 Unintentional Violations A necessary part of all of the above learning stories for the PDE is that children are rarely exposed to sample violations in which the agent foresaw a consequence that he didn’t intentionally produce. That is, the explanation only works if children rarely see cases in the HForeseen À HIntentional margin. If they do see such cases, then they should infer HForeseen or perhaps the union of HForeseen and HIntentional. Are children exposed to sample violations in which an agent foresees the consequence but doesn’t intentionally produce it? At first blush, it might

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

72

 

seem that they are.¹¹ Imagine Billy hears his mom say that there are cookies in the kitchen. Billy’s only interest is to get to the cookie as fast as possible. As he’s running to the cookies he knowingly steps on his sister’s drawing, ruining it. He probably counts as having broken a rule against wrecking his sister’s stuff, and he is likely to be scolded. The fact that he didn’t have an intention to step on her drawing doesn’t mean that he didn’t break a rule. So is this a case of a violation in which the agent foresees a consequence that he didn’t intentionally produce? This, too, brings us to longstanding discussions about the notion of intentionality (Bratman 1984; Harman 1976). Pioneering experimental work by Knobe (2003) suggests that on the everyday understanding of “intentional,” Billy did intentionally step on his sister’s drawing. Children (including Billy’s sister presumably) would express this by saying that Billy stepped on her drawing on purpose (Leslie et al. 2006). There are a variety of different explanations for how such intentionality judgments come about (e.g., Adams & Steadman 2004; Machery 2008; Nadelhoffer 2006). I’ll work with Knobe’s (2010) own explanation, which is given in terms of whether the agent’s attitude towards the consequence is more positive than the attitude of most people (see also Proft et al. 2019). In Knobe’s famous example, a CEO decides to implement a program solely because it’s profitable, even though he knows it will harm the environment. Most people say that the CEO intentionally harmed the environment (Knobe 2003). Knobe’s explanation for this is that even though the CEO didn’t really want to harm the environment, the fact that he knowingly did so reflects a more positive attitude towards harming the environment than is normal—that’s why people say he intentionally harmed the environment.¹² Similarly, then, in the case of Billy, Billy has a more positive attitude about stepping on his sister’s drawing than is the norm. That’s what makes it intentional. On people’s ordinary notion of intentional, Billy counts as intentionally ruining his sister’s picture. This suggests that the fact that Billy broke the rule doesn’t constitute an exception to the claim that the sample violations that children get are almost entirely composed of intentionally produced

¹¹ Thanks to Fiery Cushman and Josh Greene for (independently) pressing this issue. ¹² These results are part of a pattern that Knobe finds, in which people will say that the CEO intentionally harmed the environment, but not that he intentionally helped the environment in a closely parallel case. There are reasons to think that when people are being reflective, they are less likely to give these asymmetric responses (e.g., Nichols & Ulatowski 2007; Pinillos et al. 2011). But what matters for present purposes is the automatic reaction to a case like the CEO, not the reflective one.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi



73

consequences. To be sure, the set of consequences that count as intentional is now significantly larger on this notion of intentionality. But the learning issues are still in place. First, children have to learn which consequences are implicated in a rule. Obviously some intentional acts aren’t prohibited, like crossing your legs. In addition, children still have to learn whether the rule is restricted to intentional consequences or also applies to unintentional but foreseen consequences. As we saw above, when participants are given evidence that accidental consequences are violations, they think that unintentional but foreseen consequences are also violations. In addition, as we’ll see in more detail later (Chapter 7, Section 5), when people are given only examples in which allowing something to happen is a violation, people tend to infer that intentionally producing the consequence isn’t a violation. So there is still a lot for a learner to learn. Finally, if the notion of intentional that children operate with is as Knobe’s work suggests, then of course that category will be the basis for learning rules against intentionally producing some consequence.¹³

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

¹³ The way that I’m explaining how people make judgments that conform to PDE does suggest that the underlying representations are somewhat different from the philosophical depictions of the PDE. Consider three representative presentations of PDE. First, John Mikhail: the principle holds that an otherwise prohibited action, such as battery or homicide, which has both good and bad effects may be permissible if the prohibited act itself is not directly intended, the good but not the bad effects are directly intended, the good effects outweigh the bad effects, and no morally preferable alternative is available. (Mikhail 2011: 149) Here is Warren Quinn’s characterization: The doctrine . . . is typically put as a set of necessary conditions on morally permissible agency in which a morally questionable bad upshot is foreseen: (a) the intended final end must be good, (b) the intended means to it must be morally acceptable, (c) the foreseen bad upshot must not itself be willed (that is, must not be, in some sense, intended), and (d) the good end must be proportionate to the bad upshot (that is, must be important enough to justify the bad upshot). (Quinn 1993: 175) And here is Suzanne Uniacke’s: The principle holds that under strict conditions it is permissible foreseeably to bring about an effect of a type that it is never permissible to intend. These conditions are: that the act itself . . . be morally good or indifferent; that the bad effect . . . be an unavoidable, unintended effect of the act which also achieves the good effect . . . ; and that the good effect be sufficiently weighty to warrant causing the bad effect. (Uniacke 1998: 120) In all of these characterizations, there are quite distinct roles for intention and proportionality. The bad effect cannot be intended and the good effect must outweigh the bad effect. Switching the trolley to save five people at the cost of one can be permissible because the agent doesn’t intend to kill the one and the proportionality constraint is met. Consider instead someone who switched the trolley to save a flower on the main track, knowing that the trolley will kill a person on the side track. That is obviously impermissible, and the PDE can explain this by appeal to the proportionality clause. Even though the agent doesn’t intend to kill the person on the side track, the bad effect of a person’s death outweighs the good effect of saving a flower. Thus, the agent

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

74

 

Thus the PDE presents a rather complicated situation for the learning theorist. I’ve outlined several possible ways in which a learner might acquire rules that conform to that Principle. It’s not entirely clear which if any of these learning theoretical proposals is correct. But I hope at least to have shown that there are considerable resources available to the learning theorist confronting this challenging case.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

3. Parochial Morality It’s a familiar point from the anthropology of ethics that many moral rules are parochial (see, e.g., Snare 1980: 364). The rules apply to people in certain groups and not to others. Consider rules in which there are victims or “patients.” A standard pattern is that the set of possible moral patients is composed of the ingroup. For instance, Turnbull (1972) reports that the Mbuti regarded it as wrong to steal from each other, but perfectly fine to steal from people in villages. Another familiar anthropological claim is that in many small-scale societies, even the term for people is parochial, including only members of the society itself (e.g., “Yanomami,” “Numunu” (Comanche), “Shuar”), apparently excluding outsiders from the domain of people. The phenomenon of parochial morality is not restricted to foreign societies. Many animal welfare advocates would maintain that U.S. practices of meat consumption reflect the parochial moral view that cows and pigs are excluded from the set of individuals that it’s wrong to kill. If we accept this as

who switches the trolley to save a flower violates the proportionality clause, making the action impermissible. If we adopt a Knobean rendering of intentional, the psychological process involved in generating PDE-like responses are rather different. For proportionality considerations are at least partly fused into the very notion of intentional. People tend to think that in the standard switch case, the agent doesn’t intentionally harm the person on the side track (see, e.g., SinnottArmstrong et al. 2008). However, in the flower-saving scenario, it seems that the agent did intentionally harm the person on the side track. The judgment is not that the flower-saving agent doesn’t intentionally harm the person but there is an independent proportionality clause which renders the action impermissible. Instead, the wild imbalance of values is reflected in the judgment that the flower-saving agent intentionally harmed the person on the side track. Indeed, in a legal context, barring insanity, such a case would be treated as involving mens rea. It shouldn’t be surprising that the philosophical characterization of PDE does not map directly onto the representations and algorithms of commonsense morality. But the basic phenomenon that has been of interest here—that consequences that are wrong to aim at aren’t always wrong to produce knowingly—is reflected in commonsense judgments. And a learning theory might explain this aspect of commonsense morality.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi



75

a case of parochiality, it seems like parochial morality is the norm. Across cultures and history most rules apply to less than everyone. One way to explain the pervasiveness of parochiality is an innate tendency for tribalism. There is surely a disturbing, and early emerging, element of ingroup moral thinking. Infants are sensitive to group markers like language (see, e.g., Spelke & Kinsler 2007). And children give preferential treatment to those who are perceived to be ingroup members (see, e.g., Sparks et al. 2017). But I will argue that part of the phenomenon of parochial morality might be explained in terms of rational learning over the evidence.

Learning Theoretic Account of Parochialism The broader phenomenon that we want to explain is the acquisition of parochial norms. To keep it simple, let’s just assume that the child is deciding between these two hypotheses about a rule he is learning:

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

(HCommunity) community.

The set of potential patients for this rule is the people in my

(HEveryone) The set of potential patients for this rule is all people [where the class of people is Homo sapiens]. For this set of hypotheses, HCommunity counts as a parochial norm and HEveryone as an inclusive norm. Here we once again have a clear subset structure (Figure 3.9). As a result, once again, the size principle might be brought to bear on determining the set of potential patients for a rule. In deciding between the hypotheses above, the learner would be deciding between a narrow and a wide range of potential patients for the rule. Now, if all of the sample violations fall in the narrow range (and the prior probabilities are the same), the size principle indicates that the narrow-range hypothesis is more probable than the wide-range hypothesis. Of course what constitutes a community will vary. And the set of patients in the wide-range hypothesis might be smaller than all people. So for instance, in some cases, the question might be whether the range of patients is the people in my church or the people in my town. The wide range might also be larger, as when I’m trying to figure out whether the set of potential patients includes all sentient beings. Also, in some cases of rule learning,

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

76

 

People in my community All people

Figure 3.9 Set of potential patients for a rule

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

I might try to figure out rules for communities that I’m not part of—I might be trying to determine whether the range of patients is people in their community or includes people in other communities as well. In any of these cases, though, so long as one range is smaller than the other, the size principle can be an appropriate means of making inferences. In addition, there is a question about the range of the agents in rules. Often both ranges will need to be determined, and the size principle can help with both. Imagine a child showing up at day care, which is full of strangers. She is trying to determine the proper form of address. The distinction between teachers and students is plausibly quite salient to her for learning rules. But this still leaves a wide array of options, including: Students can’t call teachers by their first name. Students can’t call students by their first name. Students can’t call anyone (at the day care) by their first name. No one (at the day care) can call teachers by their first name. No one (at the day care) can call students by their first name. No one (at the day care) can call anyone (at the day care) by their first name. The new student notices that one boy called the teacher “Sally,” and another child says, “No. Her name is Ms. Sally.” She later sees a different student refer to the teacher as “Sally” and this student is told “You can’t call her that.” However, when students talk about other students, they often use just the first name, and are never corrected for this behavior. This pattern of corrections (both the corrections that the child does see and those that are not observed) suggests that the range of possible patients is restricted to teachers. And, as is typical with the size principle, the more data consistent with this interpretation, the stronger the evidence in its favor. At the same time that this evidence about patients is rolling in, the student is also getting

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



77

evidence about which agents fall under the prohibition. She notices that teachers call students by their first name without ever being corrected, and teachers call other teachers by their first name without correction. Thus, again, the absence of corrections can be recruited as evidence that the range of agents to whom the rule applies does not extend to everyone at the day care but is restricted to students. Once again, the size principle can be used to shore up this inference. In short, the pattern of corrections suggests that the rule is students can’t call teachers by their first name. This yields a rational analysis for how parochial norms can be learned.¹⁴ If the child is deciding between a parochial and a more inclusive hypothesis, and if all the evidence is consistent with the parochial hypothesis, this provides some reason to think that the parochial hypothesis is more probable than the inclusive one. The inference here partly depends on the child having sufficient opportunities to observe evidence against the parochial hypothesis. In recent work, we have been exploring whether people make these kinds of inferences in ways that are sensitive to the evidence (Partington et al. 2020). The basic structure of the task had participants observe an animated clip of creatures on a distant island, some yellow and some blue. Participants were told of one color that these creatures are called Glerks and the others were called Hibbles. We expected that this group differentiation would suffice to enable participants to create a hypothesis space in which the rules applied either to one group (the parochial hypothesis) or to both groups (the inclusive hypothesis). In the display, several of the creatures in both groups were wearing ribbons. Participants were told that they were to try to learn a rule on the island on the basis of several sample violations. At this point four of the individuals were identified as violating the rule, and each of them was wearing a ribbon (see Figure 3.10). When all four of these violators were from the same group, participants were less likely to think that the rule also applied to members of the other group. That is, participants tended to infer that the rule is parochial. Thus, participants’ responses conformed to the prediction generated by the size principle. When given

¹⁴ The size principle might also be able to explain the acquisition of what seem to be parochial concepts. How would it come to be that the notion Yanomami means people but is restricted to what we think of as a proper subset of people? Historically, for a relatively isolated group like the Yanomami, it might be that for generations of learners, all the observed samples labelled “Yanomami” were in fact Yanomami. In that case, it might be reasonable to be more confident that Yanomami were people than that anyone else was.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

78

 

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

* *

*

*

*

*

*

*

* *

* *

*

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Figure 3.10 Schematic depiction of display for parochial norms study. Squares and triangles correspond to yellow and blue creatures; *depicts ribbon

evidence consistent with the narrower hypothesis of parochialism, people were more inclined to make that inference. Although this result fits with our prediction, it’s possible that participants were not really being sensitive to statistical information. They might simply have been engaged in a kind of perceptual matching to arrive at their generalization. So we included a stronger test of statistical sensitivity by varying the proportions of the groups in the population. In the previous condition, 50 percent of the individuals were Glerks and 50 percent Hibbles. In another condition we varied this so that 20 percent of the individuals were Glerks and 80 percent were Hibbles (modeled on Kushnir et al. 2010). In this case, if all of the violations involved Glerks (as in Figure 3.11), this would provide stronger evidence in favor of a parochial norm. It is a somewhat suspicious coincidence if all the sampled violations involve a group that comprises 50 percent of the population, but it is an even more suspicious coincidence if all the sampled violations involve a group that comprises only 20 percent of the population. We found that our participants were sensitive to this difference. Compared to the 50 percent condition, they were even more likely to infer a parochial norm when the violations all involved the minority group. Thus, participants are appropriately sensitive to statistical evidence in making inferences about whether a norm is parochial or inclusive. With this

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi



*

*

*

*

*

*

*

*

* *

*

*

*

*

*

*

* *

* *

*

* *

*

*

*

*

*

*

*

*

79

* *

* *

*

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Figure 3.11 Schematic depiction of display for parochial norms study, 20 percent condition

framework in place, there is a natural story to be told about learning parochial morality. If in the sample violations I receive, the patients are always members of my community, then I have lots of evidence that members of my group are in the range of patients. To the extent that I also had an opportunity to see cases of violations involving patients outside of my community, the lack of such evidence provides evidence in favor of parochialism—i.e., that members outside my community are not in the range of possible patients. This is the core idea of how sampling might naturally lead to parochial inferences.¹⁵ There are, however, some important complicating factors. One prominent issue is the constitution of groups. This bears directly on the construction of the hypothesis space. What matters is which groups will count as significant for formulating hypotheses about the ranges of agents and patients. Our experiments are artificial, allowing us to make the groups perceptually obvious and focal. However, in the real world, for purposes of formulating hypotheses about the range of agents and patients in rules, some groupings are likely quite natural, including adults, children, teachers, ¹⁵ Indeed, in our initial learning study on ribbon wearing, we found a bias in favor of expecting the rules to be inclusive rather than parochial (Partington et al. 2020).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

80

 

students, and experts. These groups have such manifestly distinct social roles that it would make sense that learners would have these groups readily available for hypothesis formation. In addition, where there are clearly distinct communities, this would be a further natural grouping for purposes of learning the range of agents and patients. If I learn a set of rules in church, I should be alive to the possibility that the range of agents or patients is restricted to others in the church. If I live in a city with segregated cultural communities, I might wonder whether the rules I learned in my community also range over people in the other communities. This tentativeness about generalizing outside of my community would presumably be amplified if my community is in competition with other communities. A second factor in making inferences about range concerns the potential for observing evidence that would count in favor of more inclusive hypotheses. For instance, if I am able to observe teachers talking about each other, then I am in a position to get evidence of whether it’s a violation for them to use first names. By contrast, if I never have the opportunity to observe my elders interact with outsiders, then it will be harder for me to make inferences. Thus, we have the beginnings of a rational learning story for parochialism. Insofar as all the evidence I have that it’s wrong to steal concerns people in my community, when I consider outsiders, I should be less sure that it’s wrong to steal from them. This explanation of parochial morality is not just a simple appeal to some innate ingroupism. Ingroupism might play a role in how we carve up social categories, but there is also a key role for statistical learning. Parochial morality emerges in part, I suggest, because of characteristics of the sample to which moral learners are exposed. That suggests that parochialism about norms is not an inevitable result of an innate tribalism. Indeed, a statistical learning account of parochial morality is consistent with thinking that it’s possible to move people to more inclusive moralities by giving them different evidence.

Conclusion In this chapter, I’ve argued that statistical learning might explain the acquisition of subtle features of rules. In particular, an application of the size principle to sample violations can explain how children might learn the act/ allow distinction for rules. Moreover, the learning that would be involved here is plausibly rational in a robust sense. If the child is deciding whether a rule prohibits acting or both acting and allowing, and all of the evidence is

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi



81

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

consistent with the rule being an act-based rule, then the size principle implies that it’s rational to infer that the rule is solely act-based. A rational agent in such a situation should infer that the rule only prohibits acting in the way proscribed. The size principle also promises to explain the acquisition of other elements of the scope of rules. For instance, rules are typically parochial, applying to some subset of the population. Although this kind of parochialism is sometimes morally disturbing, if all the evidence the child gets is consistent with a parochial interpretation, that is, one in which the only moral patients are those in one’s community, the size principle might imply that the rational inference to make is indeed that the rule is parochial.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

4 Priors

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

In the previous chapter, we saw that the size principle can be used to drive the inference that a new rule is act-based. However, we also saw that even with very little evidence, people tend to suppose that a new rule is act-based. Indeed, as I will argue presently, people are inclined to infer that a novel rule is act-based even when this goes beyond what their first-order evidence supports. This might suggest that there is an innate presumption that rules are act-based. Here I propose a more empiricist-friendly alternative. Although people do expect a new rule to be act-based, this expectation might be explained as a prior that is itself the result of earlier learning. In particular, I’ll argue that the expectation for act-based rules can be explained in terms of statistical learning through the process of overhypothesis construction. This is a powerful tool for inference in many domains, and it can plausibly be extended to the domain of moral learning.

1. Learning Theoretic Account of Prior for Act-Based Rules 1.1 Acquirendum As we saw in Chapter 3, people can learn either act-based or consequencebased rules. When given three sample violations in which an agent produced a consequence, people infer that the rule is restricted to actions. When given sample violations in which an agent allowed a consequence, participants infer that the rule is not restricted to actions but applies to consequences more generally. So we know that people can learn either kind of rule. However, when given a single sample violation in which an agent produced the consequence, people think the rule is act-based. For example, if told that the following is a violation: “Mike puts a block onto the shelf,” people tend to think that it’s a violation when another agent puts something on the shelf, but not when another agent leaves something on the shelf. It seems like with

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



83

a single sample, one should be unsure about whether the rule is act- or consequence-based. But participants jump to the conclusion that the rule is act-based, suggesting that there is some preference for act-based rules that isn’t a mere function of the data on sample violations of that rule. When learning a new rule, participants tend to expect that the rule will be act-based rather than consequence-based. There seems to be a tendency to infer that a rule is act-based that goes beyond the first-order evidence. We can call that tendency a bias, while remaining neutral about the particular explanation for the bias. It might be that the bias is an innate presupposition. Or it might be that the bias is a prior that is (1) learned from previous experiences and (2) subject to updating based on new evidence.¹ Although the fact that people infer an act-based rule after only a single sample violation suggests a bias for act-based rules, this evidence isn’t decisive. Whether one can make a reasonable inference from a single sample depends on the likelihoods of the alternative hypotheses. To see this, imagine two dice, a four-sided one and a 1000-sided one. As before, your friend picks one of them at random and rolls it. If the die comes up 3, that is strong evidence for H₄. Indeed, the likelihood of a 3 is 250 times greater for H₄ than for H₁₀₀₀. Similarly, then, it might be that the likelihood of getting an act-based example is actually very low if the rule is consequence-based. This would be the case if allowings are much more common than acts. Just as a 3 is much more likely on H₄ than H₁₀₀₀, it might be that an act-based violation is much more likely on HAct than on HConsequence. In that case, getting a single instance of an act-based violation might be enough to warrant the inference that the rule is act-based. To see whether there really is a preference for act-based rules, we tried another tack (Millhouse et al. 2018). We told participants that they were trying to guess the content of a rule in a school in a foreign culture. They were told that at least one of the following is a violation of the rule: Nick puts a ball on the shelf. Claude sees a ball on the shelf and leaves it there.

¹ Of course, this way of putting it is natural for a Bayesian approach. On that kind of an approach to probabilistic inference, one uses the prior probability of the hypothesis and the likelihood of the data given the hypothesis to calculate the posterior probability of the hypothesis, and that posterior probability then becomes the updated prior probability for future calculations.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

84

 

Then they had to guess which of these is more likely to be a violation of the rule. This design doesn’t depend on sampling, so it doesn’t fall prey to the limitation in the study discussed above. If there is a preference for act-based rules, then participants should be more likely to think that the action (Nick puts a ball on the shelf) is a violation than the allowing (Claude sees a ball on the shelf and leaves it there). This is exactly what we found. In addition, when asked to give separate ratings for how likely it is that Nick violated the rule (by acting) and how likely it is that Claude violated the rule (by allowing), people rated it more likely that Nick violated the rule (unpublished data). This suggests that people do indeed have an expectation that a novel rule is more likely to be an act-based rule than a consequence-based rule. Our acquirendum, then, is the bias for act-based rules.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

1.2 Hypotheses Where does this bias for act-based rules come from? Alisabeth Ayars and I suggested that people learn a second-order generalization about rules, an “overhypothesis,” according to which rules tend to be act-based (Ayars & Nichols 2017). Such an overhypothesis would then lead people to expect that a new rule will also be act-based. To explicate the learning theoretic account, let’s suppose that the candidate hypotheses are as follows: (HAct-Overhyp) Overhypothesis that rules tend to be act-based. (HConsequence-Overhyp) Overhypothesis that rules tend to be consequencebased. (HNo-Overhyp) No overhypothesis about whether rules tend to be act- or consequence-based.

1.3 Statistical Principle: Overhypothesis Formation To see how overhypothesis formation works, imagine you are presented with a bag of marbles and you are trying to predict the color of the next marble that will be randomly drawn from the bag. You first draw a green marble. What will the color of the next marble be? Without any additional information about the color distribution of the marbles in the bag, there is little way to predict the color of the next marble. But now suppose that

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



85

before drawing from the bag, you randomly draw three marbles from each of three other bags. In the first bag, all three marbles are blue. In the second, all three marbles are red. In the third, all three marbles are yellow. Now you draw a single marble from a fourth bag and find that it is green. What will the color of the next marble from that fourth bag be? In this case you can reasonably infer that the next marble will also be green. The fact that the samples drawn from each bag were uniform in color suggests a higher-order hypothesis about the contents of the bags, viz., within bags, marbles are uniform in color. This illustration is based on an example from Nelson Goodman (1955), who coined the term “overhypothesis” for these kinds of higher-level hypotheses. Overhypothesis formation is a statistically appropriate procedure if one has reason to believe that the samples used for the higher-level category are representative of the relevant population. The idea here parallels the situation for drawing inferences from samples to populations in simple cases. If you randomly sample three balls from an urn, then you have reason to think that those samples are representative; if all three of the samples are white, this licenses the inference that most of the balls in the urn are white. Now imagine the case of the bags of marbles. If you’ve randomly sampled three bags of marbles from a population of bags of marbles, then you have reason to think that those samples are representative; if each of the three sampled bags has uniformly colored marbles, this licenses the overhypothesis that most of the bags in the population have uniformly colored marbles. Thus, if you see a new bag and draw a single marble that is green, you can rely on the overhypothesis to infer that the other marbles in the bag are probably green as well. Adults do seem to form overhypotheses about stable regularities, which then guide their inferences from small samples. In an early study, Nisbett and colleagues presented undergraduate participants with the following vignette: Imagine that you are an explorer who has landed on a little known island in the Southeastern Pacific. You encounter several new animals, people, and objects. You observe the properties of your “samples” and you need to make guesses about how common these properties would be in other animals, people or objects of the same type . . .

Participants were then asked several questions including these two: “Suppose you encounter a native, who is a member of a tribe he calls the Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

86

 

Barratos. He is obese. What percent of the male Barratos do you expect to be obese?” and “Suppose the Barratos man is brown in color. What percent of male Barratos do you expect to be brown (as opposed to red, yellow, black or white)?” (Nisbett et al. 1983: 348). Participants tended to think that a much higher percentage of Barratos would have brown skin color than would be obese. This is presumably because people have formed an overhypothesis that skin color tends to be the same within tribes, but no correspondingly strong overhypothesis about obesity within tribes. The work from Nisbett and colleagues suggests that adults deploy overhypotheses regarding the distributions of traits in subpopulations. Perhaps more importantly, there is evidence that children learn overhypotheses. Early research in word-learning demonstrated a “shape-bias”—children tend to think nouns will refer to classes of similarly shaped (rather than similarly colored) objects (Heibeck & Markman 1987). This bias is in place by the age of 3 (see, e.g., Bloom 2000). One explanation for the shape bias is that children have an innate disposition to expect nouns to refer to objects of the same shape. Another possibility, though, is that children form an overhypothesis. For most of the nouns children learn, the extension of the noun includes objects of the same shape, and so, the child might infer that nouns tend to refer to objects that have the same shape (Smith et al. 2002). In an important study, Linda Smith and colleagues brought 17-month-old children into the lab weekly for seven weeks. During these sessions, they taught the children several novel words, each of which was associated with objects that had the same shape (but differed in size, texture, and color). After this exposure, the children tended to expect a new noun to refer to objects with the same shape, and this was not the case for a control group of children (Smith et al. 2002: 16). It seems that the children exposed to the association between nouns and shape-similarity formed an overhypothesis that nouns tend to pick out objects with the same shape.² More recently, Dewar and Xu (2010) adapted Goodman’s marbles example for an infancy study. The infant was seated before four opaque boxes. The experimenter pulled four objects out of each of the first three boxes. In one condition, all of the items in each box were uniform in shape (four balls; four cubes; four pyramids) but different in color. The experimenter then took one item from the fourth box, a star. The second item was either another star or a ball. Infants showed longer looking times when the ² Kemp and colleagues (2007) showed that such an overhypothesis can be learned by a hierarchical Bayesian model.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



87

two objects were different shapes (a star and a ball) than when they were the same shape, suggesting that the infant expected the objects in each box to be uniform in shape.³ Dewar and Xu then did the same task, but flipping color and shape, so that the objects in each of the first three boxes were the same color but different shape. In this case, the infants expected the objects in the fourth box to be uniform in color. This indicates that babies can learn an overhypothesis regarding shape or an overhypothesis regarding color, and they can do so very quickly. We can extend the idea of overhypotheses to the normative domain (Ayars & Nichols 2017). One can think of overhypotheses about rules as analogous to the overhypothesis about the uniformity of marble color within bags. In the case of the marbles, the overhypothesis is that bags of marbles (in the relevant population of bags) tend to have marbles of the same color. In the case of rules, the overhypothesis might be that rules (in the relevant population of rules) tend to be act-based. This overhypothesis about rules could constrain inferences about a new rule (from the same population of rules) just as the overhypothesis about bags of marbles constrains inferences about a new bag of marbles (from the same population of bags). Moreover, just as we can learn the overhypothesis about bags of marbles based on the bags of marbles we’ve seen so far, we can form an overhypothesis about rules based on the rules we’ve seen so far. If most rules that we’ve been exposed to in our community are act-based, this provides grounds for the overhypothesis that most rules in our community are act-based.

1.4 Evidence Available to the Learner There hasn’t been a direct measure of the relative proportion of act-based rules in the population of rules to which children are exposed. Nonetheless, it seems likely, just from reflecting on our lives, that most of the rules we have are act-based. Our rules against littering, for example, prohibit littering but do not require that we pick up others’ litter. Our rules against stealing don’t require us to take measures to minimize stealing. Our rules against promise breaking don’t require us to ensure that others are able to keep their promises. In addition to this armchair reflection, the evidence mentioned in

³ In a control condition, the infants saw the same display—a star, then a ball—but the ball was taken from the first box. In that condition, infants showed no difference between seeing two stars and seeing a star and a ball.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

88

 

Chapter 3 provides some reason to think that most rules children learn are act-based. For, as noted there, in the CHILDES corpus there was almost no evidence that would lead children to infer a consequence-based rule. Most of the rules that they are taught are act-based rules. This is still a very limited empirical basis for the claim that most of the rules that people know are actbased. But there is certainly no evidence to suggest instead that there are lots of consequence-based rules.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

1.5 Sensitivity to the Evidence We now have an analysis of how it would be rational for a learner to infer the overhypothesis that most rules are act-based rules. The analysis is that people should acquire an overhypothesis that rules tend to be act-based based on registering the fact that most rules they know are act-based. This overhypothesis should then operate as a prior on inferences about new rules, leading people to expect that a new rule will be act-based. But do people really form these kinds overhypotheses about rules, based on previous exposure to rules? Alisabeth Ayars and I conducted several experiments that indicate that people do form these kinds of over hypotheses (Ayars & Nichols 2017). The overhypothesis studies built on the studies reported in Chapter 3, Section 1.5. As before, the participant’s task was to learn what a foreign rule prohibited, given sample rule violations. We based our first study on Goodman’s marble example. In Goodman’s example, the learner gets three samples of marbles from three bags and then makes an inference about a fourth bag. We gave the learner three samples of violations from three rules and then asked the learner to make an inference about a fourth rule. The details of the study are a bit tedious, but the upshot is that when the learners were given three act-based rules, they expected the fourth rule to be actbased as well, but when they were given three consequence-based rules, they were less likely to think that the next rule would be act-based.⁴ This suggests ⁴ Here are some of the tedious details: The rules concerned either a playground, a chalkboard, or a park. In the narrow scope condition, for each rule, participants were given samples consistent with a narrow scope (act-based) interpretation (e.g., “Ryan takes fingerpaints onto the playground”). In the wide scope condition, for each rule participants were given one example consistent with the narrow scope and two examples inconsistent with the narrow scope (e.g., “Megan sees a pencil on the slide in the playground and doesn’t take it inside”). Based on our earlier experiments on rule learning (see Chapter 3, Section 1.5), we knew that participants in the narrow scope condition would infer act-based rules and participants in the

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



89

that people can use evidence to learn overhypotheses about rules since the participants seemed to have moved to the overhypothesis that rules (in the context of the experiment) tend to be consequence-based. In addition, the results suggest that the prior for act-based rules can be updated in light of evidence. In a second study, we developed a stronger test for the hypothesis. If people are using evidence concerning known rules to form overhypotheses about rules, then the number of rules that they see should affect their inferences. Consider again Goodman’s bags of marbles. Suppose that you’re trying to determine whether bags tend to have marbles of different sizes. In one case you get to draw nine marbles from a single bag, and the draw results in three large marbles and six small marbles. In the other case, you get to draw three marbles each from three different bags, and in each case the draw consists of one large marble and two small marbles. In the three-bag case, you have stronger evidence in favor of the overhypothesis that bags contain marbles are different sizes. Similarly, if you are given three rules, each with three mixed violations (some act, some allow), that gives more evidence in favor of an overhypothesis favoring consequencebased rules than if you are given one rule with nine mixed (some act, some allow) violations. We investigated this by giving participants the exact same sample violations but in one case they were given all nine violations as evidence for a single rule, and in the other condition they were given three violations for each of three different rules. In both cases, the rules that were learned were consequence-based since the samples were always mixed. In the three-rule condition, for one of these rules, all the violations involved art-supplies on the playground; for another rule, all the violations wide scope condition would infer consequence-based rules. The key question was how they would generalize to a new rule. For the new rule they were given two sample violations, both of which were consistent with narrow scope (e.g., “Mike puts a block on the shelf”). They were then asked to indicate which other items counted as violations. We were interested in whether they would treat the new rule as a consequence-based rule, as indicated by generalizing to the items that were allowings (e.g., “David enters the room, sees a puzzle on the shelf and leaves it there”). Our earlier study showed that participants infer an act-based rule when given a single sample violation that was an action, so we knew that participants in the narrow scope condition would be very unlikely to regard the new rule as consequence-based. Our overhypothesis-driven prediction was that participants in the wide scope condition, who had learned three consequence-based rules, would be more likely to think that this new rule is consequencebased and hence generalize to allowings. This is exactly what we found. After learning three consequence-based rules, participants were more likely to think that a new rule was also consequence-based; that is, they were more likely to think it was a violation for David to leave the puzzle on the shelf.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

90

 

Ryan takes finger-paints onto the playground. Ashley sees some markers lef on the playground and leaves them there. Nicholas sees some chalk next to the tree in the playground and leaves it there. Shannon sees a baseball mitt next to the slide in the playground and leaves it there. Brian takes a ham sandwich onto the playground. Maria sees an apple on the ground in the playground and does not take it inside. Carmen observes a bag of raisons in the sandbox in the playground and leaves it there. Brian takes a soccer ball onto the playground. Maria sees a jump-rope in the playground and does not take it inside.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Figure 4.1 Complete list of violations for overhypothesis study. In the one-rule condition, all were presented as samples of violations of a rule; in three-rule condition, each group of three was presented as samples of violations of a rule

involved sporting equipment on the playground, and for the third rule, all of the violations involved food on the playground (Figure 4.1). After being presented with each set of sample violations, participants were asked to articulate what the rule said. The answers indicated that participants did learn the rules we expected. For instance, in the sporting equipment case, a characteristic articulation was that the rule said one should not “take or leave exercise equipment on to the playground.” In the one-rule condition, we just combined all of the violations, which are naturally clustered under playground. As expected, participants again articulated the rule appropriately. A characteristic articulation was that the rule said one should not “take anything to the playground, or leave anything you find on the playground there.” The question of interest is how they generalize to a new rule. Thus, after participants articulated the rule(s), we presented them with a new rule and gave them two sample violations; both of the sample violations were consistent with an act-based interpretation of the rule (e.g., “Sarah picks up a stapler and puts it onto the shelf”). We then asked the participants to judge whether the person is violating this rule in an act-based example (“Amy moves a pencil case to the shelf”) and in an allow-based example (“David enters the room, sees a tape dispenser on the shelf and leaves it there”). Not surprisingly, when it came to the act-based example, it made no difference whether the participants had learned one or three consequence-based rules. However, for the allow-based example, participants who had learned three consequence-based rules were more likely to think the person who allowed the consequence violated the rule. That is, those who learned three consequence-based rules were more likely to think the new rule was also

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi



91

consequence-based, even though they saw exactly the same examples as those in the one-rule condition (Ayars & Nichols 2017: 19). In a final study, we took a more indirect approach. We first had participants read four laws, with the ostensive purpose of getting their opinions about the laws. But the real point was to vary the kinds of laws they saw. In one condition, the laws were all strict liability laws, that is, laws that assert legal responsibility for a consequence even if the agent’s role in producing the consequence was entirely unintentional. For instance, here is one strict liability law, for timber trespass, we used in the study: If a person takes wood from someone else’s property, this is a violation of the law against timber trespass. It doesn’t matter whether the person knows that they were on someone else’s property. For example, if person A thinks he is taking wood from his own property but is actually on the property of person B, person A has broken the law and is legally responsible.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

In the other condition, the laws were closely matched except that they did not include any statement of strict liability. The matched case for the timber trespass law was: If a person takes wood from someone else’s property, this is a violation of the law against timber trespass. For example, if person A knows that he is on person B’s property and takes wood from that property, person A has broken the law and is legally responsible. After participants responded to some filler questions about the laws, they were given a novel rule-learning task. They received two sample violations for this new rule: Nick tosses an empty gum wrapper on the ground when he is walking in a park. While she is in a park, Pam puts an empty can of soda on the ground and leaves it there. They were then asked to generalize to other cases. The key case of interest was an instance of unintentionally dropping trash: Mark is taking a walk with his friend in a park. There is a small hole in Mark’s backpack. A bottle cap falls out without Mark noticing. Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

92

 

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

We found that people who were exposed to strict liability laws were more likely to say that Mark violated the rule in this case. This again suggests that people abstract general features of rules to which they’re exposed and then rely on that information in their inferences about new rules. This completes the learning-theoretic story for people’s bias in favor of act-based rules. We began with the observation that people have such a bias. That is, when presented with a new rule, people tend to expect that it will be act-based rather than consequence-based. The rational analysis of this inference holds that if a learner is exposed primarily to act-based rules, then she should form an overhypothesis to the effect that rules tend to be act-based. The rationality of this inference follows from general considerations about sampling. As we saw, the inferential process involved in Goodman’s marble example is evidentially rational on the assumption that the bags that were sampled are representative of the population of bags under consideration; similarly for overhypothesis in the normative domain. If a learner receives a representative sample of rules, then if most of those rules are act-based, this is prima facie good evidence that the other rules in the population of rules are also act-based; just as it’s rational to infer the structure of a particular rule from sample violations, so too it’s rational to infer priors about the structure of rules in general from samples of rules. The next question concerns the evidence that learners actually get about rules, and here, it seems very likely that most of the rules that people (at least in our culture) learn are in fact act-based rules. The last piece of the account is experimental. Are people sensitive to evidence in the relevant way? Our study suggests that they are. In our experiments, participants abstracted from the characteristics of the rules to which they been exposed to draw inferences about characteristics of rules more generally. In particular, when participants are exposed to rules with a certain kind of scope (e.g., whether the rule applies just to actions or to consequences more broadly), they seem to extract this regularity about scope and apply it to subsequent learning problems. Thus, we have a plausible explanation for the acquisition of the inclination to think new rules will be act-based. It is the result of learning a prior through overhypothesis formation. The story I’ve suggested here is a rational learning story at the computational level. But again, the statistical principle that is operative here is quite simple. It’s an application of a familiar principle of transitioning from characteristics of a representative sample to characteristics of the population. Although our experiments don’t expressly examine the algorithm that people are using, it is hardly

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi



93

outrageous to think that people are in fact using an algorithm from samples to populations that enjoys at least some degree of rational propriety.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

2. Overhypotheses and Moral Psychology I’ve drawn on the statistical procedure of overhypothesis formation to argue that the bias in favor of act-based rules can be explained as a prior that is learned from previous experience with rules. It is, of course, important for an empiricist to have an explanation for the acquisition of the bias for actbased rules. But it’s worth noting that the explanatory potential of overhypothesis formation for moral psychology is much greater. I focused on one aspect of the structure of rules, that they are act-based. But in Chapter 3, we saw other aspects of the structure of rules. And the basic overhypothesis formation schema naturally extends to these other aspects of rule structure. For instance, the same kind of process might generate an expectation that rules will conform to the Principle of Double Effect. This is a sensible inference if most rules one learns target what an agent intentionally produces, as opposed to what the agent foresees as a consequence of his action. One might also generate overhypotheses about moral patiency. If most rules I learn only count members of my community as potential moral patients, then I might infer that this is a typical feature of rules. That is, I might infer that moral rules are characteristically parochial about patients, and when I am presented with a new rule, my inferences will be shaped by this prior. In the next two chapters, I will look at quite different aspects of our moral system. In Chapter 5, I consider the acquisition of “closure rules,” which address actions for which there has been no explicit teaching. One closure rule holds that whatever isn’t expressly prohibited is permitted; another closure rule holds the opposite—whatever isn’t expressly permitted is prohibited. As we’ll see, people can easily learn such closure rules. This is obviously a very different feature of our moral systems than the structure of rules, but overhypotheses might again play a role. If most normative systems I learn are characterized by a certain closure rule (e.g., whatever isn’t expressly prohibited is permitted), then I might form a prior guiding subsequent inference about closure in a new normative system. I might expect that the same closure principle applies to the new normative system. In Chapter 6, I will turn to questions in folk metaethics. There, we’ll see that people make inferences about whether an evaluative statement is universally

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

94

 

true or only true relative to some population. Once again, overhypotheses can play a key role. If most claims of taste that I encounter are relativized to a subpopulation, then I might form a prior that informs my thinking about the meta-evaluative status of new claims of taste I encounter. And if most rules about harm that I encounter are not relativized to a subpopulation, then I might form a prior that harm-based rules hold universally, which can guide my expectations about new rules concerning potentially harmful actions.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Conclusion The formation of overhypotheses is poised to explain the acquisition of priors in normative systems. Our studies were focused on learning an overhypothesis about the scope of rules—whether the rule is act- or consequence-based. If the act-bias is acquired as an overhypothesis, this means of course that it is a flexible prior rather than an innate constraint. Even if the act-bias is innate, the experiments indicate that the bias is easy enough to overturn. But given the availability of an overhypothesis-based explanation for the act bias, we have less reason to think that the act bias is the product of an innate constraint. Although our studies were focused on the prior for act-based rules, this is just one application of the idea of how overhypotheses might shape our moral minds. I expect that studies on overhypotheses concerning other features of moral systems would yield similar results. Overhypotheses are a strikingly powerful tool for forming priors across domains from word learning to the distribution of traits to rule learning. It’s likely that moral cognition is suffused with priors, and a moral empiricist can invoke overhypothesis formation as a way to explain how these priors might be learned and changed.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

5 Closure

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

When children learn norms they do so in pieces. They learn that it’s wrong to hit the dog, it’s wrong to track mud onto the carpet, it’s wrong to lick your plate. But what is the child to think about all the kinds of possible activities that have never been discussed? Is it okay to ask the dog questions? Is it okay to wiggle your toes in your shoes? Is it okay to eat your peas before your potatoes? On these, and countless other matters, the child has likely never been given specific instruction. Given the lack of instruction, what should the child think about these other matters? Should she assume that these unmentioned kinds of actions are also prohibited? Or that they are permitted? Should she reserve judgment entirely? Actions that have never been mentioned seem to pose a quandary for the learner, who is trying to figure out how to have fun without getting in trouble. There will be many kinds of actions that have never been mentioned in the training set for a learner. When opportunities arise for those kinds of unmentioned actions, what will she think? It’s possible that she will remain neutral. But this neutrality is costly (see, e.g., Bicchieri 2006: 42–54; Gaus 2011: ch. 3). If she performs such an unmentioned action and it is prohibited, then she might be subject to punishment; however, if she refrains from unmentioned actions, this might be to forego a wide range of valuable opportunities. Thus, it is in the interests of the learner to make reasonable inferences about whether a new action is prohibited or permitted. The status of unmentioned actions also occupies an important place in moral and political theorizing. According to the principle of liberty, a person is free to do anything that hasn’t been expressly forbidden. This liberal view is shared by a diverse range of political thinkers including Mill (1989), Rawls (2001), and Nozick (1974). Such a principle of liberty is morally substantive, as reflected by the fact that some moral and political philosophers challenge the principle (e.g., Forst 2005; Hillinger & Lapham 1971). The principle of liberty is a kind of a “closure principle,” a principle that proposes to specify the permissibility of the kinds of actions that haven’t been

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

96

 

explicitly addressed in the theory (e.g., Raz 1970; Stone 1964). Mikhail sets out two options for closure principles:

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

(i) a Residual Prohibition Principle, which assumes that all permissible acts and omissions are defined and states that “whatever is not legally permitted is prohibited,” and (ii) a Residual Permission Principle, which assumes that all forbidden acts and omissions are defined and states that “whatever is not legally prohibited is permitted.” (Mikhail 2011: 132)

By adopting something along the lines of one of these principles, one might be in a position to infer whether some new action is permitted or prohibited, which would make for a tidy solution to the learner’s quandary. Do people exploit a closure principle? Mikhail maintains that they do, and that the closure principle that people exploit is one that resonates with the liberal tradition in moral and political philosophy. Mikhail labels the Residual Permission Principle a Principle of Natural Liberty and he proposes that such a principle is part of our ordinary moral competence. But how would children acquire such a principle? Presumably few children are told that whatever isn’t expressly allowed is permitted. Mikhail suggests that the principle might be part of an innate moral grammar (Mikhail 2011: 124, 133). For those who would resist such a nativist proposal, one might simply deny that our ordinary moral competence includes a Principle of Natural Liberty or any other closure principle. But in that case we need some other explanation for how people deal with new action types. One possible explanation for how people deal with new action types is a similaritybased account. That is, when faced with some new action type, perhaps people simply reflect on related action types for which they do have explicit evidence. So, for instance, if I have wondering whether it’s okay to clip my toenails at the kitchen table, I might recall being forbidden from clipping my fingernails at the kitchen table and infer that clipping my toenails is relevantly similar. Appealing to relevantly similar actions might inform predictions about the permissibility of new actions, but it’s far from clear that this is a sufficiently general solution to the problem. Jerry Gaus and I argue that people likely do exploit closure principles, and that such principles are eminently learnable (Gaus & Nichols 2017; Nichols & Gaus 2018). The Principle of Natural Liberty is associated with a specifically moral closure principle. But it’s possible that closure principles are operative even outside

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



97

of the moral domain. Consider a game like chess. There is a set of explicit rules about how pieces might be moved. But nothing is said about whether a rook can be turned into a wizard by uttering “abracadabra.” Similarly, no rule explicitly prohibits adding a golf ball mid-way through the game. Despite the absence of such explicit rules, obviously these are not allowable actions in chess. The fact that such actions aren’t explicitly allowed seems to mean that they are forbidden. Contrast this with the game children play with a balloon where the object is not to let the balloon hit the ground. The only rule is not to hold the balloon. There is no rule that says that using the knee is okay, or that using the elbow is okay, or that climbing on the couch is okay. But all of these are allowed in the game. This game seems to be characterized by Residual Permission. There seem to be closure principles outside of the domain of games too. Consider the norms of appropriate behavior in a Catholic Mass. One may listen to the service, sing with the congregation, and participate in the litany. What about others things? Is it okay to listen to the radio on headphones? No. Is it okay to sing a different song than others? Or at a different time than others? No. Is it okay to hop up and down? Absolutely not! Here it seems that whatever isn’t expressly permitted is forbidden. Now let’s turn to the norms of appropriate behavior in a public park. Some parks will list rules like the following: No motorized vehicles. No fires. No consumption of alcoholic beverages. What about listening to the radio on headphones? Of course. Singing? No problem. Hopping up and down? Why not? In the park what matters are the explicit prohibitions. So for chess and the Catholic Mass, it’s plausible that people operate with a Residual Prohibition Principle and for the balloon game and public parks, people seem to operate with a Residual Permission Principle. Games and parks aren’t specifically moral domains, but the above cases suggest that the application of closure principles might be domain specific. While the examples are suggestive, the prospect of closure principles in normative systems raises several empirical questions: When presented with a small set of rules, do ordinary people infer a closure principle that generates a determination about whether unmentioned actions are allowed or forbidden? If so, is the learning process rational? Do people respond flexibly to data such that they can learn either principle? Are people biased toward learning one of the closure principles? As we’ll see, the evidence points toward a positive answer to each of these questions.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

98

 

1. A Learning Theoretic Account of Closure Principles 1.1 Acquirendum Ordinary experience suggests that in the moral domain, we tend to operate with an assumption of liberty. Many of the possible actions that are available to an agent have never been explicitly permitted or prohibited. But people tend to think that most actions that have never been explicitly permitted or prohibited are morally permissible. The idea that we operate with a Liberty Principle for closure would make sense of this. Thus, the acquirendum is the knowledge that moral systems are characterized by a Liberty Principle, that is, the principle that whatever is not expressly prohibited is permitted.

1.2 Hypotheses

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

As we’ve seen, there are two natural closure principles: the Residual Prohibition Principle and the Residual Permission Principle. In the moral domain, the latter principle is dubbed the Liberty Principle. Thus, let’s take the relevant hypotheses for the moral domain to be as follows: (HLibM) The closure principle for morality is one of Liberty: if an actiontype is not expressly forbidden, then acts of that type are permitted. (HResProhM) The closure principle for morality is one of Residual Prohibition: if an action-type is not expressly permitted, then acts of that type are prohibited. (HNoClosureM) There is no closure principle for morality.¹

1.3 Statistical Principle: Pedagogical Sampling Gaus and I argue that closure principles might be learned through pedagogical sampling (Nichols & Gaus 2018). The core idea of pedagogical sampling is quite intuitive, and it’s reflected in the following normative claim:

¹ I am framing the hypotheses here expressly for the moral domain. But, as we’ll see below, the experiments that we conducted targeted closure principles outside of the moral domain.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi



99

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

If you know that a teacher is trying to teach you a set of rules (say), and you know that the teacher assumes that you are rational, then you should expect the teacher to provide examples that are maximally useful for a rational learner trying to identify the set. There is empirical evidence that learners make inferences that conform to this normative claim (Shafto et al. 2014). That is, learners make inferences that suggest that they expect teachers to provide helpful examples.² One study that illustrates the use of pedagogical sampling has participants play the rectangle game. In this game, a teacher is supposed to help a learner select a rectangle with certain dimensions (e.g., “a” in Figure 5.1). The teacher observes the particular rectangle on a screen and picks two points on the screen, one that falls within the rectangle, and one that falls outside the rectangle. The inside point is indicated by a green circle and the outside point is indicated by a red X. These points constitute the evidence for the learner, and the learner is to infer the dimensions of the rectangle from the two points provided by the teacher (Shafto et al. 2014: 64–5). The learner is told that the teacher was given these instructions. As a teacher, knowing that the learner will use these points to infer the dimension, the best strategy is to pick an inside point right next to one of the inside corners of the rectangle (say, the bottom left) and pick an outside point that falls just outside of the diagonal corner (the top right) (as in “b” in Figure 5.1). Learners should be able to figure out this pedagogical strategy and thus expect teachers to provide these kinds of examples, and, in fact the experiments show learners who are given points like 5.1c do indeed tend to select rectangles like 5.1b.

a.

b.

c.

Figure 5.1 The rectangle game

² The idea of pedagogical sampling is related to broader issues in pragmatics, epistemology, and philosophy of language (see, e.g., Goldberg 2010; Sperber & Wilson 1995). But for this volume, I will stick with the characterization drawn from recent work in learning theory (Shafto et al. 2014).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

100

 

Turning to rules, suppose a teacher has a limited amount of time to teach someone a rule system. There are, of course, many factors to consider. For instance, if it is much worse to commit a violation than to forgo an option, the teacher might focus on the violations. But in general, one prevailing factor in good teaching will be efficiency. If a teacher expects a learner to apply a closure principle, then she might just supply a set of permission rules, anticipating that the learner will infer that the remainder is prohibited; alternatively if she supplies a set of prohibitory rules she might anticipate that the learner will infer that the remainder is permitted. In addition, if the teacher knows that for the rule system she’s trying to teach, fewer act-types are permitted than prohibited, it would be most efficient to specify permission rules for that smaller set of act-types rather than prohibitory rules for the larger set. When we turn to the learner’s perspective, if I think that the teacher aims to be efficient and expects me to infer a closure principle, then I should infer a Residual Permission Principle when the teacher gives only prohibition rules and a Residual Prohibition Principle when the teacher provides only permission rules. Thus, the pedagogical sampling account predicts that when people are given exclusively permission rules in a domain, they will infer Residual Prohibition and judge other act-types in the domain as prohibited; by contrast, when given exclusively prohibition rules in a domain, pedagogical sampling predicts that people will infer Residual Permission and regard all unmentioned act-types in the domain as permissible. This proposal allows for flexible learning based on the evidence, and it might vary by domain. For instance, for a domain that is taught via permissions, we should expect the learner to infer a residual Prohibition Principle for that domain. If the training set for the moral domain (or some important domain within morality) is primarily prohibitory, then the pedagogical sampling account predicts that the learner should infer that morality is characterized by Residual Permission, that is, the Liberty Principle.

1.4 Evidence Available to the Learner The above analysis suggests that if learners rely on pedagogical sampling and are primarily taught via moral prohibitions, then they should infer that morality is characterized by the closure principle of Liberty. Does the evidence available to learners in the moral domain consist mostly of prohibitions (rather than permissions)? There is little direct evidence on this. As far as I know, no

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi



101

quantitative measures of frequency have been conducted. Nonetheless, in the moral domain regarding actions it is very easy to call to mind explicit prohibition rules (e.g., don’t hit, don’t steal, don’t cheat), and it is comparatively harder to call to mind explicit rules of permission. It would be extremely valuable to have actual quantitative data on the relative frequency of prohibition rules in moral instruction. But for present purposes I will assume that armchair reflection is right here—the training set for the moral learner will primarily be rules of prohibition rather than rules of permission.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

1.5 Sensitivity to the Evidence If (1) people entertain the hypothesis that the moral domain is characterized by the Liberty Principle, and (2) they use pedagogical sampling, and (3) the evidence is mostly rules of prohibition, then people should infer that the Liberty Principle is operative in the moral domain. But do people actually make the sorts of inferences about rules suggested by the pedagogical sampling account? Gaus and I (Nichols & Gaus 2018) conducted several experiments that show that people do infer closure principles from the kinds of evidence they get. Our experiments all use artificial, non-moral rules. Thus, we are not directly investigating the acquisition of the Liberty Principle for the moral domain. If people already have the Liberty Principle for the moral domain, then we can’t really have an experiment where they acquire it. Instead, Gaus and I use unfamiliar artificial rule systems to see whether people will infer closure principles given novel materials. For instance, participants were presented with the following vignette: There is a Farm with lots of mice, and all the mice are supposed to follow The Rules of Mice. The rules are taught to them by one of the older mice. The Farm has four barns: Red, Blue, Yellow, and Green. Following the vignette, half of the participants were given two permission rules as evidence: The Rules of Mice book says: 1. Mice are allowed to be in the Red Barn. 2. Mice are allowed to be in the Yellow Barn. Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

102

 

The other half were given two prohibition rules:³ 1. Mice are not allowed to be in the Red Barn. 2. Mice are not allowed to be in the Yellow Barn. Then all participants were told: Marky is one of the mice on the farm, and he knows the Rules of Mice. They were then asked to indicate the extent to which it’s permissible for Marky to be in the Green Barn. The pedagogical sampling account predicts that participants’ responses should conform to the following rules: If for domain X, I’m taught via prohibitions, that’s evidence that the domain is characterized by Residual Permission (HPerm);

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

If for domain Y, I’m taught via permissions, that’s evidence that the domain is characterized by Residual Prohibition (HProh). There is a competing prediction based on response bias. If all of the examples are permissions, that might lead participants to think the next one will also be a permission. This response-bias prediction coheres with the idea that people generalize to novel cases simply by considering similarity with past cases. In this first experiment we got overwhelming support for the pedagogical sampling account (and against the response-bias prediction). Participants given permission rules maintained that it’s not permissible for Marky to be in the Green Barn and participants who were given prohibition rules maintained that it is permissible for Marky to be in the Green Barn. This study indicates that in simple cases, people do learn closure principles. In addition, people learn the rules flexibly—they easily learn HPerm in one condition and HProh in the other. This first experiment explicitly specified that there were only the two rules. We also ran studies to see

³ Our experiments only used prohibitions of the form “Ss are not allowed to do A.” One might classify obligations (e.g., “Ss must do B”) as a kind of prohibition too, though (see, e.g., Mikhail 2011). It will be important to see whether people also treat obligations the way they do prohibitions.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi



103

whether the pattern holds for a less determinate system. For instance, in one study, we presented participants with the following variant: There is a Farm with lots of mice, and all the mice are supposed to follow The Rules of Mice. The rules are taught to them by one of the older mice. The teacher is in a hurry, but is able to tell the young mice two of the rules. Those two rules are: (1) Mice are allowed to be in the Red Barn. (2) Mice are allowed to be in the Yellow Barn.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Marky is one of the mice on the farm. The prohibition rule condition was the same except “not” was inserted into the two rules. Again, participants were asked to indicate the extent to which it’s permissible for Marky to be in the Green Barn. Once again the results were very clear. Participants given permission rules inferred Residual Permission; participants given prohibition inferred Residual Prohibition. Participants’ explanations for their answers in these studies indicated that they were at least implicitly thinking in terms of closure principles. A characteristic response in the Permission-rule condition was “There isn’t a rule that states mice are not allowed in the Green Barn”; and a characteristic response in the Prohibition-rule condition was: “It said mice are allowed in either red or yellow barns. It said nothing about green. It would have said green also so obviously it’s not allowed in green.” In some cases, participants’ explanations came close to an explicit invocation of a closure principle, e.g.: “The Rules state that Mice are only allowed in the Red and Yellow Barn. Of course, one could interpret that as, ‘Oh, the rules don’t state that we aren’t allowed in the other barns, so we’re good.’ But I’d rather think it was implied that you weren’t allowed in the other barns besides the ones mentioned.” (Permission-rule condition) Furthermore, in keeping with the pedagogical sampling account, participants often explicitly referenced the teacher in their explanations: “The teacher was rushed but he specifically singled out two barns that they are allowed. If they were allowed in Green, he would have said so.” (Permission-rule condition)

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

104

 

“I cannot be 100% certain but there is a very low chance that the teacher would leave out a barn that they are not allowed to be in (even if he was in a hurry). So I’m going to assume that he told the young mice all of the barns they are NOT allowed in (only red/yellow). Which makes it likely that they are allowed in the Green and Blue barn.” (Prohibition-rule condition) These explanations are suggestive that people are indeed using pedagogical sampling, but to confirm this, we ran further experiments. In one of these studies, we contrasted a pedagogical frame with a framing on which the samples are merely randomly drawn. The studies were much like the previous ones. In the pedagogical frame, we introduced the two rules by saying: An older mouse teaches the rules from The Rules of Mice book. He is in a hurry but is able to tell the young mice two of the rules. In the random frame, we introduced the two rules with this:

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

A young mouse opens The Rules of Mice book to learn the rules. He is a slow reader but is able to read two of the rules before he gets called away. Responses in the pedagogical frame were, as in the other studies, in keeping with the pedagogical sampling account—people in the permission condition said it wasn’t permissible for Marky to go in the Green Barn and those in the prohibition condition said that it was permissible. In the random frame, by contrast, no such difference emerged (Nichols & Gaus 2018: 2745–6). This suggests that the pedagogical frame is critical for inferring a closure principle. In another study, we flipped the roles. If the pedagogical sampling account is right, participants should be able to switch positions and give the answer a teacher would give, for it is part of the account that the learner has an expectation about the samples that an efficient teacher would provide. We explored this empirically by having participants take on the role of teacher. Participants were given the following instructions: Imagine a playground with several little barns. There is a code of conduct for the barns. Namely, children are allowed into some of the barns and not others, as follows:

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi



105

Red Barn: Not allowed Blue Barn: Allowed Yellow Barn: Not allowed Brown Barn: Not Allowed Orange Barn: Allowed Gray Barn: Allowed Green Barn: Allowed Your task is to figure out the most efficient way to teach the code of conduct ...

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

You need to pick the smallest set of rules that will enable people to learn the entire code of conduct. We varied whether participants saw an array with four Allow rules (as above) or four Not-Allow rules. Participants were told that based on previous studies, it is known what the smallest set of rules is that will work for participants to learn the entire code of conduct. Participants were told that they would get a bonus payment for selecting this set. Participants did very well on this task. Sixty-five percent correctly picked the smallest set (Not Allowed in Red, Yellow, or Brown, in the above example). Given that there are forty-nine candidate sets, this is obviously much better than chance. In addition, participants tended to give appropriate explanations for why this is the efficient selection. Here are some representative examples: “It’s easier to teach the students what they aren’t allowed in since there are fewer of them.” “I checked the ones they are allowed in and figured they can guess the unchecked ones are barns they aren’t allowed in.” “This lets people know all the barns that are not allowed, so the rest should be understood to be allowed. Since there are fewer not allowed, I chose these to have the fewest rules possible.” Thus, the experiments show that people are sensitive to evidence in the way that the pedagogical sampling account requires and predicts. Participants infer Residual Permission when trained on prohibitions and Residual Prohibition when trained on permissions, and they are tuned to the pedagogical features of the evidence. Indeed, the explanations provided by the

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

106

 

participants provide good reason to think that the algorithm that people run in solving this task is at least partly characterized by pedagogical sampling. That suggests that when people are learning closure principles, the process that they use is algorithmically rational. The pedagogical learning account suggests an empiricist explanation for why we find a Liberty Principle for (at least part of) the moral domain. Rather than think that the Liberty Principle is part of an innate moral grammar, it seems to be eminently learnable from evidence. If moral systems are primarily taught by prohibitions, the Liberty Principle is the expected inference from pedagogical sampling.⁴ Furthermore, our experiments indicate that people actually use pedagogical sampling in trying to determine which closure principle applies to a novel rule system.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

2. Closure Principles and Moral Psychology We saw at the beginning of this chapter that people seem to assume a Residual Permission Principle for some domains (e.g., public parks) and Residual Prohibition for other domains (e.g., chess). Our studies confirm that people can easily learn either kind of closure principle, depending on the kinds of rules they’re exposed to. In addition to confirming that people deploy closure principles and adopt them in a domain-specific way, the experiments also indicate that the process is evidentially rational given pedagogical sampling (Shafto et al. 2014). Our pedagogical sampling account predicted that participants would infer Residual Permission when trained on prohibitions and Residual Prohibition when trained on permissions, and this is exactly what we found. The pedagogical sampling interpretation of these findings is strengthened by the fact that participants did not learn a closure principle when presented with a case in which the rules were in a context of random sampling rather than pedagogy. Furthermore, participants were adept at figuring out the most efficient set of rules that a teacher can provide to enable a learner to infer a complete rule system. We found that, at least for the novel rule systems we used in our experiments, closure principles are eminently learnable. But is this qualified ⁴ Following Mikhail, I have been treating the moral domain as unified with respect to a closure principle. But it’s possible that the principle of liberty is only part of the moral domain. I leave this as an open question.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



107

by other normative considerations? In particular, do considerations of harm affect the kinds of inferences people make in contexts where they naturally learn closure principles? To investigate this, Gaus and I (Nichols & Gaus 2018) conducted a further experiment much like the other experiments, but we varied whether the novel action (e.g., content going into the Green Barn) jeopardized others. When the action jeopardized the welfare of others, participants were less likely to think that an action is permissible, even when the pedagogical sampling considerations suggested a closure principle on which it would be permissible. More specifically, when the novel action doesn’t jeopardize anyone’s welfare, participants infer Residual Permission from prohibition rules, but this is sharply moderated when the novel action does jeopardize the welfare of others. That is, participants are much less likely to think that a novel action is permissible when the novel action is likely cause harm. This suggests that, in a rather Millian vein (Mill 1859), people have a tempered view of the reach of the kind of liberty reflected by the Residual Permission Principle—it seems to be constrained by a kind of Harm Principle. Although we found that Residual Permission is qualified by considerations of harm, we also found evidence of a bias in favor of a Residual Permission Principle. In the experiments reported above, the conditions either included only permission rules or only prohibition rules. But we also ran studies using a mixed condition, in which participants received one permission rule and one prohibition rule (e.g., “Mice are allowed to be in the Red Barn” and “Mice are not allowed to be in the Yellow Barn”). In that case, people showed a preference for a Residual Permission Principle, maintaining that the novel action is permissible (Nichols & Gaus 2018: 2741–2). Indeed, when given a mixture of prohibition and permission rules, people explained their responses in ways that sounded like an affirmation of a Residual Permission Principle: “Do as you please if no rule against.” “There is no rule that prevents mice from going in the Green Barn, so it is permissible, though it may not be encouraged.” “Doesn’t say they’re not allowed explicitly, so . . .” Why do we find this bias for Residual Permission? Perhaps there is some innate inclination to Natural Liberty. Note, though, that the evidence doesn’t provide reason to think that the bias is specific to the moral domain—the

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

108

 

scenarios were all novel rules about mice and barns. So what demands an explanation is this broader tendency to the Residual Permission Principle. Another explanation for the bias is that people have formed overhypotheses about closure principles, perhaps including an overhypothesis that in certain social contexts, norm systems tend to be characterized by a Residual Permission Principle. If the bias for Residual Permission really is from overhypothesis formation, it should be malleable, perhaps under experimental conditions (see Chapter 4). In addition, if the bias for Residual Permission is based on an overhypothesis, we might expect cultural variation. More authoritarian cultures might have more norm systems that are governed by Residual Prohibition. In that case, perhaps the bias for Residual Permission is weaker or absent. Thus, there are some fairly clear research questions that follow from these issues about closure principles. For present purposes, what matters is that we have in place a rational learning account of the acquisition of the Liberty Principle for morality. Considerations of pedagogical sampling suggest that if the instruction that learners get for a norm system is primarily in terms of prohibitions, it’s rational for them to infer that the norm system is characterized by a principle of Residual Permission. Again, we don’t have good quantitative evidence on the proportion of prohibition rules to which people are exposed in the moral domain. But if it turns out that most of the rules that are taught in the moral domain are prohibition rules, then it’s rationally appropriate for people to infer that the moral domain is characterized by a principle of Natural Liberty. On the experimental side of things, our studies show that people do make inferences that conform to the pattern suggested by pedagogical sampling; in fact, people show a flexible pattern of inference to arrive at closure principles in ways that are appropriately sensitive to the nature of the instruction. Indeed, people’s self-report of the process they used to arrive at these closure principles suggest that they are actually using pedagogical sampling in rationally appropriate ways in their algorithms. That is, their inferences are algorithmically rational.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

6 Status

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

The truth of some sentences varies depending on the context. A sentence might be true relative to one context but not true relative to another.¹ The sentence “It is 4PM” is relative in this way: that sentence might be true when uttered in New York and false when uttered at the same time in San Francisco. Other sentences are not qualified in this way; rather, they are universally true or false. For instance, the sentence, “A chlorine atom has 17 protons” is universally true, and the sentence “A chlorine atom has 20 protons” is universally false. Similarly, the sentence “7 * 2 = 14” is universally true. The truth of these sentences doesn’t vary by culture, geography, or historical period.² Universalism and relativism can also be articulated for judgments: a judgment of the form it’s illegal to sell alcohol to 19-year-olds is true when relativized to the context of the United States but not when relativized to the context of Mexico. So when a shopkeeper in the U.S. makes the judgment that it’s illegal to sell alcohol to 19-year-olds, his judgment is true; but if a shopkeeper in Mexico make the judgment of the same form, his judgment will be false. By contrast, a judgment of the form 7 * 2 = 14 is true wherever and whenever it is made. Given the above characterization of universalism that, if one person judges that P is true and another judges that P is not true, then if the truth of P is a universal matter, it follows that at least one of these individuals must be wrong. Thus, if one person makes the judgment 7 * 2 = 14 and another ¹ I’m using “sentence” here to pick out the surface sentence, which might take different interpretations when uttered (or written) in different contexts. ² One natural way to articulate this kind of relativism is that for sentences that are only relativistically true, the content of the sentence is relative to a context, and this is why the same sentence can be true in one context and false in another (see, e.g., Dreier 1990: 7). In much contemporary literature, “contextualism” is used for the kinds of relativist views described here, according to which the content of a sentence is partly determined by a contextual parameter. “Relativism” is often reserved for a different view on which the relativizing parameter doesn’t apply at the level of content but at the level of the assessor (e.g., MacFarlane 2014; see also Murray 2020 for empirical work on this topic). In this book, I use “relativism” rather than “contextualism” simply because that is the prevailing term in the literature in moral psychology. Nothing turns on this.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

110

 

person makes the judgment It is not the case that 7 * 2 = 14, then at least one of them must be wrong. The situation is quite different for contextually relativized sentences. If one person makes the judgment It’s 4PM and another makes the judgment It’s not 4PM, they might both be right because time of day is relative to geography. Moral universalism is the view that there is a single true morality.³ On this view, when someone makes a moral judgment, whether that judgment is true or false holds independently of the context in which it is made. In philosophy, the thesis of moral universalism has been developed in different ways, drawing on different accounts of the “truth-makers”—what makes the universally true moral sentences true (for discussion see, e.g., Finlay 2007; Shafer-Landau 2003). On some views, true moral sentences are made true by what an ideal agent would say (e.g., Smith 1994); on other views, the moral truths are made true by attitude-independent natural properties akin to health (e.g., Bloomfield 2001); on other views, the moral truths are made true by irreducible moral properties (e.g., Shafer-Landau 2003). I won’t try to explore these distinctions in folk morality. Indeed, it’s unclear whether most people have sufficiently fine-grained views on these matters. But all of these forms of universalism are at odds with moral relativism, according to which there is no single true morality (e.g., Harman 1985; Joyce 2001; Prinz 2007). Many philosophers, including both universalists and relativists, have suggested that most people are moral universalists (see, e.g., Blackburn 1984; Brink 1989; Jackson 1998; Joyce 2001; Mackie 1977; Shafer-Landau 2003). Psychologists have investigated the issue by drawing on the above idea that if it’s a universal matter whether P is true, then if two people make different judgments about the truth of P, at least one of them must be wrong.⁴ And several studies confirm that experimental participants think

³ The terminology in the empirical literature on metaethics is not yet systematic. Here, “moral universalism” is the view that there is a single true morality, in contrast with “moral relativism” which denies this. I follow Wong (2006) in taking “moral absolutism” to be a stronger claim. He writes: “Moral absolutism is universalism plus the view that the core of the single true morality is a set of general principles or rules, all of which hold true without exception. Often the further claim is made that these rules hold no matter what the consequences” (2006: xii). “Objectivism” is another term used as contrast with relativism (including in my earlier work, e.g., Nichols 2004a), but “objectivism” is more naturally treated as the contrary for “subjectivism” (see Section 4). ⁴ This is far from a perfect test of the kind of universalism defined above. Even for some cases of conventions, like which side of the road to drive on, it would be natural to say that if two people disagree, one of them must be wrong. Nonetheless, the disagreement measure has turned up some interesting differences in people’s meta-evaluative judgments, as we will see below.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi



111

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

that if people make different judgments about a moral matter, at least one of them must be wrong (Goodwin & Darley 2008, 2012; Heiphetz & Young 2016; Wright et al. 2013, 2014; but see Sarkissian et al. 2011). In a critical early study, Goodwin and Darley (2008) presented participants with sentences concerning facts (e.g., “The earth is not at the center of the known universe”), conventions (e.g., “Calling teachers by their first name, without being given permission to do so, in a school that calls them ‘Mr.’ or ‘Mrs.’ is wrong behavior”), taste (e.g., “Frank Sinatra was a better singer than is Michael Bolton”), and ethics (e.g., “Robbing a bank in order to pay for an expensive holiday is a morally bad action”). Participants were asked, for each sentence, whether a person who disagreed with them “is surely mistaken” or whether “it is possible that neither you nor the other person is mistaken.” Selecting the latter option indicates a relativist response. Goodwin and Darley (2008) found that people were more likely to give relativist responses for sentences regarding taste and social convention than for sentences about ethical matters (1352–3). Indeed, the rejection of relativism for sentences about ethical matters was almost as high as for sentences about scientific facts (1354).

1. A Learning Theoretic Account of Universalist and Relativist Judgment 1.1 Acquirendum People seem to regard judgments about morality as universally true and judgments about taste as true in some relativistic fashion. Why is this? Why do people believe of some moral judgments that they are universally true? More broadly, we want to explain what leads to the belief that a sentence or judgment is relatively or universally true. As a result, we have two acquirenda: (i) why do people think of certain judgments that they are universally true and (ii) why do people think of other judgments that they are only relatively true? To anticipate, the core idea is that the amount of consensus regarding the judgment counts as evidence when deciding whether a judgment is universally or only relatively true (Ayars & Nichols 2020).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

112

 

1.2 Hypotheses Glossing over complexities, a learner trying to determine the status of a sentence will consider two hypotheses about the status of a judgment with respect to universalism. For a given judgment that is true in at least some contexts, the hypotheses are: (HUniversalism) The judgment is universally true; i.e., the judgment is true in all contexts. (HRelativism) The judgment is only relatively true; i.e., the judgment is true relative to some contexts and not true relative to others.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

1.3 Statistical Principle: Tradeoff between Fit and Flexibility The statistical principle that I appeal to here is more global than in the preceding cases, and it will take rather more exposition. I want to suggest that it can be rationally appropriate to treat consensus information as evidence for (or against) relativism regarding a moral judgment. Indeed, consensus information can count as evidence for (or against) relativism for non-moral judgments too, and Ayars and I examined the impact of consensus information in both moral and non-moral domains (Ayars & Nichols 2020). Before articulating how consensus bears on universalism and relativism, it’s worth noting that consensus can clearly provide evidence bearing on first-order judgments. The appeal to consensus as evidence is an ancient idea, prominently used in the consensus gentium argument for the existence of God (Kelly 2011: 142; Zagzebski 2011: 34), which is found as early as Plato’s Laws. Cleinias adverts to “the fact that all Hellenes and barbarians believe in [gods]” as an argument for the existence of the gods (The Laws, book X). The core thought behind the argument, as Thomas Kelly notes, is that “the fact that theistic belief is widespread among the human population is itself a significant piece of evidence that God exists” (Kelly 2011: 136). The common consent argument moves from a descriptive fact about theistic belief (its ubiquity) to a metaphysical conclusion (God’s existence). There are reasons to worry about these arguments, of course, but the idea that consensus often provides evidence is clear from simple examples. When a schoolchild at recess sees lots of kids funneling back into the school, the

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi



113

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

child can rightly take his peers’ behavior as evidence that recess is ending.⁵ Information about consensus also bears on people’s normative beliefs. For instance, social psychologists have found that if a subject thinks that others have donated time to a charity, he is more likely to donate his own time (see, e.g., Cialdini et al. 1990). The foregoing are just a couple of examples of the impact of consensus on first-order beliefs, but there is a considerable amount of work showing that it’s often a good bet to rely on the wisdom of the crowd (see, e.g., Surowiecki 2004). What I want to argue is that we can also extract meta-wisdom from the crowd. To see how consensus applies to second-order theses about whether a true judgment is universally or relativistically true, let’s turn to rational probabilistic inference (see, e.g., Griffiths et al. 2008). In deciding between hypotheses, there is often a key tradeoff. Obviously the extent to which a hypothesis fits the data counts in favor of the hypothesis; but the extent to which a hypothesis is flexible in its ability to fit data reduces the extent to which fit counts in favor of the hypothesis. To illustrate how these different factors play into hypothesis selection, imagine you’re a physician trying to determine which disease is present in your community. You’re deciding between two hypotheses, each with a prior probability of 0.5: HM: The disease present in the community is M, which typically produces high fever and doesn’t produce any other symptoms. HD: The disease present in the community is D, which typically produces either high fever or a sore throat, but never both. There is no prior reason to think that a person with D will be more likely to have the fever or the sore throat. Now imagine that twenty patients have come to see you. Eleven patients have a high fever and nine have a sore throat. HM does a poor job of fitting the data—it has to dismiss almost half of the data points as noise. By contrast, HD can fit the data perfectly since each patient has one of the possible symptoms produced by disease D. In this case, it seems quite plausible that the disease in the community is D. Now imagine instead that among the twenty patients who come to see you, nineteen patients have a high fever and one patient has a sore throat. HD can again fit the data ⁵ The fact that consensus is evidence in such cases is related to Condorcet’s Jury Theorem, which shows, under some strong assumptions, that a majority opinion will tend to be correct. The Jury Theorem traditionally focuses on first-order issues. This chapter argues that consensus also provides evidence for second-order matters concerning universalism and relativism.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

114

 

perfectly, for one can simply say that one patient has the sore throat symptom of D and all the rest have the fever symptom of D. But at this point, it should seem like things are a bit too easy for HD. HD also fits the data if many more or different patients have sore throats. Any pattern of symptoms of fevers or sore throats can be fitted by HD. Because each of these options has an equal chance of occurring on HD, the total probability that HD can assign (i.e., 1) must be spread over all these different options. With twenty patients, there are 2²⁰ ways the symptoms can be distributed, and HD can fit all of them. HD thus assigns a small probability—1/2²⁰—to each of these possibilities.⁶ By contrast, HM makes a much less flexible prediction: it predicts lots of high fevers and nothing else. Since HM is inflexible in this way, it doesn’t have to split up the probability mass among different distributions of symptoms. The great flexibility of HD can make it a worse explanation of the 19:1 data than HM, despite the fact that HD fits the data better.⁷ HM does have to attribute some of the data, namely patients with sore throats, to noise (e.g., perhaps the sore throats were misreported or were caused by some environmental irritant). But a highly flexible hypothesis that can fit the data perfectly can be less probable than an inflexible hypothesis that has to attribute some data to noise. In hypothesis selection, we want a hypothesis that fits the data well without being overly flexible (see, e.g., Blanchard et al. 2018: 1345–7; MacKay 2003: 346–50; Perfors et al. 2011: 310–12). Using consensus to decide between universalist and relativist hypotheses can make rational sense based on the foregoing considerations about hypothesis selection.⁸ Imagine two hypotheses, each with a prior probability of 0.5, that purport to explain people’s judgments about whether P holds: ⁶ This can be captured in the likelihood term, i.e., the probability of a certain symptom pattern given HD. The probability that the pattern will be patient #1 having a sore throat and all others having high fever is 1/2²⁰, and this is also true for each of the other 2²⁰ possible symptom patterns. When you calculate the likelihood of the actual symptom pattern you encounter, all of these alternative possibilities need to be registered. So, if you find that only patient 1 has a sore throat, the likelihood of that hypothesis on HD is: P(#1 has sore throat, #2–20 have fever|HD) = (1 * 1/2²⁰ + 0 * 1/2²⁰ + 0 * 1/2²⁰ + 0 * 1/2²⁰ . . . ) The first addend here represents the actualized possibility that #1 has sore throat and #2–19 have fever; the second addend represents the unactualized possibility that #1–2 have sore throats and #3–20 have fever, and so on for all 2²⁰ possibilities. Since only one of these possibilities actually occurs, in the simple case under consideration the likelihood of that outcome (in this case, #1 has sore throat and #2–20 have fevers) will always be 1/2²⁰. ⁷ Of course, this conclusion will be affected by the likelihood of finding a sore throat among individuals without disease D. If it turns out that sore throats are extremely rare except among patients with disease D, then this will offset the penalty to some extent. ⁸ Whether it is rational to use consensus as evidence in the suggested way depends on several assumptions, discussed below.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi



115

HU: There is a single fact about whether P, and this fact (partly) explains the pattern of people’s judgments.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

HR: There is no single fact about whether P; rather, whether P holds is relative to context or culture, and this relativity (partly) explains the pattern of people’s judgments. HU would correspond to a model that has a single parent node connected to several child nodes (e.g., Figure 6.1a and d). On this model, the parent node represents the single fact about P, and that fact purports to explain the pattern of child nodes. Roughly speaking, this model says that those who affirm P (represented by the shaded child nodes) do so because P is true, without any restriction to context. In one of these cases (Figure 6.1a), this single fact P only explains about half of the child nodes; in the other case (Figure 6.1d), this single fact P explains almost all of the child nodes. HR, by contrast, would correspond to a relativist “multiple fact” model (e.g., Figure 6.1b and c). On the multiple fact model, the striped parent node might represent that P is a fact (in some contexts) and the dotted parent node represent that ~P is a fact (in some other contexts). Roughly speaking, on the relativist model, those who affirm P (represented by the shaded childnodes) do so in part because P is a fact in their context, and those who affirm ~P (represented by the unshaded child nodes) do so in part because ~P is a fact in their (different) context. Now, if in the population as a whole there is low consensus in judgments about whether P holds, the fit between the data and a single fact is poor (Figure 6.1a); in that case, appeal to a single fact is likely a worse explanation of the data since the appeal to multiple facts provides a much better fit (Figure 6.1b). When there is high consensus across the population as a whole in judging that P, a relativist account can also fit these data (Figure 6.1c). But the relativist model is flexible enough to accommodate any pattern of consensus, and, as noted above, such flexibility is a theoretical cost.⁹ As a result, when there is high consensus, appealing to a single fact (with a bit of noise) can provide a more probable explanation of the data (Figure 6.1d).

⁹ The relativist can reduce the flexibility of the theory by imposing constraints on how groups can be formed. If the relativist hypothesis groups people purely based on whether they make the same judgments, then the hypothesis will be completely flexible (and accordingly penalized very heavily). But a relativist might avoid total flexibility by allowing only natural groupings, e.g., based on community or geography.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

116

 

a.

b.

noise

c.

d.

noise

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Figure 6.1 Universalist and relativist models for different patterns of consensus

To make this all a bit more concrete, imagine a child learning about months and seasons. She thinks that the sentence “July is a summer month” is true, and she’s trying to figure out whether it’s universally true or only relatively true. She learns that while 55 percent of people around the world think, as she does, that July is a summer month, 45 percent think that July is not a summer month. Given this broad diversity, it would be reasonable for her to conclude that the sentence “July is a summer month” is only relatively true (cf. Figure 6.1b).¹⁰ The hypothesis that it is a universal truth fits the consensus data too poorly (cf. Figure 6.1a). On the flip side, if there is widespread consensus regarding some sentence or judgment, that might count as evidence that the sentence or judgment is universally true. Imagine the same child learning that 95 percent of people around the world think that “Summer is the hottest season” is true. The consensus surrounding this

¹⁰ The distribution of opinion here is evidence for two different claims. It is evidence for the second-order thesis that there is no single fact about whether July is a summer month. This parallels the metaethical relativist claim that there is no single fact of the matter. Such a relativist view can be neutral about what facts there are. But the distribution of responses is also evidence for a first-order thesis that in some contexts, July is a summer month. This parallels a normative relativist claim that, e.g., polygamy is wrong in some contexts. The first- and second-order claims are related in at least the following way. If in some contexts it’s a first-order fact that July is a summer month is true and in other contexts, it’s a first-order fact that July is a summer month is false, the second-order thesis follows: there is no single fact about whether July is a summer month.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



117

judgment provides reason to think that it is a universal truth (Figure 6.1d). A relativist account can, of course, fit the responses—one could say that the judgment is true relative to one group and false relative to another, much smaller group, but just as in the case of the diseases, the massive flexibility of the relativist hypothesis counts against it. It will often be more plausible to count a small minority as mistaken about a universal truth rather than correct about a relative one. Thus, if an agent is deciding between a universalist and a relativist hypothesis, consensus can provide evidence that bears on the decision. If there is high consensus, this counts in favor of the universalist hypothesis since the relativist hypothesis needs to be penalized for its flexibility. But if there is sufficiently low consensus, this favors the relativist model since the universalist model needs to posit a lot of arbitrary noise.¹¹ These same principles might apply in the normative domain. If people think that there is widespread divergence on some judgment in aesthetics, that can be regarded as evidence that the judgment is only relatively true. And if people think that almost everyone makes the same moral judgment about some case, that can be evidence that the judgment is universally true. We now have in place a rational analysis of how consensus information might provide evidence for a rational learner to infer that moral judgments are universal whereas judgments of times of day are relative to context. If a learner is trying to decide between whether some judgment is universally true or only relatively true, then consensus can provide relevant evidence. If there is low consensus, such that many people judge that P is true, and many judge that P is false, this can count as evidence that the claim is only relatively true; this is because universalism would fit the data very poorly. On the other hand, if there is high consensus, this can count as evidence that the judgment is universally true; this is because universalism makes a much less flexible prediction than relativism, and this means that if the universalist account fits the data reasonably well, it should be preferred to a relativist account. All of this is, of course, dependent on a variety of other factors about the data and the hypotheses. And there are some quite substantive ¹¹ Given relevant assumptions (see, e.g., Section 3), it’s rational for a learner to infer relativism even if each of the subgroups themselves hold universalist views. In such a case, a plausible analysis is that the subgroups fail to recognize that their own opinion reflects some contextually relativized fact, rather than some universal truth. Indeed, this is plausibly how it works with children thinking about the months. Children in the US tend to think that July is a summer month the world over, and children in the southern hemisphere think that July is not a summer month the world over. An adult observer of this diversity can easily see the childhood mistakes.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

118

 

assumptions that need to be satisfied for these inferences to be solid (see Section 3). But the core idea here about the relevance of consensus for assessing universalism and relativism is a natural implication of probabilistic reasoning.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

1.4 Evidence Available to the Learner If a learner is trying to decide between whether a judgment is universally or only relatively true, then if there is high consensus surrounding the judgment, this provides evidence in favor of universalism. This might then provide the beginnings of an account of how people come to regard certain moral claims as universally true. But it depends crucially on the kind of evidence that people get about morality. It certainly seems that there is near universal consensus about the wrongness of hitting innocent people, stealing from others, and cheating. As in other cases, there is no direct measure of whether children get evidence of this consensus, but it seems very likely that children are exposed to relevant evidence concerning consensus. For example, children almost certainly get evidence supporting the view that most people think, e.g., it is wrong to steal and is right to keep your promises. They also presumably get evidence that fits with relativism rather than universalism, e.g., that many people think radishes are good and many think radishes aren’t good.

1.5 Sensitivity to the Evidence My proposal has been that people believe that true moral judgments hold universally because that provides the best explanation of the evidence of consensus in moral judgment. The final piece of the account is empirical. Are people actually sensitive to consensus evidence when determining whether a judgment is universally or only relatively true? Indeed they are.

1.5.1 Previous Work on Consensus and Universalism One surprising result in the work on folk metaethics is that there is variation in relativist responses within the ethical domain (Goodwin & Darley 2008). People are more likely to give relativist responses concerning abortion than they are for bank robbery. Wright and colleagues (2013) corroborated this finding even after controlling for whether the subject herself regarded the

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



119

item as “moral.” That is, even when restricted to items that participants expressly identify as moral, Wright and colleagues still found metaethical pluralism—participants were more likely to endorse relativism for some moral sentences than for others. A second important result is that the intra-moral variation in relativist responses correlates with variation in perceived consensus (Goodwin & Darley 2008; Wright et al. 2014: 48). When people think there is high consensus regarding a moral sentence (either in the form of high consensus that the sentence is true or high consensus that the sentence is false), people are more likely to give universalist responses concerning the status of that sentence; when there is low consensus, people are more likely to give relativist responses. Thus, there is a correlation between perceived consensus and metaethical judgments (Figure 6.2). What explains the correlation between metaethical judgment and perceived consensus? In particular, what is the causal relationship between such assessments? One simple possibility is that when people think that some sentence expresses a universal moral truth, this will causally facilitate the thought that the sentence will elicit wide agreement. Something like this is clearly plausible for many obvious facts. I think it’s a manifest universal truth that the diagonal of a square is longer than the sides, and as a result, I think that most people also think that the diagonal of a square is longer than the sides. Similarly, then, one might think it’s obviously a universal truth that bank robbery is wrong, and conclude from this that there will be high consensus around this obvious truth. Indeed, some work suggests that the direction of causation does go in this direction: judgments of universalism drive estimates of consensus (e.g., Wright et al. 2014). Jennifer Wright and her colleagues presented participants with thirty issues and included questions about both universalism/relativism and consensus. As in previous studies, they found a correlation between judgments of universalism and judgments of consensus. But they also conducted mediation analyses on judgments of universalism and perceived consensus, and the results indicated that the judgments about universalism were in the driver’s seat. They Figure 6.2 Correlation between perceived consensus and judgments of universalism

Metaethical judgment

Perceived consensus

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

120

 

write, “rather than participants using perceived consensus as a proxy for objectivity [universalism], it is more likely that objectivity [universalism] serves as an indicator of how much consensus they could reasonably expect” (Wright et al. 2014: 49). The results from Wright and colleagues corroborate the plausible view that universalist judgments will often drive judgments of consensus. But it’s not clear that this tells the whole story about the relationship between perceived consensus and metaethical judgments. Importantly, in the study by Wright and colleagues, they collapsed all of the cases in their analysis, including conventional, personal, and scientific issues, in addition to the moral issues. So it’s possible that even though judgments of universalism drive judgments of consensus in many cases, in the domain of metaethics, consensus judgments might play an important role in guiding judgments of universalism and relativism. Indeed, there is a bit of evidence in favor of this direction of causation in the moral domain. Goodwin and Darley (2012) manipulated perceived consensus by giving participants fabricated reports about consensus regarding various moral issues. They found that when given reports of high consensus, participants gave increased judgments that the action was universally wrong (Goodwin & Darley 2012: 254). This provides some grounds for thinking that perceived consensus is treated as evidence of universalism and was the inspiration for a set of new experiments Ayars and I conducted (Ayars & Nichols 2020).

1.5.2 New Experimental Work on Consensus and Universalism As noted above, Goodwin and Darley (2012: 254) did find that manipulating consensus affected judgments of universalism. However, it’s not clear whether consensus influences judgments of universalism because it is treated as evidence. It could be that perceived low consensus decreases one’s willingness to assert the universality of one’s view because one expects more resistance from unpopular views. Or, perceived high consensus could bolster one’s emotional attachment to one’s own view, and this might be what generates the increased universalist responses. To reduce the possibility of such emotional or motivational effects of consensus, in our experiments, we used either boring non-moral examples or abstractly characterized moral cases. For many domains, people plausibly have strong prior beliefs about whether judgments in that domain are universal or not. For instance, “4 * 2 = 8” clearly expresses a universal truth, and diversity of opinion would be unlikely to lead people to think that it is only relatively true. On the other end, “It’s 9AM” is so patently relativistic that no

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



121

amount of perceived consensus will shift judgment to the view that such a sentence is universally true. Nevertheless, in some non-moral domains, there may be no strong prior beliefs about the meta-evaluative status of judgments in that domain. Color is a promising example (see, e.g., Maund 2012). On some philosophical views, color is simply an objective property (e.g., the reflected wavelengths of light from a surface). On these views, our visual system happens to register and track an objective property of the world (e.g., Byrne & Hilbert 2003). In that case, claims about color would be universal. Other philosophical views of color reject universalism and maintain that color is partly constituted by the experience of the observer, which can of course vary between individuals. Jonathan Cohen and I (Cohen & Nichols 2010) examined meta-evaluative attitudes about color and found that participants were divided about whether they regarded color as a non-relational property of objects or as a relational property that depends both on the object and the experience of an observer. In that study, about half of the participants affirmed that two people who disagreed about the color of an object could both be correct, and half denied this. In light of this variability in response, color judgments seemed like promising candidates for testing whether consensus information can be used to infer universalism. Ayars and I (Ayars & Nichols 2020) randomly assigned participants to evaluate judgments about universalism regarding the color of a paint sample. In a between-subjects design, subjects were presented with the following text: The general public was asked to make judgments about the color of 3 samples of paint (A, B, & C). For each sample, the participants were asked whether the sample of paint is blue or green. They saw each sample and then just selected either “is blue” or “is not blue.” There were 230 judges.

In the high consensus condition participants were told that 221 people (96 percent) said that the color of Sample B was blue, whereas nine people (4 percent) said that the color of Sample B was not blue. In the low consensus condition the ratio was 124 (54 percent) vs. 106 (46 percent). Next, participants received the standard disagreement question for universalism: Imagine that Bob is one of the 221 [124] people who said that the color Sample B is blue and Alex is one of the 9 [106] people who said that the color of Sample B is not blue. Given that these individuals have different judgments about this case, we would like to know whether you think at least one

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

122

 

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

of them must be wrong, or whether you think both of them could actually be correct. In other words, to what extent would you agree or disagree with the following statement concerning such a case? Given that Bob and Alex have different judgments about the case, at least one must be wrong.

Our hypothesis was that people would use consensus information evidentially to make inferences about universalism and relativism for color. If consensus is used as evidence in deciding between a universalist and a relativist assessment, those assigned to the High Consensus condition should be more likely to make a universalist response regarding the judgment that the paint is blue. That’s exactly what we found. Universalist responses in the high consensus condition were significantly higher than in the low consensus condition. Thus, we found we found a clear effect of consensus on universalism judgments even for non-emotional and non-moral cases.¹² The foregoing study shows that people will use consensus to make inferences about universalism for some claims about non-moral properties. However, it’s likely that there are certain kinds of claims for which consensus will not be treated as evidence, simply because one already has prior commitments on whether the claim is universally or only relatively true. For many scientific claims, we likely have a strong prior for universalism. Either the earth is flat or it isn’t. The shape of the earth is not contingent on who is thinking the thought or where they are. On the other hand, for many aesthetic claims, we likely have a strong prior for relativism. Some people think Charlie Parker’s music is beautiful and some don’t, and this plausibly depends on different musical sensibilities. Indeed, as noted above, Goodwin and Darley (2012) find that people tend to be universalists about scientific claims and relativists about taste claims. If people have strong priors for particular kinds of claims, these priors should offset the evidentiary influence of consensus information. Thus, in another study (Ayars & Nichols 2020), we explored whether consensus information would affect judgments regarding aesthetic claims, scientific claims, and moral claims. We used an abstract characterization of each ¹² We corroborated the finding with a different measure of universalism, which was used in Cohen and Nichols (2010). Rather than being asked whether two people who disagreed could both be right, participants were asked to pick the best of three interpretations of a case of disagreement where the relativist option was more explicit, saying that that the color claim is “not absolutely true or false.” Once again, we found that people in the high consensus condition were more likely to give universalist responses.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi



123

claim. Our prediction was that consensus information would affect universalist judgments about the moral claim, but not the other claims. In particular, we predicted that responses would be uniformly relativist about aesthetic claims, uniformly universalist about scientific claims, and sensitive to consensus for moral claims. In a between-subjects design, subjects were presented with one of three cases, either in a high-consensus or a low-consensus condition. For the beauty case, they were told: The general public was shown a painting and asked to make judgments about whether the painting was beautiful. They looked at the painting and then just selected either “is beautiful” or “is not beautiful.” For the moral case, they were told: The general public was asked to make judgments about whether a certain action was morally wrong. They read the description of the action and then just selected either “morally wrong” or “morally okay.”

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

For the scientific case, they were told: The general public was asked to make judgments about whether a certain molecule contains hydrogen. They saw a photo of the molecule (from a powerful microscope) and then just selected either “contains hydrogen” or “does not contain hydrogen.” Following the vignette, participants were given the disagreement question to measure universalism. As predicted, we found that consensus influenced universalist judgment for the moral items but not for the others (Figure 6.3). When presented with an abstractly described moral case, participants did use consensus information to make inferences about universalism. Since no concrete details were given about what the moral claim was, the fact that consensus affected judgments suggests that people were using consensus as evidence for their metaethical verdicts. In cases of science and aesthetics, earlier results indicate that people tend to be universalists about scientific claims and relativists about aesthetic claims (e.g., Goodwin & Darley 2008; see also Wright et al. 2013); as a result, we antecedently had reason to expect that people have strong priors regarding universalism and relativism regarding these cases.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

124

  6

Universalism

5 4 High consensus Low consensus

3 2 1

Beauty

Moral

Science

Figure 6.3 Results on universalism/relativism for abstract cases, by domain. Bars represent the standard error

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

And for those cases, we found that consensus made no difference. People were robustly universalist about the scientific claim and robustly relativist about the aesthetic one. It’s time to pull together the different strands. The rational learning analysis can be put as follows: If (1) learners get evidence of near universal consensus regarding some normative judgment, and (2) they have equal priors for whether the judgment is universally or only relatively true, and (3) they are rational learners, then they should infer that the judgment is universally true. There are important qualifications on this analysis (see Section 3). But the experimental evidence indicates that people do in fact treat perceived consensus as evidence for whether a judgment is universally or only relatively true. And it’s likely that children are exposed to evidence indicating that for most serious moral issues, there is widespread agreement about them. The overall picture gives us at least a provisional account of how people acquire the view that many moral judgments are universally true.

2. The Moral/Conventional Distinction and Consensus Thus far, I have focused on measures that are supposed to test directly for universalism. But there is a much more intensively studied measure of moral judgment, the authority dependence task (e.g., Blair 1995; Smetana 1985;

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi



125

Turiel 1983; for discussion, see Kelly et al. 2007; Kumar 2015). This task was designed by developmental psychologists trying to see whether young children had an appreciation of the distinction between morality and mere convention. The inspiration for the task is that actions that are wrong because of conventional rules depend on an authority; but moral wrongs seem to be authority independent. In a typical study using this measure, participants are told about some presumed transgression that a child does in a school setting (e.g., hitting another child). They are then asked both a permissibility question: 1. Was it okay for S to φ? and an authority dependence question:

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

2. What if the teacher didn’t have a rule against φ-ing, would it be okay to φ? The familiar pattern of results is that children (and adults) will say that both a conventional transgression (e.g., chewing gum) and a moral transgression (e.g., hitting another child) are impermissible. But a substantial domain difference emerges for the authority dependence question—both children and adults are much less likely to assent to authority dependence for canonical moral cases than for canonical conventional cases. So, for instance, most people deny that if the teacher didn’t have a rule against hitting it would be okay to hit, but people are much more inclined to accept such a claim about chewing gum. It’s plausible that there is a strong link between universalism and authority independence. For if it’s a universal truth that it’s wrong to φ, then whether it’s wrong to φ can’t depend on what the teacher says. Thus, insofar as high consensus is evidence for universalism, it should also be evidence against authority dependence. We looked at this directly, examining whether consensus information influences responses on an authority dependence measure. Given the previous findings that consensus influences judgments of universalism, and the idea that universalism implies authorityindependence, we predicted that high consensus will make participants more likely to deny authority dependence. In addition, of course, it is of independent interest whether authority dependence judgments can be affected by consensus information. As we’ll see, they certainly can be. Borrowing from Smetana (1985), adult participants were told that they would be asked to make judgments about some behavior, but the behavior

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

126

 

would be referred to with an unfamiliar term, “zibbet.” In a between-subjects design, subjects were presented with either a high or low consensus condition using the following text: About 95 percent (55 percent) of people think you should NOT zibbet. This was followed with a graphic depiction of the opinion population, using different colored dots to reflect different opinions. Following this, they were asked the Authority Dependence question: Mark is a child in a preschool class. His teacher has a number of rules that they were taught at the beginning of the year. But the rules did not say anything about whether it is okay to zibbet in the class.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Do you think that it’s okay to zibbet in the class? The studies recounted above (in Section 1.5) show that under high consensus participants are more likely to regard a moral issue as having universal status. So, if participants believe that almost everyone thinks that some action is morally wrong, the participants will be more likely to think that the action is universally wrong. And, as observed above, if an action is taken to be universally wrong, it should also be regarded as authority independent. Hence, insofar as high consensus that φ-ing is wrong is treated as evidence that φ-ing is universally wrong, high consensus should also lead people to think that φ-ing is wrong independent of the teacher’s rules. And that’s what we found. Under conditions of high consensus, participants were more likely to think that the action was wrong even if the teacher didn’t have a new rule about it. The study counts as a kind of an indirect confirmation of the earlier results on universalism. But it also, of course, shows that high consensus about the wrongness of an act leads people to think that the act is wrong in an authority-independent way. As such, it provides a rational learning account of the acquisition of this part of the competence that children have with the moral/conventional distinction.

3. Substantive Assumptions I started this chapter with the question, Why do people think that at least some moral claims are universally true while aesthetic claims are only

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi



127

relatively true? I’ve argued that considerations of fit and flexibility might provide the basis for a rational learning account of these beliefs. Insofar as most people make the same judgment about some issue, this counts as prima facie evidence that the judgment is universally true. In our experiments, we found that people were sensitive to consensus information in ways that were appropriate. However, the extent to which this counts as a rational inference depends on some substantive assumptions. Part of the goal of the statistical learning accounts I’ve been promoting in this book has been to show that ordinary people form moral beliefs in rationally appropriate ways. In the present case, we are exploring the rationality of using consensus information to infer the meta-evaluative status of judgments or sentences. However, the situation vis-à-vis rationality is a bit complicated. For consensus will only count as good evidence provided certain substantive assumptions hold. I will discuss three of these assumptions.¹³

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

3.1 Tracking Evaluative Properties In order for consensus to be good evidence for the meta-evaluative status of judgments it must be the case that people regard other people as tolerably good at tracking the (first-order) evaluative facts. If you don’t think people are any good at determining whether something is good or wrong, then you shouldn’t take their opinions as evidence of whether evaluative judgments hold universally or are only true in certain contexts. It’s only if you think that people are reasonably good at determining what is good or wrong that you should take consensus as evidence.¹⁴ Is there reason to think that people do expect others to be good at detecting evaluative properties? For at least some evaluative properties, the ¹³ I am setting aside issues about how people come to form their beliefs about what the consensus is. There are known biases here, however. For instance, one robust effect in the social psychology of social cognition is that people tend to project their own views on to others. This generates what is called a “false consensus effect.” This phenomenon emerges not just for beliefs; people also seem to have inflated estimates about the extent to which others share their desires and preferences (e.g., Krueger & Clement 1994). Obviously the extent to which people’s inferences about universalism are driven by biases in estimates of consensus will affect the extent to which we can regard their overall inferences as rational. But my purpose in this chapter is to defend the inference from consensus information to meta-evaluative judgment, not the inference to consensus beliefs themselves. ¹⁴ In many cases, the key question is whether people take others to be their epistemic peers. In some cases, of course, people will only grant a subpopulation as their epistemic peers.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

128

 

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

answer seems clearly to be yes, as reflected in the widespread use of customer reviews. People consult customer reviews for a huge range of products and services. And there is evidence that this has an impact on decisions across several areas including hotels (Ye et al. 2009), restaurants (Zhang et al. 2010), and movies (Chintagunta et al. 2010). Presumably, people take the fact that most people liked the movie as indicating something about the quality of the movie. What about for something closer to moral properties? Work on conformity indicates that people will use other’s actions and normative judgments as evidence about the right thing to do. This is found both in experimental games (Bicchieri & Xiao 2009: 198) and in more natural behaviors like littering and volunteering (Cialdini et al. 1990, 1999). If a subject thinks that others think that one should split evenly on a dictator game, the subject herself is more likely to split evenly. An even starker result has emerged in developmental psychology. Knowing that members of a certain group typically eat green berries leads kids to say it is “not okay” for a member of the group to eat red berries (Roberts et al. 2017: 585–86). Thus, there is reason to think that people do often regard others as being good at tracking what’s good and right. So the tracking conditions seems like it is often met.

3.2 Independence As we just saw, in order for consensus to indicate the meta-evaluative status of a sentence or judgment (as universal or relative), people must be good at descrying the relevant first-order properties. A closely related assumption is that the judgments of individuals must be, to some significant extent, independent.¹⁵ It can’t be that everyone is just blindly copying the opinion of one guy. The individuals must provide independent evaluative data points. Again, for our purposes, the primary question is whether people regard the constituents of consensus (or dissensus) as providing independent evaluations. And, at least in some evaluative domains, it’s plausible that people’s inferences from consensus tacitly assume independence. Witness the natural concern about whether consumer reviews are faked by companies. If people learned that all the gushing reviews of some indie movie were generated by one industrious fanboy, this would

¹⁵ Cf. Condorcet’s Jury Theorem.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi



129

presumably diminish their trust in the evidentiary force of the reviews. It’s harder to know whether people also expect others’ moral judgments to be largely independent. It is plausible that for some kinds of evaluative judgments, people do typically make somewhat independent judgments. For instance, if an evaluative judgment is based in an emotional reaction, like an assessment of the propriety of guilt in a certain context, then people probably answer these questions by consulting their own emotional reactions rather than just following the crowd. But what really matters at this juncture is people’s assumptions about the independence of moral judgments. That’s all matter for evaluating whether or not their inferences from consensus are rational. And it is currently quite unclear to what extent people think that others’ judgments about moral issues are independent.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

3.3 Hypothesis Space I have assumed that the agent is deciding between two hypotheses, universalism and relativism, and that the priors for these hypotheses are equal. Thus, the hypothesis space excludes, inter alia, error theoretic and noncognitivist views on which there are no moral facts. In addition, I’ve been contrasting relativism with universalism. But there are important distinctions within relativism that are elided by treating relativism as simply the denial of universalism, and this obscures central issues. However, in the next section, we will see that even within relativist views, people treat consensus as evidence.

4. Consensus and Subjectivism I’ve argued that high consensus is evidence for universalism and low consensus is evidence for relativism. And our studies indicate that participants do use consensus to make inferences about universalism and relativism, in the expected directions. When told of high consensus, people make more universalist judgments, and when told of low consensus, people make more relativist judgments. But what about the cases where people affirm relativist responses? For example, in Goodwin and Darley (2008), we saw that people tend to give relativist responses to questions about abortion. How should we understand that relativism? Relativism is in fact a capacious category. At one extreme of relativism, the truth of a claim is indexed simply to the attitudes

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

130

 

of the subject. According to such a “simple subjectivist” view of moral judgment, what makes a moral judgment true is simply that the person who makes the judgment sincerely believes it. On that view, if a person sincerely endorses a moral view, that person can’t be making a mistake, since the truth of such a judgment is entirely a function of the person’s sincere attitudes. Some normative utterances might allow for a subjectivist reading. For instance, if I say, “this dish is tasty,” I might mean tasty relative to me, and this would insulate me from certain external criticisms. Thus, one possibility is that under conditions of low consensus about some issue, participants infer that subjectivist relativism holds for the issue. However, simple subjectivism clearly isn’t the only kind of relativist view available. In a haze of jet lag, I might think it’s 8PM, but my sincere belief that it’s 8PM doesn’t make it true. The truth of a sentence about time of day is relative to the speaker’s location; it isn’t simply a matter of the speaker’s sincere belief. Obvious cases like this show that there are some contextually relativized sentences that cannot be accounted for by simple subjectivism. When people embrace a non-universalist view, do they default to simple subjectivism? Ayars and I (Ayars & Nichols 2020) investigated this by exploiting consensus once again. We presented participants with a scenario involving the moral beliefs of aliens. All of the aliens thought that the worst moral offense was drawing some figure in the sand, but there was diversity in which figure was deemed the worst. One group thought it was circles, a second group thought it was squares, and a third group thought the worst was triangles. In one condition, these three groups (A, B, and C) were equal in size; in the other condition, one group (C) was tiny and the other two groups were equal in size. As we expected, in both conditions, participants rejected universalism. That is, they maintained that if two of the aliens held different views, it was not the case that one of them had to be wrong. Our primary interest was in how participants would think about the third group. We found that participants were much more likely to think that a member of the third group was mistaken when the third group was tiny. In effect, they tended to think that if the group was so small, the people in that group were probably making a kind of performance error. Participants’ written responses reflect this. Here are some representative explanations for why they said that people in the tiny group are likely wrong: “If a large majority sees something as wrong, then it must be considered. However, a few people might hold some extreme opinion, such as drawing a triangle, but that cannot always be taken as fact.”

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi



131

a.

noise b.

Figure 6.4 Different relativist models for split consensus

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

“There is a definite large number of aliens who agree with Alien A, while there are only two aliens who believe drawing the square is wrong. So, I would favor the majority.” “I think that since only two aliens from group C agree on what they believe then they must be wrong. The numbers are against them and for the others.” We predicted this result based again on considerations about treating consensus as evidence. Even if relativism holds, the number of adherents in each group plausibly also matters. If we have a population divided among three positions, A, B, and C, then if A and B are endorsed by large groups and C by a tiny group, we might reject universalism while also treating the C responses as noise (cf. Figure 6.4a). And we might treat the C responses as noise precisely because appealing to a third fact introduces excessive flexibility (cf. Figure 6.4b). The earlier example with summer months can be repeated. We know that many people think July is a summer month and many think July is a winter month. This is good reason to reject universalism about which months are summer months. But what if we find out that a few people maintain that July is a spring month? We could add a third relativized fact to accommodate these few. But instead, we would dismiss these few as making a mistake in thinking that July is a spring month, and we would do this while granting

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

132

 

that it’s only a relative truth that July is a summer month. Our participants were sensitive to this issue, as reflected in their overall judgments that the tiny third group was mistaken. Moreover, as reflected in the explanations quoted above, many participants seemed explicitly to think that the fact that the third group was so small was the reason to treat it as noise. This is, of course, a good reason to treat it as noise, given the theoretical shortcomings of overly flexible hypotheses. So, the algorithms that people are using to solve these tasks seem to be sensitive to at least some of the rationally relevant considerations.

5. Reprise Before moving to the implications, I’d like to review the main proposals of this part of the book, starting with the current chapter.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Universalism and Relativism People tend to think it’s universally true that it’s wrong to steal but only relatively true that it’s wrong to wear jeans to school. On the account presented above, when considering whether some judgment that P is universally or only relatively true, people can use information about consensus to decide between two hypotheses: (HUniversalism) There is a single fact about whether P, and this fact (partly) explains the pattern of people’s judgments. (HRelativism) There is no single fact about whether P; rather, whether P holds is relative to context or culture, and this relativity (partly) explains the pattern of people’s judgments. When there is low consensus regarding whether the claim that P is true (i.e., many people think it is true and many people think it isn’t), this can count as evidence that the claim is only relatively true. But when there is high consensus that the claim that P is true, that provides reason to think that the claim is not relativistically qualified in the same way. In that case, a small minority is reasonably regarded as noise. The statistical principle in operation here is the tradeoff between fit and flexibility. If opinion is largely split, then, unless we have prior reason to

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi



133

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

think it’s a universal domain or the opinion-holders are unreliable on the topic, we should infer relativism as a means of fitting the data. However, if opinion is almost universal, then, unless we have prior reason to think it’s a relativist domain or the opinion holders are unreliable on the topic, we should maintain universalism rather than take on a flexible account like relativism. Our experiments show that for claims about color and about morality, people do treat high consensus as evidence for universalism and low consensus as evidence for relativism. That is, in these domains, people are more likely to make universalist judgments when consensus is high and relativist judgments when consensus is low. So people are sensitive in the expected way. Now, when it comes to the claim that it’s wrong to steal, there is widespread consensus, so there is no pressure to treat this claim as relative. By contrast, there is no such consensus about the claim that it’s wrong to wear jeans to school, and this suggests that the claim about attire must be relativized to context; the fact that these actions are forbidden in some contexts and allowed in others provides evidence that claims regarding these actions are relative.

Moral/Conventional Distinction The foregoing account of universalist and relativist judgments can be extended to explain judgments about authority independence. Insofar as people judge an action universally wrong, it follows that they should regard it as wrong independent of authority. Indeed, as we saw in Section 2, people do use consensus information as evidence about whether a normative claim is authority independent. People are more likely to judge an action wrong in an authority-independent way under high consensus than under low consensus.

Scope Children tend to treat social and moral rules as act-based rather than consequence-based. They typically treat the rules as saying that agents shouldn’t produce certain consequences, as opposed to saying that agents should try to minimize such consequences. For instance, the moral rule regarding lying says that one shouldn’t lie rather than that one should

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

134

 

minimize lying. Children aren’t explicitly told things like “it’s wrong to lie, but one is not required to minimize others’ lying,” so how does the child acquire such narrowly constrained rules? In Chapter 3, I suggested that, when a child is learning a prohibitory rule, her hypothesis space includes these alternatives: (HAct) Act-based: the rule being taught prohibits agents from producing a certain consequence.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

(HConsequence) Consequence-based: the rule being taught prohibits agents from producing a certain consequence and also from allowing such a consequence to come about or persist. Any behavior that counts as a violation on HAct will also count as a violation on HConsequence , so a consequence-based rule will characteristically comprise a much larger set than an act-based rule. Now, when learning a new rule by sample violations, if all of the sample violations are consistent with an actbased rule (that is, if all of the examples are cases in which a person violates the rule by producing the consequence), then this is evidence that the rule is act-based. Otherwise it’s a suspicious coincidence that all the examples fall into the smaller hypothesis that the rule is act-based. The operative statistical principle is the size principle, which entails that if deciding between two hypotheses, one of which is a nested subset in the other, if all the evidence is consistent with the smaller hypothesis, that smaller hypothesis has a higher likelihood. Since the set of consequences produced by an agent is a subset of the set of consequences either produced or allowed by an agent, if all of the sample violations are cases in which the violator produced the consequence, this is evidence that the rule applies only to producing the consequence. Our experiments show that people are sensitive to sample violations in appropriate ways. When all the sample violations are cases in which the person produces the consequence, people infer that the rule is act-based; when the sample violations are a mix of producing and allowing, people infer that the rule is consequence-based. In addition, our analysis of a corpus of child-directed speech indicates that the overwhelming majority of violations marked by adults are cases in which the violator has produced the consequence. Thus, here again we have a learning theoretic explanation for an important part of moral cognition – the acquisition of act-based rules rather than consequence based rules.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi



135

Priors People show a pronounced prior in favor of expecting new rules to be actbased. In Chapter 4, I explained how this prior might be explained as the result of learning an overhypothesis. The candidate hypotheses are as follows: (HAct-Overhyp) Overhypothesis that rules tend to be act-based. (HConsequence-Overhyp) Overhypothesis that rules tend to be consequencebased.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

(HNo-Overhyp) No overhypothesis about whether rules tend to be act- or consequence-based. The operative statistical principle here is that if most known instances of rules are act-based, then one should adopt the overhypothesis that rules tend to be act-based. Alternatively, if most known instances of rules are consequence-based, then one should adopt the overhypothesis that rules tend to be consequence-based. Our experiments show that people make these kinds of inferences—when exposed to consequence-based rules, they are more likely to expect a new rule to be consequence-based. Further, at least in our culture, most rules that people know do seem to be act-based, so it is rational for them to have an overhypothesis—a prior—that rules tend to be act-based, and they should accordingly expect new rules to be act-based.

Closure People, at least in our culture, tend to think that they enjoy the liberty to do many things that have never been mentioned in explicit tuition of rules. One way to explain this tendency is that people accept a closure principle of Residual Permission (or “Liberty”) that specifies that whatever isn’t expressly forbidden is permissible. This kind of principle is quite different from other elements of the normative system. But in Chapter 5, I argued that there is a natural explanation for how this principle might be acquired. We take the relevant hypotheses for the learner to be as follows: (HResProh) The closure principle is one of Residual Permission: if an actiontype is not expressly forbidden, then acts of that type are permitted.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

136

 

(HResProh) The closure principle is one of Residual Prohibition: if an actiontype is not expressly permitted, then acts of that type are prohibited. (HNoClosure) There is no closure principle.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

We draw on the idea that a learner might exploit pedagogical sampling. A teacher who regards the learner as rational would aim for efficient instruction, and as a result, a learner might rightly expect such a teacher to provide efficient training examples. If a teacher presented the learner with exclusively prohibition rules, that would be most efficient if the other kinds of actions in the domain were allowed. Based on this idea, we predicted that participants would infer the Residual Permission Principle when trained on prohibitions, and they would infer the Residual Prohibition Principle when trained on permissions. This is exactly what we found across several experiments. If, as seems likely, most moral rules that people are taught are prohibition rules, it would be apt, based on the foregoing principle, for them to infer a Liberty principle for the moral domain.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

PART III

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

PHILOSOPHICAL IMPLICATIONS

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

7 Moral Empiricism To what extent is morality prewired into our minds? The idea that morality is built into us is perhaps as old as philosophy itself, receiving a critical treatment already in Plato. Moral nativism has seen a resurgence of late. Indeed, the prevailing systematic account of how we acquire complex moral representations is a nativist view inspired by arguments in Chomskyan linguistics. If the statistical learning accounts I’ve defended in Part II of this book are right, we have incipient empiricist explanations for important aspects of human morality. This chapter will offer a sustained defense of a moral empiricist view in the face of the Chomskyan challenge.¹

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

1. Moral Nativism Linguistic nativism has received the bulk of attention in contemporary innateness debates because Chomsky made a case for linguistic nativism characterized by unprecedented rigor. Hence it is not surprising that recent attempts to revive the thesis that we have innate moral knowledge have drawn on Chomsky’s framework. In particular, several philosophers and psychologists have suggested that there is an innate domain-specific learning device for the acquisition of moral principles (Dwyer 1999; Harman 1999; Hauser 2006; Levine et al. 2018; Mikhail 2011). The most detailed treatment of the moral Chomskyan view is to be found in Mikhail’s The Elements of Moral Cognition. In the concluding pages of the ¹ In this chapter, I won’t try to explain why children receive the input they do from their parents (and others in the community). The obvious beginning of an answer is that children get this input because their parents are manifesting rules and distinctions that they themselves hold. But then there is the further question about why their parents have the rules and distinctions that they do. I will take this question up, in a limited fashion, in Chapter 8, Sections 3 and 4. For present purposes, I am focused on engaging the Chomskyan nativists. As we’ll see, their Poverty of the Stimulus argument starts from the input the child gets, and argues that the child can’t transition from this input to the moral grammar that is acquired. Thus, to parry this nativist argument, I don’t need to have an account of the genesis of the input.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

140

 

book, Mikhail draws out the abiding promise of the linguistic analogy. He writes, Linguists and cognitive scientists have argued that every normal human being is endowed with innate knowledge of grammatical principles . . . Both the classical understanding of the law of nature and the modern idea of human rights . . . rest at bottom on an analogous idea: that innate moral principles are a common human possession. (Mikhail 2011: 317)

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Mikhail is cautious about affirming an argument for this nativist view (71, 349), but he rightly regards it as a deep question thrown into relief by the Chomskyan program in moral psychology. With the linguistic analogy, Mikhail suggests, “The existence and character of a natural moral law, which ancient belief held is written in every heart, is, or can be, a problem of ordinary science” (318). Since the linguistic analogy is rooted in Chomskyan linguistics, let’s start there. Chomsky characterizes the investigation of language in terms of three questions: (i) What constitutes knowledge of language? (ii) How is knowledge of language acquired? (iii) How is knowledge of language put to use? (Chomsky 1986: 3) The moral Chomskyans have focused on the first two questions, so I will do the same and largely ignore the third question (see, e.g., Mikhail 2011: 88ff). As for what constitutes knowledge of language, Chomsky says, “The answer . . . is given by a particular generative grammar, a theory concerned with the state of the mind/brain of the person who knows a particular language” (1986: 3). Chomskyans typically maintain that this grammar is largely composed of rules, which are richly structured mental representations.² Similarly, moral Chomskyans maintain that knowledge of morality is constituted by a moral grammar, a set of rules which are richly structured mental representations (e.g., Mikhail 2011: 88).

² Chomsky himself has, in places, distanced himself from the view that grammars are composed of representations (for discussion, see Rey 2003 and Chomsky 2003). But the moral Chomskyans have worked with a representational understanding of the linguistic and moral grammars, and I will adopt this understanding as well.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

 

141

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

For the second question—how language is acquired—Chomsky says the answer “is given by a specification of [Universal Grammar] along with an account of the ways in which its principles interact with experience to yield a particular language; [Universal Grammar] is a theory of the ‘initial state’ of the language faculty, prior to any linguistic experience” (1986: 3–4). Chomsky’s appeal to an initial state of the language faculty reflects the nativist view that language acquisition depends on some domain specific device. It’s not just that the language acquisition device is a separate mechanism. The great interest of the nativist proposal is that different acquisition devices run different programs: the device for learning to perceive faces is supposed to be quite different from the device for learning a grammar. The moral Chomskyans follow suit and maintain that, just as the acquisition of the complex linguistic grammar depends on a domain-specific language acquisition device, so too the acquisition of the complex moral grammar depends on a domain-specific morality acquisition device. As Susan Dwyer puts the view: The child’s mindbrain contains (at some level of abstraction) a morality acquisition device (or moral faculty) that makes possible the acquisition of all and only humanly possible moralities. The moral faculty is characterized in terms of a set of rules, principles, and constraints (universal moral grammar) that determine what aspects of her environment a child needs to pay attention to, and, together with what she hears and sees around her, determines her mature moral competence, which we can call her I-morality, or moral idiolect. (Dwyer 2006: 242)

The moral acquisition device is, of course, different from the language acquisition device, and both are different from any domain general learning device like associationist or statistical inference mechanisms. The work my collaborators and I have done on moral cognition is entirely in keeping with the moral Chomskyan view about what constitutes moral knowledge (e.g., Lopez et al. 2009; Mallon & Nichols 2010; Nichols & Mallon 2006; Nichols et al. 2016). I heartily agree that the moral system is partly composed of a complex set of richly structured representations. But I want to resist the arguments for a moral acquisition device. Moral nativism has certainly been a popular target for criticism (Dupoux & Jacob 2007; Nichols 2006; Prinz 2007; Sterelny 2010), and I think many of these criticisms are good. But there has been no systematic alternative that attempts to explain the same array of phenomena as the Chomskyans. Insofar as moral

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

142

 

knowledge is constituted by this complex system of rule-based moral judgment, if we are to reject the moral Chomskyans’ nativism, we need a systematic alternative account of acquisition. That is the project I hope to contribute to. In Part II of this book, I argued that several aspects of moral cognition can be explained with statistical learning models. This can provide the rudiments for a systematic alternative to moral nativism. In this chapter, I will say more precisely how the work on statistical learning bears on the nativist/empiricist debate regarding the moral system.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

2. The Argument from Poverty of the Moral Stimulus As noted in Chapter 1, the debate between empiricism and nativism concerns the nature of the operative mechanisms of acquisition. As also noted there, the empiricist/nativist debate must always be considered for a particular domain. Some capacities, like the segmentation of word boundaries, might be explicated in terms of domain general mechanisms like statistical learning. Other capacities, like birdsong in the swamp sparrow, clearly depend on innate domain specific mechanisms.³ To show that moral knowledge depends on an innate, domain specific mechanism, moral Chomskyans apply a poverty of the stimulus (POS) argument to the moral domain, again following Chomsky’s example.

2.1 Poverty of the Stimulus Arguments Like most toweringly influential arguments in philosophy, Chomsky’s POS argument is at its core quite simple. We start by assuming that empiricist learning proceeds by applying domain-general learning mechanisms (e.g., association, hypothesis testing) to the available data. For the case of linguistic knowledge, the POS tries to show that the stimuli to which the child is exposed don’t contain enough information to reliably enable an empiricist learner to acquire the linguistic competence that children exhibit (Laurence & Margolis 2001; see also Botterill & Carruthers 1999; Cowie 1999). Let’s see how this works in more detail. ³ There has been much discussion about how to define innateness (see, e.g., Cowie 1999; Samuels 2002), but I am happy enough to rely on exemplars of innate traits (e.g., ears) and noninnate traits (e.g., scars) as a rough guide to whether a cognitive trait is innate (see Laurence & Margolis 2001: 219–20).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

 

143

The challenge of the POS argument isn’t as simple and ill-defined as “How do babies go from zero to grammar?” If that were the challenge, it would be rather easier to ignore. Instead, the challenge is much more precise. In particular, the challenge grants the young child a good deal of knowledge in the starting state, and then asks the empiricist to explain how the child goes from that state of knowledge to, for example, the sophisticated knowledge of grammar we find later. By the time a child is acquiring her competence of English grammar, it’s plausible that she already possesses a great deal of information. For instance, she knows how to distinguish one word from another, and she likely knows the names of several individuals and the words for several objects. The debate about the acquisition of grammar grants such a background of presumed knowledge. To frame the debate in a way that is neutral between empiricism and nativism, the explanandum is how a child gets from a certain specified starting state to some interesting subsequent state. How does the learner transition from this starting state of knowledge to a subsequent state of interest, e.g., the mature or “steady” state (Chomsky 1980: 37)? An empiricist explanation of the transition maintains that, given the evidence available to the child, domaingeneral learning devices suffice to generate the steady state from the starting state (Figure 7.1). If the empiricist succeeds in giving such an account, all he has done is to explain, in an empiricist way, the transition from the specified starting state to the steady state. It’s important to note that such an explanation is consistent with the view that the general-purpose learning devices are innate. Moreover, such an empiricist explanation is also consistent with the starting state itself being innate (see Section 4).⁴ The challenge that the empiricist seeks to meet is just to explain the transition from starting to steady state.⁵ The linguistic nativist rejects the empiricist explanation and ⁴ The notion of a starting state, as I’m using it here, is weaker than Chomsky’s “initial state of the language faculty” (1986: 3). Chomsky’s “initial state” is supposed to provide the innate contribution of the language faculty. The notion of a starting state, by contrast, leaves open whether that state is innate or acquired through empiricist learning. ⁵ This characterization of the challenge plausibly applies even to the ur-text for nativism, the Meno. In the Meno, Socrates leads the boy to make a sophisticated judgment regarding geometry, even though the boy had never been taught geometry. The argument begins with Socrates determining what the boy already explicitly knows—he knows (1) what a square is, (2) that the lines of a square are equal, and (3) that a square can be of any size. It is common ground between the nativist and the empiricist that the boy knows this much. The innovation in the Meno is to reveal that the boy actually knows a lot more about geometry than these basic facts, and to argue that the only explanation for the sophisticated geometric knowledge is that it is innate—it can’t be explained in terms of the boy’s experience.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

144

  Starting State

Domain general learning devices

Steady State

Evidence

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Figure 7.1 Empiricist model of learning

argues that the transition between the starting state and the steady state cannot be explained by general-purpose learning devices. This is where the POS argument enters. Birdsong provides a nice illustration for how a successful POS argument works. When reared in isolation, the song sparrow and swamp sparrow get what is effectively the same stimulus, but the isolate swamp sparrow ends up producing a different song from the isolate song sparrow, and the different songs resemble the different songs produced by non-isolated members of their respective species. Members of other bird species, not to mention humans or cats, wouldn’t develop the isolate swamp sparrow’s song if given the same evidence. This indicates that a domain-general learning device can’t explain how the isolate swamp sparrow arrives at its distinctive song. The natural conclusion is that the swamp sparrow has an innate domain-specific acquisition device with song-related information. This would be akin to Universal Grammar—a universal swamp-sparrow song template. The nativist argument regarding songbirds capitalizes on the fact that isolate birds from different species produce different songs. Since Chomskyans maintain that linguistic competence is both species universal and species specific, there is no closely related species that we can observe for differences. Hence, POS arguments often appeal to imagined species that match (or exceed) us in intelligence, but lack our species-specific grammatical endowment. Chomskyans maintain that if such an alien species with superlative empiricist learning abilities were exposed to the same linguistic information as the child, the alien would come to different views about what the right grammar is. It will be useful to have in place an instance of a POS argument from Chomsky, and I’ll use the most familiar example, which concerns question

(Aristotle’s empiricist alternative holds that the boy’s additional knowledge about geometry is the product of reasoning, presumably in a domain-general way, from his more basic knowledge of geometric facts.)

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

 

145

formation via auxiliary fronting (1980: 39–40; for more recent examples, see Hornstein 2009; Lidz & Gagliardi 2015).⁶ The argument is delightfully concrete. Imagine an alien scientist in the position of the child trying to figure out the rule for how to form questions. The child is exposed to data, in the form of sentence transformations from declaratives to interrogatives, like these: (1)

The man is here.—Is the man here?

The man will leave.—Will the man leave? (Chomsky 1980: 39) The alien scientist has such data and is trying to figure out the underlying rule for forming questions. He will consider various hypotheses. Chomsky outlines two hypotheses that would generate the proper formation of the interrogatives from the declaratives in (1):

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

H₁: process the declarative from beginning to end (left to right), word by word, until reaching the first occurrence of the words is, will, etc.; transpose this occurrence to the beginning (left), forming the associated interrogative. H₂: same as HI, but select the first occurrence of is, will, etc., following the first noun phrase of the declarative. (Chomsky 1980: 39) Chomsky calls H₁ a “structure-independent rule” and H₂ a “structuredependent rule.” He writes, “A scientist observing English speakers, given such data as (1), would naturally select hypothesis H₁ over the far more complex hypothesis H₂, which postulates abstract mental processing of a nontrivial sort beyond H₁” (1980: 39). Although H₁ does fine with the data in (1), it produces the wrong result for more complex sentences like “The man who is here is tall.” H₂ generates “Is the man who is here tall?” whereas H₁ would generate “Is the man who here is tall?” Again, the data in (1) are consistent with both H₁ and H₂. Further, while children have lots of evidence regarding sentences like those in (1), it’s likely that children aren’t taught the relevant facts about more complicated sentences, like the proper way to convert “The man who is here is tall” into an interrogative. However,

⁶ The example is contested (see, e.g., Perfors et al. 2011). But for present purposes, I will just assume that this example works for linguistic nativism, since my point in using the example is to explicate the linguistic analogy, not to evaluate the argument for linguistic nativism.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

146

 

Chomsky observes, although children make lots of mistakes when acquiring language, they never say things like “Is the man who here is tall?” (1980: 40) (see also Crain & Nakayama 1987). The point is that, if the child were just trying out various possibilities based on the evidence, the unstructured rule H₁ should have been a natural thing to try out. But apparently this never happens. Apparently children don’t even try the structure-independent rule. Chomsky suggests that the foregoing argument supports a thesis about Universal Grammar. He writes,

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

. . . all known formal operations in the grammar of English, or of any other language, are structure-dependent. This is a very simple example of an invariant principle of language, what might be called a formal linguistic universal . . . Given such facts, it is natural to postulate that the idea of “structure-dependent operations” is part of the innate schematism applied by the mind to the data of experience. (Chomsky 1972: 30; see also Crain & Nakayama 1987: 522)

Thus Chomsky presents a POS argument against the empiricist explanation for language acquisition, as well as a specific positive proposal—that the child favors the structure-dependent rule because that bias is part of the innate Universal Grammar.⁷

2.2 Poverty of the Moral Stimulus Moral Chomskyans use the same kind of reasoning for an argument from the poverty of the moral stimulus. Mikhail writes: If the relevant principles [of moral competence] can be shown to emerge and become operative during the course of normal development, but to be

⁷ One can distinguish two claims drawn from the POS argument, a negative and a positive claim (see, e.g., Laurence & Margolis 2001: 248). If the linguistic POS argument works at all, it delivers the negative conclusion that the acquisition of grammar can’t be explained by the empiricist proposal. But typically nativists also promote a positive proposal about what does explain acquisition, e.g., an innate language acquisition device which specifies a Universal Grammar. Although it’s important to acknowledge the distinction between negative and positive claims, moral Chomskyans advance both the negative claim that moral competence exceeds empiricist learning capacities and the positive claim that the missing piece is an innate morality-acquisition device. As a result, in the interests of simplifying the discussion, I will not carefully mark this distinction in the text (but see Nichols 2005).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

 

147

neither explicitly taught nor derivable in any obvious way from the data of experience, then there would appear to be at least some evidence supporting an argument from the poverty of the stimulus in the moral domain. (Mikhail 2011: 82)

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Just as linguistic Chomskyans take a successful POS to provide evidence for a Universal Grammar, the moral Chomskyans take a successful Poverty of the Moral Stimulus argument to provide evidence for a Universal Moral Grammar (Dwyer 1999: 185; Harman 1999: 114; Mikhail 2011: 73, 318). The Poverty of the Moral Stimulus has been expressly developed for two different aspects of moral cognition—the moral/conventional distinction and the scope of moral rules. I’ll sketch out both of these arguments in the remainder of this section before turning to empiricist alternatives in Section 3.⁸

Moral/Conventional Distinction As discussed in Chapter 6, a robust tradition in developmental moral psychology investigates moral judgment by exploring the basic capacity to distinguish moral violations from conventional violations (for reviews, see Smetana 1993 and Tisak 1995). From a young age, children distinguish canonical moral violations (e.g., hitting, pulling hair) from canonical conventional violations (e.g., talking during story-time) on a number of dimensions. Perhaps most interestingly, conventional rules, unlike moral rules, are viewed as dependent on authority. For instance, if at another school the teacher has no rule against talking during story-time, children will judge that it’s not wrong to talk during story-time at that school; but even if the teacher at another school has no rule against hitting, children claim that it’s still wrong to hit.⁹ Susan Dwyer takes this early appreciation of the moral/conventional distinction as the acquirendum for her argument. Like other researchers, she focuses largely on the fact that children take morality to enjoy a ⁸ The moral reactions of infants (see, e.g., Hamlin et al. 2007) and nonhuman animals (see, e.g., de Waal et al. 2008) are naturally interpreted in terms of moral badness (or goodness). Kiley Hamlin et al. (2007) showed babies events in which one agent either helps or hinders another agent. They found that babies preferred helpers to hinderers. A plausible explanation of the phenomenon is that the babies assign a negative valence to hinderers, and this emerges so early that it seems to demand a nativist explanation. But even if babies have an innate preference for helpers over hinderers, it is a further question whether babies judge that it was wrong for the agent to hinder, and Hamlin does not argue for that richer interpretation of the data. My focus in this chapter is on the moral Chomskyans’ nativist account of judgments of wrongness. ⁹ The psychological depth of the moral/conventional distinction has been debated (e.g., Kelly et al. 2007; Kumar 2015), but for present purposes I will simply take for granted that the research reveals an important moral competence.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

148

 

distinctive kind of authority independence. In her first article on moral nativism, she writes: [T]he recognition of a distinction between moral and conventional domains, and the belief that moral considerations are imbued with special force and authority appear to be universal features of human life . . . (Dwyer 1999: 169–70)

Similarly, in her subsequent treatment of moral nativism, she writes:

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Three- to four-year-olds understand that moral rules differ from conventional rules in terms of two main criteria: the former have force that is independent of any particular authority (e.g., God, parents, social custom) and are closely tied up with considerations of harm and injury . . . (Dwyer 2006: 237)

Thus, a key acquirendum for Dwyer is the recognition that moral rules, unlike conventional rules, are authority independent. Dwyer goes on to propose that a Poverty of the Stimulus argument indicates that the recognition of the moral/conventional distinction has an innate basis (Dwyer 1999: 171–7; 2006: 239–42). According to Dwyer, “the fundamental mistake” of empiricist accounts like social learning theory is “the assumption that all the information the child needs to achieve moral maturity is available in her environment” (1999: 172). She continues: Absent a detailed account of how children extrapolate distinctly moral rules from the barrage of parental imperatives and evaluations, the appeal to explicit moral instruction will not provide anything like a satisfactory explanation of the emergence of mature moral competence. What we have here is a set of complex, articulated abilities that (i) emerge over time in an environment that is impoverished with respect to the content and scope of their mature manifestations, and (ii) appear to develop naturally across the species. (1999: 173)

According to Dwyer, just as empiricist accounts can’t explain the child’s linguistic competence, empiricist accounts can’t explain the child’s moral competence, as revealed by their grasp of the moral/conventional distinction (Dwyer 2006: 239–40). That is, Dwyer maintains that the child’s moral competence exceeds what an empiricist learner would be able to achieve

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

 

149

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

given the information available in the environment. She concludes that “we all come into the world equipped with a store of innate moral knowledge which, together with our experience, determines our mature moral competence” (1999: 176–7). Given the widespread distribution of the moral/conventional distinction across cultures, Dwyer speculates that children are “in possession of some knowledge that primes them for recognizing two normative social domains” (1999: 177). Thus, the innate contribution to moral competence can be thought of as a Universal Moral Grammar (1999: 185).

Scope Distinctions (PDE) While Dwyer promotes a POS argument for the moral/conventional distinction, other moral nativists have focused on the scope of rules, that is, what the rules apply to. As we saw in Chapter 3, Moral Chomskyans like Harman and Mikhail suggest that the Principle of Double Effect (PDE) is reflected in the pattern of intuitions people have about trolley cases (Harman 1999: 113–14; Mikhail 2011: 360). The PDE holds that it can be permissible to bring about a bad consequence when that consequence is foreseen but not intentional. If this principle is represented in people’s normative system, this would partly explain why people judge it permissible to pull the switch to save five people, knowing that one person on the side track will be killed. With this explanation for lay judgments about trolley cases, Harman suggests that moral nativism is required to explain why people’s judgments conform to the PDE. If the PDE is “adequate to an ordinary person’s I-morality [the moral idiolect of an individual]” this would provide reason to think that the principle is part of universal moral grammar: An ordinary person was never taught the principle of Double-Effect . . . , and it is unclear how such a principle might have been acquired from the examples available to the ordinary person. This suggests that the relevant principle is built into I-morality ahead of time, in which case we should expect it to occur in all I-moralities (or be a default case, or something of the sort). In other words, the principles should be part of universal moral grammar. (Harman 1999: 225)

In short, Harman claims that it is implausible that people are taught the PDE. So, if this principle is encoded in the moral system, then Harman suggests, it must be a built-in element of a Universal Moral Grammar. These two cases—the moral/conventional distinction and the PDE—are the most prominent instances of Poverty of the Moral Stimulus arguments. But one can easily imagine such arguments developed for the other elements

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

150

 

of moral cognition that we have investigated, like the prior for act-based rules (Chapter 4) and the belief in the principle of liberty (Chapter 5). Where we are unable to explain how some feature of our normative systems are acquired, Poverty of the Stimulus arguments will be an available option.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

3. Empiricist Learning of Moral Systems In Part II, I sketched statistical learning accounts of the acquisition of several features of rule systems. In Chapter 3, I argued that we can give a statistical learning account of how children come to think that a rule is act-based rather than consequence-based, based on the size principle. In Chapter 4, I argued that we can also give a statistical learning account of how children come to expect that a new rule will be act-based, based on the formation of overhypotheses. Chapter 5 offered a statistical learning account for how people come to believe a principle of liberty, according to which whatever is not expressly prohibited is permitted. In Chapter 6, I argued that people might rationally infer universalism or relativism about some claim, based on evidence from consensus; I also argued that similar considerations could provide a statistical learning explanation for how people come to think that moral claims are true independent of authority. All of these accounts are empiricist accounts. That is, they explain the transition between a starting state and a steady state in terms of a domain general mechanism operating over the available evidence. The domain-general learning mechanisms are, of course, statistical learning mechanisms. Their domain generality is revealed in the fact that they can be used across wildly different domains. For instance, as we’ve seen, one can form overhypotheses about anything from the color of marbles in bags to the character of rules in a norm system.

3.1 Moral Empiricism Insofar as the proposals in Part II are empiricist accounts, what I aim to provide is a partial sketch of a moral empiricist theory.¹⁰ But there are

¹⁰ Elsewhere (Nichols 2006), I have argued, contra the nativist, that emotional responses might explain the emergence of the moral/conventional distinction without appealing to a domain-specific morality-acquisition device. The statistical learning explanation I offer here can be used to supplement or replace the emotion-based account.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

 

151

several ways in which the approach taken here differs from other empiricist accounts. First, unlike many classical empiricist procedures, the procedures implicated in statistical learning are rational. Associative principles, by contrast, are characteristically not regarded as rational (Mandelbaum 2017). For example, in Hebbian learning, if two mental states (e.g., hearing the word “salt” and hearing the word “pepper”) tend to co-occur for an organism, the organism will develop an association between those mental states, and the activation of one state will facilitate the activation of the other. But this is nothing like rational inference. It’s just a brute link between the two states. By contrast, the domain-general mechanisms that I’ve invoked are taken to be rational principles of statistical inference. Another difference from other empiricist accounts concerns the kinds of mental states involved. Contemporary empiricist accounts are often austere in the kinds of states to which they appeal, sometimes foreswearing symbolic representations altogether (e.g., Elman et al. 1996; Rumelhart & McClelland 1986). Relatedly, as noted in Chapter 1, many cognitive scientists have tried to explain key aspects of moral judgment with austere resources like aversions and habit learning (Cushman 2013; Greene 2008). By contrast, I have been making liberal use of representations. Indeed, I’ve maintained, along with the moral Chomskyans, that the moral system is composed of richly structured representations. The statistical learning mechanisms are supposed to manipulate and generate such representations (cf. Perfors et al. 2011: 308). The statistical learning accounts can actually corroborate the claim that the moral system involves richly structured representations. There are significant difficulties in empirically demonstrating that moral systems really do involve complexly structured rules. For what we typically measure is a person’s judgment about a case, and even if rules play a part, judgments also depend on several other factors. People’s judgments about moral scenarios are affected by the emotions elicited before the judgment (e.g., Strohminger et al. 2011; Valdesolo & DeSteno 2006). Judgments are also affected by how the questions are worded, and by the order in which the questions appear (Petrinovich et al. 1993). And people’s moral judgments are affected by the weight of competing values; for instance, people are more likely to say it’s okay to kill an innocent to save twenty people than to save five people (Bartels 2008; see also Nichols & Mallon 2006). Given these disparate influences on people’s moral judgments, it is difficult to isolate the unique contribution of the rule and hence to show that the rule does indeed have a complex structure.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

152

 

A successful statistical learning account can help. For if it is true that, given the evidence and the statistical principles that are available to the child, she should acquire the complexly structured rule that has been posited as the acquirendum, this lends credence to the hypothesis that this is indeed what gets acquired and represented. Similarly, if it is true that, given the evidence from consensus and the statistical principles that are available, children should think that certain normative claims are relative, that provides reason to think that meta-evaluative qualifications really are encoded in moral representations.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

3.2 Limitations Although we have a sketch of an empiricist account of moral learning, there are several salient limitations. As I’ll discuss in more detail below (Section 4), all of the proposals are only locally empiricist. In addition, the experimental work my collaborators and I have done has typically not used actual moral cases. Rather, in order to reduce the influence of prior expectations, our learning studies typically exploit artificial rule systems. In using artificial systems, we follow the example of research on word and grammar learning, where it is typical to use artificial words and artificial grammars to illuminate the basic capacity for learning in these domains (e.g., Gomez & Gerken 1999). Nonetheless, one might worry that the results don’t illuminate the acquisition of the moral system, but only the kinds of artificial systems that we teach participants. The fact that most of the evidence I have marshaled doesn’t use specifically moral rules does limit the strength of the conclusion somewhat. However, it’s important to see that the central considerations bearing on the difficulty of acquisition aren’t morally specific in any way. Moral Chomskyans emphasize the complexity and subtlety of the structures that are acquired. And these kinds of structures—act-based rules, closure principles, relativizing parameters—aren’t unique to morality. As a result, learning that moral rules have act-based scope, that moral systems are characterized by a principle of liberty, that moral claims tend to be universal—these achievements would only need to be particular applications of a general talent for statistical inference over the hypothesis space. A related qualification is that my explanations for the acquisition of these aspects of rule systems are only “how possible” explanations. We don’t attempt longitudinal studies that examine the processes implicated at the

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

 

153

moment when a given child comes to regard a moral rule as act-based. Thus, even if the aspects of moral systems we explore are learnable—even if the information is in the signal and the ability is within us—we don’t have sufficiently fine-grained longitudinal studies of moral learning to show that kids actually learn the distinctions in an empiricist fashion. This is a limitation to be sure, but offering “how possible” explanations can shift the direction of the debate. The development of research in speech perception provides an instructive precedent. Early work in speech perception indicated that people make categorical discriminations (e.g., between /da/ and /ta/) which ignore intermediate sounds. When presented with many sounds along an acoustic continuum, people move abruptly from perceiving one sound (presumed to be perceived as a phoneme) to perceiving another sound (a different phoneme), rather than having a continuous series of acoustic perceptions that parallels the acoustic continuum. The sensible interpretation of this is that people hear either one phoneme or the other. It was initially thought that this categorical perception could not be acquired from acoustic processing without a special-purpose mechanism for speech perception; as a result, it was suggested that the ability to make these discriminations was uniquely human (Liberman et al. 1972). The idea was that it’s not possible for an organism to make the categorical discriminations characteristic of phoneme perception without a dedicated speechperception mechanism. But it turned out that chinchillas could learn to make similarly abrupt discriminations, paralleling human performance on phoneme discrimination (Kuhl & Miller 1975). Thus, the information to generate this behavior is plausibly available to general acoustic abilities. This undercuts the “how possible” argument that it’s not possible for non-human animals to produce human-like behavior on categorical perception tasks. But there remains a “how actual” question. The fact that other animals can learn to exhibit such behavior doesn’t mean that they do it the same way as humans (see, e.g., Trout 2001: 530–1). Categorical perception of phonemes in humans might really be a product of a dedicated and species-specific mechanism. Thus, current debates focus on whether the mechanisms that people use in categorical perception are the same as the mechanisms used by other animals in acoustic discrimination tasks (e.g., Holt & Lotto 2008; Pinker & Jackendoff 2005; Trout 2003). The precedent from speech perception suggests that we must be cautious about moving from results on how moral learning might proceed via statistical learning to claims about how moral learning actually proceeds. The fact that there is a local empiricist explanation for how children could

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

154

 

come to learn act-based rules does not entail that this is how children actually come to acquire the act-based rules. Still, the “how possible” accounts strengthen the case for the moral empiricist. The kinds of POS arguments marshaled by moral Chomskyans are themselves pitched at the level of possibility—they maintain that it’s not possible to explain how children acquire a moral system without appealing to an innate domain-specific contribution. At a minimum, the fact that we have empiricist “how possible” explanations undercuts the force of such arguments. Indeed, insofar as the nativist challenge is to explain how we could acquire the complex features of moral systems, the learning theories demystify the problem. Thus, the moral empiricist has a quite general explanation for how we acquire complex features of our moral systems. This is hardly the end of the nativist/empiricist debate, but the moral Chomskyan must now give further reasons to make their nativist view plausible. In particular, they need to provide evidence that even though it’s possible for an empiricist learner to acquire these aspects of moral systems, as a matter of fact, humans do not acquire them in this way. In any case, I hope it’s clear that the cases explored in Part II reveal an impressive empiricist toolkit for the moral psychologist. A wide range of techniques of statistical inference seems to be within the ambit of ordinary people. These techniques include the size principle, overhypothesis formation, pedagogical sampling, and trading off fit and flexibility. The fact that people seem to have such resources available to them in moral learning should give us some optimism about the explanatory promise of moral empiricism.

4. Starting States Although the proposals I’ve made are all based on empiricist mechanisms (namely, principles of statistical learning), I hasten to emphasize that the empiricism is local—it doesn’t go all the way down. The acquisition problems pursued in Part II start in medias res. For instance, the starting states presuppose that the learner already has important representational resources, like representations for agent and wrong; also, insofar as the content of the rules will be framed over harms and intentions, there must also be representations for these categories. For all I’ve said, all of those representations could be innately specified. Indeed, there is a more general issue here. Statistical learning, as I’ve sketched it here, starts with a

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

 

155

hypothesis space. And even if a local empiricist account of the acquisition of, e.g., act-based rules is right, it’s obviously a further question how the hypothesis space gets generated (see, e.g., Nichols et al. 2016: 550; Xu & Tenenbaum 2007b: 251). The fact that the accounts are locally empiricist doesn’t undermine their relevance to the debate. Even in Chomsky’s POS arguments, the empiricist hypotheses under consideration are local. Consider Chomsky’s example of auxiliary fronting discussed in Section 2. The simple hypothesis (H₁) doesn’t start from nothing. For instance, it uses the notion of words and it implicitly assumes that the acquisition device will cluster auxiliary words like “is” and “will” together. The challenge for the empiricist is to explain how the child with mastery of the notion of word (among other things) will come to choose the more complex rule H₂. That is an important challenge even if the concept of word is innate. In moral Chomskyianism, we also find the empiricist alternatives to be local. Mikhail carefully distinguishes between the representations of types of actions and the rules that can be defined over those types. He gives formal characterizations of the representations of types of actions like purposeful homicide and knowing homicide (2011: 134–6), and he uses these to build formal characterizations of the rules—prima facie prohibitions against purposeful homicide and knowing homicide (2011: 150). Importantly, the representations of types of actions are not themselves moral rules. So even if these representations were innate, moral-rule nativism wouldn’t directly follow. Indeed, the essence of the local empiricist account is that, even if the representations of types of actions are innate, the moral rules that are acquired—the rules that guide judgment and action—can be explained in terms of statistical learning over the evidence. Local empiricist accounts do presuppose hypothesis spaces. But we can try to understand the bases of these hypothesis spaces. In at least some of the cases we’ve explored, the hypothesis spaces aren’t unique to the moral domain. Consider, for instance, the hypothesis space for whether moral claims are universally or only relatively true. These different possibilities are not unique to the moral domain. We also need to be able to sort nonmoral claims into these categories. Scientific claims will typically be universally true and claims about time of days will only be relatively true. So we don’t need any special pleading to motivate the existence of the hypothesis space for universalism vs. relativism about moral claims. Something similar applies for the authority independence of morality. The hypothesis that the truth of some claim is not dependent on authority is also implicated in how

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

156

 

we think about factual domains. Whether the earth is round doesn’t depend on the teacher’s rules. So again, we don’t need to posit a special moral hypothesis space for the idea that moral claims are authority independent. Let’s turn next to the hypothesis space for the scope of rules. I suggested that the hypothesis space is composed of two live options:

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

(i) the rule prohibits an agent producing a consequence, or (ii) the rule requires minimizing such consequences. These options presuppose that the learner has categories for consequences that an agent produces and consequences. These two categories naturally compose a subset structure. We saw that if all the sample violations are actions, people will tend to think the rule is act-based, whereas if the sample violations include allowings, people tend to think that the rule is consequence-based. I will not attempt to explain how people acquire the categories of consequences an agent produces and consequences. But even taking those categories for granted, we face a further question. Why are these categories the hypotheses that people consult when learning a rule? Why not the category of things that are allowed to happen? It’s possible to formulate hypotheses over actions, allowings, or consequences. Yet the hypothesis space that people seem to consider consists of act-based and consequence-based. When given a single example of a violation that is an agent producing a consequence, participants tend to think the rule is act-based, and when given a single example of a violation that is an agent allowing a consequence, participants tend to think the rule is consequence-based. But if given a single example in which an agent allows a consequence, why not infer that the rule is allowbased, such that it’s wrong to allow the consequence but permissible to produce the consequence? One way to explain why allow-based rules aren’t in the live hypothesis space is to turn once again to overhypotheses. We have been exposed to lots of act-based rules. And we have been exposed to some consequence-based rules. But we have been exposed to virtually no allow-based rules. This affords the empiricist the option of maintaining that we have learned to exclude allow-based rules from our hypothesis space. If that is true, then we should think of the narrowed hypothesis space as a flexible prior rather than an innate constraint. The reason people tend to focus on act-based vs. consequence-based rules is that they have virtually never seen allow-based rules. That, anyway, is a natural explanation for an empiricist to offer. It is a

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

 

157

further question whether it’s possible for people to learn allow-based rules, which we consider in the next section.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

5. Humanly Possible Moral Systems Thus far, I’ve argued that local empiricist accounts are available for explaining how people could have acquired the complex moral systems that we seem to possess. This serves as an answer to the nativist challenge to explain how it’s possible for us to acquire the moral system we possess. But the linguistic analogy also generates a distinctive and provocative prediction— that there are constraints on the kinds of moral systems we can acquire. Just as Chomskyans maintain that the range of possible human languages is a subset of the logically possible languages, the moral Chomskyans maintain that the range of possible human moralities is a subset of the logically possible norm systems. That is, there are constraints on the kinds of moral grammars we can acquire. It is certainly a familiar feature of Chomskyan linguistics that it maintains that there are constraints on humanly possible grammars. These constraints are typically presented as deriving from Universal Grammar (Lidz & Gagliardi 2015: 334). As we saw above (Section 2), Chomsky suggests that it’s part of the Universal Grammar that grammatical operations are structure dependent, and this excludes unstructured rules like H₁. Such unstructured rules are not available in humanly possible grammars. The linguistic analogy generates a similar proposal about morality. The point has been made by a number of scholars. Here’s Stephen Stich: One of the more intriguing possibilities suggested by the analogy between grammatical theory and moral theory is that, as we learn more about the mental representations underlying moral judgment, we may find that they sustain a similar sort of “argument from the poverty of the stimulus.” Thus it may be that “humanly possible” moral systems are a very small subset of the logically possible systems, and that much of the structure of moral systems is innate, not acquired. (Stich 1993: 96)

Something similar is suggested by Joshua Cohen and Joel Rogers: The hypothesis of moral modularity would explain the acquisition of a system of moral understanding in part in terms of a set of intrinsic features

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

158

 

of mind that are specific to morality. A characterization of the moral module would of course need to be consistent with the variety of moral systems, but it would also impose limits on possible human moralities. (Cohen & Rogers 1991: 9)

More recently, Susan Dwyer and colleagues write: Drawing inspiration from the empirical and theoretical methodologies of generative linguistics, [the linguistic analogy] seeks a description of the mental structures and computations that implement the ubiquitous and apparently unbounded human capacity for making moral judgments. In the process, proponents of [the linguistic analogy] hope to explain how and why every child develops a moral sense, and how this capacity constrains the range of humanly possible moral systems . . . .

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Just as the set of humanly possible languages are likely to be a subset of logically possible languages, we hypothesize that the range of humanly possible moralities is a subset of the logically possible moralities. (Dwyer et al. 2010: 487, 502; see also Hauser 2006: 54)

Despite the enthusiasm for the possibility of innate constraints on moral systems, theorists haven’t given us much guidance about what the constraints might be. They’ve said little about which moral systems are humanly impossible. At a minimum, if we are to follow the linguistic analogy, presumably the limitations derive from formal constraints on the kinds of principles or rules that can enter into the learner’s hypothesis space. The constraints will not be about whether the rules are somehow unreasonable, counterproductive, or stupid. Rather, the linguistic analogy suggests that rules with certain kinds of structures would simply be unlearnable. The person with the most detailed proposal is, once again, John Mikhail. He writes: Just as with language . . . , some invented normative systems—for example, a completely strict liability moral system in which neither fault nor culpable mental states matter, or a system in which negligent homicide is judged to be systematically worse than intentional homicide—appear virtually “unlearnable” because they violate principles of UMG [Universal Moral Grammar]. One could perhaps deliberately acquire or internalize such a norm system, of course, but not simply by allowing one’s natural moral sentiments to unfold. (Mikhail 2012: 170–1)

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

 

159

Mikhail offers two examples here. The first is that no moral system could be entirely framed over strict liability. This is not a distinctive prediction of Universal Moral Grammar (UMG). Given our natural interest in intentions, the role of intention in predicting future behavior, and the evidence intention provides regarding quality of will, it would be surprising on quite general grounds if all of morality ignored intention in favor of strict liability. There is a nearby prediction that would be distinctive to UMG—that strict liability rules are unlearnable. However, this seems to be false. Most people seem to learn that laws against speeding and statutory rape operate by strict liability. And recent work on moral intuitions indicates that people’s intuitions about speeding, incest, and statutory rape are in fact more in line with strict liability (Giffin & Lombrozo 2016). Mikhail’s second example is more promising—that principles of UMG would exclude learning a system in which negligent homicide is systematically judged worse than intentional homicide. There are two elements of the example to distinguish. One concerns homicide itself. Insofar as a community cares about deterring homicide, there are good reasons to condemn intentional homicide at least as much as negligent homicide. That is, it would be unreasonable to condemn negligent homicide more than intentional homicide. As a result, this isn’t a distinctive prediction of UMG. But once again there is a distinctive prediction nearby—viz., that it’s impossible to learn rules that condemn negligently produced consequences more strongly than intentionally produced consequences. This prediction coheres with the fact, registered in the previous section, that there are certain kinds of rules that seem to be excluded from the hypothesis space. People have a bias such that with minimal evidence, they will think that a novel rule is an act-based rule (Figure 7.2a); but if they get evidence that allowing the consequence to occur is a violation, then people will easily infer a consequence-based rule (Figure 7.2b). However, it seems that people never infer that a novel rule is an allow-based rule, i.e., a rule that prohibits allowing certain consequences but not producing those consequence (Figure 7.2c). In the previous section, I offered an empiricist-friendly explanation for how we might have arrived at this narrowed hypothesis space. Mikhail’s application of the linguistic analogy suggests quite a different alternative. His proposal suggests that there is an innate constraint against rules in which negligently produced consequences are morally worse than intentionally produced consequences. If that’s right, then presumably there is also a constraint against allow-based rules. For an allow-based rule says that the

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

160

 

a

b Allowings [consequences allowed or tolerated by agent]

Acts [consequences brought about by agent] Consequences

c Allowings [consequences allowed or tolerated by agent]

Acts [consequences brought about by agent] Consequences

Allowings [consequences allowed or tolerated by agent] Acts [consequences brought about by agent] Consequences

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Figure 7.2 Hypothesis space for scope of rules (shaded regions indicate violation zones)

intentional production of the consequence isn’t even wrong. (So of course, it would follow that allowing the consequence is worse than producing the consequence.) This makes allow-based rules natural candidates for the kind of rule that would be excluded by the moral grammar. Indeed, given the emphasis on principles like the PDE, one might think that there is an “intention constraint” that always includes intentional violations in the set of violations. That is, a learner won’t even consider a rule that does not prohibit intentionally producing the consequence. We wanted to see whether people could learn allow-based rules (Millhouse et al. 2018). Of course we couldn’t use moral rules, since prior moral beliefs would bias the learning task. As in other studies, we had participants learn a novel rule (called “nib weigns”) based on examples of violations. Participants were presented with a list of ten items that might or might not be violations of the rule. All of the items involved a ball on a shelf; five of the candidate violations were actions (e.g., “Nick puts a ball on the shelf”) and five were allowings (e.g., “Claude sees a ball on the shelf and leaves it there”). Next, participants were told that two of the actual violations on the list would be revealed. In the act condition, the two revealed violations were actions. In the allow condition, the two violations were allowings. Participants were explicitly informed that any items not revealed to be violations may or may not be violations, with the following instruction: “We now show some of the actual violations of nib weigns, selected at random from among the violations. The unmarked cases may or may not be violations of nib weigns.” Not surprisingly, we found that when the sample violations were exclusively act-based examples people generalized only to other actions as violations. But we also found that when the sample violations were exclusively allow-based examples, people often generalized only to other cases of allowing. Moreover, their explicit articulations of the rule in the different conditions suggested that they had in fact interpreted the rules as act- or allow-based, in the appropriate ways. In the act-example condition, participants

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

 

161

indicated that the rule said that it was wrong to “put a ball on the shelf,” “knowingly put a ball on the shelf,” and “place a ball on the shelf.” In the allow-example condition, articulations of the rule tended to be restricted to allowing such that the rule said it was wrong to “leave a ball on the shelf,” “see a ball on the shelf and leave it there,” and “see a ball on a shelf and just leave it there.” Thus, allow-based rules, despite their cross-cultural rarity and counterintuitive character, can be learned by examples without explicit instruction or explanation. The problem with allow-based rules is not that they are unlearnable (much less un-representable). Instead, the reason we don’t find allowbased rules in our normative systems is that such rules are unreasonable. They are bad rules for achieving normal ends given other features of our motivational, emotional, and moral systems. This stands in sharp contrast with the kinds of constraints that Chomskyans posit for grammar. The constraints against structure-independent rules are not that they are unreasonable. It’s not that the kid thinks “Tut tut. It is unreasonable to say ‘Is the man who tall is here’.” The innate grammar is thought to constrain the hypothesis space in a more brute way. But that’s not what we find with allow-based rules. These results on learnability are limited in important ways of course. Our studies used novel non-moral rules, and perhaps allow-based rules are unlearnable when the context is specifically moral. But in light of the fact that the rules are easily learned for novel rules, we would need a more explicit defense of the claim that there are innate constraints on the learnability of allow-based rules in the moral domain. A more general limitation of our exploration of unlearnability is that we focused on one kind of counterintuitive rule—allow-based rules. Perhaps there are other kinds of rules that will prove unlearnable. But in the absence of detailed examples of unlearnability, we see no reason to think that there are innate formal constraints on the kinds of rules that can be acquired by moral learners.

6. Empiricism, Plasticity, and Rationality The suggestion that there is a Universal Moral Grammar—that the moral law is written into the hearts of all humans—can be a comforting thought. The statistical learning accounts I’ve sketched offer no such comfort. What is written into the heart seems to be far less than the moral law. While statistical learning accounts don’t support the idea that moral principles are written into our hearts, they can be reassuring in other ways.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

162

 

First, statistical learning is flexible learning. Different evidence will produce different inferences. We’ve seen this in the experiments on rule learning (Part II). People draw different inferences about whether a rule is act-based or consequence-based depending on the evidence they get; they draw different inferences about whether a closure principle is Liberty or Residual Prohibition depending on the evidence they get; and they draw different inferences about whether a claim is universally true depending on the evidence they get. In the last section, we saw another illustration of this flexibility: even counterintuitive rules with utterly unfamiliar structures are easy to learn depending on the evidence. The fact that statistical learning is flexible can help explain why it is that people in different cultures and contexts are so adept at learning the local rules. In addition, on the whole, the plasticity of moral learning is partly to be celebrated. Where our rules are oppressive or counterproductive, at least we have the capacity to learn different rules. Moreover, we have some idea about how to change which rules children learn—give them different evidence. A second way in which the empiricism I’ve promoted is reassuring is that statistical learning is rational learning. Traditional rationalists in early modern philosophy promote a vindicatory view of philosophical judgments. Descartes, for example argues for a kind of rationalism which involves the recognition of innate truths (see, e.g., Newman 2016: sec. 1.5). On that view, we use reason to uncover the innate truths. Insofar as statistical learning accounts are empiricist, they aren’t rationalist in this fully Cartesian sense, but they are rationalist in an evidentialist sense. According to evidentialism, S’s belief is rational or justified just in case it is supported by S’s evidence. This is exactly what statistical learning accounts affirm—the learner is using statistical principles to make appropriate inferences from the available evidence. The fact that statistical learning is rational learning puts it in contrast with debunking accounts philosophical judgments (see Chapter 8). Further, despite the fact that Chomskyan nativism is sometimes identified as rationalist (Chomsky 2009), the evidentialist rationalism of statistical learning also puts it in contrast with Chomskyanism. Chomskyan arguments for innate acquisition devices are emphatically not rationalist in the evidentialist sense (see, e.g., Mikhail 2011: 32). A crucial commitment of any Poverty of the Stimulus argument is that rational inference can’t suffice to explain how the organism transitions from the available evidence to her actual knowledge. For instance, Chomskyans maintain that in the case of the acquisition of

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

 

163

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

grammar, the child jumps to conclusions about the grammar that aren’t warranted by the evidence, as reflected by the fact that a rational alien would not make the same leaps as the human child. Thus, in important ways, the statistical learning accounts of moral systems are considerably more optimistic about human rationality than is the linguistic analogy. For the statistical learning accounts suggest that, given the evidence available to the child, she should infer the kinds of rules that she acquires.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

8 Rational Rules and Normative Propriety

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Are our moral beliefs justified? In this chapter, I’ll give an optimistic verdict on this, for central parts of our moral system. I begin with a caveat. In asking “are our moral beliefs justified,” some might take the “our” to pick out the part of us as philosophers that is completely divorced from the moral attitudes we hold in virtue of being decent persons. I won’t be making that restriction. I take the question, “are our moral beliefs justified” to include in its scope the justificatory status of the beliefs of our parents, children, and non-academic friends. I’ll argue first that Chapters 3–5 provide reason to think that many central beliefs about socio-moral rules are acquired through rational processes and that this confers on the beliefs an essential kind of rational credential. The beliefs are doxastically rational. For instance, it’s doxastically rational for children to think that rules are framed over acts rather than consequences. However, there is an obvious limitation to this. Even if the beliefs about the rules are doxastically rational, the rules themselves might be counterproductive for achieving our ends. In Section 3, I explore the factors that contribute to whether a rule is good or bad in terms of ecological rationality, that is, how well the rule works given our minds and environments. Then in Section 4, I argue that act-based rules, beloved of common sense and deontology, are more ecologically rational than consequence-based rules. Finally in Section 5, I consider some modest lessons for moral progress.

1. Doxastic Rationality and Morality On an evidentialist notion of justification, if children are justified in their beliefs about what the rules are, then at a minimum it must be the case that they acquired these beliefs in a rational way, based on the evidence. The evidence the child gets might be misleading and the beliefs she arrives at might be false. But if we can show that her beliefs are based on the evidence, then at least we have a constructive response to the most radical

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

    

165

challenges to moral epistemology (e.g., Baron 1994; Greene 2008; Singer 2005; Unger 1996).

1.1 Doxastic Rationality

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

According to evidentialism, in order for a belief to be justified, the belief has to be acquired in a way that is responsive to the evidence. In addition, if a belief is responsive to (or “based on”) the evidence, then it’s justified. Being responsive to the evidence is both necessary and sufficient for a belief to be justified. That’s what makes a belief doxastically rational. At the risk of belaboring the point, let’s return to the engineers and lawyers example from Chapter 1. Imagine Todd is shown a bag with one hundred descriptions of individuals in it. He is told that thirty of those hundred are engineers and seventy are lawyers. Now he pulls out one of the descriptions, which fits the engineer stereotype: Jack is a 45-year-old man. He is married and has four children. He is generally conservative, careful and ambitious. He shows no interest in political and social issues and spends most of his free time on his many hobbies which include home carpentry, sailing, and mathematical puzzles. (Kahneman & Tversky 1973)

Todd is asked whether Jack is an engineer or a lawyer. Here are some possibilities: (1) Todd ignores the base rate (30 percent engineers/70 percent lawyers) and believes that Jack is probably an engineer because he finds the description somewhat more characteristic of engineers. It so happens that Jack is an engineer. But Todd’s belief is not doxastically rational because he ignored the critical evidence from base rates. (2) Todd believes that Jack is probably a lawyer because he attends to the base rates. It so happens that Jack is an engineer, so Todd’s belief is false. But Todd’s belief is still doxastically rational. (3) Todd believes that Jack is probably a lawyer because he attends to the base rates. But the whole set up is a cleverly disguised lie, and none of the descriptions actually pick out real people. Todd’s belief is still doxastically rational.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

166

 

Just because Todd’s belief is true (as in (1)) doesn’t make it doxastically rational. And just because Todd’s belief is false (as in (2) and (3)) doesn’t prevent it from being doxastically rational. What matters is whether Todd based his belief on the evidence.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

1.2 Doxastic Rationality and Moral Learning The same conditions of doxastic rationality plausibly apply in the case of learning rules. The child needs to learn the rules of his household, school, culture, and so on. That is the task of acquisition. So if the child acquires the rules by an epistemically defective process, then even if the rules are the one true moral rules, the child isn’t justified in thinking those are the rules. And even if the rules the child learns are bad or mistaken rules, so long as the child is using the evidence appropriately, the child’s belief that those are the rules is doxastically rational. Note that this applies not just to moral rules but to more quotidian rules. For instance, the child might acquire the belief that at home it’s against the rules to wear shoes inside. If that belief is acquired in a way that is appropriately responsive to the child’s evidence, then the belief is doxastically rational. In Chapter 3, I drew a distinction between discerning what the rule is in some context and adopting the rule. I can discern that there is a rule against wearing white after Labor Day without ever making the judgment that one shouldn’t wear white after Labor Day. That’s a rule that I discern without adopting. The rational learning accounts I’ve given are intended to explain how children discern what the rules are. Accordingly, it is the child’s knowledge of what the rules are that enjoys doxastic rationality. The move from discerning what the rule is to adopting the rule does not obviously enjoy the same epistemic virtues. Often when I learn a rule I go directly from discerning the rule to adopting it. When I discern the rule prohibiting shoes in the temple, I immediately adopt the rule in both judgment and action: I judge that one shouldn’t wear shoes in the temple and I take off my shoes before entering. This transition between discerning the rule and adopting it is not an evidentially rational process. The process that takes a learner from discerning what the rule is to adopting the rule is opaque. There is reason to think that it is automatic, at least in a wide range of cases. In a series of now classic studies, Michael Tomasello, Hannes Rakoczy, Marco Schmidt, and their collaborators found that children will adopt and enforce rules based on very limited information

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

    

167

(Rakoczy et al. 2008; Schmidt et al. 2011, 2016). In some studies, the information the child gets is purely descriptive, and yet they still infer a rule and sanction others for violating it (see also Roberts et al. 2017). We will see more details about the studies later (Chapter 10, Section 2.2), but for now the important point is that children often seem to naturally and immediately adopt rules that they discern. At least in a central range of cases, there seems to be an automatic transition from discerning a rule to adopting it in judgment (cf. Rakoczy & Schmidt 2013). Although this process is not evidentially rational, there are reasons to think that it’s a good feature of our psychology that we adopt rules so readily and quickly. It’s likely that the automaticity of rule adoption played a critical role in the accumulation of cultural knowledge and expertise (see, e.g., Henrich 2017; Laland 2017). But clearly an important antecedent to automatically adopting rules is having a keen ability to discern what the rules are. That’s where I suggest the statistical learning approaches make their contribution. As we’ve seen, some of the most significant moral distinctions and principles might be acquired by rationally appropriate processes, given the evidence available to the child. We can explain the acquisition of act-based rules (rather than consequence-based rules) in terms of the avoidance of suspicious coincidences (Chapter 3). Most instances of violations that children are exposed to (e.g., by being scolded) are instances where an agent produces the consequence (rather than allows it to persist). Thus, if allowings counted as violations, it would be a suspicious coincidence that the evidence base is devoid of such examples. Hence it’s rationally appropriate for the child to think that the rules are act-based rather than consequencebased. It’s not just that children have act-based rules; it’s also the case that people expect new rules to be act-based rather than consequence-based. This, too, might be explained in terms of rational inference. Given that the vast majority of known rules are act-based rules, one can form a generalization that rules tend to be act-based (Chapter 4). We’ve also seen that people can quickly make inferences about whether or not a particular rule system is characterized by a principle of liberty (such that whatever isn’t explicitly forbidden is permitted). And such inferences are rationally explicable in terms of the kinds of information one might expect an efficient teacher to provide (Chapter 5). The foregoing suggests that important aspects of children’s beliefs about the rules are doxastically rational. Their beliefs about key rules and distinctions are rationally appropriate. This doesn’t mean that it’s doxastically rational when they adopt these rules in their judgments. In addition, there

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

168

 

is a further question about how the rules are integrated into overall moral verdicts. When it comes to moral judgment, it’s plausible that the rules have to be weighed against each other and against other values, and nothing in the rational learning theory implies that this integration process is doxastically rational. We might want to criticize children (and adults) for particular moral judgments that they make. For any particular judgment, it might be that a person fails to give the proper amount of weight to the various rules and values that they endorse. But this is a familiar problem that besets theories of all-things-considered judgment. Indeed, in Ross’s (2002) influential pluralist account of moral duties, he doesn’t try to solve the problem of how to weigh the different duties to arrive at an all-things-considered verdict. The answer here will likely be holistic (cf. Fodor 1983, 2000). As a result, even if the child’s belief about what the rules are is doxastically rational, there is still plenty of room to challenge how they reach their final judgments. The statistical learning approach offered here suggests that the way people come to draw moral distinctions derives in a significant part from their rational faculties. Insofar as sentimentalists eschew any role for reason in the genesis of moral distinctions, they will be missing a critical element of human moral judgment. This point applies more immediately to recent work on moral judgment discussed in Chapter 1. Perhaps the most widely discussed view in moral psychology is that our “non-utilitarian” judgments about moral dilemmas like Footbridge and Bystander are generated when primitive emotions interfere with the kind of rational cognition epitomized by utilitarian reasoning. Thus, it is suggested, primitive emotions distort our rationally appropriate utilitarian reasoning, and hence we should discount those non-utilitarian judgments (Greene 2008; Singer 2005; Unger 1996). The statistical learning approach paints quite a different picture. On this view, people’s judgments about moral situations depend critically on structured rules, not primarily on primitive emotions. The rules themselves are not utilitarian rules, as they enshrine distinctions like that between acting and allowing. But these non-utilitarian rules are not acquired through rationally defective processes. Rather, they are acquired through evidentially responsive statistical learning procedures. Indeed, pace the utilitarian debunkers, given the evidence that is available to the child, it would be statistically irrational for her to infer utilitarian rules.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

    

169

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

2. On the Normative Propriety of Common-Sense Rules The learning processes I’ve invoked are, by standard accounts, rational. This insulates moral judgment from important charges of irrationality. However, even if we grant that children learn the rules in a way that makes their beliefs doxastically rational, there is a natural further question. Are the rules themselves justified? Just as we can acquire false factual beliefs through rationally appropriate means, we can also acquire misguided or inappropriate rules through rationally appropriate means. Often, we have no idea why we have the kinds of rules that we do.¹ Do we have any reason to think the rules kids learn aren’t simply mistaken or counterproductive?² Of course different ethical theories will have different views about what makes a rule mistaken or inappropriate. This is hardly the book (and I am hardly the person) to present detailed arguments on the philosophical merits of various ethical theories. My treatment of the propriety of our rules is restricted. First, I will focus on exploring the propriety of act-based rules. As we’ve seen, common-sense ethics favors act-based rules over consequencebased rules. Children acquire rules like “don’t steal” and “keep your promises” rather than “minimize stealing” and “maximize promise keeping.”³ The propriety of act-based rules is a central locus of debate between deontologists and consequentialists. Deontologists maintain that the distinction between doing something and allowing such a thing to happen is a morally important distinction. Consequentialists often maintain that this act/allow distinction is morally irrelevant (for discussion, see, e.g., McNaughton & Rawling 1991; Nozick 1974; Parfit 1984). The second restriction in my treatment is that I will focus on factors that are, broadly speaking, consequentialist. This is not because I am rejecting deontology. I intend to be ecumenical. If act-based rules have good consequentialist justification (as I’ll suggest), this doesn’t count against such rules also being the right ¹ As Jerry Gaus observes, the functions of the rules we learn are often causally opaque to us (Gaus, forthcoming; see also Henrich 2017: 102–4). ² One way to think about this is as a Garbage In Garbage Out problem. If the rules the children are exposed to are bad rules, then the rational learning story suggests that children will do a good job at acquiring those bad rules. So we would need some further kind of reason to think that the rules they acquire are not bad rules. That is the central issue I will take up in the remainder of the chapter. ³ This is not to say that there are no consequence-based rules. It’s plausible that we do have consequence-based rules regarding vulnerable populations. For instance, it’s not just that you should not push a toddler into a pool; if you see a toddler about to fall into a pool, you are obligated to intervene. So, I don’t mean to deny the presence or significance of such rules. Nonetheless, the preponderance of rules that we learn do seem to be act-based.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

170

 

deontological rules. A rule can be both deontologically right and consequentially good.

2.1 Consequentialist Challenges to Act-Based Rules If act-based rules are learned in rational ways, this provides a defense against the idea that the act/allow distinction can be dismissed as the byproduct of defective online processing. However, a consequentialist might maintain that the distinction itself is the product of historical factors that strip the distinction (and the accompanying act-based rules) of any normative weight. This is exactly what Roger Crisp does in his evolutionary debunking argument against the act/allow distinction. He writes:

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

It is clear that a group cannot function well if its members are permitted to harm one another, whereas the survival value of a prohibition on allowing others to suffer is more dubious. Given that such reactions have been contingently engendered in us by evolution, we should not endanger the rationality and impartiality of our normative theory by allowing them to interfere with our judgement. (Crisp 2006: 21)

Crisp maintains that once we see the evolutionary explanation for the differential reaction to producing harm and allowing harm, we can see the “rational emptiness behind” our reactions and that we shouldn’t let such considerations influence our judgment. This kind of argument is also affirmed by Katarzyna de Lazari-Radek and Peter Singer (2014: 190). This evolutionary account is rather limited as an explanation for the existence of the act/allow distinction. Crisp’s characterization of the evolutionary origins of the distinction are specific to harming each other. But actbased rules are not restricted to the kinds of harms that would have been relevant to our distant ancestors. Rather, act-based rules prevail across the moral and social domain. We have act-based rules against lying, littering, and queue jumping. This holds for rules that are clearly not evolutionary in origin—virtually all traffic and pedestrian laws are act-based. There is a rule against passing in no-passing zones, but not that we must minimize such consequences. Similarly, the laws against jaywalking say “don’t jaywalk,” not “minimize jaywalking.” All else equal, we should want an explanation for the preponderance of act-based rules that would extend to the vast range of cases where we find act-based rules. Such a general account of the act/allow

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

    

171

distinction might plausibly apply as well to the domain of harm, displacing the evolutionary account promoted by Crisp, and de Lazari-Radek and Singer.⁴ A very different problem with the kind of debunking argument favored by Crisp, and de Lazari-Radek and Singer, is that it sets ethics with a dangerously high bar for when our rules and distinctions enjoy propriety. Crisp writes that, once we recognize how evolutionary contingencies led to differential reactions to acting and allowing, “we should not endanger the rationality and impartiality of our normative theory by allowing them to interfere with our judgement” (2006: 21). The suggestion seems to be that “rationally empty” factors should be excluded from normative ethics. However, it might be that none of our moral beliefs can be given an ultimate rational justification. One familiar line of reasoning to this conclusion is a simple regress argument. In many cases, we can give a rational justification for one moral belief in terms of other moral beliefs. For instance, we might explain our moral belief that it’s wrong to drive drunk in terms of moral beliefs about risks of harming other people. But at a certain point—probably not too long after we articulate risks of harming people—we will run out of reasons. We can’t keep giving forever further reasons on pain of an infinite regress (see, e.g., Sauer 2017: 91–3; Sinnott-Armstrong 1996: 9; Timmons 1999: 216). One response to the regress worry, the one I favor, is to reject the idea that the normative authority of morality depends on there being some foundational rational justification for moral principles. Another tack, though, is to ⁴ Harman has a different account of the act/allow distinction, based on power dynamics. Harman writes: The rich, the poor, the strong, and the weak would all benefit if all were to try to avoid harming one another. So everyone could agree to that arrangement. But the rich and the strong would not benefit from an arrangement whereby everyone would try to do as much as possible to help those in need. The poor and weak would get all of the benefit of this latter arrangement. Since the rich and the strong could foresee that they would be required to do most of the helping and that they would receive little in return, they would be reluctant to agree to a strong principle of mutual aid. A compromise would be likely and a weaker principle would probably be accepted. In other words, although everyone could agree to a strong principle concerning the avoidance of harm, it would not be true that everyone would favor an equally strong principle of mutual aid. It is likely that only a weaker principle of the latter sort would gain general acceptance. So the hypothesis that morality derives from an understanding among people of different powers and resources can explain (and, according to me, does explain) why in our morality avoiding harm to others is taken to be more important than helping those who need help. (Harman 1975: 13) Like Crisp’s account, Harman’s proposal here wouldn’t explain the wide prevalence of act-based rules, even when the rules do not implicate the differences in resources that matter for the power-dynamical explanation (e.g., rules about litter and traffic).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

172

 

maintain that there are moral principles that are foundational and rational. This is the option suggested by de Lazari-Radek and Singer, who advert to Sidgwick’s principle of benevolence (2014: 119–20; see also Crisp 2006: 7). Sidgwick characterizes the principle as follows:

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

the abstract principle of the duty of Benevolence, so far as it is cognizable by direct intuition that one is morally bound to regard the good of any other individual as much as one’s own, except in so far as we judge it to be less, when impartially viewed, or less certainly knowable or attainable. (Sidgwick 1884: 381–2)

Sidgwick calls this the Principle of Rational Benevolence, and he maintains that the principle is “an indubitable intuition of the practical Reason” which “would not direct us to the pursuit of universal happiness alone, but of Truth, Freedom, Beauty as well, as ends ultimately desirable for mankind generally” (398).⁵ I’m skeptical of such an appeal to self-evident rational principles. The reason we favor benevolence, I think, involves our emotional endowment, not just our rational one (e.g., Nichols 2004c, 2008). Given our constitution and history, we have a certain moral system, and benevolence is an important part of that system. But that system doesn’t carry normative authority over rational amoralists who lack our emotional and conative tendencies. Creatures who lack our inclinations towards benevolence might reject the principle of benevolence without being thereby being irrational. Some of these rational amoralists might not be merely hypothetical but psychopaths within our midst. If this is right, we can’t expect there to be a rational bedrock for morality. The charge that some moral claim is “rationally empty” will hold, at the most basic level, for all of morality. Rejecting all moral claims that are rationally empty threatens ethical nihilism (Nichols 2014).

2.2 Consequentialist Defenses of Rules Thus, my own view is that it is not necessarily irrational for a psychopath to disregard the suffering of others. Others disagree, of course (e.g., de Lazari⁵ Before presenting the Benevolence principle, Sidgwick makes two points. First, he says that it is self-evident that from “the point of view of the universe,” my own good is no more important than the good of anyone else. Then he says that “as rational beings we are bound to aim at good generally . . . not merely at this or that part of it” (1884: 381).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

    

173

Radek & Singer 2014; Smith 1994) The Sidgwickian advocate of rational intuition might insist that the psychopath exhibits a rational failure in not caring about the suffering of others. I am not going to press the issue here. Instead, I want to use these pages on some questions that seem more constructive. Even if our moral system lacks a rational foundation, we can still ask whether, within the system that we have, some rules are better than others; we can ask what makes some rules better than others; we can ask whether the rules we have are better in a way that makes it rational for us to sustain them. Importantly, consequentialists care about these questions. Consequentialists want to promote rules that will produce the best consequences. This is obviously true for rule consequentialists, who maintain that we ought morally to follow the rules that will produce the best consequences; those rules determine whether an act is right or wrong (e.g., Hooker 2000). Rules also play a vital role for act-consequentialists, from Sidgwick to Singer. Unlike ruleconsequentialists, act-consequentialists maintain that what determines whether an act is right is simply whether the act produces the best consequences. Nonetheless, act-consequentialists don’t typically think that we should use this account of right-making characteristics as a procedure for making decisions. For instance, de Lazari-Radek and Singer write, “‘Maximize the good’ is not the best decision procedure” (2014: 312). Instead, they maintain that we should use rules: “[W]e should encourage people to keep to a publicly known set of rules, to be truthful, to improve their character, and not to focus on maximizing the good all the time” (2014: 312–13; see also Hare 1981). This constitutes a consequentialist account which maintains that what makes an act right is simply whether it produces the best consequences, but that the decision procedure that will bring about the best consequences is one that uses rules (for discussion, see Hooker 2016). Not only do both act- and rule-consequentialists think that the best decision procedures implicate rules, they also maintain that the rules need to be internalized so that they become intuitive and effortlessly applied (e.g., de Lazari-Radek & Singer 2014: 289; Hare 1981: 38–9, 45–7; Hooker 2016: 32). The guiding idea here is that internalizing certain rules will lead to better consequences overall even if internalizing the rule entails that people will follow the rule in instances where this doesn’t produce the best consequences. Since consequentialists acknowledge that rules are important for moral decisions, it should be no surprise that consequentialists care about the characteristics of rules. Already with Sidgwick we see close attention to issues about what kinds of rules we should promote (Sidgwick 1884: 477).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

174

 

In the following section, I want to describe several considerations that bear on the effectiveness of rules. Then in Section 4, I will apply those considerations to the act/allow distinction.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

3. Ecological Rationality and Rules Evidentialism provides one way to think about rationality—the rationality of an individual responding to evidence. But in evaluating human institutions and practices, it’s useful to countenance a notion of rationality rooted in considerations of our ecology—the characteristics of our minds and our environments. Insofar as an institution or practice is effective for achieving our ends in our actual ecological settings, we can call that institution or practice ecologically rational. In cognitive science, Gerd Gigerenzer has developed the most extensive research program that goes under the label ecological rationality. He maintains that certain “fast and frugal” heuristics that do not respect evidentialist strictures are nonetheless rational in this ecological sense, because the heuristics tends to be accurate in the natural environment. He writes, “In general terms, a heuristic is ecologically rational to the degree that it is adapted to the structure of an environment” (Gigerenzer 2019; see also Todd & Gigerenzer 2007: 168). On this approach, “the key to good performance [resides] in the ability to select and match the mind’s tools to the current social or nonsocial environment” (Hertwig & GrüneYanoff 2017: 4). While the notion of ecological rationality that I have in mind is consistent with this, my interests are broader than those of the fast and frugal heuristics program. First of all, the domain of interest is not just individual performance, but also the social effectiveness of institutions and practices. Secondly, when evaluating the effectiveness of institutions, an important part of the ecology are features of our minds themselves, including conative features of our minds. On this construal, certain policies recommended by behavioral economics would count as ecologically rational, given our interests. For instance, people are less likely to select unhealthy food when it is displayed below eye level (Levy et al. 2012), and this is presumably due to both appetitive and attentional characteristics of people. Since these appetitive and attentional characteristics are presumably stable features of people, if a primary goal is to discourage the

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

    

175

consumption of unhealthy food, it’s ecologically rational to adopt policies about where we place unhealthy food on shelves.⁶ One way to explicate why ecological rationality deserves the term “rational” is that ecologically rational institutions are those that would be selected by a rational agent who knew our ends, our minds, and our environments. Were a rational agent in a position to choose our rules on our behalf, the rules he would choose would be those that are effective for us, given our minds and environments. A few clarificatory remarks are in order. First, ecological rationality is framed against a background of particular ends that we want to aim for. So if we know that we want to decrease unhealthy eating, we can investigate which practices are effective at achieving that end given our ecological trappings. Second, ecological rationality is not a form of all-in rationality. Even if our goal to decrease unhealthy eating makes it ecologically rational to change food placement in cafeterias, there might be other considerations that override this goal (for instance, in some places the changes might be prohibitively expensive). Finally, one might treat ecological rationality as criterial—e.g., one might maintain that only the most effective institution gets to be dubbed “ecologically rational.” But in what follows, I will treat the notion of ecological rationality as comparative—one institution is more ecologically rational than another insofar as it is more effective given our minds and environments. In this section, I’ll set out various factors that would contribute to the ecological rationality of rules. But let’s start with some uncontroversial examples of ecologically rational practices in teaching arithmetic. First, if you want to teach children arithmetic, it’s ecologically rational to use Arabic numerals rather than Roman numerals. The multiplication algorithms for Arabic numerals are easier, less error-prone, and quicker. Arabic numerals are simply better than Roman numerals for multiplication. Second, it’s more ecologically rational to teach arithmetic with base 10 rather than base 2, given human limitations for storing and manipulating large numeral strings. If I am book shopping and want to know the total of the three books I’ve selected, I add $11, $26, and $43, an easy problem. By contrast, base-2 pricing would be $1011, $11010, and $101011, and the limitations of

⁶ A closely related use of the term “ecological rationality” can also be found in Vernon Smith (2003: 511). As I’m using the notion of ecological rationality, it has important affinities with “state of nature” justifications (e.g., Craig 1990, 2007: 191; Fricker 2008). Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

176

 

working memory will make computation in this format more error prone. Finally, given the environment that children are raised in, it’s more ecologically rational to teach arithmetic using base 10 than base 11. By the time children are in grade school, most of them already have some exposure to base-10 numerals, which means they have a head start with a particular numeral system. Exploiting that pre-existing familiarity will be more efficient than teaching them a new system. Moreover, an important part of the mission of teaching children arithmetic is to enable them to deal with arithmetic problems that arise in their environment. Since that environment never uses base 11, and almost always uses base 10, it’s ecologically appropriate to favor teaching the skills for calculating with base-10 numerals. Thus we can see different reasons that particular practices in arithmetic education are ecologically rational. One kind of reason is internal to the practice itself—Arabic numerals are simply better suited than Roman numerals for arithmetic, based on representational properties of the system. Another kind of reason has to do with characteristics of the human mind— base 10 is better for arithmetic than base 2 because of the nature of human storage capacities. And another kind of reason has to do with the environment—base 10 is better than base 11 because of the actual world that kids navigate. Our interest here, of course, is the ecological rationality of moral and social rules. Part of the background for this investigation is that many of the rules we have—rules against lying, stealing, maiming, and so on—plausibly facilitate better consequences than would be had without rules. So, in assessing the ecological rationality of our current rules, we are, to a great extent, exploring whether the rules we have should be replaced by different rules. Sidgwick himself espouses a rather conservative view about revising the moral rules we have, for he worries that it is risky to try to replace the rules we have with more felicific rules: it is easier to pull down than to build up; easier to weaken or destroy the restraining force that a moral rule, habitually and generally obeyed, has over men’s minds, than to substitute for it a new restraining habit, not similarly sustained by tradition and custom . . . . For just as the breaking of any positive law has an inevitable tendency to encourage lawlessness generally, so the violation of any generally recognized moral rule seems to give a certain aid to the forces that are always tending towards moral anarchy in any society. (Sidgwick 1884: 477)

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

    

177

When we consider replacing rules, we need to consider features of rules that make them effective or ineffective for us. Factors like learnability and resilience will obviously be important for whether a rule is ecologically rational. Rules that are unlearnable by individuals or unsustainable in communities will fare poorly on the dimension of ecological rationality. Let’s now turn to look more closely at factors that bear on the ecological rationality of rules.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

3.1 Resource Limitations: Computational and Motivational Boundedness Humans have substantially limited computational resources, and our computational boundedness will loom large in any account of ecological rationality (e.g., Simon 1990: 7). Our institutions, practices, and rules can’t be so computationally demanding that they are beyond our cognitive capacities. This point about computational boundedness was not lost on utilitarians, who noted that any rule that will be effective must not be too complex to be learned. Here’s Brandt: “the complexity of the conduct enjoined or banned is limited by the intellectual capacities of the average person” (Brandt 1979: 287). More recently, Hooker writes: “Although learning a code can fall short of being able to recite it, there remain limits on what we can learn. And even within the class of learnable codes, the costs of learning the ones with more rules and more complex rules will be higher. So the ideal code will contain rules of limited number and limited complexity” (Hooker 2000: 97; see also Brandt 1963; Gert 1998). A different kind of resource limitation applies in the case of rules— motivational boundedness. Effective rules cannot demand too much of people. Here’s Brandt again: “What these rules may require is limited by the strain of self-interest in everyone” (Brandt 1979: 287; see also Hooker 2000: 98). Some caution is required here. Some effective rules might have excessively demanding implications that people don’t readily recognize. Indeed, it’s possible that our rules for rescue are like this—they carry demands that outstrip our motivational bounds for conformity to the rules (cf. Singer 1972). Still, it’s plausible that rules that are extremely demanding in a way that is entirely manifest will suffer in terms of their ecological rationality. These points about resource limitations were also already appreciated by Sidgwick. He warns that even if a new rule is more felicific, it might tax the

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

178

 

abilities of ordinary people: “It may be too subtle and refined, or too complex and elaborate: it may require a greater intellectual development, or a higher degree of self-control, or a different quality or balance of feelings, than is to be found in an average member of the community” (Sidgwick 1884: 477; see also de Lazari-Radek & Singer 2014: 286, 292). Consequentialists have thus long recognized the importance of these factors when it comes to evaluating the suitability of rules.

3.2 Affective Factors

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

A quite different set of factors that contribute to the sustainability of a rule concerns how the rule fits with our natural emotional endowment. Many different kinds of emotions—e.g., anger, fear, jealousy, disgust, and sympathy—are characteristic features of human psychology. Rules that resonate with our natural emotional reactions will be more resilient. For instance, norms that prohibit actions that are independently likely to be aversive will be more likely to survive. We can frame the basic idea here in terms of an “affective resonance” principle regarding proscriptive rules: Ceteris paribus, proscriptive rules against actions that we naturally find affectively aversive (or are easily led to find aversive) will be more likely to survive than rules against actions that are not naturally aversive. For example, etiquette norms prohibiting the display of bodily fluids seem to be preserved once they are introduced into the culture, and a plausible explanation for this is that these prohibitions resonate with our natural proclivity to feel disgust at bodily fluids (Nichols 2004c). Affective resonance also plausibly applies to prescriptive rules, and so we can articulate another principle: Ceteris paribus, prescriptive rules for actions that we naturally find affectively attractive (or are easily led to find attractive) will be more likely to survive that rules for actions that we do not naturally find attractive. This principle plausibly applies to the case of retributive rules. The rule that wrongdoers should be punished resonates with our natural anger-driven motivation to retaliate against (perceived) wrongdoers. We want to retaliate against wrongdoers. This anger-driven motivation would likely contribute

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

    

179

to the cultural heft of a norm that prescribes inflicting harm on wrongdoers (see, e.g., Nichols 2015: 128–9). There’s a flip side as well, where our emotions run counter to the rules. For instance, rules that promote actions that are independently likely to elicit negative affect will be less likely to survive. We might put this as an “affective dissonance” principle: Ceteris paribus, prescriptive rules for actions that we naturally find affectively aversive (or are easily led to find aversive) will be less likely to survive than rules for actions that are we do not naturally find aversive.⁷

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

The dissonance principle might explain the decline of the handkerchief. For centuries, norms regarding nose blowing involved a reusable handkerchief, which is returned to the pocket after blowing your nose. With the advent of disposable tissue, people have moved away from using handkerchiefs to something less likely to be disgusting. In general, then, the cultural resilience of a rule, and hence its ecological rationality, will depend partly on whether the rule is resonant or dissonant with our emotional endowment.

3.3 Perceived Unfairness and Sucker Aversion An additional important factor in the sustainability of rules turns on the potential for perceived unfairness. Ceteris paribus, if following a rule is likely to trigger a perception of unfairness, that rule will be less sustainable. There are various reasons why perceived unfairness would bear on the sustainability of a rule. In some cultures, there is a general aversion to inequitable distributions, even when the inequity benefits the self. One study showed that 8-year-old children from the Boston area tended to reject inequitable divisions of candies even when it benefited them (Blake & McAuliffe 2011). The sentiment is captured in the title of the paper: “I had so much it didn’t seem fair.” A recent cross-cultural study found this kind of aversion to advantageous inequity in the U.S., Canada, and Uganda, but not in Senegal, India, Mexico, or Peru. Aversion to disadvantageous inequity, when the other person gets more than the participants, was found in all ⁷ This principle presumably applies when the rule promotes an action that generates negative affect in either actors or observers.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

180

 

the cultures by middle childhood (Blake et al. 2015). Both kinds of inequity aversions are likely to contribute to the sustainability of rules. Insofar as following a rule will lead to perceptions of aversive inequity, those rules will be less sustainable. One form of perceived unfairness that is especially salient is sucker aversion.⁸ If there’s one thing people hate, it’s feeling like a sucker—someone who is bearing the burden of free riders. People dislike this so much that they are willing to sacrifice goods for themselves just so that they won’t carry free riders. Norbert Kerr puts this as a kind of principle: “If one has a partner who appears to be free riding on one’s efforts, one should reduce one’s own efforts rather than play the sucker” (Kerr 1983: 820; see also Mulvey & Klein 1998). In one experiment on the phenomenon, participants were put in individual booths and led to think that they were engaging in a task with another person in a neighboring booth. The task was to pump air into a spirometer by rapidly squeezing a rubber bulb. After four practice trials, participants were told that they would be cooperating with the other player in nine performance trials. On each trial, if either of the players pumped in enough air to reach a criterion, both players would receive $0.25 (Kerr 1983: 822–3). The participants were given fake feedback on the performance of their (fake) partner. In one condition, they were led to think that the partner succeeded at reaching criterion in the practice trials but then doesn’t reach criterion when it comes to the actual cooperative performance trials. In that case, participants were less likely to reach criterion. Kerr writes, “Apparently, subjects sometimes preferred to fail at the task rather than be a sucker and carry a free rider” (Kerr 1983: 823; see also Jackson & Harkins 1985).⁹ We find a similar phenomenon in economic games. Gächter and colleagues looked at people’s trust attitude and cooperation. They found that people who think they will be exploited by others contribute less in a public goods game. Gächter and colleagues write: when we relate the trust attitude question to the cooperation decision in the public good experiment, we find that people who believe that most ⁸ Perhaps sucker aversion falls under the affective dissonance rubric above. But sucker aversion is so significant that it deserves its own place at the table. ⁹ They did not find this effect, however, when participants were led to think that the partner failed on the training trials and hence lacked the ability to reach criterion. Hence, Kerr writes, “it appears that subjects were willing to carry an incapable partner” (1983: 824). So it’s not that people are unwilling to do the work, they are just unwilling to do the work to benefit free riders.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

    

181

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

others are fair and do not exploit others make significantly higher contributions to the public good than those who believe that they will be exploited by others . . . These findings are consistent with the observation that most people do not want to be the suckers in cooperative enterprises in jeopardy of free riding. (Gächter et al. 2004: 507)

In these cases, the aversion to being a sucker again apparently leads participants to be less cooperative. The participants would rather be uncooperative than be suckers. Why are we so aversive to being suckers? At least part of the explanation probably goes to concerns about status. To carry someone else’s load suggests diminished status relative to the other person. Status itself is part of the furniture of most human societies. For instance, in small-scale societies, there is often a widely respected hierarchy in which elders have an elevated status. Work on cultures of honor, like those in the Middle East, indicates that status is a critical resource that affects who will cooperate with you, who will work with you, and who will marry your children. In many Arab cultures, there is a constant struggle for rank that is reflected in the proverb: “Always be sure to claim all due respect for what you have and deserve” (Salzman 2008). Status continues to preoccupy us in large industrial societies. Concerns about status often trump even monetary concerns. Just to take one example from the empirical literature, indignation at low salaries is largely about threat to status, not buying power (Berger et al. 1972; Layard 2006). In street vernacular, outrage at status threats among peers is familiar from the rhetorical question, “You think you’re better than me?!” These abiding concerns about status likely contribute to the power of sucker aversion. Given our aversion to being suckers, rules that would facilitate feeling like a sucker would be less likely to survive. We can put this as a “sucker aversion” principle: Ceteris paribus, rules that are likely to make the agent feel (or anticipate feeling) like a sucker will be less likely to survive than rules that are unlikely to trigger those feelings (and expectations). In general then, if following a rule is likely to lead an agent to feel (or expect to feel) like he is being taken advantage of—being a sucker—then that rule will be a lot harder to sustain than a rule that does not trigger such feelings.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

182

 

These psychological factors (e.g., affective forces and sucker aversion) are predictors of which kinds of rules are likely to survive. But of course it’s possible for there to be countervailing factors that help to fix or sustain rules that run against the psychological propensities charted here. For instance, if sufficient punishment is exacted, a rule can be established even if it carries ecological disadvantages (see, e.g., Boyd & Richerson 1992). Also, if the benefits of a rule are recognized to be sufficiently great, this might counteract psychological factors that weigh against the rule. But absent such strong countervailing factors, rules that are extremely complex or demanding or trigger concerns about being taken for a sucker will be unlikely to survive.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

3.4 Defection The foregoing factors—complexity, demandingness, affective resonance/ dissonance, and perceived unfairness/sucker aversion—all primarily involve human psychological propensities and limitations. But there is a feature of the social environment that plays a key role in the resilience of rules as well—defection. Believing that others have defected on a rule makes people more likely to defect themselves.¹⁰ This is reflected in research on littering. People are more likely to litter if there is litter on the ground (Cialdini et al. 1990). Indeed, a large-scale study that found that, while the presence of litter predicts whether people will litter, signs prohibiting litter do not show that effect (Schultz et al. 2013; see also Keizer et al. 2008; Reiter & Samuel 1980). Given that the presence of defection breeds more defection, rules that are more likely to generate defection will be more likely to unravel completely and lose their hold on a community.¹¹,¹²

¹⁰ At least if they have no opportunity to punish the defectors (see, e.g., Fehr & Gächter 2002). ¹¹ One factor that makes defections more likely is difficulty of detection. As you might imagine, in economic games where players are anonymous, cheating is more common (Bapna et al. 2017; Leider et al. 2010). ¹² There are different reasons that defections might facilitate further defection. One reason is simply that others’ behavior might be taken as evidence about whether there is a rule present at all. That is, people might regard others’ behavior as providing information about what the right thing to do is. An instructive analogy here is the classic bystander effect (Latane & Darley 1968). The likelihood that a person will help an apparently needy stranger is affected by the number of idle bystanders. What is often neglected in reporting this study is the explanation offered by the researchers themselves. It’s not that people are heartless and selfish in the presence of bystanders; rather, it’s that the presence of idle bystanders often leads subjects to wonder whether they are failing to recognize some feature of the situation—the subjects aren’t sure that helping is the

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

    

183

Consequentialists have long emphasized the importance of having rules that will produce good consequences, and they have attended to the characteristics that bear on the effectiveness of rules. In this section, I’ve used the notion of ecological rationality to articulate features that affect whether candidate rules are good rules. There are several features of human psychology that bear on the ecological rationality of rules, including computational boundedness, motivational boundedness, affective resonance/ dissonance, and perceived unfairness/sucker aversion. In addition, there are features of the social environment that will bear on the sustainability of rules including the knowledge or expectation that people won’t follow the rules. Now that we have some of the key factors that affect ecological rationality in place, we can begin to evaluate the ecological rationality of act-based rules.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

4. Ecological Rationality and Act-Based Rules The previous section set out some factors that will contribute or detract from the ecological rationality of a rule. I now want to use these factors to interrogate the ecological rationality of act-based rules. We saw above that prominent consequentialists disparage the act/allow distinction (Section 2). However, consequentialists are also concerned to promote effective rules, and this means that consequentialists should attend to the ecological rationality of rules. The act/allow distinction is widely reflected in common-sense morality— we tend to have rules that prohibit actions, rather than rules that require minimizing bad consequences. We will consider whether there are general

right thing to do. Similarly, then, in a context where I think a rule applies, if I see others do things that contravene the rule, I might infer that the rule isn’t operative after all. Even if I am confident that there is a rule, if I expect others to defect on the rule, I might take this as an excuse to cheat. Consider, for illustration, the fact that many of us feel a moral obligation to give to medical charities, and we might feel guilty for spending money in nice restaurants when that money could save lives. However, the fact that so many others in my community do the same thing—eat at nice restaurants when they could give the money to save lives—makes it easier for me to do it. It’s not that I have a merely strategic desire of the form help save lives only if others help. Instead, my expectations of others’ behavior lead me to think that, while I really shouldn’t spend the money in the restaurants, I’m no worse than my friends. In such a case, we have competing motivations—the desire to conform to the moral obligation, “save lives” and the fear of missing out. Our expectations about others’ behavior can tilt the balance away from the moral obligation. In this case, my expectations about others’ behavior can affect whether I defect on a rule that I fully acknowledge.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

184

 

reasons to think act-based rules are more ecologically rational than consequence-based rules. It will be helpful to have a non-moral example to think this through. Given the rich body of work on littering norms (see, e.g., Cialdini et al. 1990), I will use litter as the ready example where “Don’t litter” is a paradigmatic act-based rule, and “Minimize litter” is a paradigmatic consequence-based rule.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

4.1 Consequence- vs. Act-Based Rules How do act- and consequence-based rules compare on the factors set out in Section 3? On the computational demandingness dimension, there is no appreciable difference between act-based rules and consequence-based rules. As we’ve seen (Chapter 3) both kinds of rules are utterly trivial to learn. When all the sample violations are actions, people infer an act-based rule; but when some of the violations are allowings, people infer a consequence-based rule. When it comes to motivational demandingness, it’s clear that consequencebased rules ask for more than act-based rules, since consequence-based rules ask for everything that act-based rules do, and then more besides. Still, we tend not to have consequence-based rules even when the demands are fairly minimal. In many of our communities, it would be low cost to follow a consequence-based rule to minimize litter—there simply isn’t that much litter around in many residential neighborhoods. In such communities, “minimize litter” is less demanding than some rules we actually follow, like rules of recycling that require us to separate our trash from our recycling and then divide the latter into paper and plastic, and then put the bins out for pickup at different times. So, even though consequence-based rules will be more demanding than act-based rules, some consequence-based rules would be less demanding than lots of the act-based rules we follow. So consequencebased rules are not in principle overly demanding. When we turn to emotional considerations, again the result is somewhat mixed. To be sure, there will be cases where the consequence-based rule will be more likely to trigger negative emotions than act-based rules. A rule that said “minimize gum on the sidewalk” would require doing things that are much more disgusting than a rule that said “don’t spit your gum on the sidewalk (and if you do spit your gum, pick it up).” Gum that has been in another person’s mouth is much more disgusting to us than gum that has been in our own mouths, and this difference would ramify into the affective

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

    

185

dissonance of the corresponding rules. In these kinds of cases, consequencebased rules will suffer from affective dissonance. However, in many other cases, like picking up litter, again, consequence-based rules don’t seem to be systematically likely to trigger disgust or other negative emotions. A substantial difference emerges, I think, when we turn to perceived unfairness. Consequence-based rules are likely to inspire perceived unfairness, at multiple levels. First, consequence-based rules will require a person to rectify the bad consequences produced by defectors. That will likely trigger perceptions of unfairness. In the case of littering, it’s easy to have the thought, why should I pick up after defectors? Why should I have to endure costs to benefit them? This is enough to make consequence-based rules much less viable than act-based rules. In addition, there is another level of perceived unfairness, which arises from second-order free riders who don’t defect but also don’t rectify bad consequences. For example, for the rule “minimize litter,” a second-order free rider might not litter but neither does he pick up litter left by defectors. Again, it will seem unfair to have to comply with the consequence-based norm in that case—why should I be the one that has to pick up after defectors. Indeed, this concern about secondorder free riders will apply even if the litter is there because of a natural process rather than a defector. It’s unfair that I have to be the one to pick up the litter. Note that these perceptions of unfairness tap into deep and persistent concerns people have, like sucker aversion, and there is no reason to think these concerns will abate. Perceived unfairness of this sort plausibly applies to most candidate consequence-rules. Consider something as simple as rules in the office kitchen. It’s common to see signs like “Your mother doesn’t work here!” or “Self-cleaning kitchen. Clean up after yourself!” But never “Pick up after your coworkers!” The very idea of such a rule chafes—why can’t my coworkers pick up after themselves?¹³ Or consider a rule like “minimize lying” or “minimize stealing”—why should I have to devote my resources to policing others’ bad behavior? (In addition to the costs of time and effort, in many cases it would be risky to intervene to stop someone from lying or stealing. An agent might think it unfair that she should have to take on those risks.) So even though consequence-based rules are simple to learn, need not be overly demanding, and need not be affectively dissonant, they will ¹³ One does sometimes see signs like “Leave things better than you found them” (thanks to Mark van Roojen for this example), but these are plausibly intended (and interpreted) as recommendations rather than requirements.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

186

 

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

facilitate perceptions of unfairness. This will make people reluctant to conform to such rules. This is not to say that consequence-based rules are impossible to generate and sustain. If sufficient resources are spent monitoring and punishing first- and second-order free riders, a consequencebased rule might be stabilized. But the costs of such monitoring and punishment might themselves be too much to be sustained. In any case it is clear that consequence-based rules will be less ecologically rational, on the dimension of perceived unfairness, than act-based rules. Given the psychological force of perceived unfairness and, especially, sucker aversion, consequence-based rules are less sustainable than act-based rules. The fact that consequence-based rules will likely trigger perceptions of unfairness will facilitate defecting on such rules. This is already a significant defect in such rules, from the perspective of ecological rationality. In addition, defections inspire further defections. If a person thinks that others have defected (or will defect), he will be more likely to defect himself. This all suggests that consequence-based rules would unravel, not because they are complex or demanding, but because they will seem unfair, and this would lead to a spiral of defection leading to the disintegration of the consequencebased rule.¹⁴

4.2 Ecological Rationality and Rationality One way to think about the ecological rationality of act-based rules is that such rules prevailed in the push and pull of human history. They are better rules that have shown their superior mettle by winning battles in cultural evolution. However, it’s possible that consequence-based rules were always rare in human history. Certainly, such rules are virtually non-existent in early penal codes.¹⁵ Those codes are, however, replete with act-based rules. For instance in the code of Hammurabi, we find rules like the following: ¹⁴ As we saw above (Section 3), Sidgwick worries that if we replace a current rule with a more felicific rule, this might instead lead to the weakening or destruction of the extant rule. In the present context, one might worry that trying to replace a rule like “don’t litter” with “minimize litter” might be counterproductive for reducing litter. People might abandon litter rules altogether rather than follow a consequence-based rule. Another possibility is that if we introduced a rule like “minimize litter” it would morph back into the act-based rule targeted at litter: “don’t litter.” In either case, though, we can expect that consequence-based rules would be less likely to survive than the act-based variant. ¹⁵ Human culture is much older than the earliest written codes, so it’s possible that in earlier communities there were consequence-based codes. However, since we don’t have evidence on the matter, it seems reasonable to rely on the earliest records we have.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

    

187

If any one steal cattle or sheep, or an ass, or a pig or a goat, . . . if they belonged to a freed man of the king he shall pay tenfold. If any one steal the minor son of another, he shall be put to death. If a man be guilty of incest with his daughter, he shall be driven from the place (exiled). And in Nesilim, we find rules like this: If anyone blind a free man or knock out his teeth, formerly they would give one pound of silver, now he shall give twenty half-shekels of silver. If anyone injure a man so that he cause him suffering, he shall take care of him. Yet he shall give him a man in his place, who shall work for him in his house until he recovers. But if he recover, he shall give him six half-shekels of silver. And to the physician this one shall also give the fee. If anyone cause a free woman to miscarry, if it be the tenth month, he shall give ten half-shekels of silver, if it be the fifth month, he shall give five halfshekels of silver.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

In the code of Assura: If the wife of a man go out from her house and visit a man where he lives, and he have intercourse with her, knowing that she is a man’s wife, the man and also the woman they shall [be] put to death. If a man or a woman practice sorcery, and they be caught with it in their hands, they shall prosecute them, they shall convict them. The practicer of magic they shall put to death. If a man strike the wife of a man, in her first stage of pregnancy, and cause her to drop that which is in her, it is a crime; two talents of lead he shall pay. By contrast, fully consequence-based rules are almost entirely absent from these early codes.¹⁶ Thus, we have little reason to think that there were ancient struggles in cultural evolution in which act-based rules prevailed. As far back as we can ¹⁶ A deontologist might maintain that the reason act-based rules predominate is because our distant ancestors realized the agent-relative nature of socio-moral duties. Again, I don’t mean to deny that. Rather, my aim in this section is to show why even a consequentialist should embrace act-based rules.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

188

 

see, we find little evidence of consequence-based rules. But even if there never were any consequence-based rules, it is still significant that act-based rules are more ecologically rational. Rather than think of the justificatory contribution of ecological rationality being hard won through cultural selection, instead we can think of it in terms of whether it makes sense to embrace a certain practice, given the ecological considerations for or against the practice. Given the ecological considerations canvassed above, it is rational for us to endorse the act-based rules we have inherited from our predecessors. If we could choose the rules to have, given constraints of our nature and environment, it would be rational for us to choose act-based rules (cf. Williams 2002: 34).

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

5. Moral Progress I’ve argued that certain features of our moral system, in particular the actbased scope of most rules, enjoy two kinds of rationality. First, children’s belief that rules are act-based is doxastically rational given their evidence. Second, having act-based rules is ecologically rational insofar as act-based rules are less likely to collapse than consequence-based rules, and they require fewer resources (e.g., in terms of punishment) to be maintained. So, insofar as there is an end that we want, we are likely better off with an act-based rule than a consequence-based rule for achieving that end. It is grounds for optimism that people can apparently use rational learning principles to infer rules from evidence. Insofar as learners are rational and flexible, if we want to facilitate a change in people’s views, we can do so by drawing on evidentialist considerations. This provides a (modest) source of optimism about moral progress. Perhaps the most familiar proposed example of moral progress is the expanding circle of moral concern (e.g., Buchanan & Powell 2018; Nichols 2004c; Railton 1986; Singer 1980). There are different phenomena here. One phenomenon is simply becoming emotionally attuned to a wider group of individuals. This can be facilitated in various ways—e.g., by attending closely to the responses of others or by taking their perspective. But another way in which the moral circle can be expanded involves, once again, the scope of moral rules. As noted in Chapter 3, a common feature of ethical rules in many societies is that the rules apply to a restricted set of potential moral patients, sometimes limited to the community itself. For instance, in many small-scale societies it’s acceptable to steal from outsiders. The scope of

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

    

189

application here is obviously parochial. In other communities, like the ones we inhabit, the rules against theft apply much more broadly. In the previous section, I argued that act-based rules are ecologically rational. In particular, insofar as act-based rules are less likely than consequence-based rules to trigger perceptions of unfairness, act-based rules fit better with human ecology. The situation is more mixed for parochial rules. Perhaps the most significant ecological advantage of parochial rules is their resonance with our natural tendency to in-groupism (see, e.g., Hamlin et al. 2013; Kinzler et al. 2007). However, it’s also true that our empathic responses are naturally quite expansive. Like many mammals, it is a characteristic of our species that we find the suffering of others, including strangers, aversive. As a result, more inclusive rules enjoy affective resonances with our basic emotional tendencies (see, e.g., Nichols 2004c). Indeed, insofar as we naturally respond to human suffering, parochial rules that permit harming outsiders might run up against affective dissonance. Furthermore, unlike act-based rules, parochial rules seem no more likely than inclusive rules to generate perceptions of unfairness. A rule that says “don’t steal from anyone” doesn’t seem more unfair than a rule that says “don’t steal from people in the community.” The foregoing gives reason to think, even if inclusive rules are not ecologically superior to parochial rules, they are not ecologically inferior in the same way as consequence-based rules. This point is reinforced by the fact even though there are few consequence-based rules anywhere, many rules in many cultures are inclusive. Indeed, by and large, the direction of cultural change is towards more inclusive rather than more parochial norms. Inclusive norms don’t seem to be especially ecologically disadvantaged. That is an important piece of the background for moral progress. Ecologically inferior rules will be poor candidates for making the world a better place. Parochial rules specify a narrow scope of application, e.g., a narrow range of potential victims. As we saw in Chapter 3, people make inferences about the scope of rules based on the kinds of violations to which they’re exposed. In a population that contains two discernable groups, A and B, if a learner encounters several violations, and in each case the victim of the violation was a member of group A, this might be treated as evidence that the potential patients for the rule ranges only over members of group A. For people in insulated communities, insofar as the sample violations they see all involve actions done to people in their community, it might be rational for them to place a higher probability on a parochial rule than a more inclusive one. In this way, we can explain the acquisition of parochial rules

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

190

 

in terms of statistical learning—if all the evidence of violations that I learn about involves people in my community, there is reason to doubt that the rules also apply to people outside my group. Thus, it’s possible that, just as with act-based rules, the belief in parochial rules is actually the product of rational inference given the evidence. Insofar as parochiality is learned in evidentially rational ways, we can draw on this to understand avenues for encouraging the acquisition of inclusive rules. I don’t have a novel argument against parochial morality. But, like many of us, I endorse inclusive morality. From the perspective of those of us who endorse inclusive morality, one might see the acquisition of parochial morality as the result of sampling error. It’s not that people make the wrong inferences from the evidence; the problem is that the evidence is skewed. When people come to believe in parochial rules, it’s because they only see a restricted range of sample violations. But this also suggests a natural way to encourage inclusive morality—provide the learner evidence of violations in which the victims of the violation fall outside the parochial group. Indeed, it’s plausible that inter-group interactions in the form of trade and alliances did provide groups with evidence against their parochial norms.¹⁷ On the assumption that inclusive rules are morally superior to parochial ones, when learners get more representative evidence, they learn better rules.¹⁸ Statistical learning provides some grounds for optimism that people learn morality in rational ways, and that they can learn new rules if given new evidence. Although I think this gives us some grounds for optimism about moral progress, I don’t want to be Pollyanna. The fact that our rational moral psychology makes moral progress possible doesn’t make it probable. Work in the Heuristics and Biases tradition famously shows that people make systematic mistakes in logical and statistical reasoning. The more recent research on statistical learning indicates that despite these mistakes, people have an impressive capacity for statistical reasoning (for discussion, see Cesana-Arlotti et al. 2012; Denison & Xu 2012). I am obviously an ¹⁷ Elsewhere, I argue that our natural emotional reactions to suffering in others would also facilitate more inclusive norms (Nichols 2004c: ch. 7). I continue to think these affective influences are important for moral change. But here, as throughout this book, I’m focusing on the contribution of rational learning to moral psychology. ¹⁸ If indeed parochial rules are worse rules that are inferred because of sampling error, then insofar as we provide people with better evidence, we wouldn’t be tricking them. In that way, it would differ from nudges that play on foibles of our psychological processes. Rather, it would be something that facilitates our rational responses. In Victor Kumar’s term, it would be a bump rather than a nudge (Kumar 2016).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

    

191

enthusiast about this optimistic line of research. However, even if the foibles emphasized by the Heuristics and Biases tradition mask a deeper statistical competence, these foibles interfere with our statistical competence. Moreover, our vulnerabilities to bad reasoning are exploited by advertisers, politicians, and the media. These forces can have a deeply distorting effect on moral outlooks. To be sure, if we want moral progress, we can’t just sit back and hope for statistical learning to do the work. We also need to find ways to offset or circumvent the threats posed to moral progress by our vulnerability to the distortions of marketing, echo chambers, and motivated reasoning. There are clearly obstacles to moral progress, but I think that the evidence on our abilities for flexible rational inference provides some reason to be hopeful.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Conclusion Thus, we now have a fuller defense for the rationality of common-sense morality. First, as we’ve seen, children plausibly acquire the rules based on rationally appropriate principles; for example, their beliefs that the rules are act-based are doxastically rational. And second, act-based rules are more ecologically rational than their consequence-based counterparts. Given our nature, act-based rules are better rules for us to have than consequencebased rules.¹⁹ This doesn’t deliver a strong rationalist view according to which all rational agents with full information will agree on the same moral rules. But it does deliver a modest rationalist view. Some of our moral rules and distinctions seem to be ecologically rational. In those cases, we rationally learn rules that it is rational for us to endorse.

¹⁹ This is not to say that we ought always to follow the rules, even if they are the best rules we have. For one thing, there will often be competing rules and values that might matter more. In addition, there are persistent disputes between act- and rule-consequentialists about “esoteric” morality (see, e.g., Hooker 2000 and de Lazari-Radek & Singer 2014). There is also the possibility that we will ultimately decide that a rule that is ecologically rational is nonetheless counterproductive given other elements of the normative system. In that case, we might want to reject the rule despite its being ecologically rational. So the case for the rationality of these rules will need to be placed in a context with the competing considerations.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

9

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Rationalism, Universalism, and Relativism In this book, I have defended the rationality of a posteriori inferences in the normative domain. In contemporary metaethics, “moral rationalism” occupies a central place. But the kind of rationalism I’ve been developing probably wouldn’t qualify as rationalism in contemporary metaethics. There are two prominent forms of rationalism in metaethics, what we might call epistemic rationalism and practical rationalism. According to epistemic rationalism, moral judgment is grounded in a priori rational thought. According to practical rationalism, there is a necessary connection between moral judgment and reasons for action. Each of these views offers a reassuring account of the authority of morality. Epistemic rationalism maintains that our moral judgments have the full authority of reason, and practical rationalism maintains that morally right action has the authority of reason. I won’t defend either of these forms of moral rationalism, but I will argue that we can retain some of the attractions of metaethical rationalism nonetheless. In Chapter 10, I’ll discuss motivational issues connected with practical rationalism. But in this chapter I’ll focus on epistemic forms of rationalism. To anticipate, I will argue in Section 2 that the belief in moral universalism may not be evidentially rational, but universalism is a default setting in normative cognition, and this default universalism is ecologically rational. In Section 3, I’ll argue that low consensus often provides appropriate grounds for inferring relativism, and this fits with the common-sense reaction to consensus information. But first, some background on a priori epistemic rationalism.

1. A Priori Epistemic Rationalism Epistemic moral rationalism has a deep and distinguished philosophical pedigree. In early modern philosophy, the view was articulated and defended by several philosophers, including Samuel Clarke, John Locke, Rational Rules: Towards a Theory of Moral Learning. Shaun Nichols, Oxford University Press (2021).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

, ,  

193

and John Balguy.¹ I want to set out the traditional a priori rationalist view in some detail because it will be the background for the modest rationalist view I’m promoting. Clarke provides one of the earliest sustained defenses of a priori moral rationalism, and I will focus on his treatment. According to a priori rationalists, there are a priori moral principles that are self-evident, and lead inexorably to moral judgment. These self-evident principles also provide the basis for deductive inferences to further moral claims. For instance, Clarke (1728: 239) writes:

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

the mind of man cannot avoid giving its assent to the eternal law of righteousness; that is, cannot but acknowledge the reasonableness and fitness of men’s governing all their actions by the rule of right or equity: and also that this assent is a formal obligation upon every man, actually and constantly to conform himself to that rule. I might now from hence deduce in particular, all the several duties of morality or natural religion.²

The analogy with mathematics seems irresistible for a priori rationalists. It appears prominently in early modern philosophy and is enjoying a resurgence in the twenty-first century (e.g., Clarke-Doane 2014; Peacocke 2004: 501; see Gill 2019 for discussion). It’s plausible that many mathematical truths are discovered by a priori rationality, and this makes mathematics an appealing model for a priori moral rationalism. Clarke promotes the analogy with simple arithmetic examples. Once you understand the claim that 2 * 2 = 4, you can’t help but appreciate that it is true. Its truth is self-evident and overwhelming. Similarly, Clarke maintains, the idea that you shouldn’t injure an innocent person is self-evident: “it is as absurd and blame-worthy, to mistake negligently plain right and wrong, . . . as it would be absurd and ridiculous for a man in arithmetical matters, ignorantly to believe that twice two is not equal to four” (1728: 232). ¹ For the history of moral rationalism in early modern philosophy, see Gill (2007, 2019). ² In a similar vein, Locke (1975 [1689]: 239) expresses optimism about finding foundations for our duties, and he suggests this will “make morality one of the sciences capable of demonstration.” He continues: Within such a morality, the measures of right and wrong could, I am sure, be derived from self-evident propositions by valid inferences as incontestable as those in mathematics, in a way that would satisfy anyone who was willing to bring to moral studies the same attentiveness and lack of bias that he brings to mathematics. More recently, this a priori rationalist view has been promoted by Christopher Peacocke: “Every moral principle that we know, or are entitled to accept, is either itself a priori, or it is derivable from known a priori moral principles in conjunction with nonmoral propositions that we know” (Peacocke 2004: 500). For a related view, see Audi 2009.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

194

 

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

At least for Clarke, the rational capacities required for appreciating fundamental moral truths are supposed to be minimal. When presented with a statement like “it is wrong to injure the innocent,” so long as we understand the terms, we are compelled by rationality to recognize the truth of the claim. Who is capable of recognizing such truths? According to Clarke, virtually everyone: “These things are so notoriously plain and selfevident, that nothing but the extremest stupidity of Mind, corruption of Manners, or perverseness of Spirit, can possibly make any man entertain the least doubt concerning them” (227). Clarke goes on to give a kind of consensus argument for moral universalism. He first adverts to the passage in the Meno that argues that the recognition of geometrical truths doesn’t require tuition. The slave boy, on Plato’s account, is able to express geometric truths without being taught anything directly. One would get the same result, Clarke suggests, for truths regarding matters of right and wrong. Clarke demurs from Plato’s nativist conclusion, but maintains that the fact that unprejudiced, uneducated minds arrive at the same conclusion does have a critical implication: thus much it proves unavoidably; that the differences, relations, and proportions of things both natural and moral, in which all unprejudiced minds thus naturally agree, are certain, unalterable, and real in the things themselves; and do not at all depend on the variable opinions, fancies, or imaginations of men prejudiced by education, laws, customs, or evil practices. (235)

The fact that all unprejudiced minds come to the same conclusion about matters of mathematics and morality indicates that morality, like mathematics, is a domain of universal truths. If an action is morally wrong, then the property of moral wrongness isn’t relative to context or culture.³

³ In a lovely anticipation of anthropological ethics, Clarke considers an empirical objection to the claim that all unprejudiced minds agree about moral matters. Clarke sets out the objection to his view: There is but one thing, that I am sensible of, which can . . . be objected against what has been hitherto said concerning the necessity of the mind’s giving its assent to the eternal law of righteousness; and that is, the total ignorance, which some whole nations are reported to lie under, of the nature and force of these moral obligations. (1728: 238) Clarke first expresses skepticism about the reports of cultures without morality: “The matter of fact, is not very true” (238). He then goes on to parry the challenge by suggesting that any culture that didn’t understand basic moral truths would likely not understand basic mathematical truths either.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

, ,  

195

2. Universalism and Rationalism A priori epistemic rationalism holds that true moral judgments are universally true, and carry the full authority of reason. I am skeptical that morality has such an unassailable epistemic foundation. Moreover, even if high consensus about moral issues counts as some evidence for universalism, this evidence is seriously undermined on reflection. But that doesn’t mean that universalism has nothing going for it. On the contrary, I’ll suggest that the belief in universalism is likely beneficial in significant ways.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

2.1 Universalism and the Evidence from Consensus As we’ve seen, Clarke maintains that the rational abilities of ordinary people are adequate to the task of recognizing a priori moral truths. Throughout the book, I’ve argued that ordinary people make rational inferences regarding basic issues in the moral domain, including metaethical issues. These inferences are a posteriori, of course. In our experiments, participants draw inferences from the evidence we provide, including metaethical inferences about universalism and relativism. People take high consensus as evidence of universalism and low consensus as evidence of relativism. I’ve argued that these inferences from consensus are evidentially rational given certain substantive assumptions (Chapter 6, Section 3), including the following: (1) The hypotheses under consideration are relativism and universalism. (2) The sample is representative. (3) The individuals making up the consensus are reasonably good at tracking the property in the domain. (4) The individuals making up the consensus are to a significant extent independent. Insofar as people adopt these assumptions and register high consensus for some moral claims, it’s epistemically rational for them to regard those moral claims as universally true. In Chapter 6, I focused on whether it’s evidentially rational for people to make the inferences they do. For those purposes what mattered was whether people accepted the above assumptions. I suggested that there is some reason to think that they do. But now we are facing a different question—should we,

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

196

 

as theorists, accept the assumptions? Even if it’s evidentially rational for people to take high consensus about a moral claim as indicating that the claim is universally true, it’s less clear that the inference carries through for a more informed audience. Once we are exposed to cultural diversity or start philosophizing, we see important challenges to the idea that high consensus provides compelling evidence for universalism. First, the fact that there is cultural diversity in moral judgment challenges the assumption that a high consensus sample from one’s own culture is a representative sample. In a state of pre-anthropological innocence, we might have taken our community’s beliefs about incest to hold universally, but then we learn that different cultures have significantly different views about which relatives are permissible mates. Such diversity suggests that our preanthropological sense of high consensus might just have been a result of sampling error. Of course, it’s a familiar point that cultural diversity alone doesn’t entail metaethical relativism (see, e.g., Baghramian & Carter 2018). Nonetheless, cultural diversity in moral judgment certainly undermines the inference from consensus to universalism about that judgment. Insofar as our beliefs in universalism depend on evidence from consensus, the discovery of diversity should lower our credence in universalism. In addition to concerns about cultural diversity, there are reasons to doubt the adequacy of the hypothesis space. For instance, the hypothesis space neglects the possibility of error theory. More importantly, the hypothesis space fails to distinguish different kinds of relativist views, some of which can accommodate high consensus. As sentimentalists have noted since the eighteenth century, high consensus can be explained by appeal to emotional faculties rather than a priori reason. Sentimentalists would not begrudge Clarke the claim that every normal human being would agree that it is wrong to injure the innocent. But these common judgments are not, according to sentimentalists, the result of recognizing a universal truth about properties in the acts themselves. Rather, we find common moral verdicts among humans because they are based on common human emotions. On a Hutchesonian version of this view, when we see an agent hurting an innocent, we experience a sentiment of disapproval towards the agent. This sentiment might well be a universal component of the human mind, but that doesn’t mean that it’s a universal feature of any rational creature whatever. The sentimentalist can allow that rational aliens with a different emotional make-up might have different views on moral matters, and they wouldn’t be rationally wrong in holding those views (Nichols 2004c, 2008; Prinz 2007). On these sentimentalist accounts, we can explain why there is

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

, ,  

197

widespread consensus while denying that this consensus is explained by people tracking some universal moral truth. Thus, although people might infer moral universalism in ways that are epistemically legitimate given their beliefs, there are philosophical reasons for us to be more cautious about inferring universalism from perceived high consensus.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

2.2 Default Universalism When we acquire moral beliefs, like the belief that it’s wrong to harm unthreatening dogs and cats, we likely acquire those beliefs as unqualified. We don’t carefully hedge them as holding in our culture or our family. One anecdotal indicator of this presumptive universalism is the shock provoked by discovering moral diversity. Herodotus’ Histories, Sumner’s Folkways, and Westermarck’s The Origin and Development of the Moral Ideas hold an enduring fascination partly because it is so surprising that other cultures have different moral views. Are there really cultures in which: people eat their dead parents (Herodotus)? The old and sick are buried alive (Sumner 1906: 325)? Children are killed as sacrifices to the gods (Westermarck 1906: 443)? Boys are expected to perform oral sex on adult men (Herdt 1994)? Each of these surely comes as a surprise to us. We are not prepared for such variation in practices. Presumptive universalism is not restricted to morality. When children learn that dogs have tails do they encode that as universal or more cautiously as having some as-yet-undetermined relativizing parameter? Likely the former. Neither is presumptive universalism restricted to biological kinds. Children tend to be surprised that it’s a different time in different parts of the world and that water boils at different temperatures depending on the altitude. Indeed, even as an adult it can seem pretty weird that it’s Sunday here and Monday in Australia.⁴ ⁴ The claim of default universalism might seem to be in tension with my earlier claim (Chapters 3 and 8) that parochial morality can be a product of rational learning. If the child is rational to infer parochial rules, how can he also regard the rules as universal? It’s plausible that these issues target different dimensions of normative representations. Parochialism concerns the content of the rule, and universalism concerns the status of the rule. To see the distinction, imagine a learner in an insulated community acquiring the rule that it’s wrong to take property from others. When the learner discovers that there are other communities, this raises questions about the content of the rule and the status of the rule. The content question is: Does the rule forbid taking property from those in other communities? The status question is:

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

198

 

Universalism is also likely the default in terms of the representational format of our thoughts about right and wrong. Some adjectives have relations built into the very structure of the lexical item, e.g., “foreign” and “perpendicular” (Gillon 2006). To say that a line is perpendicular requires that it be perpendicular to something. Similarly, to say that some language is foreign presupposes a contrasting starting point. By contrast, the thin evaluative terms right and wrong don’t seem to be necessarily relational in this way. Sometimes when we say something is wrong, we do append a modification on the predicate: it is wrong for children to call their teachers by their first names; it is wrong to drop the subject in English; for Muslims, eating pork is wrong. Unlike terms like “foreign,” these modifications don’t seem to be an essential feature of right and wrong. Rather, when we explicitly relativize a claim about wrongness to some parameter, this might be thought of as a form of predicate restriction, along the lines of domain restriction. In Lewis’s famous example, if I say “all the beer is in the fridge” I’m not saying something about all the beer in the world. There is a tacit restriction to my house. Restricting the domain in this way allows for the words “beer,” “in,” and “fridge” to have their normal meaning, while permitting a charitable interpretation of the utterance. So, if someone objected to my statement by pointing out that the local beer store has shelves of six-packs that aren’t in the fridge, I will make explicit the restriction that was previously tacit. I might do something similar when I say “it’s summer”; if someone objects by saying, “What about New Zealand?” I will make explicit my tacit hemispheric presupposition. Although in many cases when we use the word “wrong” we do impose a relativizing restriction, this isn’t a necessary feature of the word, and predicate restriction is typically absent. Often when we say something is wrong we mean wrong simpliciter: It is wrong to infer that a false antecedent entails a false consequence. It is wrong to ignore base rates. It is wrong to treat subtraction as commutative. Does the rule apply universally or is it relativized, e.g., to my community? This status question admits of different answers even after the content question is settled. Suppose that the answer to the content question is that the rule is parochial—it forbids taking from people in my community but not taking from people in other communities. That parochial rule might be taken to be universal in scope—it is wrong for anyone, including those in other communities, to take property from people in my community. Alternatively, the parochial rule might be taken to be relativized to my group such that it’s wrong for people in my group to take from others in my group, but allows that, relative to the other group, it is not wrong for them to take property from my group.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

, ,  

199

For these statements we don’t presuppose any restriction on the predicate. So there is nothing intrinsic to the adjective wrong that demands relativization. Even full-blown relativists, when going about the normal course of their normative affairs, presumably just use a first-order representation, unencumbered by a relativizing parameter. Take something outside the moral domain, like public nudity. I think public nudity is wrong, but if pressed, I don’t think that it’s wrong in all cultures or that people in cultures in which nudity is the standard practice are behaving badly. Still, my normal representation when I’m making judgments about public nudity is probably not of the form Relative to my culture, public nudity is wrong. Rather, my relativism about nudity emerges when asked for a more reflective judgment. Again, the same probably holds in the non-normative domain. When I think about the seasons, I normally think something like July falls in summer and not Relative to my hemisphere July falls in summer.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

2.3 Ecological Rationality of Default Universalism In Chapter 8, I argued that certain kinds of rules are ecologically rational. An institution or practice is ecologically rational when it is effective for achieving our ends in our actual ecological settings, including the characteristics of our minds. The notion of ecological rationality can also extend to how we represent the status of normative claims. I’ve suggested that when we learn a new norm, we typically do not include relativist qualifications; rather, we effectively presume universalism. In addition, even when we have reflective knowledge that some norm only holds relative to some context, we will often not include that relativist qualification in the format of the norm representation. As a result, universalism (in the sense of the absence of relativist qualification) seems to be the default in both acquisition and representation. This default universalism is, I propose, ecologically rational. Cautiously hedging for some possible but unknown parameter would be computationally inefficient. To explicitly include the relativizing parameter would be a significant cost in terms of processing, and it is often unimportant to make that relativization. Note that this even holds for non-normative representations. Even if it’s true that it’s summer in Australia, that won’t matter much to the seasonal reflections of the average person in the northern hemisphere. It takes effort to keep a relativizing parameter in mind. When it is important, as when planning a trip to

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

200

 

Australia, then we Northerners have to apply the relativization into our representations. There is a further reason that default universalism might be ecologically rational. Having universalism as a default in acquisition might carry advantages when it comes to coordination.⁵ As has long been emphasized by game theorists, coordination problems are a persistent feature of our environment and that of our ancestors, and a critical part of human social life involves solving coordination problems (see, e.g., Bicchieri 2006; Gaus 2011; Skyrms 1996). This is familiar from games like Stag Hunt (Figure 9.1). In this game, if we coordinate and both go for the stag, we will both do well, but if you go for a stag and I go for a rabbit (the lower left cell in Figure 9.1), my reward will be meager and you will go hungry as a result of our failure to coordinate. Game theorists describe several different kinds of situations where coordination itself is especially important. In some of these situations, there is a common interest among all players but there are two options that are equally good, as in choosing which side of the road to drive on (Figure 9.2). In other games, there are conflicting interests but coordination is preferred to discoordination, as in cases where two people want to spend time together but have slightly different preferences for the activity (e.g., one prefers the movies to the theater and the other has the opposite preference). One coordination point is good for both, but better for A; the other Other(s)

Self

Stag

Hare

Stag

2,2

0,1

Hare

1,0

1,1

Figure 9.1 The Stag Hunt

Other(s)

Self

Left

Right

Left

1,1

0,0

Right

0,0

1,1

Figure 9.2 Choosing sides

⁵ Kyle Stanford (2018) develops a related idea, though he does so in the context of an account of the evolution of morality. He argues that our tendency to “objectify” or “externalize” moral demands is an adaptation that served to facilitate cooperation.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

, ,  

201

coordination point is good for both but better for B. In such games, the worst outcome is failing to coordinate at all. Having a default assumption of universalism can facilitate coordination. Suppose that you are learning a norm that is in fact important for coordination. You start out assuming that the norm that is being communicated holds universally. Let’s say you learn that the norm is Right. You assume that that is the norm that everyone else holds (or will ultimately hold) as well. If you get evidence that a small minority thinks the norm is Left, you treat that minority as mistaken. However, if you get evidence that the majority thinks the norm is Left, you switch your first-order view and adopt the Left norm. Since you assume that the norm holds universally, if you find out that you hold a minority view, this provides reason to think you have adopted the wrong norm. By contrast, if you start as a relativist, without the assumption that the norm that is communicated holds universally, if you find out that your own view is in the minority, this doesn’t provide reason to switch to the other view. For, as a relativist, you could just think that there are multiple correct views, relativized by context. The point here is that universalists and relativists will respond to consensus information differently, and one consequence of the difference is that universalists will converge on solutions to coordination problems more effectively than relativists. Imagine two populations, each trying to solve a coordination problem like those above. The people in one population have universalism as a prior expectation—they expect there to be a single right answer about what to do in the case at hand. The people in the other population are relativists—they don’t expect there to be a single right answer. Universalists will be more likely than relativists to treat minority responses as noise than will be relativists. This is simply because the universalists have the prior that there is a single fact. To make this a little more concrete, imagine that 65 percent of people in each population think the correct solution is A and 35 percent think it’s D. The universalists will put considerable credence in the view that 35 percent of people are in error. The relativist, by contrast, will be more likely to think that there are two facts here—D is right for the subpopulation reflected in the 65 percent of the sample who said D, and A is right for the subpopulation reflected in the 35 percent of the sample who said A. Given the willingness of universalists to treat minority responses as noise, consensus information will drive a population of universalists to converge on a solution more quickly than a population of relativists, for the simple reason that relativists are quick to affirm the propriety of multiple different

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

202

 

solutions. Justin Sytsma, Ryan Muldoon, and I ran a series of simulations that confirmed this. In the simulations, the universalist agents readily change their first-order views if given information that their view was in the minority; the relativist agents were much more conservative about changing their first-order views—it took a much higher level of consensus to get relativists to switch their first-order views. As expected, we found that the universalist populations converged much more quickly on solutions to coordination problems then relativist populations. And where these solutions corresponded to differential payoffs, the universalists accrued much higher payoffs than the relativists (Sytsma et al., ms.). The upshot of this is that, where coordination is important, it’s better to err on the side of universalism. In human society, coordination is often essential, and the greater the importance of coordination, the greater the advantage of presuming universalism. This means that having a default of universalism is often ecologically advantageous. One way to think about this operationally is that it’s advantageous to be excessively sensitive to consensus information: it’s advantageous to switch one’s first-order views to conform to the majority more readily than is warranted by probabilistic considerations.⁶ Note that the kind of universalism that facilitates coordination need not be an especially strong form of universalism. For instance, one needn’t have any thoughts about whether some new norm applies to rational aliens.⁷ What matters instead is just that one expects that there is a single right answer about what the norm is, and this doesn’t require thought or commitment regarding far-flung populations. I’ve proposed that our normal mode of acquiring and representing norms is not qualified by a relativizing parameter. We learn norms and represent them in a way that defaults to universalism. I’ve argued that this is in many ways a good thing. It’s plausibly more efficient for acquisition and processing: acquisition isn’t hampered by a search to fill a relativizing parameter, and processing isn’t impeded by tokening such parameters. In addition, I’ve ⁶ Imagine, for instance, that in some particular case, the probabilistic considerations entail that universalism should be the preferred view if 85 percent of the population shares the same first-order view, and relativism should be the preferred view if the consensus is less than 85 percent. A bias towards universalism would set that threshold lower than 85 percent. For example, a universalist bias might set the threshold at 75 percent, and an agent with such a bias would switch their first-order view to conform to the majority view more readily than probabilistic considerations warrant. ⁷ By contrast, Stanford’s claim that “externalizing” moral demands facilitates cooperation relies on a stronger notion, according to which “we regard such demands as imposing unconditional obligations not only on ourselves, but also on any and all agents whatsoever, regardless of their preferences and desires” (2018: 1).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

, ,  

203

argued that default universalism will facilitate coordination. Insofar as coordination problems are central to human social life, this is a significant practical advantage of universalism. At the same time, at least for many norms and practices, the truth lies with relativism. The relativist will often be right to think that there is no single fact (this is surely true in many coordination games). The benefits of coordination will often make it preferable to be a naïve universalist. However, there are also situations in which coordination is not desired. In some domains, we prize our liberty to go our own way (see, e.g., Muldoon et al. 2012). More disturbingly, rigid universalism would entail a tyranny of the majority and promote prejudice against minority opinion. Thus, even though there are considerable advantages to having a bias in favor of universalism, it’s also important to be able to move past default universalism. As we saw in Chapter 6, when given evidence of low consensus, people do move past default universalism and, as I’ll argue in the next section, it can be rational to do so.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

3. Relativism and Rationalism When people get evidence of low consensus, they retreat from universalism. This holds in both moral and non-moral domains (Ayars & Nichols 2020; Goodwin & Darley 2012). What kind of non-universalism do people move to? It’s not entirely clear; indeed, it might not be determinate. But it doesn’t seem to be a simple form of subjectivism. As recounted in Chapter 6, we found that when presented with three different claims about the morally worst action, people responded as relativists about which thing was morally worst. But when one of the three positions was held by a tiny minority, people tended to say that the view held by that minority was mistaken. They did not say that anything goes or that the mere fact that the minority held the belief self-ratified the belief. Again, this makes sense given the kind of evidence consensus provides. If the third group is only a tiny minority, it makes more sense to regard their views as the product of performance error rather than treating them as responding to some third relativized fact.

3.1 Relativism in Non-moral Domains There are good theoretical reasons for maintaining that low consensus is evidence for relativism where the truth of the relevant judgments depends

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

204

 

on the standards of a group. Consider legal judgments. The law explicitly states standards, which are relativized to a particular region. It’s illegal to sell alcohol to 18-year-olds in the U.S., but it’s not illegal to do so in Mexico. Thus, one kind of relativism that applies to part of the normative domain is standards relativism (cf. Sturgeon 1994; Wong 2006: 36). If people in different communities make different judgments about what the law is, it will often be the case that the proper inference to draw is that the communities have different laws. Something similar holds for etiquette, which is partly composed of standards. Although the standards of etiquette are often less explicit than those of the law, it’s often clear enough when there are different standards. It’s clear enough, for instance, that it’s wrong to use the left hand for the fork in the U.S., but not in Europe. There are also good theoretical reasons for thinking that low consensus is evidence of relativism where the truth of the relevant judgments depends on the sensibilities of a group. This applies to judgments about humor and deliciousness. Whether something is funny seems to depend on the sensibilities of the audience, and those sensibilities will vary with, among other things, the age of the audience. Thus, another kind of relativism that is plausible for part of the normative domain is sensibility relativism (cf., e.g., D’Arms & Jacobson 2006, 2010; Shoemaker 2017: 488).⁸ Suppose I want to decide whether to watch the movie, Jackass. I notice a bimodal distribution in the reviews. So I dig into the comments. The lovers say that the stunts are hilarious. Here are some examples from reviews on Amazon: “Hilarious!!!!!!!!! Too funny for words.” “Awesome!!!” “I laughed so hard that I gave myself a migraine headache.” The haters say that the stunts are disgusting and pointless. Here are some of those comments: “Makes me sick: I could not get past the first 10 mins. It was sickening.” “Lowbrow, immature and beyond stupid!”

⁸ In Chapter 6, I noted that relativist hypotheses are penalized because of their flexibility. This penalty is blunted when the groups are carved up in non-arbitrary ways. So, if we can identify robust and stable differences in the sensibilities of two groups, this will alleviate some worries about rampant flexibility.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

, ,  

205

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

“Awful: Don’t even waste your time. This is nothing but a bunch of stupid unsafe antics replete with swearing and inappropriate nudity.” It’s prima facie rational to draw a relativist conclusion from this lack of consensus: Jackass is funny for one subpopulation and not for another, presumably because the different subpopulations have different amusement sensibilities. Note that neither standards relativism nor sensibility relativism is as stark as simple subjectivism, according to which normative judgments are relativized to each individual’s sincere evaluation. Rather, judgments are relativized to the standards or sensibilities of a group rather than simply to an individual. Someone in the U.S. who makes the sincere judgment that it’s legal to sell alcohol to 18-year-olds is just making a mistake. Similarly, a fatigued person can be mistaken about what counts as funny, given their sensibility. This nonsubjectivism fits with the common-sense verdicts we saw earlier (Chapter 6, Section 4). Under low consensus, participants retreated from universalism, but did not descend into a simple subjectivist view. The fact that many normative judgments are based on either standards or sensibilities can make it legitimate to infer relativism regarding normative judgments under certain conditions. In particular, when standards or sensibilities are the bases for relativist groupings, low consensus provides good evidence for relativism. The problems noted above (Section 2.1) for the inferences from consensus are much less significant when we are using consensus to carve up groups by standards and sensibilities. The law is known to vary by country, and so the idea that the laws must be universal isn’t part of my hypothesis space to begin with. Similarly, it is a platitude that different people have different senses of humor and different palates, and we expect people’s judgments to reflect those different sensibilities. The threat of unknown cultural diversity isn’t much of an issue either. Insofar as I am interested in the rules of law and etiquette in my own community, I needn’t worry that I am getting a culturally local sample— that is in fact the kind of sample that I want. Sensibility relativism is similarly receptive to cultural diversity. If other groups have different sensibilities, then the truth of their judgments is determined by their sensibilities, not ours. So we don’t have to worry that we are being irrationally parochial in our own evaluative judgments.⁹ ⁹ If we sample from a population that shares the same sense of humor—say, a coarse sense of humor—and we find consensus regarding some movie, that’s evidence that given that Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

206

 

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

A third concern about inferences from consensus was that people need to be reasonably good at tracking the features of interest. Again, this isn’t much of a problem for standards relativism. People are presumably good at tracking laws that matter to them (e.g., laws that are likely to generate penalties for otherwise appealing behaviors). Sensibility relativism also seems to deal with the worry about tracking. If we presuppose that the judgments are true relative to a sensibility, then tracking requires that people are good at tracking the proper outputs of their own sensibility. So, for example, if we are to rely on consensus as evidence for whether something is funny, people (those whose opinions we are relying on) need to be able to track whether something reliably tickles their sense of humor.¹⁰ But that seems right for judgments about the funny and the delicious. People are generally good at tracking what their sense of humor responds to and what their palate delights in. That’s how they decide which sitcoms to watch and which restaurants to frequent. The independence assumption raises more interesting issues. In order for consensus to provide good evidence, the responses of the individuals must be to some significant extent independent of one another. One familiar threat to independence comes from information cascades. As Bicchieri characterizes the notion, Informational cascades occur when it is optimal for an individual, having observed the actions of other individuals, to follow their behavior regardless of his own preferences or information. Once an individual acts only on the information obtained from others’ actions, his decision conveys no truthful information about his private information or preferences. (Bicchieri 2006: 197; see also Bikhchandani et al. 1992: 994) sensibility, the movie is funny. (Of course we still might have ended up with a misrepresentative sample of the population—maybe we got a batch of atypical coarse-humored folk. But that is an issue that arises whenever we make inferences from samples to populations.) ¹⁰ There are delicate issues about what exactly is supposed to be tracked. For many emotions, it’s useful to distinguish the actual domain of the emotion (the things that reliably trigger the emotion) from the proper domain of the emotion (the things that should trigger the emotion, given the function of the emotion) (see, e.g., Kamtekar & Nichols 2019). So, harmless spiders systematically trigger fear (harmless spiders are in the actual domain) but we might think that fear is not an apt response to harmless spiders (such that harmless spiders aren’t in the proper domain). For any given emotion, it’s overwhelmingly likely that people are good at tracking things in the actual domain of their emotion. And so long as the margin of false positives in the actual domain isn’t too great, reliably tracking things in the actual domain will also entail reliably tracking things in the proper domain. It’s also plausible that for many emotions, people are often good at tracking which triggers in the actual domain are also in the proper domain. I reliably feel fear at coiled rattlesnakes on the trail, and I think that’s the right thing to feel.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

, ,  

207

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Such cascades can be beneficial or detrimental. And they can occur for beliefs just as much as behavior. Information cascades constitute violations of independence since people are just copying others rather than contributing independent information into the consensus. There are numerous cautionary tales about information cascades, from the dot-com fiasco to Tulipmania. How much do we need to fear that the consensus information we get about the law is the product of information cascades? Not very much, at least when the laws carry significant penalties, for people are less likely to generate information cascades when the stakes are high (Munshi 2004; Surowiecki 2005: 63). Information cascades might also be blunted in the case of sensibilities. One’s sense of humor surely depends on upbringing, education, and culture. But the result of this developmental history is a standing sense of humor that remains relatively stable over years or even decades. And the responses people offer about whether a skit is funny is likely to be largely based on their sense of humor. It seems reasonable to be optimistic that most people trust their own sense of humor, providing a check against rampant information cascades. Thus, when sensibilities and standards are in play, low consensus really does provide good evidence for relativism.¹¹

3.2 Relativism in the Moral Domain Sensibility relativism is plausible for some normative domains, including humor. Standards relativism is plausible for other normative domains, including the law and etiquette. Both of these views are available to common sense, and common sense makes reasonable inferences about such matters given evidence from consensus. Standards, in the form of moral rules, also

¹¹ Low consensus is not the only rational basis for drawing relativist conclusions. Resilience of judgment can provide another source of evidence. If I know that several people have the minority view on an issue and that they know that their view is in the minority, I might take this as evidence that relativism is the proper verdict on the issue. More explicitly, if (1) I’m a rational learner, and (2) I assume others are rational learners, and hence are generally sensitive to consensus as evidence for universalism vs. relativism, and (3) I know that for the belief that P, the minority knows that they are in the minority and have not changed their view on the basis of the prevailing consensus, then I have grounds for thinking that the truth of P is relative to context or group. That is a rather more subtle basis for drawing relativist conclusions, but it might nonetheless be a sound basis.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

208

 

play a key role in moral evaluation. And sensibilities that derive from moral emotions are critical to moral responses.¹² Obviously this is not the place to argue for moral relativism, but insofar as morality is grounded in sensibility and standards (rather than some set of a priori truths), low consensus is prima facie evidence for group relativism. As before, the threat of unknown cultural diversity challenges inferences from high consensus to universalism. We typically have limited evidence of ethical opinions from other cultures and of course no evidence from rational aliens. Thus, it’s rash to draw universalist conclusions from high consensus. But the possibility of cultural diversity doesn’t undermine inferences from low consensus to relativism. If we find evidence that people in one region find some action wrong and people in another region find it acceptable, and if we can exclude factual disagreement (Brandt 1959), this is prima facie evidence that the action is wrong relative to the standards of first population and not wrong relative to the standards of the second population. What about tracking and independence? For moral evaluations that are relativized to sensibility, it’s plausible that people are fairly reliable at tracking their own systematic emotional responses. Just as people are generally good at knowing what activates their sense of humor, people are good at knowing which kinds of stimuli are likely to trigger their sympathetic responses. In addition, the internal affective signal helps provide a check against information cascades. A person’s own emotional reaction to witnessing cruelty to animals constitutes an independent source of evidence that influences their judgment that it’s bad to mistreat animals. When it comes to moral standards, many of these standards are not as explicit as the law. A learner must figure them out. Much of this book has argued that people are excellent at gleaning the rules (including moral rules) held by their parents and peers. What about the prospect of information cascades? As noted earlier, in domains where the stakes are high for people, information cascades are less likely to occur. Morality is a domain of great personal importance (see, e.g., Prinz & Nichols 2017; Strohminger & Nichols 2014), and so we can expect people to consult their own moral judgments to some considerable extent. In addition, there are obvious costs to committing moral transgressions, including punishment and exclusion. If someone is

¹² Many moral evaluations plausibly implicate both standards and sensibilities. There is a sensibility that involves guilt, but guilt is activated by appraisals of wrongdoing, and those appraisals hinge on moral rules regarding wrong doing. I leave aside this complication for present purposes.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

, ,  

209

caught violating a standard that is operative in his community, he is likely to be punished in some fashion. This means that people who assert that it’s permissible to φ, have presumably not been punished for φ-ing. This provides an indirect but powerful bulwark against information cascades. The beginning of such a cascade would produce feedback, in the form of punishment, that φ is in fact not permitted. The foregoing suggests that it’s reasonable to take consensus as evidence for moral truth, relativized to standards or sensibilities. Of course, if one has a prior commitment to universalism, perhaps for philosophical reasons, then this might overwhelm any apparent evidence in favor of relativism. Similarly, there might be defeaters for the inference; e.g., it might turn out that the lack of consensus is due to factual disagreement rather than a fundamental difference in standards or sensibilities. But in the absence of such considerations, low consensus can provide good evidence for standards- or sensibility-relativism. My focus throughout here has been on the acquisition of normative and meta-normative beliefs. I’ve been defending the rationality of the acquisition process. There are, of course, other kinds of rational processes that are important to one’s overall normative system, including “consistency reasoning” (see, e.g., Campbell & Kumar 2012). One might acquire a norm via rational processes and then come to reject the norm because it conflicts with more important moral commitments. This can hold even if one is a relativist, since within a given moral framework, some commitments are more important than others. Similarly, one might think that the lack of consensus about some issue (like prejudicial discrimination on the basis of race) does not show that relativism holds, but rather that one of the groups has not adequately considered competing moral considerations. Thus, I don’t mean to suggest that evidence from consensus is the final word on the status of a normative system, but only that it is an important part of the evidence we have about morality. A priori moral rationalism promises a powerful vindication of the authority of common-sense morality. Nothing in this book supports such a strong form of rationalism. Still, common-sense morality is rational in other important ways. First, our default way of learning and processing moral rules is largely unqualified by relativizing parameters. That is, we have a kind of default universalism in the acquisition and processing of moral rules. This likely confers advantages in both the ease of acquisition and the efficiency of processing. In addition, having universalism as a default will facilitate in solving coordination problems. In all of these ways, default universalism is

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 16/12/2020, SPi

210

 

ecologically rational. Second, reflective judgments about the meta-evaluative status of moral claims is appropriately sensitive to evidence from consensus, adding relativizing parameters for subpopulations appropriately based on the distribution of consensus. And this is often a fitting response to differences in standards and sensibilities.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

4. Reprise In Chapter 10, I turn to a very different set of issues, those surrounding moral motivation. But first, I’d like to briefly review the central theoretical claims I’ve developed in Chapters 7–9. I’ve argued that moral judgment depends on a system of rules that register subtle distinctions, and these rules are learned in a rational way via basic domain general learning procedures. This provides an empiricist theory of a key part of moral acquisition, since the learning procedures are domain general. It also entails that crucial parts of our moral system enjoy rational credentials since the learning procedures are forms of rational inference. Thus, what I propose is a rational empiricist account of the acquisition of moral and social rules. Of course, even if moral and social rules are acquired via rational learning, this doesn’t show that the rules themselves are good rules. I argued that one measure of whether a rule is good is whether it is ecologically rational, that is, whether it effectively serves our interests given the kinds of minds and environments we have. No doubt some of our rules are ecologically rational and some aren’t. I focused on one aspect of normative judgment that plays a large role in philosophical ethics: the distinction between acting and allowing. This distinction is enshrined in the scope of our rules as reflected by the fact that our prohibitory rules tend to be act-based rather than consequence-based. I argued that given characteristic features of our minds, including especially our natural aversion to being taken for a sucker, act-based rules are more ecologically rational than consequence-based rules. Finally, in this chapter I argued that our bias in favor of universalism—our default assumption that rules hold generally and not just relative to some subpopulation—is ecologically rational as it facilitates coordination. Despite this advantage conferred by a universalist bias, it’s also important to be able to recognize issues for which universalism is implausible. Low consensus around an issue often is, and should be, treated as evidence for relativism.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

10 Is It Rational to Be Moral?

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

In the previous chapter, I explored epistemic forms of rationalism. But there is also a practical strand of rationalism in metaethics focused on the motivational consequences of moral reasons.¹ One version of the view holds that it is a necessary truth that if it’s morally right for a person to φ then there is a reason for that person to φ, and if a person recognizes that there is a reason to φ, then she will have some motivation to φ (Smith 1994; van Roojen 2015).² Thus, practical rationalism provides an answer to the question, “Why Be Moral?”—we should be moral because it’s the rational thing to do. But there is a broader set of questions about moral motivation.³ Let’s distinguish three “Why Be Moral?” questions: (1) Why are we (most humans) moral? (2) Why should we (most humans) be moral? (3) Why should all (rational) agents be moral? The first question here is flatly descriptive. Why do people tend to behave morally? What is it about us that makes us behave morally? The second question is normative, and it can be linked to the first question: Is the basis for our actual moral behavior a good reason for us to be moral? So the first question might be cast as, What is it about the kind of creature I am that inclines me to be moral? And a corresponding version of the second question is, Given that I am that kind of creature with that kind of basis for moral action, should I be moral? The third question asks a deeper normative question. It asks for a justification for being moral that applies ¹ Clarke maintains that the rational appreciation of moral truths also carries with it motivation to abide by the moral truths (e.g., Clarke 1728: 198–202), but this is less well developed in his work. ² Michael Smith distinguishes two versions of this rationalist thesis. The conceptual rationalist thesis holds that “our concept of a moral requirement is the concept of a reason for action; a requirement of rationality or reason.” The substantive rationalist thesis holds that this conceptual claim bears out in the world. That is, “there are requirements of rationality or reason corresponding to the various moral requirements” (Smith 1994: 64–5). ³ See also Sinnott-Armstrong (2019), supplement on practical skepticism.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

212

 

to all rational beings. An answer to this question might, of course, also explain why humans should be moral. But the third question asks for reasons for being moral that could be offered to any rational person. This last question resonates down the history of philosophy. We find versions of it pressed by Thrasymachus, Hobbes’s Foole, and Hume’s sensible knave. I will begin with this question, only to set it aside.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

1. Why Should All Rational Agents Be Moral? On the evidential rationalist approach I have taken, we get no immediate answer to the grand question, Why should all rational agents be moral? Evidentialism (at least as I’m relying on it) is neutral on that because it doesn’t claim that motivation derives from rational inference. Evidentialism is entirely consistent with the Humean view that reason alone carries no motivational force. Some philosophers maintain that all rational beings should be motivated to do as morality requires because moral requirements are rational requirements and we should be motivated to do as rationality requires (e.g., Nagel 1986; Smith 1994; van Roojen 2018b). This kind of motivational moral rationalism offers a comforting answer for why we should be moral. But many regard the view as improbable and Pollyanna. Blackburn is characteristically acerbic on the matter: This is the permanent chimaera, the holy grail of moral philosophy, the knock-down argument that people who are nasty and unpleasant and motivated by the wrong things are above all unreasonable: that they can be proved to be wrong by the pure sword of reason. They aren’t just selfish or thoughtless or malignant or imprudent, but are reasoning badly, or out of touch with the facts. (Blackburn 1984: 222)

Blackburn goes on to endorse the sentimentalist alternative: “In reality the motivational grip of moral considerations is bound to depend on desires which must simply be taken for granted, although they can also be encouraged and fostered” (222). Like Blackburn, I think that the sentimentalist has the right end of this. It’s perfectly coherent that an agent can be rational without having any motivation to be moral. It’s because we have certain desires and emotions that we behave morally, and there is nothing necessarily irrational about having different emotions. Hume famously gives a

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

     ?

213

stark example: “Tis not contrary to reason to prefer the destruction of the whole world to the scratching of my finger” (Treatise 2.3.3.6). Practical moral rationalists reject this narrow notion of rationality. Nagel, for instance, suggests that the Humean is “in the grip of an overnarrow conception of what reasoning is” (1986: 154). It’s unclear whether this is a fundamental disagreement about the true nature of rationality or whether it’s an instance of philosophers talking past each other. I won’t attempt to enter into this fray. I will turn instead to some more tractable problems involving actual moral motivation.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

2. Why Are We Moral? If we could show that it is irrational to be amoral, that would obviously be a philosophical result of the first order. But there are other enduring philosophical issues surrounding moral motivation. From the earliest discussion of moral motivation, philosophers have asked about the psychological basis of moral behavior. We see this in Plato, Hobbes, Hutcheson, and Hume. I think that the sentimentalist can explain an important range of moral motivation. But I’ve maintained throughout this book that rules are critical to moral judgment and behavior. And sentimentalism, in its familiar forms, seems to have limited resources to explain rule-based moral motivation.⁴ To see the limitations, and to place the issue in a familiar context, it will be helpful to review some contemporary metaethical approaches to moral motivation.

2.1 Internalism and Externalism about Moral Motivation In metaethics, internalism (specifically, “moral judgment internalism”) is typically treated as a conceptual thesis involving a necessary connection between moral judgment and motivation.⁵ However, there is a plausible ⁴ In his reply to recent sentimentalist accounts of moral judgment, Joshua May (2018) emphasizes the role of moral reasoning in moral motivation. In a nice twist, May points out that even when we behave immorally, we often deploy normative reasoning to justify our actions (ch. 7). This is an apt point, and I think the right way to accommodate it is for the sentimentalist to allow that people engage in considerable rule-based moral reasoning. This won’t, however, solve the core problem of rule-based motivation that sentimentalists face. ⁵ Thus, Connie Rosati characterizes internalism as “the conceptual claim that a necessary connection exists between sincere moral judgment and . . . motives: necessarily, if an individual

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

214

 

empirical version of internalism, which claims that it’s an observed fact that moral judgment always brings along with it some motivation to behave in accordance with the judgment. Indeed, some of the metaethical discussion of internalism seems to be about the facts, not the concepts. For instance, in his defense of internalism, Michael Smith writes, “By all accounts, it is a striking fact about moral motivation that a change in motivation follows reliably in the wake of a change in moral judgment, at least in the good and strong-willed person” (1994: 71). This fact is indeed striking and important, but it seems to be about moral motivation itself and not just our concept of it.⁶ How can the fact that there is a reliable connection between moral judgment and moral motivation be explained? Michael Smith articulates two possibilities:⁷ Internalism: the reliable connection between judgement and motivation is to be explained internally: it follows directly from the content of moral judgement itself.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Externalism: the reliable connection between judgement and motivation is to be explained externally: it follows from the content of the motivational dispositions possessed by the good and strong-willed person. (1994: 72)⁸

sincerely judges that she ought to φ, then she has a . . . motive to φ” (2016: 19). As a conceptual thesis, internalism is challenged by the (apparent) conceptual possibility of an agent (e.g., Satan) that makes moral judgments but isn’t motivated by them (see, e.g., Brink 1989; Nichols 2004c). For the purposes of this chapter, my focus is on empirical claims and not narrowly conceptual ones. ⁶ If Smith’s claim were restricted to concepts it would be more fitting to say something like “it’s a striking fact that we expect a change in motivation to follow reliably in the wake of a change in judgment . . .” Elsewhere in the book, Smith writes, “The question is: how are we to explain the reliability of this connection between judgement and motivation in the good and strong-willed person?” (1994: 72). Again, this seems to be about moral motivation itself and not (or not just) our concept of it. Perhaps there are ways to read these statements as purely conceptual claims, but there really is a key fact here, viz., that there is a reliable connection between moral judgment and motivation. And that fact is important to understand even if it is not in the purview of analytic metaethics. ⁷ Again, I will treat these issues as empirical rather than conceptual. So I am taking Smith’s accounts of internalism and externalism as (at least partly) empirical accounts of moral motivation. ⁸ Rosati casts the externalist account as follows: “Moral motivation occurs when a moral judgment combines with a desire, and the content of the judgment is related to the content of the desire so as to rationalize the action” (2016: 22).

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

     ?

215

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Smith goes on to argue that the externalist explanation fails insofar as it treats moral motivation as “de dicto,” which in this context means that the motivation is indirect or derivative. The idea is that externalism bases moral motivation on a general desire to do the right thing, and more specific moral motivations depend expressly on this general desire. So for instance, on this view, the motivation to keep one’s promises depends on (1) the desire to do the right thing and (2) the specification that the right thing to do is to keep one’s promises. However, Smith maintains, much moral motivation is not at all like this. When I keep my promises, I typically don’t go through the process of thinking “I want to do the right thing; keeping promises is the right thing; thus I want to keep my promise.” Instead, when I have made a promise, I am motivated to keep it simply because I promised. This kind of motivation Smith labels “de re,” that is, direct or non-derivative. Smith is pointing to a psychological fact about moral motivation, and it generates the core causal question of moral motivation: Why does a moral judgment that it’s right to φ directly motivate φ-ing? While moral motivation often seems to be direct (i.e., non-derivative), Smith suggests that the situation is otherwise with etiquette. He maintains that our motivation to conform to rules of etiquette is well explained by an externalist account: our judgements about what etiquette requires of us are at best only externally related to our motivations to act accordingly. Those who reliably do ‘the done thing’—insiders to the subgroup, people like Miss Manners— do seem to desire to do what etiquette requires of them where this is read de dicto [i.e., derivative], and not de re [i.e., non-derivative]. (83)

Smith says something similar about the motivation to follow the law: “someone who is reliably motivated to act in accordance with the law must desire to do what she is legally obliged to do, where this is read de dicto and not de re. She must have this motivation because an independent source of motivation would not explain why the connection is reliable” (82). By contrast, “Morally good people are indeed reliably motivated to do what they believe they should, but only if we read this de re and not de dicto” (83). Thus, externalist motivation makes perfect sense, says Smith. It’s just inadequate as an account of moral motivation, because moral motivation is characteristically direct. Smith’s claim that moral motivation is direct does seem plausible as a psychological claim. Externalists have tried to accommodate this in various

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

216

 

ways. For our purposes, one especially interesting externalist suggestion is that we can come to acquire direct motivation. This idea was developed by Sigrún Svavarsdóttir, who writes:

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

The presence in the good person of the desire to be moral certainly does not prevent her from forming such a commitment. Although her desire to Φ may initially be derived from her desire to be moral, it may subsequently come to operate psychologically independently of the latter. (Svavarsdóttir 1999: 206; see also Copp 1997)

It is likely that we develop non-derivative motivations to behave morally through the process Svavarsdóttir suggests. This might be achieved through the kind of habit learning we saw in Chapter 2. When we consistently behave in certain ways, we can develop habitual reactions. So, I might come to think it right to make a monthly donation to Save the Children, and this belief combined with my desire to do the right thing motivates me to donate. As the months wear on, my motivation becomes self-sustaining and I no longer rehearse the derivative reasoning. I am intrinsically motivated to make the donation. It’s also plausible that we acquire direct motivations for moral behavior through reinforcement learning. Just as you can get rats to find it aversive to step on a tile, you can get children to find it aversive to divide a windfall unequally. If you scold selfish children enough (and if their teachers and peers do the same), this would naturally engender unpleasant associations with dividing unequally. Parents and peers certainly do scold a child who keeps a disproportionate amount of a good to be shared, and this provides some reason to expect that young children would come to have some nonderivative aversion to hoarding. Thus, processes of habit-formation and reinforcement learning can secure direct (i.e., non-derivative) motivations to do moral things. But it’s important to note that there is nothing distinctively moral about these processes. The same processes can generate direct motivations to follow the rules of etiquette and school rules. Furthermore, pace Svavarsdóttir, Smith might maintain that there is something different with moral belief, in that changes in moral belief immediately produce direct motivation. The motivation that follows on change in moral belief doesn’t depend on the comparatively slow processes of habit formation or reinforcement learning. Where do sentimentalists land on whether moral motivation is direct? It is taken to be a virtue of sentimentalism that it suggests a tight link between

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

     ?

217

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

moral evaluation and moral motivation. It’s relatively uncontroversial that the sentiments (e.g., sympathy, guilt, pride) are directly motivating, and so insofar as the sentiments are implicated in moral evaluation, these evaluations have a motivational factor built in to the evaluations themselves. Indeed, Hume took this as a primary consideration in favor of sentimentalism (Treatise 3.1.1). Sentimentalists can certainly explain much of moral motivation in the direct way. When I see someone in need and I feel sympathy, this motivates my helping and underwrites my evaluation that helping is good. This is great as far as it goes, but it doesn’t go far enough to solve the problem. Indeed, it doesn’t even solve the case of promising. I am motivated to keep my promises, yet there is no natural emotion for keeping promises, and so there is no primitive sentiment to directly motivate promise-keeping.⁹ Instead, as with so much of moral cognition, our moral motivation to keep promises depends on rules. We seem to be motivated directly by moral rules in a way that is too specific to be accounted for by our emotions, which lack the precise specificity of the rules. Indeed, the content of moral motivation seems to be isomorphic to the content of moral rules. This points to a major lacuna in the sentimentalist account of moral motivation.

2.2 Automaticity of Rule-Based Motivation I think that we can get part of the solution to this problem if we step back from specifically moral motivation and consider rule-based motivation more generally. That is, part of the answer to why we are moral will be because of the nature of our rule-representations. My proposal is that motivation is an automatic concomitant of the primary form of rule acquisition.¹⁰ This proposal outstrips the evidence, I admit. But I’ll lay my bets. I’ll begin with recent evidence from developmental psychology which suggests that children learn rules extremely quickly and are motivated to follow them.¹¹ Children are motivated to conform to adult behavior in

⁹ This is a central observation in Hume’s discussion of justice as an artificial virtue in the Treatise (see, e.g., 3.2.5, para. 7). ¹⁰ This idea is anticipated by Sripada and Stich (2006). They propose that norms are intrinsically motivating. The evidence that they use to support their view is rather different than the evidence I’ll recount below, but I take my elaborations to be broadly consistent with their view. See also Kelly and Davis (2018). ¹¹ I’m grateful to Hannes Rakoczy for discussion of the material in this section.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

218

 

surprisingly detailed ways, as illustrated in work on “overimitation.” When children observe an adult engaging in some goal-oriented task, children will imitate features of the action that are irrelevant to achieving the goal. For instance, in one study, children watch an experimenter open a cage. The experimenter first uses a handle on the side of the cage to rotate the cage 180 around its central axis and then unscrews a cap on the top of a cage and opens the cage. The first action (rotating the cage) is irrelevant to opening the cage (as is clear to an adult observer); nonetheless most of the children copied the irrelevant action (Lyons et al. 2007: 19754; see also McGuigan et al. 2007; Whiten et al. 2009; Whiten & Flynn 2010). The phenomenon is quite robust: even when children were explicitly praised for identifying irrelevant actions, they still showed overimitation (Lyons et al. 2007).¹² Of course, overimitation need not reflect the internalization of a normative rule. To make the link, let’s turn to explicit measures of normative judgment. Children pick up rules very quickly. In one study with 2- and 3-year-olds, the experimenter tells the child that she is going to show him a game called “daxing” and she proceeds to demonstrate daxing by pushing a wooden block along a Styrofoam board until the block lands in a gutter attached to the board, at which point the experimental says, “Now I’ve daxed.” The experimenter then puts the block back on the board and slowly tilts the board up until the block slides into the gutter at which point she exclaims, “Oops! That’s not how daxing goes!” After this the child observes a puppet (controlled by a different experimenter) say “Now I’m gonna dax,” and the puppet proceeds to tilt the board so that the block slides into the gutter. When this happens children intervene both physically (trying to prevent the puppet from tilting the board) and with verbal protestations (e.g., “No! It does not go like this!”; “No! Don’t do it that way!”) (Rakoczy et al. 2008: 879). In a subsequent study using this kind of task (Schmidt et al. 2011), the experimenter didn’t use any normative language but merely demonstrated daxing (without using the word) as if he was quite familiar with the action. The experimenter then gave the block and board to the child and says “Now you can have it.” Children tended to produce the same behavior as the experimenter (2011: 5). For the next phase of the experiment, a puppet tilts

¹² In his impressive book, Darwin’s Unfinished Symphony, Kevin Laland (2017) argues that humans are the only species that engages in “high-fidelity” imitation (of the kind reflected in over-copying) and that this explains why human cultures produced (inter alia) technology so vastly superior to any other species.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

     ?

219

the board (as above). As in the earlier study, the puppet’s action triggers protestations from the child (2011: 5). Thus, witnessing a distinctive act leads to both conforming behavior and to sanctioning of individuals who diverge from the action. A natural interpretation here is that the child internalizes a rule which he follows (in conformity) and enforces (in protestation) (see also Roberts et al. 2017). What are the psychological underpinnings of this rule-based motivation? One possibility is a conditional or indirect model, on which the motivation to follow the rule is conditional on one’s other desires and beliefs. This is akin to the externalist model of motivation we saw above.¹³ So in the case of the daxing study, perhaps the child has a desire to play the daxing game and beliefs about the rules of the game. An alternative model is that the motivation to follow a rule is often an unconditional concomitant of acquisition— the rule is intrinsically motivating. Call this the unconditional model.¹⁴ On this view, it is an automatic sequela upon learning a rule—including a rule of etiquette—that one is motivated to follow the rule. Suggestive evidence for the unconditional model comes from recent work in experimental economics on what we might call naïve rule-following. In one such study, participants completed a task on a computer in which they would drag and drop balls into one of two buckets. They were told that for each ball they put into the yellow bucket they would receive $0.10 and for each ball they put into the blue bucket they would get $0.05. The earnings were updated at the top of the screen after each ball was deposited. The rule component of the study was simple. After being told about the payoffs, participants were told, with no further explanation, “The rule is to put the ¹³ The most prominent contemporary treatment of social norms in philosophy, due to Cristina Bicchieri, is a conditional model on which one prefers to conform to a norm only if one believes that others will conform (2006: 20). Bicchieri’s approach differs from the one pursued here because she takes a dispositional view of preference rather than the internalrepresentation approach that I favor (2006: 10; 2016: 6). More importantly for present purposes, on Bicchieri’s account, these norms are explicitly conditional. The only reason I prefer to follow the norms is because others follow them and think that they should be followed. It doesn’t go deeper than that. On her view, I don’t treat the judgments of others as evidence about the right thing to do, in some unconditional way. As a result, Bicchieri contrasts the way people think about social norms with the phenomenon of “social proof,” when we take others’ judgments to reveal something about the right thing to do or choose (2016: 23). On the view I’m sketching, in rule-learning, we often do treat others’ judgments and behaviors as social proof—i.e., evidence of the right thing to do. ¹⁴ Note that the sense of unconditional operative here is motivational. As noted in Section 3 in Chapter 2, rules can also be unconditional or non-hypothetical in their application such that they can apply to a person regardless of their attitude toward the rule (e.g., Foot 1972). The rules that children learn are plausibly unconditional in both senses, but my interest in this chapter is in the motivational side of things.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

220

 

balls into the blue bucket” (Kimbrough & Vostroknutov 2018: 148). Thus, the rule goes against their monetary interests and no justification for the rule is offered. The central finding of the study was that people who showed greater naïve rule-following (i.e., put more balls in the blue bucket) were more prosocial in a Dictator game (see also Kimbrough & Vostroknutov 2016). For present purposes, a further point is of particular interest: in each of five different countries (the U.S., Canada, Netherlands, Italy, and Turkey), more than 70 percent of the participants engaged in naïve rule-following at least some of the time.¹⁵ This doesn’t show that the unconditional model is right, but the unconditional model can easily make sense of the results— telling people there is a rule induces some motivation to conform to the rule, even in the absence of any known reason for following the rule. The most sensational version of the unconditional model would hold that you can’t not be motivated by every rule you learn.¹⁶ But this seems to run afoul of obvious counterexamples. As a Northerner in South Carolina, I learned that “Ma’am” is often the appropriate form of address, but given the stock of rules I already had about address, this new one didn’t incline me in the least to use that expression. Similarly, we might have a detached attitude towards many of the state laws we learn as adults. In addition, we might be less likely to be motivated to follow a rule introduced by strangers.¹⁷ However, these examples don’t reflect the normal course of rule-learning with children. Children are typically taught the rules in their home, their school, and their community. In those cases, rule-learning is not like the distanced exercise of an anthropologist. It’s the quick acquisition of the rules the child needs to live by. This process of rule acquisition, I suggest, automatically carries motivation to follow the rule. There is an automaticity of normativity. If some rule representations automatically carry motivation and some don’t, there must be a difference in the two classes of rule representations. Presumably the precise way that rules are encoded won’t perfectly map to

¹⁵ See also Henrich (2017), who argues that our propensity to follow social norms is an adaptation. ¹⁶ This formulation is based on a similarly sensational claim from Daniel Gilbert: “you can’t not believe everything you read” (Gilbert et al. 1993). See also Mandelbaum (2014). ¹⁷ We know that children are less likely to imitate those with foreign accents (Kinzler et al. 2011) and those with “low prestige” (Chudek et al. 2012). They are also less likely to accept advice from those known to be error prone (for a review, see Sobel & Kushnir 2013). Perhaps this sensitivity to features of the model will carry over to rule-learning such that children are less likely to be automatically motivated by the rule expressions of people regarded as ignorant or as having low prestige.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

     ?

221

distinctions in linguistic grammar. But a natural way to characterize the distinction is between representing a rule as a declarative statement and representing it as an imperative. So I might know that my mom has the rule it is impermissible to run in the living room or I might internalize the imperative no running in the living room. When we learn about the rules of another culture we might acquire the declarative representation of the rule. But when we learn the rules in our home, as a child, it’s likely that we learn them as imperatival. At a minimum, it’s clear that rules are often taught with the imperatives. Indeed in the CHILDES database, most of the instructions about how to behave were given with imperatives (e.g., “don’t throw anything,” “brush your teeth,” “don’t do that”). If imperatival rule-learning brings motivation along automatically, then refraining from rule-following should involve some amount of impulse control. None of the previous studies indicates any such role for impulse control in rule-based motivation. There is, however, reason to suspect that impulse control is involved in refraining from rule-following, based on a very different line of research. One characteristic test of impulse control is whether a child can refrain from carrying out an imperative. Consider the game Simon says where children are supposed to carry out a command (e.g., “touch your toes”) if and only if that command is preceded by “Simon says.” Children have difficulty not following an imperative even when it isn’t embedded in “Simon says” (e.g., Strommen 1973). In a variant of this kind of game, Bear/Dragon (Reed et al. 1984), children are told to do what the bear-puppet commands, but not what the dragon-puppet commands. Again, children have trouble not doing what the dragon commands. The main use of these results is to measure the development of impulse control (e.g., Carlson 2005). But the fact that it’s used to measure impulse control reveals something important for us. The default is to follow commands. Impulse control is required to override the default.¹⁸ The experiments on impulse control might not involve rule-learning per se, but they certainly

¹⁸ There is an interesting parallel here with Eric Mandelbaum’s (2014) treatment of believing versus rejecting statements. As he notes, when participants under cognitive load are presented with novel statements that are flagged as true or false, participants tend to remember the true statements as true (as one would expect), but they also tend to remember the false statements as true (65–6). The fact that participants tend to treat statements as true even when they have been flagged as false suggests that there is a bias in favor of treating all statements as true. (Mandelbaum takes this as evidence that believing is automatic, while rejecting takes effort.) In the Bear/Dragon experiment, we find a bias in favor of conforming to imperatives since children tend to comply with imperatives even when they are flagged as coming from the dragon.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

222

 

suggest that representing an imperative automatically brings motivation along. It’s plausible that conforming to an imperatival rule is more like a reflex and defying such a rule is more like a deliberate response. I’ve reviewed a diverse range of evidence on rule-following and motivation. The evidence isn’t decisive, but it provides some reason to favor the unconditional model of rule-based motivation. Obviously we can’t draw the conclusion that all rule-learning has this character. But the evidence indicates that sometimes, even with very limited information, and even for nonmoral rules, acquisition of a rule carries with it the motivation to follow the rule. That’s where I’m placing my bets. On this view, at least often when we teach rules to our children, they don’t learn the rules in the indirect, conditional fashion associated with externalism. As we saw above, Michael Smith rejects the conditional (externalist) model of moral motivation, but he promotes the conditional model for etiquette. That is, he maintains that the motivation to follow the norms of etiquette is conditional: “Those who reliably do ‘the done thing’—insiders to the subgroup, people like Miss Manners—do seem to desire to do what etiquette requires of them where this is read de dicto, and not de re” (1994: 83). I suspect instead that for much of etiquette, the motivational effects of rules are not de dicto or conditional. When children learn to say “please” and “thank you,” when they learn to close their mouth when chewing, when they learn to close the door behind them, they don’t learn these things conditional on some further desire, like the desire to be like Miss Manners or some general desire to be a rule-follower. Rather, when we teach our children a rule of etiquette, the rule often enters the psyche as a representation that is directly motivating.

2.3 Automaticity of Moral Motivation Why, then, are we motivated to conform to the content of our moral rules? I’ve suggested that the motivation to follow rules isn’t something that comes from calculation or hard work. It’s an automatic concomitant of acquiring the rule. If rule-learning brings along motivation automatically, perhaps this means that the representations of rules (in this basic motivational form) are not beliefs but conative states (e.g., Gibbard 1992). Or perhaps the representations are simultaneously beliefs and conative states (e.g., Little 1997). Or perhaps the representations are beliefs that motivate, in opposition to the Humean view that beliefs can never motivate. I won’t try to sort any of this

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

     ?

223

out. The main point I want to emphasize is that regardless of which of these views is right, there is reason to think that even non-moral rules often come with a kind of automatic motivation. In Chapter 2, I argued that we need a rule-based account to explain the specificity of moral judgment (e.g., the judgment that it’s wrong for parallel cousins to marry). The same kind of specificity is found for moral motivation (e.g., the motivation not to marry one’s parallel cousin). Rule representations provide the unifying factor for these aspects of normative psychology. We acquire rule representations specifying what is impermissible. These rule representations drive both the judgment that an act is impermissible and some motivation not to do the specified act. As we’ve seen, this account of how rules motivate diverges from externalism, because the motivation is an intrinsic feature of the representation, not something that is conditional on other desires. In addition, the account isn’t sentimentalist in any traditional way since there is no claim that rules motivate in virtue of some natural emotion. Finally, on the view I’ve (tentatively) proposed here, the basic kind of intrinsic motivation that we find associated with moral rules turns out not to be restricted to the moral domain but is rather a general feature of the acquisition of rules. Of course, moral considerations and moral rules might weigh more heavily than non-moral rules. There are several reasons we might find stronger motivation associated with moral rules. (1) Insofar as moral rules align with our natural emotions, that would bring additional motivation. For instance, the fact that we find suffering aversive plausibly increases the motivational power associated with moral rules prohibiting harm (see, e.g., Nichols 2004c). (2) Insofar as moral rules resonate with strong values, like the value we assigned to human life, we can expect those rules to be especially motivationally effective. (3) Finally, even though externalist accounts of moral motivation fail to provide a complete account of moral motivation, there is something to the idea that it matters whether we categorize a potential action as “morally impermissible”. In Chapter 2, I noted that people care about whether a particular action falls under a proscribed category of actions. For instance, people care about whether some prospective sexual encounter would fall under the category incest. It’s plausible that something similar holds for the category morally wrong. If a student finds out that his peers categorize cheating on a test as immoral, that might augment the motivational force of the rule prescribing cheating. Thus, I don’t mean to suggest that there is nothing at all special about moral motivation. Clearly the motivation associated with moral rules is

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

224

 

complex and multifarious. My proposal is that there is a fundamental kind of intrinsic (if weak) motivation associated with rules, and this basic kind of intrinsic motivation attaches not just to moral rules but also to rules much more broadly, including arbitrary conventions like the rules of etiquette.

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

3. Why Should We Be Moral? And now, the last of our “Why be moral” questions: Given the kinds of creatures we (most humans) are, why should we be moral? Why should we keep our promises? Why should we refrain from lying and cheating? Before addressing this question, I want to reiterate that I am not trying to answer the question Why should all rational agents be moral? Rather, the question is Why should we, constituted as we are, be moral? In addition, I’m not trying to give an answer to this question that would entail that morality should always be overriding for us. It’s a matter of debate whether moral considerations should always be overriding for us (e.g., Foot 1978; Shiffrin 1999; Williams 1985). I will focus instead on a more basic question—Why should moral considerations matter to us at all? The classical sentimentalist already has an answer for part of this question. Why should helping others matter to us in decision making? Because our natural emotions often make us value helping others. If I see someone suffering, I feel sympathy, and this motivates helping—the sympathetic emotional reaction informs my decision. And there’s a clear sense in which this is rational—my feelings of sympathy constitute part of the values that guide my informed decision-making. Imagine hiking and hearing a stranger just behind you trip and hurt himself. Your sympathetic reactions will likely motivate you to go back and check on him, even though this delays your own hike. To ignore his moans and continue hiking would actually require self-restraint. Your motivational set will include helping the hiker as a significant felt value. As Hume (1998 [1751]: 81) put it, “the immediate feeling of benevolence and friendship, humanity and kindness, is sweet, smooth, tender, and agreeable, independent of all fortune and accidents.” What the classical sentimentalist does not have, I suggested, is an account of how moral rules motivate action. My own proposal is that for a basic form of rule acquisition, acquiring the rule carries with it the motivation to follow the rule, and this will apply broadly to many moral and conventional rules. When children learn a rule they simultaneously acquire a motivation to

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

     ?

225

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

follow the rule. This means that we are naturally inclined to follow moral rules. It is part of what we value. So asking, Why should moral rules factor in our decisions at all? is a little like asking Why listen to music? or Why eat ice cream? or Why have a beer? There might be good competing reasons not to listen, eat, drink, or follow a moral rule. But absent competing reasons, presumably we should follow our natural inclinations. Indeed, if we are naturally, unconditionally, motivated to follow moral rules, this can factor into a familiar rational choice framework. What I should do, according to rational choice theory, is maximize my expected utility. And if we place some disutility on violating moral rules, as suggested by the unconditional account of rule-based motivation, this disutility should factor into my calculation of the expected utilities for candidate actions. To ignore this disutility would in fact be irrational. So one reason we should follow moral rules is because, like feelings of benevolence, moral rules matter to us, and what matters to us should inform our decisions. While our moral rules might not carry epistemic or practical authority for any conceivable rational creature, they do carry some epistemic and rational authority for us. To that extent, it is rational for us to be moral.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

References Adams, F. & Steadman, A. (2004). Intentional action in ordinary language: core concept or pragmatic understanding? Analysis, 64, 173–81. Amit, E. & Greene, J. (2012). You see, the ends don’t justify the means. Psychological Science, 23(8), 861–8. Audi, R. (2009). The Good in the Right: A Theory of Intuition and Intrinsic Value. Princeton, NJ: Princeton University Press. Ayars, A. (2016). Can model-free reinforcement learning explain deontological moral judgments? Cognition, 150, 232–42. Ayars, A. & Nichols, S. (2017). Moral empiricism and the bias for act-based rules. Cognition, 167, 11–24. Ayars, A. & Nichols, S. (2020). Rational learners and metaethics: universalism, relativism, and evidence from consensus. Mind & Language, 35(1), 67–89. Baghramian, M. & Carter, J. (2018). Relativism. The Stanford Encyclopedia of Philosophy (Winter 2018 Edition), Edward N. Zalta (ed.), https://plato.stanford. edu/archives/win2018/entries/relativism/. Bapna, R., Gupta, A., Rice, S., & Sundararajan, A. (2017). Trust and the strength of ties in online social networks: an exploratory field experiment. MIS Quarterly: Management Information Systems, 41(1), 115–30. Baron, J. (1994). Nonconsequentialist decisions. Behavioral and Brain Sciences, 17, 1–10. Bartels, D. (2008). Principled moral sentiment and the flexibility of moral judgment and decision making. Cognition, 108(2), 381–417. Bartels, D. & Pizarro, D. (2011). The mismeasure of morals. Cognition, 121, 154–61. Batson, C. D. (1991). The Altruism Question: Toward a Socio-Psychological Answer. Hillsdale, NJ: Lawrence Erlbaum. Berger, J., Zelditch, M., Bo, A., & Cohen, B. (1972). Structural aspects of distributive justice: a status value formulation. In J. Berger et al. (eds.), Sociological Theories in Progress, 119–46. Boston, MA: Houghton Mifflin. Berker, S. (2009). The normative insignificance of neuroscience. Philosophy & Public Affairs, 37, 293–329. Berwick, R. C. (1986). Learning from positive-only examples: the subset principle and three case studies. In R. S. Michalski, J. C. Carbonell, & T. M. Mitchell (eds.), Machine Learning: An Artificial Intelligence Approach, Vol. 2, 625–45. Los Altos, CA: Morgan Kaufmann. Bicchieri, C. (2006). The Grammar of Society: The Nature and Dynamics of Social Norms. Cambridge: Cambridge University Press. Bicchieri, C. (2016). Norms in the Wild: How to Diagnose, Measure, and Change Social Norms. Oxford: Oxford University Press.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

228



Bicchieri, C. & Xiao, E. (2009). Do the right thing: but only if others do so. Journal of Behavioral Decision Making, 22(2), 191–208. Bikhchandani, S., Hirshleifer, D., & Welch, I. (1992). A theory of fads, fashion, custom, and cultural change as informational cascades. Journal of Political Economy, 100(5), 992–1026. Blackburn, S. (1984). Spreading the Word: Groundings in the Philosophy of Language. New York: Oxford University Press. Blair, R. J. R. (1995). A cognitive developmental approach to morality. Cognition, 57 (1), 1–29. Blake, P. R. & McAuliffe, K. (2011). “I had so much it didn’t seem fair”: eight-yearolds reject two forms of inequity. Cognition, 120, 215–24. Blake, P. R., McAuliffe, K., Corbit, J., Callaghan, T. C., Barry, O., Bowie, A., Kleutsch, L., Kramer, K. L., Ross, E., Vongsachang, H., & Wrangham, R. (2015). The ontogeny of fairness in seven societies. Nature, 528(7581), 258–61. Blanchard, T., Lombrozo, T., & Nichols, S. (2018). Bayesian Occam’s razor is a razor of the people. Cognitive Science, 42(4), 1345–59. Bloom, P. (2000). How Children Learn the Meanings of Words. Cambridge, MA: MIT Press. Bloomfield, P. (2001). Moral Reality. New York: Oxford University Press. Botterill, G. & Carruthers, P. (1999). The Philosophy of Psychology. Cambridge: Cambridge University Press. Boyd, R. & Richerson, P. J. (1992). Punishment allows the evolution of cooperation (or anything else) in sizable groups. Ethology and Sociobiology, 13(3), 171–95. Brandt, R. (1959). Ethical Theory. Englewood Cliffs, NJ: Prentice Hall. Brandt, R. (1963). Toward a credible form of utilitarianism. In H.-N. Castañeda & G. Nakhnikian (eds.), Morality and the Language of Conduct, 107–43. Detroit, MI: Wayne State University Press. Brandt, R. (1979). A Theory of the Good and the Right. Oxford: Clarendon Press. Bratman, M. (1984). Two faces of intention. Philosophical Review, 93, 375–405. Brink, D. (1989). Moral Realism and the Foundation of Ethics. Cambridge: Cambridge University Press. Buchanan, A. & Powell, R. (2018). The Evolution of Moral Progress: A Biocultural Theory. Oxford: Oxford University Press. Byrne, A. & Hilbert, D. R. (2003). Color realism and color science. Behavioral and Brain Sciences, 26(1), 3–21. Cameron, C. D., Payne, B. K., & Doris, J. M. (2013). Morality in high definition: emotion differentiation calibrates the influence of incidental disgust on moral judgments. Journal of Experimental Social Psychology, 49(4), 719–25. Campbell, R. & Kumar, V. (2012). Moral reasoning on the ground. Ethics, 122(2), 273–312. Carlson, S. M. (2005). Developmentally sensitive measures of executive function in preschool children. Developmental Neuropsychology, 28(2), 595–616. Cesana-Arlotti, N., Téglás, E., & Bonatti, L. L. (2012). The probable and the possible at 12 months: intuitive reasoning about the uncertain future. Advances in Child Development and Behavior, 43, 1–25.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



229

Chintagunta, P. K., Gopinath, S. & Venkataraman, S. (2010). The effects of online user reviews on movie box office performance. Marketing Science, 29(5), 944–57. Chomsky, N. (1972). Problems of Knowledge and Freedom. New York: Fontana/Collins. Chomsky, N. (1980). On cognitive structures and their development: a reply to Piaget. In M. Piatelli-Palmarini (ed.), Language and Learning: The Debate between Jean Piaget and Noam Chomsky, 35–52. Cambridge, MA: Harvard University Press. Chomsky, N. (1986). Knowledge of Language: Its Nature, Origin, and Use. Westport, CT: Greenwood Publishing Group. Chomsky, N. (2003). Replies. In L. M. Antony & N. Hornstein (eds.), Chomsky and His Critics. Oxford: Blackwell. Chomsky, N. (2009). Cartesian Linguistics: A Chapter in the History of Rationalist Thought (3rd edition). Cambridge: Cambridge University Press. Chudek, M., Heller, S., Birch, S., & Henrich, J. (2012). Prestige-biased cultural learning: bystander’s differential attention to potential models influences children’s learning. Evolution and Human Behavior, 33(1), 46–56. Cialdini, R. B., Reno, R. R., & Kallgren, C. A. (1990). A focus theory of normative conduct: recycling the concept of norms to reduce littering in public places. Journal of Personality and Social Psychology, 58(6), 1015–26. Cialdini, R. B., Wosinska, W., Barrett, D. W., Butner, J., & Gornik-Durose, M. (1999). Compliance with a request in two cultures. Personality and Social Psychology Bulletin, 25, 1242–53. Clarke, S. (1728). A Discourse Concerning the Unchangeable Obligations of Natural Religion, and the Truth and Certainty of Christian Revelation (7th edition). London: James & John Knapton. Clarke-Doane, J. (2014). Moral epistemology: the mathematics analogy. Noûs, 48(2), 238–55. Cohen, J. & Nichols, S. (2010). Colours, colour relationalism and the deliverances of introspection. Analysis, 70(2), 218–28. Cohen, J. & Rogers, J. (1991). Knowledge, morality and hope: the social thought of Noam Chomsky. New Left Review, 187(May/June), 5–27. Copp, D. (1997). Belief, reason, and motivation: Michael Smith’s “the moral problem.” Ethics, 108(1), 33–54. Cosmides, L. & Tooby, J. (1995). From function to structure. In M. Gazzaniga (ed.), The Cognitive Neurosciences, 1199–210. Cambridge, MA: MIT Press. Cosmides, L. & Tooby, J. (1996). Are humans good intuitive statisticians after all? Cognition, 58(1), 1–73. Cowie, F. (1999). What’s Within? Oxford: Oxford University Press. Cowie, F. (2008). Innateness and language. The Stanford Encyclopedia of Philosophy (Fall 2017 Edition), Edward N. Zalta (ed.), https://plato.stanford.edu/archives/ fall2017/entries/innateness-language/. Craig, E. (1990). Knowledge and the State of Nature: An Essay in Conceptual Synthesis. Oxford: Clarendon Press. Craig, E. (2007). Genealogies and the state of nature. In A. Thomas (ed.), Bernard Williams, 181–200. Cambridge: Cambridge University Press.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

230



Crain, S. & Nakayama, M. (1987). Structure dependence in grammar formation. Language, 63(3), 522–43. Crisp, R. (2006). Reasons and the Good. Oxford: Oxford University Press. Crockett, M. J. (2013). Models of morality. Trends in Cognitive Sciences, 17(8), 363–6. Cushman, F. (2013). Action, outcome, and value a dual-system framework for morality. Personality and Social Psychology Review, 17(3), 273–92. Cushman, F., Gray, K., Gaffey, A., & Mendes, W. B. (2012). Simulating murder: the aversion to harmful action. Emotion, 12(1), 2–7. Cushman, F., Young, L., & Hauser, M. (2006). The role of conscious reasoning and intuition in moral judgment. Psychological Science, 17(12), 1082–9. Dancy, J. (2009). Moral particularism. The Stanford Encyclopedia of Philosophy (Spring 2009 Edition), Edward N. Zalta (ed.), https://plato.stanford.edu/arch ives/win2017/entries/moral-particularism/. D’Arms, J. & Jacobson, D. (2006). Anthropocentric constraints on human value. Oxford Studies in Metaethics, 1, 99–126. D’Arms, J. & Jacobson, D. (2010). Demystifying sensibilities: sentimental values and the instability of affect. In P. Goldie (ed.), The Oxford Handbook of Philosophy of Emotion, 585–614. Oxford: Oxford University Press. de Lazari-Radek, K. & Singer, P. (2014). The Point of View of the Universe: Sidgwick and Contemporary Ethics. Oxford: Oxford University Press. De Neys, W. & Glumicic, T. (2008). Conflict monitoring in dual process theories of thinking. Cognition, 106(3), 1248–99. de Waal, F. B., Leimgruber, K., & Greenberg, A. R. (2008). Giving is self-rewarding for monkeys. Proceedings of the National Academy of Sciences of the USA, 105(36), 13685–9. Dean, R. (2010). Does neuroscience undermine deontological theory? Neuroethics, 3 (1), 43–60. Denison, S., Bonawitz, E., Gopnik, A., & Griffiths, T. L. (2013). Rational variability in children’s causal inferences. Cognition, 126(2), 285–300. Denison, S. & Xu, F. (2012). Probabilistic inference in human infants. Advances in Child Development and Behavior, 43, 27–58. Dewar, K. & Xu, F. (2010). Induction, overhypothesis, and the origin of abstract knowledge evidence from 9-month-old infants. Psychological Science, 21(12), 1871–7. Dreier, J. (1990). Internalism and speaker relativism. Ethics, 101(1), 6–26. Dupoux, E. & Jacob, P. (2007). Universal moral grammar: a critical appraisal. Trends in Cognitive Sciences, 11(9), 373–8. Dwyer, S. (1999). Moral competence. In K. Murasugi & R. Stainton (eds.), Philosophy and Linguistics, 169–90. Boulder, CO: Westview Press. Dwyer, S. (2006). How good is the linguistic analogy? The Innate Mind, 2, 237–56. Dwyer, S., Huebner, B., & Hauser, M. D. (2010). The linguistic analogy: motivations, results, and speculations. Topics in Cognitive Science, 2(3), 486–510. Elman, J. L., Bates, E., Johnson, M. H., Karmiloff-Smith, A., Parisi, D., & Plunkett, K. (1996). Rethinking Innateness: Connectionism in a Developmental Framework. Cambridge, MA: MIT Press.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



231

Ericsson, K. A. & Simon, H. A. (1984). Protocol Analysis: Verbal Reports as Data. Cambridge, MA: MIT Press. Fehr, E. & Gächter, S. (2002). Altruistic punishment in humans. Nature, 415(6868), 137. Feinberg, J. (1984). Harm to Others. New York: Oxford University Press. Finlay, S. (2007). Four faces of moral realism. Philosophy Compass, 2(6), 820–49. Fisher, M., Knobe, J., Strickland, B., & Keil, F. C. (2017). The influence of social interaction on intuitions of objectivity and subjectivity. Cognitive Science, 41(4), 1119–34. Fodor, J. (1983). Modularity of Mind. Cambridge, MA: MIT Press. Fodor, J. (2000). The Mind Doesn’t Work That Way. Cambridge, MA: MIT Press. Fodor, J. & Pylyshyn, Z. (1988). Connectionism and cognitive architecture. Cognition, 28(1–2), 3–71. Fontanari, L., Gonzalez, M., Vallortigara, G., & Girotto, V. (2014). Probabilistic cognition in two indigenous Mayan groups. Proceedings of the National Academy of Sciences, 111(48), 17075–80. Foot, P. (1967). The problem of abortion and the doctrine of the double effect. Oxford Review, 5. Reprinted in Virtues and Vices, 19–32. Oxford: Oxford University Press. Foot, P. (1972). Morality as a system of hypothetical imperatives. The Philosophical Review, 81(3), 305–16. Foot, P. (1978). Are moral considerations overriding? In Virtues and Vices and Other Essays in Moral Philosophy, 181–8. Oxford: Clarendon Press. Forst, R. (2005). Political liberty. In J. Christman & J. Anderson (eds.), Autonomy and the Challenges to Liberalism, 226–42. Cambridge: Cambridge University Press. Fricker, M. (2008). Scepticism and the genealogy of knowledge: situating epistemology in time. Philosophical Papers, 37(1), 27–50. Gächter, S., Herrmann, B., & Thöni, C. (2004). Trust, voluntary cooperation, and socio-economic background: survey and experimental evidence. Journal of Economic Behavior & Organization, 55(4), 505–31. Gaus, G. (2011). The Order of Public Reason: A Theory of Freedom and Morality in a Diverse and Bounded World. Cambridge: Cambridge University Press. Gaus, G. & Nichols, S. (2017). Moral learning in the open society: the theory and practice of natural liberty. Social Philosophy and Policy, 34(1), 79–101. Gaus, G. (forthcoming). The open society and its complexities: an essay in moral science. Gert, B. (1998). Morality: Its Nature and Justification. New York: Oxford University Press. Gibbard, A. (1992). Wise Choices, Apt Feelings: A Theory of Normative Judgment. Cambridge, MA: Harvard University Press. Gibson, E., Bergen, L., & Piantadosi, S. T. (2013). Rational integration of noisy evidence and prior semantic expectations in sentence interpretation. Proceedings of the National Academy of Sciences, 110(20), 8051–6. Giffin, C. & Lombrozo, T. (2016). Wrong or merely prohibited: special treatment of strict liability in intuitive moral judgment. Law and Human Behavior, 40(6), 707–20.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

232



Gigerenzer, G. (1991). How to make cognitive illusions disappear. European Review of Social Psychology, 2(1), 83–115. Gigerenzer, G. (2019). Axiomatic rationality and ecological rationality. Synthese, 1–18, https://doi.org/10.1007/s11229-019-02296-5. Gilbert, D. T., Tafarodi, R. W., & Malone, P. S. (1993). You can’t not believe everything you read. Journal of Personality and Social Psychology, 65(2), 221–33. Gill, M. B. (2007). Moral rationalism vs. moral sentimentalism: is morality more like math or beauty? Philosophy Compass, 2(1), 16–30. Gill, M. B. (2009). Indeterminacy and variability in meta-ethics. Philosophical Studies, 145(2), 215–34. Gill, M. B. (2019). Morality is not like mathematics: the weakness of the math-moral analogy. The Southern Journal of Philosophy, 57(2), 194–216. Gillon, B. (2006). English relational words, context sensitivity and implicit arguments, https://semanticsarchive.net/Archive/jk5ZjU1O/implicit-argument.pdf. Girotto, V. & Gonzalez, M. (2008). Children’s understanding of posterior probability. Cognition, 106(1), 325–44. Goldberg, S. (2010). Relying on Others: An Essay in Epistemology. Oxford: Oxford University Press. Goldman, A. (1979). What is justified belief? In G. Pappas (ed.), Justification and Knowledge. Dordrecht: Reidel. Goldman, A. & Beddor, B. (2016). Reliabilist epistemology. The Stanford Encyclopedia of Philosophy (Winter 2016 Edition), Edward N. Zalta (ed.), https://plato.stanford.edu/archives/win2016/entries/reliabilism/. Gomez, R. & Gerken, L. (1999). Artificial grammar learning by 1-year-olds leads to specific and abstract knowledge. Cognition, 70(2), 109–35. Goodman, N. (1955). Fact, Fiction, and Forecast. Cambridge, MA: Harvard University Press. Goodman, N., Tenenbaum, J., Feldman, J., & Griffiths, T. (2010). A rational analysis of rule-based concept learning. Cognitive Science, 32(1), 108–54. Goodwin, G. & Darley, J. (2008). The psychology of meta-ethics: exploring objectivism. Cognition, 106, 1339–66. Goodwin, G. & Darley, J. (2010). The perceived objectivity of ethical beliefs: psychological findings and implications for public policy. Review of Philosophy and Psychology, 1, 1–28. Goodwin, G. & Darley, J. (2012). Why are some moral beliefs perceived to be more objective than others? Journal of Experimental Social Psychology, 48, 250–6. Greene, J. (2008). The secret joke of Kant’s soul. In W. Sinnott-Armstrong (ed.), Moral Psychology, Vol. 3, 59–66. Cambridge, MA: MIT Press. Greene, J., Sommerville, R. B., Nystrom, L., Darley, J., & Cohen, J. (2001). An fMRI investigation of emotional engagement in moral judgment. Science, 293(5537), 2105–8. Greene, J. T. (1969). Altruistic behavior in the albino rat. Psychonomic Science, 14(1), 47–8. Griffiths, T. L., Kemp, C., & Tenenbaum, J. B. (2008). Bayesian models of cognition. In R. Sun (ed.), Cambridge Handbook of Computational Psychology, 59–100. Cambridge: Cambridge University Press.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



233

Griffiths, T. L., Lieder, F., & Goodman, N. D. (2015). Rational use of cognitive resources: levels of analysis between the computational and the algorithmic. Topics in Cognitive Science, 7(2), 217–29. Gwynne, S. (2010). Empire of the Summer Moon. New York: Simon & Schuster. Haidt, J. (2001). The emotional dog and its rational tail. Psychological Review, 108(4), 814–34. Hamlin, J. K., Mahajan, N., Liberman, Z., & Wynn, K. (2013). Not like me = bad: infants prefer those who harm dissimilar others. Psychological Science, 24(4), 589–94. Hamlin, J. K., Wynn, K., & Bloom, P. (2007). Social evaluation by preverbal infants. Nature, 450(7169), 557–9. Hare, R. (1981). Moral Thinking: Its Levels, Method, and Point, Vol. 21, No. 41,111. Oxford: Clarendon Press. Harman, G. (1975). Moral relativism defended. The Philosophical Review, 84(1), 3–22. Harman, G. (1976). Practical reasoning. Review of Metaphysics, 79, 431–63. Harman, G. (1977). The Nature of Morality. Oxford: Oxford University Press. Harman, G. (1985). Is there a single true morality? In D. Copp & D. Zimmerman (eds.), Morality, Reason and Truth: New Essays on the Foundations of Ethics. Totowa, NJ: Rowman & Allanheld. Harman, G. (1999). Moral philosophy and linguistics. In K. Brinkmann (ed.), Proceedings of the 20th World Congress of Philosophy, Vol. 1: Ethics. Philosophy Documentation Center, 107–15. Reprinted in his Explaining Value, Oxford: Oxford University Press (2000), 217–26. Harman, G. & Thomson, J. (1996). Moral Relativism and Moral Objectivity. Oxford: Blackwell. Hauser, M. (2006). Moral Minds: How Nature Designed Our Universal Sense of Right and Wrong. New York: Ecco/HarperCollins Publishers. Hauser, M., Cushman, F., Young, L., Kang-Xing Jin, R., & Mikhail, J. (2007). A dissociation between moral judgments and justifications. Mind & Language, 22, 1–21. Heibeck, T. H. & Markman, E. M. (1987). Word learning in children. Child Development, 58, 1021–34. Heiphetz, L. & Young, L. L. (2016). Can only one person be right? The development of objectivism and social preferences regarding widely shared and controversial moral beliefs. Cognition, 167, 78–90. Henrich, J. (2017). The Secret of Our Success: How Culture is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter. Princeton, NJ: Princeton University Press. Herdt, G. (1994). Guardians of the Flutes, Vol. 1: Idioms of Masculinity. Chicago, IL: University of Chicago Press. Hertwig, R. & Grüne-Yanoff, T. (2017). Nudging and boosting: steering or empowering good decisions. Perspectives on Psychological Science, 12(6), 973–86.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

234



Hillinger, C. & Lapham, V. (1971). The impossibility of a Paretian liberal: comment by two who are unreconstructed. Journal of Political Economy, 79(6), 1403–5. Holt, L. L. & Lotto, A. J. (2008). Speech perception within an auditory cognitive science framework. Current Directions in Psychological Science, 17(1), 42–6. Hooker, B. (2000). Ideal Code, Real World: A Rule-Consequentialist Theory of Morality. Oxford: Oxford University Press. Hooker, B. (2016). Rule consequentialism. The Stanford Encyclopedia of Philosophy (Winter 2016 Edition), Edward N. Zalta (ed.), https://plato.stanford.edu/archives/ win2016/entries/consequentialism-rule/. Hopkins, K. (1980). Brother-sister marriage in Roman Egypt. Comparative Studies in Society and History, 22(3), 303–54. Horne, Z. & Powell, D. (2016). How large is the role of emotion in judgments of moral dilemmas? PloS One, 11(7), e0154780. Hornstein, N. (2009). A Theory of Syntax: Minimal Operations and Universal Grammar. Cambridge: Cambridge University Press. Hume, D. (1998 [1751]). An Enquiry Concerning the Principles of Morals. New York: Oxford University Press. Jackson, F. (1998). From Metaphysics to Ethics: A Defence of Conceptual Analysis. Oxford: Oxford University Press. Jackson, J. M. & Harkins, S. G. (1985). Equity in effort: an explanation of the social loafing effect. Journal of Personality and Social Psychology, 49(5), 1199–206. Jacobson, D. (2012). Moral Dumbfounding and Moral. Oxford Studies in Normative Ethics, Volume 2, 2, 289. Joyce, R. (2001). The Myth of Morality. Cambridge: Cambridge University Press. Kahneman, D. & Tversky, A. (1973). On the psychology of prediction. Psychological Review, 80, 237–51. Keizer, K., Lindenberg, S., & Steg, L. (2008). The spreading of disorder. Science, 322 (5908), 1681–5. Kelly, D. & Davis, T. (2018). Social norms and human normative psychology. Social Philosophy & Policy, 35(1), 54–76. Kelly, D., Stich, S., Haley, K. J., Eng, S. J., & Fessler, D. M. (2007). Harm, affect, and the moral/conventional distinction. Mind & Language, 22(2), 117–31. Kelly, T. (2011). Consensus gentium. In K. Clark & R. Vanarragon (eds.), Evidence and Religious Belief, 135–56. Oxford: Oxford University Press. Kemp, C., Perfors, A., & Tenenbaum, J. B. (2007). Learning overhypotheses with hierarchical Bayesian models. Developmental Science, 10(3), 307–21. Kerr, N. L. (1983). Motivation losses in small groups: a social dilemma analysis. Journal of Personality and Social Psychology, 45, 819–28. Kimbrough, E. O. & Vostroknutov, A. (2016). Norms make preferences social. Journal of the European Economic Association, 14(3), 608–38. Kimbrough, E. O. & Vostroknutov, A. (2018). A portable method of eliciting respect for social norms. Economics Letters, 168, 147–50. Kinzler, K. D., Corriveau, K. H., & Harris, P. L. (2011). Children’s selective trust in native-accented speakers. Developmental Science, 14(1), 106–11. Kinzler, K. D., Dupoux, E., & Spelke, E. S. (2007). The native language of social cognition. Proceedings of the National Academy of Sciences, USA, 104, 12577–80.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



235

Kirkham, N. Z., Slemmer, J. A., & Johnson, S. P. (2002). Visual statistical learning in infancy: evidence for a domain general learning mechanism. Cognition, 83, B35–42. Knobe, J. (2003). Intentional action and side effects in ordinary language. Analysis, 63, 190–3. Knobe, J. (2010). Person as scientist, person as moralist. Behavioral and Brain Sciences, 33(4), 315–29. Koenigs, M., Young, L., Adolphs, R., Tranel, D., Cushman, F., Hauser, M., & Damasio, A. (2007). Damage to the prefrontal cortex increases utilitarian moral judgements. Nature, 446(7138), 908–11. Krueger, J. & Clement, R. (1994). The truly false consensus effect: an ineradicable and egocentric bias in social perception. Journal of Personality and Social Psychology, 67, 596–610. Kuhl, P. K. & Miller, J. D. (1975). Speech perception by the chinchilla: voicedvoiceless distinction in alveolar plosive consonants. Science, 190(4209), 69–72. Kumar, V. (2015). Moral judgment as a natural kind. Philosophical Studies, 172(11), 2887–910. Kumar, V. (2016). Nudges and bumps. Georgetown Journal of Law & Public Policy, 14, 861–76. Kushnir, T., Xu, F., & Wellman, H. M. (2010). Young children use statistical sampling to infer the preferences of other people. Psychological Science, 21(8), 1134–40. Laland, K. N. (2017). Darwin’s Unfinished Symphony. Princeton, NJ: Princeton University Press. Landy, J. F. & Goodwin, G. P. (2015). Does incidental disgust amplify moral judgment? Perspectives on Psychological Science, 10(4), 518–36. Latane, B. & Darley, J. (1968). Group inhibition of bystander intervention in emergencies. Journal of Personality and Social Psychology, 10(1968), 215–21. Laurence, S. & Margolis, E. (2001). The poverty of the stimulus argument. The British Journal for the Philosophy of Science, 52(2), 217–76. Layard, R. (2006). Happiness and public policy: a challenge to the profession. The Economic Journal, 116, C24–33. Leider, S., Mobius, M., Rosenblat, T., & Do, Q.-A. (2010). What do we expect from our friends? Journal of the European Economic Association, 8(1) (March), 120–38. Leslie, A. M., Knobe, J., & Cohen, A. (2006). Acting intentionally and the side-effect effect: theory of mind and moral judgment. Psychological Science, 17(5), 421–7. Levine, S. & Leslie, A. (forthcoming). Preschoolers’ representation of intention in moral judgment. Levine, S., Leslie, A. M., & Mikhail, J. (2018). The mental representation of human action. Cognitive Science, 42(4), 1229–64. Levy, D. E., Riis, J., Sonnenberg, L. M., Barraclough, S. J., & Thorndike, A. N. (2012). Food choices of minority and low-income employees: a cafeteria intervention. American Journal of Preventive Medicine, 43(3), 240–8. Liberman, A. M., Mattingly, I. G., & Turvey, M. T. (1972). Language codes and memory codes. In A. W. Melton & E. Martin (eds.), Coding Processes in Human Memory, 307–34. Washington, DC: Winston.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

236



Lidz, J. & Gagliardi, A. (2015). How nature meets nurture: universal grammar and statistical learning. Annual Review of Linguistics, 1(1), 333–53. Lieberman, D., Tooby, J., & Cosmides, L. (2003). Does morality have a biological basis? An empirical test of the factors governing moral sentiments relating to incest. Proceedings of the Royal Society of London. Series B: Biological Sciences, 270(1517), 819–26. Little, M. (1997). Virtue as knowledge: objections from the philosophy of mind. Noûs, 31, 59–79. Locke, J. (1975 [1689]). An Essay Concerning Human Understanding. Edited by P. Nidditch. Oxford: Oxford University Press. Lombrozo, T. (2007). Simplicity and probability in causal explanation. Cognitive Psychology, 55(3), 232–57. Lopez, T. (2013). The moral mind: emotion, evolution, and the case for skepticism. Ph.D. thesis, University of Arizona. Lopez, T., Zamzow, J., Gill, M., & Nichols, S. (2009). Side constraints and the structure of commonsense ethics. Philosophical Perspectives, 23(1), 305–19. Lyons, D. E., Young, A. G., & Keil, F. C. (2007). The hidden structure of overimitation. Proceedings of the National Academy of Sciences, 104(50), 19751–6. McClelland, J. L., Rumelhart, D. E., & Hinton, G. E. (1986). The appeal of parallel distributed processing. In Rumelhart, McClelland, & the PDP Research Group (eds.), Parallel Distributed Processing, Vol. 1. Cambridge, MA: MIT Press. MacFarlane, J. (2014). Assessment Sensitivity: Relative Truth and Its Applications. Oxford: Oxford University Press. McGuigan, N., Whiten, A., Flynn, E., & Horner, V. (2007). Imitation of causally opaque versus causally transparent tool use by 3- and 5-year-old children. Cognitive Development, 22(3), 353–64. Machery, E. (2008). The folk concept of intentional action: philosophical and experimental issues. Mind & Language, 23(2), 165–89. MacKay, D. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge: Cambridge University Press. Mackie, J. (1977). Ethics: Inventing Right and Wrong. London: Penguin. McNaughton, D. & Rawling, P. (1991). Agent-relativity and the doing-happening distinction. Philosophical Studies, 63(2), 167–85. MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk (3rd edition). Hillsdale, NJ: Lawrence Erlbaum Associates. Mallon, R. & Nichols, S. (2010). Rules. In J. Doris (ed.), The Moral Psychology Handbook, 297–320. New York: Oxford University Press. Mallon, R. & Nichols, S. (2011). Dual processes and moral rules. Emotion Review, 3 (3), 284–5. Mandelbaum, E. (2014). Thinking is believing. Inquiry, 57(1), 55–96. Mandelbaum, E. (2017). Associationist theories of thought. The Stanford Encyclopedia of Philosophy (Summer 2017 Edition), E. Zalta (ed.), https://plato. stanford.edu/archives/sum2017/entries/associationist-thought/. Marler, P. (2004). Science and birdsong: the good old days. In P. Marler & H. Slabbekoorn (eds.), Natures Music, 1–38. San Diego, CA: Elsevier Academic Press.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



237

Marr, D. (1982). Vision. Cambridge, MA: MIT Press. Masserman, J. H., Wechkin, S., & Terris, W. (1964). “Altruistic” behavior in rhesus monkeys. The American Journal of Psychiatry, 121(6), 584–5. Maund, B. (2012). Color. The Stanford Encyclopedia of Philosophy (Winter 2012 Edition), Edward N. Zalta (ed.), http://plato.stanford.edu/archives/win2012/ entries/color/. May, J. (2014). Does disgust influence moral judgment? Australasian Journal of Philosophy, 92(1), 125–41. May, J. (2018). Regard for Reason in the Moral Mind. Oxford: Oxford University Press. Mikhail, J. (2007). Universal moral grammar: theory, evidence and the future. Trends in Cognitive Sciences, 11(4), 143–52. Mikhail, J. (2011). Elements of Moral Cognition: Rawls’ Linguistic Analogy and the Cognitive Science of Moral and Legal Judgment. Cambridge: Cambridge University Press. Mikhail, J. (2012). Moral grammar and human rights: some reactions on cognitive science and enlightenment rationalism. In R. Goodman, D. Jinks, & A. Woods (eds.), Understanding Social Action, Promoting Human Rights, 160–202. Oxford: Oxford University Press. Mill, J. S. (1859). On Liberty. London: J. W. Parker. Mill, J. S. (1989). “On Liberty” and Other Writings. Cambridge: Cambridge University Press. Millhouse, T., Ayars, A., & Nichols, S. (2018). Learnability and moral nativism: exploring Wilde rules. In J. Suikkanen & A. Kauppinen (eds.), Methodology and Moral Philosophy, 73–88. New York: Routledge. Muldoon, R., Borgida, M., & Cuffaro, M. (2012). The conditions of tolerance. Politics, Philosophy & Economics, 11(3), 322–44. Mulvey, P. W. & Klein, H. J. (1998). The impact of perceived loafing and collective efficacy on group goal processes and group performance. Organizational Behavior and Human Decision Processes, 71, 62–87. Munshi, K. (2004). Social learning in a heterogeneous population: technology diffusion in the Indian green revolution. Journal of Development Economics, 73, 185–213. Murray, D. (2020). Maggots are delicious, sunsets hideous: false, or do you just disagree? In T. Lombrozo, J. Knobe, & S. Nichols (eds.), Oxford Studies in Experimental Philosophy, Vol. 3, 64–96. Oxford: Oxford University Press. Murray, D. & Nahmias, E. (2014). Explaining away incompatibilist intuitions. Philosophy and Phenomenological Research, 88(2), 434–67. Nadelhoffer, T. (2006). Bad acts, blameworthy agents, and intentional actions: some problems for juror impartiality. Philosophical Explorations, 9(2), 203–19. Nagel, T. (1986). The View from Nowhere. Oxford: Oxford University Press. Newman, L. (2016). Descartes’ epistemology. The Stanford Encyclopedia of Philosophy (Winter 2016 Edition), Edward N. Zalta (ed.), https://plato.stanford. edu/archives/win2016/entries/descartes-epistemology/. Nichols, S. (2004a). After objectivity: an empirical study of moral judgment. Philosophical Psychology, 17, 3–26.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

238



Nichols, S. (2004b). Is religion what we want? Journal of Cognition and Culture, 4(2), 347–71. Nichols, S. (2004c). Sentimental Rules: On the Natural Foundations of Moral Judgment. Oxford: Oxford University Press. Nichols, S. (2005). Innateness and moral psychology. In P. Carruthers, S. Laurence, & S. Stich (eds.), The Innate Mind: Structure and Content, 353–70. New York: Oxford University Press. Nichols, S. (2008). Moral rationalism and empirical immunity. Moral Psychology, 3, 395–407. Nichols, S. (2014). Process debunking and ethics. Ethics, 124(4), 727–49. Nichols, S. (2015). Bound: Essays on Free Will and Responsibility. New York: Oxford University Press. Nichols, S. (2019). Debunking and vindicating in moral psychology. In A. Goldman & B. McLaughlin (eds.), Metaphysics and Cognitive Science, Chapter 4. New York: Oxford University Press. Nichols, S. & Folds-Bennett, T. (2003). Are children moral objectivists? Children’s judgments about moral and response-dependent properties. Cognition, 90(2), B23–32. Nichols, S. & Gaus, J. (2018). Unspoken Rules: resolving under determination with closure principles. Cognitive Science, 42(8), 2735–56. Nichols, S., Kumar, S., Lopez, S., Ayars, A., & Chan, H. (2016). Rational learners and moral rules. Mind & Language, 31, 530–54. Nichols, S. & Mallon, R. (2006). Moral dilemmas and moral rules. Cognition, 100(3), 530–42. Nichols, S. & Pinillos, N. (2018). Skepticism and the acquisition of “knowledge”. Mind & Language, 33, 397–414. Nichols, S. & Samuels, R. (2017). Bayesian psychology and human rationality. In T. Hung & T. Lane (eds.), Rationality: Constraints and Contexts, 17–36. San Diego, CA: Elsevier. Nichols, S. & Ulatowski, J. (2007). Intuitions and individual differences: the Knobe effect revisited. Mind & Language, 22(4), 346–65. Nisbett, R. E., Krantz, D. H., Jepson, C., & Kunda, Z. (1983). The use of statistical heuristics in everyday inductive reasoning. Psychological Review, 90(4), 339–63. Nozick, R. (1974). Anarchy, State, and Utopia. New York: Basic Books. Nozick, R. (1981). Philosophical Explanations. Cambridge, MA: Harvard University Press. Parfit, D. (1984). Reasons and Persons. Oxford: Oxford University Press. Partington, S., Nichols, S., & Kushnir, T. (2020). When in Rome, do as Bayesians do: statistical learning and parochial norms. Conference paper. Peacocke, C. (2004). Moral rationalism. The Journal of Philosophy, 101(10), 499–526. Pellizzoni, S., Siegal, M., & Surian, L. (2010). The contact principle and utilitarian moral judgments in young children. Developmental Science, 13(2), 265–70. Perfors, A., Tenenbaum, J. B., Griffiths, T. L., & Xu, F. (2011). A tutorial introduction to Bayesian models of cognitive development. Cognition, 120(3), 302–21.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



239

Petrinovich, L., O’Neill, P., & Jorgensen, M. (1993). An empirical study of moral intuitions: toward an evolutionary ethics. Journal of Personality and Social Psychology, 64(3), 467–78. Pinillos, N. Á., Smith, N., Nair, G. S., Marchetto, P., & Mun, C. (2011). Philosophy’s new challenge: experiments and intentional action. Mind & Language, 26(1), 115–39. Pinker, S. & Jackendoff, R. (2005). The faculty of language: what’s special about it? Cognition, 95(2), 201–36. Prinz, J. (2007). The Emotional Construction of Morals. Oxford: Oxford University Press. Prinz, J. (2008). Is morality innate? In W. Sinnott-Armstrong (ed.), Moral Psychology: The Evolution of Morality, Vol. 1: Adaptations and Innateness, 367–406. Cambridge, MA: MIT Press. Prinz, J. & Nichols, S. (2017). Diachronic identity and the moral self. In J. Kiverstein (ed.), Routledge Handbook of Social Mind, 449–64. New York: Routledge. Proft, M., Dieball, A., & Rakoczy, H. (2019). What is the cognitive basis of the sideeffect effect? An experimental test of competing theories. Mind & Language, 34(3), 357–75. Pullum, G. K. & Scholz, B. C. (2002). Empirical assessment of stimulus poverty arguments. The Linguistic Review, 18(1–2), 9–50. Quinn, W. (1993). Morality and Action. Cambridge: Cambridge University Press. Railton, P. (1984). Alienation, consequentialism, and the demands of morality. Philosophy & Public Affairs, 13, 134–71. Railton, P. (1986). Moral realism. The Philosophical Review, 95(2), 163–207. Railton, P. (2014). The affective dog and its rational tale: intuition and attunement. Ethics, 124(4), 813–59. Railton, P. (2017). Moral learning: why learning? Why moral? And why now? Cognition, 167, 172–90. Rakoczy, H. & Schmidt, M. F. (2013). The early ontogeny of social norms. Child Development Perspectives, 7(1), 17–21. Rakoczy, H., Warneken, F., & Tomasello, M. (2008). The sources of normativity. Developmental Psychology, 44(3), 875–81. Rawls, J. (2001). Justice as Fairness: A Restatement. Edited by Erin Kelly. Cambridge, MA: Harvard University Press. Raz, J. (1970). The Concept of a Legal System. Oxford: Clarendon Press. Reed, M. A., Pien, D. L., & Rothbart, M. K. (1984). Inhibitory self-control in preschool children. Merrill-Palmer Quarterly, 30(2), 131–47. Reiter, S. M. & Samuel, W. (1980). Littering as a function of prior litter and the presence or absence of prohibitive signs 1. Journal of Applied Social Psychology, 10 (1), 45–55. Rey, G. (2003). Chomsky, intentionality, and a CRTT. In L. M. Antony & N. Hornstein (eds.), Chomsky and his Critics, 105–39. Oxford: Blackwell. Roberts, S. O., Gelman, S. A., & Ho, A. K. (2017). So it is, so it shall be: group regularities license children’s prescriptive judgments. Cognitive Science, 41, 576–600.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

240



Rosati, C. (2016). Moral motivation. The Stanford Encyclopedia of Philosophy (Winter 2016 Edition), Edward N. Zalta (ed.), https://plato.stanford.edu/arch ives/win2016/entries/moral-motivation/. Rose, D. & Nichols, S. (2019). From punishment to universalism. Mind & Language, 34(1), 59–72. Ross, W. D. (2002). The Right and the Good. New York: Oxford University Press. Rozin, P. (1986). One-trial acquired likes and dislikes in humans: disgust as a US, food predominance, and negative learning predominance. Learning and Motivation, 17(2), 180–9. Rumelhart, D. E. & McClelland, J. L. (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations. Cambridge, MA: MIT Press. Ryan, J. A. (2003). Moral relativism and the argument from disagreement. Journal of Social Philosophy, 34(3), 377–86. Saffran, J., Aslin, R., & Newport, E. (1996). Statistical learning by 8-month-old infants. Science, 274(5294), 1926–8. Saffran, J., Johnson, E., Aslin, R., & Newport, E. (1999). Statistical learning of tone sequences by human infants and adults. Cognition, 70(1), 27–52. Salzman, P. (2008). Culture and Conflict in the Middle East. Amherst, NY: Humanity Press. Samuels, R. (2002). Nativism in cognitive science. Mind & Language, 17(3), 233–65. Sarkissian, H., Park, J., Tien, D., Wright, J. C., & Knobe, J. (2011). Folk moral relativism. Mind & Language, 26(4,) 482–505. Sauer, H. (2017). Moral Judgments as Educated Intuitions: A Rationalist Theory of Moral Judgment. Cambridge, MA: MIT Press. Schaller, M. (1992). In-group favoritism and statistical reasoning in social inference. Journal of Personality and Social Psychology, 63, 61–74. Schmidt, M. F., Butler, L. P., Heinz, J., & Tomasello, M. (2016). Young children see a single action and infer a social norm: promiscuous normativity in 3-year-olds. Psychological Science, 27(10), 1360–70. Schmidt, M. F., Rakoczy, H., & Tomasello, M. (2011). Young children attribute normativity to novel actions without pedagogy or normative language. Developmental Science, 14(3), 530–9. Schultz, P. R., Bator, J., Large, L. B., Bruni, C. M., & Tabanico, J. J. (2013). Littering in context: personal and environmental predictors of littering behavior. Environment and Behavior, 45(1), 35–59. Shafer-Landau, R. (2003). Moral Realism: A Defence. New York: Oxford University Press. Shafto, P., Goodman, N., & Griffiths, T. (2014). A rational account of pedagogical reasoning: teaching by, and learning from, example. Cognitive Psychology, 71, 55–89. Shiffrin, S. V. (1999). Moral overridingness and moral subjectivism. Ethics, 109(4), 772–94. Shoemaker, D. (2017). Response-dependent responsibility; or, a funny thing happened on the way to blame. Philosophical Review, 126(4), 481–527.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



241

Sidgwick, H. (1884). The Methods of Ethics (3rd edition). London: MacMillan & Company. Simon, H. (1990). Invariants of human behavior. Annual Review of Psychology, 41(1), 1–20. Singer, P. (1972). Famine, affluence, and morality. Philosophy & Public Affairs, 1, 229–43. Singer, P. (1980). The Expanding Circle. New York: Farrar, Strauss & Giroux. Singer, P. (2005). Ethics and intuitions. Journal of Ethics, 9, 331–52. Sinnott-Armstrong, W. (1996). Moral skepticism and justification. In W. SinnottArmstrong & M. Timmons (eds.), Moral Knowledge? New Readings in Moral Epistemology, 3–48. New York: Oxford University Press. Sinnott-Armstrong, W. (2019). Moral skepticism. The Stanford Encyclopedia of Philosophy (Summer 2019 Edition), Edward N. Zalta (ed.), https://plato.stanford. edu/archives/sum2019/entries/skepticism-moral/. Skyrms, B. (1996). The Evolution of the Social Contract. Cambridge: Cambridge University Press. Smetana, J. (1985). Preschool children’s conceptions of transgressions: effects of varying moral and conventional domain-related attributes. Developmental Psychology, 21(1), 18–29. Smetana, J. G. (1993). Understanding of social rules. In M. Bennett (ed.), The Development of Social Cognition: The Child as Psychologist, 111–41. New York: Guilford Press. Smith, L., Jones, S., Landau, B., Gershkoff-Stowe, L., & Samuelson, L. (2002). Object name learning provides on-the-job training for attention. Psychological Science, 13(1), 13–19. Smith, M. (1993). Realism. In P. Singer (ed.), A Companion to Ethics, 399–410. Cambridge, MA: Blackwell. Smith, M. (1994). The Moral Problem. Oxford: Blackwell. Smith, V. (2003). Constructivist and ecological rationality in economics. American Economic Review, 93(3), 465–508. Snare, F. E. (1980). The diversity of morals. Mind, 89(355), 353–69. Sobel, D. M. & Kushnir, T. (2013). Knowledge matters: how children evaluate the reliability of testimony as a process of rational inference. Psychological Review, 120 (4), 779–97. Sober, E. (2015). Ockham’s Razors. Cambridge: Cambridge University Press. Sparks, E., Schinkel, M. G., & Moore, C. (2017). Affiliation affects generosity in young children: the roles of minimal group membership and shared interests. Journal of Experimental Child Psychology, 159, 242–62. Spelke, E. & Kinsler, K. (2007). Core knowledge. Developmental Science, 10, 89–96. Sperber, D. & Wilson, D. (1995). Relevance: Communication and Cognition (2nd edition). Malden, MA: Blackwell. Sripada, C. & Stich, S. (2006). A framework for the psychology of norms. In P. Carruthers, S. Laurence, & S. Stich (eds.), The Innate Mind, Vol. 2: Culture and Cognition, 280–301. New York: Oxford University Press. Stanford, P. K. (2018). The difference between ice cream and Nazis: moral externalization and the evolution of human cooperation. Behavioral and Brain Sciences, 41, 1–49.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

242



Stanley, M. L., Yin, S., & Sinnott-Armstrong, W. (2019). A reason-based explanation for moral dumbfounding. Judgment and Decision Making, 14(2), 120–9. Stein, E. (1996). Without Good Reason. Oxford: Clarendon Press. Sterelny, K. (2010). Moral nativism: a sceptical response. Mind & Language, 25(3), 279–97. Stich, S. (1990). The Fragmentation of Reason. Cambridge, MA: MIT Press. Stich, S. (1993). Moral philosophy and mental representation. In M. Hechter, L. Nadel, & R. E. Michod (eds.), The Origin of Values, 215–28. Hawthorne, NY: Aldine de Gruyter. Stone, J. (1964). Legal System and Lawyers’ Reasonings. Stanford: Stanford University Press. Strohminger, N., Lewis, R. L., & Meyer, D. E. (2011). Divergent effects of different positive emotions on moral judgment. Cognition, 119(2), 295–300. Strohminger, N. & Nichols, S. (2014). The essential moral self. Cognition, 131(1), 159–71. Strommen, E. A. (1973). Verbal self-regulation in a children’s game: impulsive errors on “Simon Says.” Child Development, 44(4), 849–53. Sturgeon, N. L. (1994). Moral disagreement and moral relativism. Social Philosophy and Policy, 11(1), 80–115. Sumner, W. (1906). Folkways. Boston, MA: Athenium Press. Surowiecki, J. (2004). The Wisdom of Crowds. New York: Doubleday. Surowiecki, J. (2005). The Wisdom of Crowds. New York: Anchor. Svavarsdóttir, S. (1999). Moral cognitivism and motivation. The Philosophical Review, 108(2), 161–219. Swain, S., Alexander, J., & Weinberg, J. (2008). The instability of philosophical intuitions. Philosophy and Phenomenological Research, 76(1), 138–55. Sytsma, J., Muldoon, R., & Nichols S. (ms.). The meta-wisdom of crowds. Tenenbaum, J. & Griffiths, T. (2001). Generalization, similarity, and Bayesian inference. Behavioral and Brain Sciences, 24, 629–41. Thomson, J. J. (1976). Killing, letting die, and the trolley problem. The Monist, 59(2), 204–17. Thomson, J. J. (1985). The trolley problem. The Yale Law Journal, 94(6), 1395–415. Timmons, M. (1999). Morality without Foundations: A Defense of Ethical Contextualism. New York: Oxford University Press. Timmons, M. (2008). Towards a sentimentalist deontology. In W. SinnottArmstrong (ed.), Moral Psychology, Vol. 3: The Neuroscience of Morality: Emotion, Brain Disorders, and Development. Cambridge, MA: MIT Press. Tisak, M. (1995). Domains of social reasoning and beyond. Annals of Child Development, 11(1), 95–130. Todd, P. M. & Gigerenzer, G. (2007). Environments that make us smart: ecological rationality. Current Directions in Psychological Science, 16(3), 167–71. Tolman, E. (1948). Cognitive maps in rats and men. Psychological Review, 55(4), 189–208. Tooby, J. & Cosmides, L. (1992). The psychological foundations of culture. In J. Barkow, L. Cosmides, & J. Tooby (eds.), The Adapted Mind: Evolutionary Psychology and the Generation of Culture, 19–136. New York: Oxford University Press.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.



243

Trout, J. D. (2001). The biological basis of speech: what to infer from talking to the animals. Psychological Review, 108(3), 523–49. Trout, J. D. (2003). Biological specializations for speech: what can the animals tell us? Current Directions in Psychological Science, 12(5), 155–9. Turiel, E. (1983). The Development of Social Knowledge: Morality and Convention. Cambridge: Cambridge University Press. Turiel, E. (1998). The development of morality. In W. Damon & N. Eisenberg (eds.), Handbook of Child Psychology (5th edition), Vol. 3: Social, Emotional, and Personality Development. Hoboken, NJ: Wiley. Turnbull, C. (1972). The Mountain People. New York: Simon & Schuster. Unger, P. (1996). Living High and Letting Die. Oxford: Oxford University Press. Uniacke, S. (1998). The principle of double effect. In E. Craig (ed.), Routledge Encyclopedia of Philosophy, Vol. 3. Oxford: Taylor and Francis. Valdesolo, P. & DeSteno, D. (2006). Manipulations of emotional context shape moral judgment. Psychological Science, 17(6), 476–7. van Roojen, M. (2014). Metaethics: A Contemporary Introduction. London: Routledge. van Roojen, M. (2015). Metaethics. New York: Routledge Press. van Roojen, M. (2018a). Moral cognitivism vs. non-cognitivism. The Stanford Encyclopedia of Philosophy (Fall 2018 Edition), E. Zalta (ed.), https://plato.stan ford.edu/archives/fall2018/entries/moral-cognitivism/. van Roojen, M. (2018b). Rationalist metaphysics, semantics and metasemantics. In K. Jones & S. Francois (eds.), The Many Moral Rationalisms, 167–86. Oxford: Oxford University Press. Wellman, C. H. (1995). On conflicts between rights. Law and Philosophy, 14(3), 271–95. Welzl, H., D’Adamo, P., & Lipp, H. P. (2001). Conditioned taste aversion as a learning and memory paradigm. Behavioural Brain Research, 125(1–2), 205–13. Westermarck, E. (1906). The Origin and Development of Moral Ideas. London: Macmillan. Whiten, A. & Flynn, E. (2010). The transmission and evolution of experimental microcultures in groups of young children. Developmental Psychology, 46(6), 1694–1709. Whiten, A., McGuigan, N., Marshall-Pescini, S., & Hopper, L. M. (2009). Emulation, imitation, over-imitation and the scope of culture for child and chimpanzee. Philosophical Transactions of the Royal Society B: Biological Sciences, 364(1528), 2417–28. Williams, B. (1985). Ethics and the Limits of Philosophy. London: Fontana Press. Williams, B. (2002). Truth & Truthfulness: An Essay in Genealogy. Princeton, NJ: Princeton University Press. Wong, D. B. (1984). Moral Relativity. Berkeley: University of California Press. Wong, D. B. (2006). Natural Moralities: A Defence of Pluralistic Relativism. New York: Oxford University Press. Wright, J. C. & Bartsch, K. (2008). Portraits of early moral sensibility in two children’s everyday conversations. Merrill-Palmer Quarterly, 54(1), 56–85.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

244



Wright, J. C., Grandjean, P. T., & McWhite, C. B. (2013). The meta-ethical grounding of our moral beliefs: evidence for meta-ethical pluralism. Philosophical Psychology, 26(3), 336–61. Wright, J. C., McWhite, C., & Grandjean, P. (2014). The cognitive mechanisms of intolerance. Oxford Studies in Experimental Philosophy, 1, 28–61. Xu, F. & Denison, S. (2009). Statistical inference and sensitivity to sampling in 11-month-old infants. Cognition, 112(1), 97–104. Xu, F. & Garcia, V. (2008). Intuitive statistics by 8-month-old infants. Proceedings of the National Academy of Sciences, 105(13), 5012–15. Xu, F. & Kushnir, T. (2013). Infants are rational constructivist learners. Psychological Science, 22(1), 28–32. Xu, F., Kushnir, T., & Benson, J. (2012). Rational Constructivism in Cognitive Development. Cambridge, MA: Academic Press. Xu, F. & Tenenbaum, J. B. (2007a). Sensitivity to sampling in Bayesian word learning. Developmental Science, 10(3), 288–97. Xu, F. & Tenenbaum, J. B. (2007b). Word learning as Bayesian inference. Psychological Review, 114(2), 245–72. Yang, C., Crain, S., Berwick, R. C., Chomsky, N., & Bolhuis, J. J. (2017). The growth of language: universal grammar, experience, and principles of computation. Neuroscience & Biobehavioral Reviews, 81, 103–19. Ye, Q., Law, R., & Gu, B. (2009). The impact of online user reviews on hotel room sales. International Journal of Hospitality Management, 28(1), 180–2. Zagzebski, L. (2011). Epistemic self-trust and the consensus gentium argument. In K. Clark & R. Vanarragon (eds.), Evidence and Religious Belief, 135–56. Oxford: Oxford University Press. Zamzow, J. L. (2015). Rules and principles in moral decision making: an empirical objection to moral particularism. Ethical Theory and Moral Practice, 18(1), 123–34. Zelazo, P. D., Helwig, C. C., & Lau, A. (1996). Intention, act, and outcome in behavioral prediction and moral judgment. Child Development, 67(5), 2478–92. Zhang, Z., Ye, Q., Law, R., & Li, Y. (2010). The impact of e-word-of-mouth on the online popularity of restaurants. International Journal of Hospitality Management, 29(4), 694–700.

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Index

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Act/allow distinction 50–51, 60, 64, 170–71, 174, 183 Act-based rules 52, 53, 62, 64, 82–83, 89–94, 134, 150, 152–56, 164–70, 184–91, 210 Adams, F. 72 Audi, R. 193n2 Affect 6–9, 37–43, 61, 89, 107, 114, 120, 122, 123, 125, 127, 151, 178–90, 208 Amit, E. 7 Automaticity of rule-based motivation 167, 217, 220, 222 Ayars, A. 36, 41n11, 84, 87, 88, 91, 111, 112, 120–22, 130, 203 Baghramian, M. 196 Bapna, R. 182 Baron, J. 165 Bartels, D. 7, 151 Bartsch, K. 50 Batson, C. x Beddor, B. 12 Berger, J. 181 Berker, S. 7n1 Bicchieri, C. 95, 128, 200, 206, 219n13 Bikhchandani, S. 206 Blackburn, S. 110, 212 Blair, R. J. R. x, 3, 25–26, 124 Blake, P. R. 179–80 Blanchard, T. 114 Bloom, P. 86 Bloomfield, P. 110 Botterill, G. 142 Boyd, R. 182 Brandt, R. 177, 208 Bratman, M. 72 Brink, D. 110, 214n5 Buchanan, A. 188 Byrne, A. 121 Cameron, C.D. 8–9 Campbell, R. 209 Carlson, S. 221

Carruthers, P. 142 Carter, J. 196 Cesana-Arlotti, N. 190 Chintagunta, P.K. 128 Chomsky, N. 11, 65, 139–43, 155, 162 Chudek, M. 220n17 Cialdini, R. B. 113, 128, 182, 184 Clarke-Doane, J. 193 Clarke, S. 192–96, 211 Clement, R. 127n13 Cohen, J. 121–22 Cohen, J. 157–58 Consensus as evidence 131–33, 195, 206–209 Consequentialism 52–56, 60–64, 83–94, 133–35, 150, 156, 159, 162–69, 184–91, 210 Copp, D. 216 Cosmides, L. 13 Cowie, F. 20, 142 Craig, E. 175n6 Crain, S. 146 Crisp, R. 170–72 Crockett, M. J. 26, 32 Cushman, F. 4, 26–27, 32, 34–36, 72, 151 D’Arms, J. 204 Dancy, J. 25 Darley, J. 5, 8–10, 111, 118–20, 122–23, 129, 203 de Lazari-Radek, K. 171–73, 178, 191 de Waal, F. B. 147n8 Dean, R. 7 Debunking 5, 8–10, 35, 162, 170–71 DeNeys, W. 19 Denison, S. 17, 190 Dewar, K. 86–87 Doris, J.M. 9 Doxastic rationality 164–69, 188, 191 Dreier, J. 109n2 Dupoux, E. 141 Dwyer, S. 49–51, 139–41, 147–49, 158

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

246



Ecological rationality 164, 174–79, 183, 186, 188, 199 Elman, J.L. 151 Ericsson, K.A. 19

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

Fehr, E. x, 182n10 Finlay, S. 110 Fisher, M. 8 Flexibility/fit trade-off 112 Fodor, J. 26, 168 Fontanari, L. 18 Foot, P. 3, 30, 30n65, 219n14, 224 Forst, R. 95 Fricker, M. 175n6 Gächter, S. x, 180–82 Gaus, G. 45, 95–98, 101, 104, 107, 169, 200 Gert, B. 177 Gibson, E. 17 Giffin, C. 159 Gigerenzer, G. 174 Gilbert, D.T. 220n16 Gill, M.B. 11, 193, 198 Girotto, V. 17–19 Glumicic, T. 19 Goldberg, S. 99 Goldman, A. 12 Gomez, R. 152 Gonzalez, M. 17–19 Goodman, N. 85–86, 88–89, 92 Goodwin, G. 5, 8–10, 111, 118–20, 122–23, 129, 203 Greene, J. ix–x, 3, 6–7, 26, 33, 39, 72, 151, 165, 168 Griffiths, T. 15, 57, 113 Haidt, J. x, 3, 6–7, 39–40 Hamlin, J.K. 147n8, 189 Hare, R. 173 Harman, G. x, 3, 66, 68, 71, 72, 110, 139, 147, 149, 171n4 Hauser, M. 139, 158 Heibeck, T.H. 86 Henrich, J. 167, 169, 220n15 Heiphetz, L. 111 Herdt, G. 197 Hertwig, R. 174 Hilbert, D. R. 121 Hillinger, C. 95 Holt, L. 153

Hooker, B. 173, 177, 191n19 Hopkins, K. 41 Horne, Z. 8, 234 Hornstein, N. 145 Hume, D. 3, 212–13, 217, 224 Jackson, F. 110, 180 Jacob, P. 141 Jacobson, D. 40 Joyce, R. 110 Kahneman, D. 15–16, 165 Keizer, K. 182 Kelly, D. 112, 125, 147n9, 217 Kemp, C. 17, 86 Kerr, N.L. 180 Kimbrough, E.O. 220 Kinzler, K.D. 189, 220n17 Kirkham, N.Z. 21 Knobe, J. 68, 72–74 Koenigs, M. 235 Krueger, J. 127n13 Kuhl, P. 153 Kumar, V. 125, 147n9, 190n18, 209 Kushnir, T. 78, 220n17 Laland, L. 167, 218 Landy, J.F. 9–10 Lapham, V. 95 Latane, B. 182 Laurence, S. 20, 142, 146n7 Layard, R. 181 Leider, S. 182 Leslie, A. 49, 51, 72 Levine, S. 49, 51, 139 Levy, D.E. 174 Liberman, A.M. 153 Lidz, J. 145, 157 Lieberman, D. 41 Little, M. 222 Locke, J. 11, 192–93 Lombrozo, T. 24n18, 159 Lopez, T. 8, 52, 141 Lotto, A.K. 153 Lyons, D.E. 218 MacFarlane, J. 109n2 Machery, E. 72 Mackay, D. 114 Mackie, J. 8, 110

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

 Macwhinney, B. 50 Mallon, R. 36, 45, 141, 151 Mandelbaum, E. 151, 220n16, 221n18 Margolis, E. 20, 142, 146n7 Markman, E.M. 86 Marler, P. 22 Marr, D. 44, 165, 181, 223 Masserman, J.H. 39 Maund, B. 121 May, J. 3, 10, 213n4 McAuliffe, K. 179 McClelland, J.L. 25, 151 McGuigan, N. 218 Mikhail, J. 3, 11, 30, 49, 51, 65, 73, 96, 102n3, 106n4, 139–40, 146–47, 149, 155, 158–59, 162 Mill, J. 153, 160 Miller, J.D. 153 Moral empiricism 20, 22, 139, 142–43, 150, 154, 161–62 Moral judgment 3–10, 25–50, 110–118, 124, 129, 130, 147, 151, 157–58, 168–69, 192–96, 208–15 Moral learning 3, 15, 20, 25, 49, 82, 95, 109, 139, 152–54, 162–66, 192, 211 Moral nativism 139–49, 155, 162 Moral psychology 3, 23, 34, 45, 93, 106, 109, 140, 168, 190 Moral/conventional distinction 124, 126, 133, 147–50 Motivational externalism 213–15, 222–23 Motivational internalism 213–14 Muldoon, R. 202–203 Mulvey, P.W. 180 Munshi, K. 207 Murray, D. 109n2 Nadelhoffer, T. 72 Nagel, T. 212–13 Natural liberty 96–108 Naughton, D. 56, 169 Newman, L. 162 Nichols, S. x, 3, 5, 8, 12, 14, 19, 22, 25, 36, 42, 44–45, 49, 51, 54, 62, 68, 72, 82, 84, 87–88, 91, 95–96, 98, 101, 104, 107, 109–12, 120–22, 130, 141, 146n7, 150n10, 155, 164, 172, 178–79, 188–89, 190n17, 196, 203, 206, 208, 211, 214n5, 223 Nisbett, R. 85–86 Nozick, R. 23n17, 95, 169

247

Overhypothesis 82–94, 108, 135, 154 Parfit, D. 56, 169 Parochial rules 74–81, 93, 189–90, 197, 198, 205 Partington, S. 77, 79 Peacocke, C. 193 Pedagogical sampling 98–108, 154 Pellizzoni, S. 49, 51 Perfors, A. 20, 114, 145, 151 Petrinovich, L. 151 Pinillos, N.A. 72 Pinker, S. 153 Plasticity 161–62 Poverty of the stimulus argument 139, 142, 147–50, 157, 162 Powell, D. 8n2 Powell, R. 188 Principle of double effect 64–66, 71–74, 93, 149, 160 Prinz, J. x, 3, 110, 141, 196, 208 Priors 17, 82, 92, 94, 122–24, 129, 135 Proft, M. 72 Pullum, G.K. 22 Quinn, W. 73 Railton, P. 38–40, 44, 188 Rakoczy, H. 166, 167, 217, 218 Rationalism 192–95, 203, 209, 211–12 Rawling, P. 56, 169 Rawls, J. 95 Raz, J. 96 Reed, M.A. 222 Reiter, S. 182 Relativism 113–23, 130–33, 150, 155, 192–210 Residual permission principle 97–100, 106–108 Residual prohibition principle 96–100, 136 Rey, G. 84, 140 Richerson, P. J. 182 Roberts, S. 128, 167, 219 Rosati, C. 213–14 Rose, D. 8 Rozin, P. 43 Rule representation 26, 29–32, 37, 43, 44, 45, 66, 220, 223 Rumelhart, D.E. 151

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 15/12/2020, SPi

Copyright © 2021. Oxford University Press USA - OSO. All rights reserved.

248



Saffran, J. 21 Salzman, P. 181 Samuels, R. 14nn9,10, 19, 142 Sarkissian, H. 111 Sauer, H. 3, 5, 171 Schmidt, M. F. 166–67, 218 Scholz, B.C. 22 Schultz, P.R. 182 Shafer-Landau, R. 110 Shafto, P. 99, 106 Shiffrin, S.V. 224 Shoemaker, D. 204 Sidgwick, H. 172–73, 176–78, 186 Simon, H. 177 Simon, H.A. 19 Singer, P. 7, 111, 165, 168, 170–73, 177–78, 188, 191 Sinnott-Armstrong, W. 56, 74, 171, 211 Size principle 57–82, 134, 150, 154 Skyrms, B. 200 Smetana, J. 124–25, 147 Smith, L. 17, 86 Smith, M. 10, 30, 110, 173, 211, 212, 214–16 Smith, V. 175n6 Snare, F. E. 74 Sobel, D. 220n17 Sparks, E. 75 Spelke, E. 75 Sperber, D. 99 Sripada, C. 217 Stanford, K. 200n5, 202n7 Stanley, M. L. 41n10 Statistical learning 15, 18, 20–22, 47, 50, 80, 82, 127, 142, 150–55, 161–68, 190–91 Steadman, A. 72 Stein, E. 12, 145 Sterelny, K. 141 Stich, S. 157, 217 Stone, J. 96 Strohminger, N. 151, 208 Strommen, E. A. 221 Sturgeon, N. L. 204 Sucker aversion 179–86 Sumner, W. 197

Surowiecki, J. 113, 207 Svavarsdóttir, S. 216 Swain, S. ix Systma, J. 202 Tenenbaum, J.B. 15, 17, 19, 54, 57–59, 63, 155 Thomson, J. J. 3, 65 Tiask, M. 147 Timmons, M. 7n1, 171 Todd, P. M. 174 Tolman, E. 27 Tooby, J. 13 Trout, J. D. 153 Turiel, E. 4, 125 Turnbull, C. 74 Tversky, A. 15–16, 165 Unger, P. 61, 165, 168 Uniacke, S. 73 Universalism 8–9, 109–33, 150, 155, 192–210 Valdesolo, P. 151 Value representation 25–46 Van Roojen, M. 10, 30, 185, 211–12 Vostroknutov, A. 220 Wellman, C. H. 45 Welzl, H. 43 Westermarck, E. 197 Whiten A. 218 Why be moral? 211, 224 Williams, V. 188, 224 Wong, D. B. 110, 204 Wright, J. 5, 50, 111, 118–20, 123 Xu, F. 86–87, 155, 190 Yang, C. 20 Zagzebski, L. 112 Zamzow, J. L. 25 Zelazo, P. D. 36 Zhang, Z. 128

Nichols, Shaun. Rational Rules : Towards a Theory of Moral Learning, Oxford University Press USA - OSO,