Advances in Experimental Epistemology: Series: Advances in Experimental Philosophy 9781472507372, 9781472594143, 9781472512390

Experimental epistemology uses experimental methods of the cognitive sciences to shed light on debates within epistemolo

182 7 2MB

English Pages [222] Year 2014

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Cover
Half-title
Title
Copyright
Contents
Notes on Contributors
Introduction
1 Experimental Evidence Supporting Anti-intellectualism About Knowledge
1 Introduction
2 General methodology
3 Our hypothesis and Anti-intellectualism about knowledge
4 Prior work
4.1 Negative results
4.2 Positive results
5 Our studies
5.1 Study 1: Evidence-seeking experiments, stakes, and probabilities
5.2 Experiment 1 (Water purifier)
5.3 Experiment 2 (Airplane)
6 Study 2: Agreement/disagreement with knowledge ascriptions
7 Study 3: Knowledge reason principle
7.1 Study 3: Experiment 1 (Do People Accept RKP?)
7.2 Study 3: Experiment 2 (Do People Accept ACTION?)
7.3 Discussion of both experiments
8 Buckwalter and Schaff er objection
8.1 Beliefs, guesses, and hopes
9 Conclusion
Notes
References
2 Winners and Losers in the Folk Epistemology of Lotteries
1 Introduction
2 Experiment 1: Skeptical judgment in basic lottery cases and the justification account’s demise
3 Experiment 2: Nonskeptical judgment in testimonial lottery cases and the chance account’s demise
4 Experiment 3: Skeptical judgment in other statistical cases and the statistical account’s demise
5 Experiment 4: Relenting skeptical judgment in nonstereotypical cases and the formulaic account’s promise
6 Experiment 5: Relenting skeptical judgment in qualitatively comparative cases
7 General discussion
Notes
References
3 Contrasting Cases
1 Background: Experiments and context
2 DeRose on joint versus separate evaluation of contexts
3 Experimenting with separate and joint evaluation
4 Why is contrast a problem?
5 Further case studies on separate and joint evaluation
6 Which type of evaluation generates better evidence for contextualism and anti-intellectualism?
7 Conclusion: Two explanatory projects
Notes
References
4 Salience and Epistemic Egocentrism: An Empirical Study
1 Introduction
2 Philosophical and psychological background
3 Our studies
3.1 Study 1: Salience effects
3.2 Study 2: Epistemic egocentrism and the curse of knowledge, Round 1
3.3 Study 3: Epistemic egocentrism and the curse of knowledge, Round 2
3.4 Study 4: Salience, epistemic egocentrism, and motivation
Notes
References
5 Semantic Integration as a Method for Investigating Concepts
1 Introduction
2 Th e methods of experimental philosophy
2.1 Pragmatic cues in experimental materials
2.2 Demand characteristics
3 Semantic integration
3.1 Memory and language processing research
3.2 Using semantic integration to investigate philosophical concepts
3.3 Two experiments using semantic integration
4 Pragmatic considerations and demand characteristics
5 Caveats
5.1 Th e structure of concepts
5.2 Mental processes and semantic integration
5.3 Impure semantic integration
6 Alternate experimental designs and surveys
6.1 Similar experimental paradigms
6.2 Surveys and semantic integration
7 Conclusion
Appendix
Notes
References
6 The Mystery of Stakes and Error in Ascriber Intuitions
1 Professional intuitions and experimental data
2 Bank experiments and epistemic contextualism
2.1 DeRose challenges the data
2.2 Meeting DeRose’s challenges
3 Evidence-seeking experiments and IRI
3.1 Pinillos challenges the data
3.2 Meeting Pinillos’ challenges
4 Toward solving the mystery
5 Implications and philosophical importance
6 Conclusion
Acknowledgments
Notes
References
7 Is Justification Necessary for Knowledge?
1 Sartwell’s argument
2 Objections to Sartwell
2.1 Kvanvig’s objections
2.2 Lycan’s objection
3 Empirical studies
4 Conclusion
Notes
References
8 The Promise of Experimental Philosophy and the Inference to Signal
Notes
References
Index
Recommend Papers

Advances in Experimental Epistemology: Series: Advances in Experimental Philosophy
 9781472507372, 9781472594143, 9781472512390

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Advances in Experimental Epistemology

Advances in Experimental Philosophy Series Editor: James R. Beebe, Associate Professor of Philosophy, University at Buffalo, USA Editorial Board: Joshua Knobe, Yale University, USA Edouard Machery, University of Pittsburgh, USA Thomas Nadelhoffer, College of Charleston, UK Eddy Nahmias, Neuroscience Institute at Georgia State University, USA Jennifer Nagel, University of Toronto, Canada Joshua Alexander, Siena College, USA Experimental philosophy is generating tremendous excitement, producing unexpected results that are challenging traditional philosophical methods. Advances in Experimental Philosophy responds to this trend, bringing together some of the most exciting voices in the field to understand the approach and measure its impact in contemporary philosophy. The result is a series that captures past and present developments and anticipates future research directions. To provide in-depth examinations, each volume links experimental philosophy to a key philosophical area. They provide historical overviews alongside case studies, reviews of current problems and discussions of new directions. For upper-level undergraduates, postgraduates, and professionals actively pursuing research in experimental philosophy, these are essential resources. New titles in the series include: Advances in Experimental Moral Psychology, edited by Hagop Sarkissian and Jennifer Cole Wright Advances in Experimental Philosophy of Mind, edited by Justin Sytsma

Advances in Experimental Epistemology Edited by James R. Beebe Series: Advances in Experimental Philosophy

Bloomsbury Academic An imprint of Bloomsbury Publishing Plc LON NR ED W • DE I • RNKE •WN YOR EYEY LON DONDON • OX•F O N ELWHYO E W DK E L•HSY I • DN SY DN



Bloomsbury Academic Bloomsbury Academic An imprint of Bloomsbury Publishing Plc Bloomsbury Academic An imprint of Bloomsbury Publishing Plc 50 Bedford Squareof Bloomsbury Publishing 1385 Broadway An imprint Plc 50 Bedford Square 1385 Broadway London New York 50 Bedford Square 1385 Broadway London New York WC1B 3DP NY 10018 London New York WC1B NY 10018 UK3DP USA WC1B NYUSA 10018 UK3DP UK USA www.bloomsbury.com www.bloomsbury.com www.bloomsbury.com Bloomsbury is a registered trade mark of Bloomsbury Publishing Plc Bloomsbury is a registered trade mark of Bloomsbury Publishing Plc Bloomsbury is a registered trade mark of Bloomsbury Publishing Plc First published 2014 First published 2014 Paperback edition first published 2015 published 2014 © James R.First Beebe and Contributors 2014 © James R. Beebe and Contributors 2014 © Jameshis R. right Beebe and the Contributors James R. Beebe has asserted under Copyright,2014 Designs and Patents Act, James R. Beebe has asserted rightedunder the Copyright, 1988, to be his identifi as the Editor of thisDesigns work. and Patents Act, James R. Beebe has asserted rightedunder the Copyright, 1988, to be his identifi as the Editor of thisDesigns work. and Patents Act, 1988, beof identifi ed as the Editor of reproduced this work. or transmitted All rights reserved. Noto part this publication may be All rights reserved. Nomeans, part of electronic this publication may be reproduced or transmitted in any form or by any or mechanical, including photocopying, All rights reserved. Nomeans, part of electronic this storage publication may be system, reproduced or transmitted in any form or by oror mechanical, including photocopying, recording, or any information retrieval without prior in any form or by means, electronic or mechanical, including photocopying, recording, or any information storage or retrieval system, without prior permission in writing from the publishers. recording, or any information storage or retrieval system, without prior permission in writing from the publishers. permission in writing from the publishers. No responsibility for loss caused to any individual or organization acting Norefraining responsibility lossascaused toof any acting on or from for action a result theindividual material or in organization this publication can be Norefraining responsibility for lossascaused toof any or acting on or fromaccepted action a result theindividual material in organization this publication can be by Bloomsbury or the author. on or refraining fromaccepted action asby a result of the material in this publication can be Bloomsbury or the author. accepted Bloomsbury or the author. Data British Library by Cataloguing-in-Publication British Library Cataloguing-in-Publication Data Library. A catalogue record for this book is available from the British British Library Cataloguing-in-Publication Data Library. A catalogue record for this book is available from the British A catalogue record ISBN: for thisHB: book is available from the British Library. 978-1-4725-0737-2 ISBN: HB: 978-1-4725-1239-0 978-1-4725-0737-2 ISBN: HB: 978-1-4725-0737-2 ePDF: ISBN: HB: 978-1-4725-0737-2 PB: 978-1-4725-1239-0 978-1-4742-5705-3 ePDF: ePub: 978-1-4725-0531-6 ePDF: 978-1-4725-1239-0 ePDF: 978-1-4725-1239-0 ePub: 978-1-4725-0531-6 ePub: 978-1-4725-0531-6 ePUB: 978-1-4725-0531-6 Library of Congress Cataloging-in-Publication Data Library of Congressepistemology/edited Cataloging-in-Publication Advances in experimental by JamesData R. Beebe. Library of Cataloging-in-Publication Advances in experimental epistemology/edited JamesData R. Beebe. pages cm.Congress – (Advances in experimental by philosophy) Advances inIncludes experimental epistemology/edited by James pages cm. –bibliographical (Advances in experimental philosophy) references and index. R. Beebe. pages cm. –bibliographical (Advances in experimental philosophy) Includes and index. ISBN 978-1-4725-0737-2 (hardback) –references ISBN 978-1-4725-0531-6 (epub) – Includes bibliographical and index. ISBN 978-1-4725-0737-2 (hardback) –references ISBN 978-1-4725-0531-6 (epub) – ISBN 978-1-4725-1239-0 (epdf) 1. Knowledge, Theory of–Research. ISBN 978-1-4725-0737-2 (hardback) – ISBN (epub) – ISBN 978-1-4725-1239-0 (epdf) Knowledge, Theory of–Research. I. Beebe, James R.,1.editor of 978-1-4725-0531-6 compilation. ISBN 978-1-4725-1239-0 (epdf) Knowledge, Theory of–Research. I. Beebe, James R.,1.editor of compilation. BD143.A38 2014 I. Beebe, James R., editor of compilation. BD143.A38 2014 121–dc23 BD143.A38 2014 121–dc23 2013041401 121–dc23 2013041401 2013041401 Typeset by Deanta Global Publishing Services, Chennai, India Typeset by Deanta Global Publishing Services, Printed and bound in Great BritainChennai, India Typeset by Deanta Global Publishing Services, Printed and bound in Great BritainChennai, India Printed and bound in Great Britain

Table of Contents Notes on Contributors

vi

Introduction James R. Beebe 1 Experimental Evidence Supporting Anti-intellectualism About Knowledge Ángel Pinillos and Shawn Simpson 2 Winners and Losers in the Folk Epistemology of Lotteries John Turri and Ori Friedman 3 Contrasting Cases Nat Hansen 4 Salience and Epistemic Egocentrism: An Empirical Study Joshua Alexander, Chad Gonnerman, and John Waterman 5 Semantic Integration as a Method for Investigating Concepts Derek Powell, Zachary Horne, and N. Ángel Pinillos 6 The Mystery of Stakes and Error in Ascriber Intuitions Wesley Buckwalter 7 Is Justification Necessary for Knowledge? David Sackris and James R. Beebe 8 The Promise of Experimental Philosophy and the Inference to Signal Jonathan M. Weinberg

1

193

Index

209

9 45 71 97 119 145 175

Notes on Contributors Joshua Alexander is an assistant professor of philosophy at Siena College, where he also directs the cognitive science program. His work focuses primarily on the nature of philosophical cognition and intellectual disagreement. He is the author of Experimental Philosophy: An Introduction (Polity 2012). James R. Beebe is an associate professor of philosophy and a member of the Center for Cognitive Science at the University at Buffalo. He has published papers on reliabilism, skepticism, and the a priori in mainstream epistemology and is actively engaged in the empirical study of folk epistemic intuitions. He also has a keen interest in the psychology of moral, political, and religious beliefs. Wesley Buckwalter is a postdoctoral researcher in the department of philosophy at the University of Waterloo in Ontario, Canada. Before that he completed his PhD in philosophy at the City University of New York, Graduate Center. His current research lies at the intersection of epistemology, psychology, and philosophy of cognitive science. He has authored or coauthored papers in Noûs, Philosophical Studies, Mind & Language, Philosophical Psychology, Philosophical Topics, Philosophy Compass, and the Annual Review of Psychology on a wide range of philosophical categories and concepts, including knowledge, belief, action, luck, intuitions, philosophical methodology, functionalism, phenomenal consciousness, emotion, and fiction. Ori Friedman is an associate professor of psychology at the University of Waterloo. He researches topics including people’s attributions of knowledge, and their reasoning about ownership of property. Chad Gonnerman is currently a postdoctoral research associate at Michigan State University. His current research focuses on the nature of concepts in psychology and philosophy, the use of intuitions as evidence in philosophy,

Notes on Contributors

vii

egocentric biases in mindreading, and philosophical and communicative issues arising in cross-disciplinary research. Nat Hansen is a lecturer in philosophy at the University of Reading. His research focuses on experimental evidence in semantics and pragmatics, the nature of context sensitivity, and the meaning of color terms. Zachary Horne is a doctoral student in the philosophy department at the University of Illinois at Urbana-Champaign. He conducts empirical and computational research on belief updating, relational reasoning, abstract concepts, and explanation. Angel Pinillos is an associate professor of philosophy at Arizona State University. He has written on experimental philosophy, epistemology, and philosophy of language. Derek Powell is a doctoral student in the psychology department at the University of California–Los Angeles. There he conducts empirical and computational research on learning, relational and abstract reasoning, and on a variety of other topics related to higher-order cognition. David Sackris is completing his dissertation on the semantics of epistemic modal terms at the University at Buffalo. He also has interests in epistemology and experimental philosophy. He is an instructor at John A. Logan College in southern Illinois. Shawn Simpson is a doctoral student in the philosophy program at the Graduate Center, City University of New York. His research is primarily in philosophy of science, philosophy language, and logic. John Turri is an assistant professor of philosophy at the University of Waterloo (Canada). A specialist in philosophy and cognitive science, he has published dozens of articles in leading journals such as Philosophical Review, Noûs, Philosophy and Phenomenological Research, and Cognition. He is the author of Epistemology: A Guide (Wiley-Blackwell 2013) and currently holds an Early Researcher Award from the Ontario Ministry of Economic Development and Innovation.

viii

Notes on Contributors

John Philip Waterman is a graduate student at Johns Hopkins University. His research focuses on understanding the psychological foundations of skeptical doubt. Jonathan Weinberg is an associate professor of philosophy at the University of Arizona. He works in experimental philosophy, esthetics, and the philosophy of mind and cognitive science.

Introduction James R. Beebe

Experimental philosophers attempt to bring the experimental methods of the cognitive and social sciences to bear on questions of perennial philosophical concern.1 Despite having precursors in empirically informed philosophy and cognitive science, experimental philosophy as a recognizable discipline did not emerge until the turn of the twenty-first century with the groundbreaking work of Weinberg et al. (2001), Nichols et al. (2003), and Knobe (2003a, 2003b). Weinberg et al. focused their initial investigations on folk intuitions about epistemic matters—that is, those pertaining to knowledge and evidence. The present volume presents some of the latest, cutting-edge research in the subfield that has developed in the wake of Weinberg, Nichols, and Stich’s seminal work.2 One strand of experimental epistemology has followed Weinberg et al.’s lead in investigating folk intuitions about Gettier cases—that is, cases where justified true belief is allegedly present without knowledge. The results, however, have been equivocal. Weinberg et al. (2001) reported that a significant majority of Western participants judged that a subject in a Gettier-style thought experiment “only believes” rather than “really knows” a certain true proposition. However, Starmans and Friedman (2012) found that—contrary to received philosophical wisdom—participants were quite willing to say that subjects in Gettier situations “really know.” When Cullen (2010) replicated the Gettier studies of Weinberg et al. but instructed participants to choose between saying that the protagonist in the thought experiment “knows” and “does not know” (instead of “really knows” vs. “only believes”), he found that less than half of the participants chose “knows.” More recently, Nagel et al. (2013) found that participants reliably distinguish between justified true beliefs in Gettier cases and unGettiered controls.

2

Advances in Experimental Epistemology

To make matters more complicated, Beebe and Shea (2013) report that whether participants attribute knowledge in a Gettier case can depend upon the moral valence of the actions described in the case. In one of the contributions to this volume (“Semantic Integration as a Method for Investigating Concepts”), Powell et al. describe and recommend a new method for investigating folk epistemological concepts that they hope will overcome some of the limitations of the vignette-based methods commonly used in experimental philosophy and lead to a better understanding of folk epistemic judgments in Gettier cases and other cases like them. Their method of semantic integration relies upon a memory task, in which participants are asked to read a story and perform a recall task after some delay. For example, in a story that describes Dempsey’s evidence for Will’s guilt, the target sentence “Whatever the ultimate verdict would be, Dempsey thought Will was guilty” appears. Powell et al. found that when Dempsey is described as having good evidence for his belief in a normal (as opposed to Gettierized) situation, participants were significantly more likely to fill in the blank in the following sentence with “knows” than when Dempsey was described as being in a Gettierized situation: “Whatever the ultimate verdict would be, Dempsey _____ Will was guilty.” Powell et al. suggest that the implicit measure at the heart of the semantic integration task may have advantages over more commonly used explicit measures, in which participants are simply asked directly whether or not a subject knows or really knows some proposition. Given the current debate within experimental epistemology, this new and powerful tool will certainly be a welcome addition. A second important topic of research within experimental epistemology focuses on the question of whether or not it is correct to attribute knowledge when skeptical possibilities have been raised, if one’s present evidence seems incapable of ruling out those possibilities. Weinberg et al. (2001) found that Western participants were generally disinclined to attribute knowledge when they were told that a subject would be unable to tell if her evidence was misleading. Nagel et al. (2013) obtained corroborating results. In this volume Joshua Alexander, Chad Gonnerman, and John Waterman (“Salience and Epistemic Egocentrism: An Empirical Study”) provide evidence that participants’ disinclination to attribute knowledge in these cases may be due to epistemic egocentrism—that is, the tendency to overattribute our

Introduction

3

own beliefs and concerns to others. Alexander, Gonnerman, and Waterman show that regardless of whether a possibility of error is described as being entertained by a subject or whether it is simply reported by an omniscient narrator while remaining unknown to the subject, participants take the error possibility to affect whether the subject has knowledge. The authors also show how understanding epistemic egocentrism is important when trying to adjudicate claims about whether data about folk epistemic intuitions provide confirmation or disconfirmation for invariantist or contextualist accounts of the semantics of “knows.” Perhaps the most active area of research and debate within experimental epistemology concerns the relative merits of invariantism, contextualism, and interest-relative invariantism—at least as they purport to square with empirical data about folk epistemic intuitions. According to contextualists (e.g., DeRose 2011), the strictness of epistemic standards vary across conversational contexts, and these standards determine how strong one’s evidence must to be in order to have knowledge. According to those who defend interestrelative invariantism (e.g., Hawthorne 2004; Stanley 2005; Fantl and McGrath 2009), a subject needs to have stronger evidence in order to have knowledge in a high-stakes situation than in a low-stakes situation. In a series of articles from Buckwalter (2010), May et al. (2010), Feltz and Zarpentine (2010), and Phelan (forthcoming), researchers reported failing to find evidence that folk attributions of knowledge or rational belief vary in ways that were allegedly predicted by epistemic contextualism and interest-relative invariantism. The authors of these papers attempted to raise epistemic standards in two different ways, each of which has led to a somewhat different strand of research in experimental epistemology. The first kind of manipulation—used by Buckwalter (2010) and May et al. (2010)—involves raising the possibility that a protagonist’s belief might be mistaken, even though it is justified. The second manipulation—used by Buckwalter (2010), May et al. (2010), Feltz and Zarpentine (2010), and Phelan (forthcoming)—involves raising the costs to a believer of having a false belief about some practical matter. In regard to error possibilities, even if raising them does not always lead participants to refrain from attributing knowledge, it has been shown to have this effect at least some of the time (cf., e.g., the results of Weinberg et al. (2001), Nagel et al. (2013), and Alexander, Gonnerman,

4

Advances in Experimental Epistemology

and Waterman described above). The situation regarding stakes, however, is more complex, and there has not yet emerged any consensus as to whether knowledge attributions are sensitive to stakes. In response to the first wave of studies that failed to reveal an effect of stakes on knowledge attributions, Pinillos (2012) decided to eschew asking participants whether or not a hypothetical subject had knowledge and instead asked them how many times a subject needed to proofread a paper or count the coins in a jar in order to know that there were no typographical errors in the paper or know the number of coins in the jar. When the stakes in each of these situations were varied, Pinillos found there was a significant difference between the numbers reported by participants in the contrasting conditions. Buckwalter and Schaffer (forthcoming) have argued that Pinillos’ (2012) results may tell us nothing in particular about the folk conception of knowledge on the grounds that the same kind of stakes effect can be found with belief ascriptions. In this volume, Buckwalter (“The Mystery of Stakes and Error in Ascriber Intuitions”) expands upon his work with Schaffer and argues that the primary factor responsible for observed differences in folk knowledge attributions—when they have been observed—is how salient the possibility of error is to the person ascribing knowledge and not how high or low the stakes are. Buckwalter contends that researchers who failed to find an effect for error possibilities simply did not present those possibilities concretely or vividly enough. In this volume, Pinillos and Simpson (“Experimental Evidence in Support of Anti-Intellectualism About Knowledge”) extend Pinillos’ earlier work by examining the extent to which it matters that a subject in a high-stakes situation is aware of this fact. After replicating Pinillos’ original results, Pinillos and Simpson show that participants think that subjects who are not aware that they are in high-stakes situations should check the basis for their beliefs significantly more times than subjects in low-stakes situations do. Pinillos and Simpson also attempt to respond to Schaffer and Buckwalter’s objections with a combination of philosophical argument and experiment. In a paper mentioned above, Phelan (forthcoming) reports (i) that in between-subject experiments—that is, when each participant only sees one version of a thought experiment—participants do not show sensitivity to raised or lowered stakes but (ii) that in within-subject experiments—for

Introduction

5

example, when each participant sees both a low- and a high-stakes vignette— participants do show the kind of sensitivity predicted by contextualists and interest-relative invariantists. Buckwalter (this volume) notes that most experiments investigating folk epistemic intuitions have a between-subject design but that philosophical discussions of stakes and error possibilities always have a within-subject structure. Taking these facts as his starting point, Nat Hansen’s contribution to this volume (“Contrasting Cases”) argues that if we want to understand how ordinary people make epistemic judgments, we need to obtain data from both within- and between-subject experiments. However, drawing upon research in the heuristics and biases tradition of cognitive psychology, Hansen makes the case that folk epistemic assessments will be more reflective and rational if they are made in within-subject or “joint evaluation” contexts and that data obtained from these contexts will constitute better evidence for or against contextualism or various forms of invariantism. Arguing that some epistemic properties are difficult to evaluate in between-subject conditions, Hansen argues for the use of more withinsubjects designs. An additional area of debate within experimental epistemology concerns the question of what the necessary conditions on the folk conception of knowledge are. Myers-Schulz and Schwitzgebel (2013) report the results of a study in which participants readily attributed knowledge in the absence of belief. While Rose and Schaffer (forthcoming) argue that Myers-Schulz and Schwitzgebel’s data show only that knowledge may not entail occurrent belief, Beebe (2013) reports data that suggest knowledge may not even entail dispositional belief, at least as far as folk conceptions are concerned. In this volume, David Sackris and James Beebe (“Is Justification Necessary for Knowledge?”) make a contribution to this area of debate by reporting the results of studies in which participants attributed knowledge to subjects who lacked good evidence but had true beliefs. Despite the fact that the question of whether one can know that one will lose a fair lottery on the basis of the very long odds against winning has received considerable attention in the mainstream epistemology literature (cf., e.g., Hawthorne 2004), experimental epistemologists have not yet contributed to the discussion. In this volume, John Turri and Ori Friedman (“Winners and Losers in the Folk Epistemology of Lotteries”) report the results of the

6

Advances in Experimental Epistemology

first experimental investigation of folk epistemic intuitions about lottery cases. Epistemologists have assumed that everyone agrees that you cannot know your ticket will lose on the basis of the odds alone, but there has never been any solid data to support this assumption. Turri and Friedman not only provide evidence in support of this contention, but also carefully examine and design studies to test various explanations of what underlies this judgment. They conclude that ordinary participants deny knowledge in lottery cases due to formulaic expression—that is, expressions that are characterized by stereotyped intonation and rhythm, familiarity, predictability, and unreflective automaticity. In his contribution to this volume, Jonathan Weinberg (“The Promise of Experimental Philosophy and the Inference to Signal”) steps back from particular sets of studies and results in experimental epistemology and considers some broad questions about the kinds of data that need to be obtained in order for experimental philosophy to make substantive contributions to first-order philosophical debates. If we take our ordinary capacities to make judgments about knowledge to have a default and defeasible reliability, and we want to determine whether some factor should be incorporated into our philosophical theory of knowledge, we need various ways of distinguishing truth-tracking judgments from non-truth-tracking ones—ways to distinguish genuine signal from the accompanying noise. Weinberg cautions that tests for statistical or psychological significance will not be sufficient for this task, inasmuch as philosophical significance is distinct from either of these. He then offers some suggestions on how philosophers might establish measures of philosophically significant effect sizes, which might be modeled after the Mohs scale for ranking the hardness of minerals or the Scoville scale of gustatory heat.

Notes 1 Cf. Knobe and Nichols (2008) and Alexander (2013) for helpful overviews of the field of experimental philosophy. 2 Cf. Alexander and Weinberg (2007), Pinillos (2011), Buckwalter (2012), and Beebe (2012) for helpful overviews of research in experimental epistemology.

Introduction

7

References Alexander, J. (2013), Experimental Philosophy: An Introduction. Cambridge: Polity Press. Alexander, J. and Weinberg, J. M. (2007), “Analytic epistemology and experimental philosophy”. Philosophy Compass, 2, 56–80. Beebe, J. R. (2012), “Experimental epistemology”, in A. Cullison (ed.), Companion to Epistemology. London: Continuum, pp. 248–69. —. (2013), “A Knobe effect for belief ascriptions”. The Review of Philosophy and Psychology, 4, 235–58. Beebe, J. R. and Shea, J. (2013), “Gettierized Knobe effects”. Episteme, 10, 219–40. Buckwalter, W. (2010), “Knowledge isn’t closed on Saturday: A study in ordinary language”. Review of Philosophy and Psychology, 1, 395–406. —. (2012), “Non-traditional factors in judgments about knowledge”. Philosophy Compass, 7, 278–89. Buckwalter, W. and Schaffer, J. (forthcoming), “Knowledge, stakes, and mistakes”. Noûs. Cullen, S. (2010), “Survey-driven romanticism”. Review of Philosophy and Psychology, 1, 275–96. DeRose, K. (2011), The Case for Contextualism: Knowledge, Skepticism, and Context, Vol. 1. New York: Oxford University Press. Fantl, J. and McGrath, M. (2009), Knowledge in an Uncertain World. New York: Oxford University Press. Feltz, A. and Zarpentine, C. (2010), “Do you know more when it matters less?” Philosophical Psychology, 23, 683–706. Hawthorne, J. (2004), Knowledge and Lotteries. New York: Oxford University Press. Knobe, J. (2003a), “Intentional action and side-effects in ordinary language”. Analysis, 63, 190–3. —. (2003b), “Intentional action in folk psychology: An experimental investigation”. Philosophical Psychology, 16, 309–24. Knobe, J. and Nichols, S. (eds). (2008), Experimental Philosophy. New York: Oxford. May, J., Sinnott-Armstrong, W., Hull, J. G. and Zimmerman, A. (2010), “Practical interests, relevant alternatives, and knowledge attributions: An empirical study”. Review of Philosophy and Psychology, 1, 265–73. Myers-Schulz, B. and Schwitzgebel, E. (2013), “Knowing that p without believing that p”. Noûs, 47, 371–84. Nagel, J., San Juan, V. and Mar, R. A. (2013), “Lay denial of knowledge for justified true beliefs”. Cognition, 129, 652–61.

8

Advances in Experimental Epistemology

Nichols, S., Stich, S. and Weinberg, J. M. (2003), “Metaskepticism: Meditations in ethno-epistemology”, in S. Luper (ed.), The Skeptics. Burlington, VT: Ashgate Press, pp. 227–47. Phelan, M. (forthcoming), “Evidence that stakes don’t matter for evidence”. Philosophical Psychology. Pinillos, N. Á. (2011), “Some recent work in experimental epistemology”. Philosophy Compass, 6, 675–88. —. (2012), “Knowledge, experiments and practical interests”, in J. Brown and M. Gerken (eds), Knowledge Ascriptions. New York: Oxford University Press, pp. 192–219. Rose, D. and Schaffer, J. (2013), “Knowledge entails dispositional belief ”. Philosophical Studies, 166, (1 Supplement), 19–50. Stanley, J. (2005), Knowledge and Practical Interests. New York: Oxford University Press. Starmans, C. and Friedman, O. (2012), “The folk conception of knowledge”. Cognition, 124, 272–83. Weinberg, J. M., Nichols, S. and Stich, S. (2001), “Normativity and epistemic intuitions”. Philosophical Topics, 29, 429–60.

1

Experimental Evidence Supporting Anti-intellectualism About Knowledge Ángel Pinillos and Shawn Simpson1

1 Introduction According to the traditional conception of knowledge, knowledge is a purely intellectual concept. In this chapter, we give some evidence against the orthodox view. We think that knowledge is, in part, a practical concept. To be specific, whether someone who believes P also counts as knowing P may depend on more than just the evidence available to her, or other intellectual features of her situation. It may also depend on practical facts, such as the cost of being wrong about P. Anti-intellectualism about knowledge (AIK) has recently been defended by a number of authors.2 These authors have pursued various strategies for defending the thesis. In this chapter we pursue a strategy that is in some ways old and deeply entrenched, but in other ways, new. Where we stay with tradition is in holding that judgments about hypothetical (or actual) cases give us powerful evidence for first-order theses about knowledge and other fundamental concepts. Where we move away from tradition is in how we collect and analyze these judgments. We rely here on the judgments of people who have not prejudged the issue.

2 General methodology According to the traditional way of utilizing judgments about cases, a philosopher will make a judgment about a hypothetical case and use the

10

Advances in Experimental Epistemology

content of the judgment as a premise in an argument.3 Making the judgment is a private or semiprivate affair. The philosopher may or may not consult other people, including peers, on the matter (but this further consulting, if it happens, is often informal and unsystematic—for example, the philosopher may or may not consider variations of the thought experiment before settling on the judgment they will use in theorizing). Now, focusing on judgments about knowledge, there is a reasonable suspicion that the traditional method of collecting and deploying intuitions may be less reliable than previously thought. First, some philosophers have admitted that the judgments of philosophers may very well be biased. Goldman (2007, p. 15) writes: philosophers are leery about trusting the intuitions of other philosophical analysts who have promoted general accounts of the analysandum (e.g. knowledge of justification). Commitment to their own favored account can distort their intuitions, even with respect to their own (pre-theoretical) concept.

The possibility of biased judgments (which Goldman is calling “intuitions”) even among experts is also echoed by Schaffer (2006, p. 90), who admits that his own judgments about knowledge may be biased: “Perhaps my intuitions are unusual, and no doubt are theoretically biased.” These self-reports raise worries about the traditional practice of harnessing judgments. Second, empirical research on folk judgments gives some evidence that judgments about knowledge may vary with features that may seem to some to be irrelevant to the subject matter under investigation. For example, some research supports the idea that certain judgments about knowledge vary across gender and cultural lines.4 We should not always take for granted then that philosophers themselves are immune from these effects.5 We believe there is a legitimate worry that the traditional method of harnessing judgments may sometimes yield results that are more a reflection of the philosopher’s background or idiosyncratic features than of knowledge. This is then another reason to be cautious about the traditional practice of collecting intuitions. Despite these and related issues, we do not think philosophers should give up on the traditional method of collecting and deploying judgment about cases. What we think is warranted, however, is a pluralism in methods—where

Experimental Evidence Supporting Anti-intellectualism About Knowledge

11

the new methods we devise aim to get around some of the shortcomings of traditional philosophical methodology. This doesn’t mean that the new methods are better than the old methods. In fact, they may even be less reliable overall. But even if this is the case, it does not mean that it cannot give us philosophically important information in certain cases. It seems to us that, for certain instances, we can benefit from pursuing multiple strategies. In this chapter we employ the tools of experimental philosophy and the behavioral sciences. We seek to collect judgments about knowledge in controlled experiments not just from one or two individuals, but from hundreds of people who have no vested interest in this or that theory in epistemology. We are thereby more likely to avoid the bias Alvin Goldman and Jonathan Schaffer warned us about. We should also be able to identify idiosyncratic judgments. What is more, if we discover patterns of judgments that vary along demographic dimensions, we can use this data to make a more informed judgment about the quality of the judgments in question. A further benefit of the method we employ is that we can now consider a broader range of judgments. For example, we can attempt to elicit reactions which take the form of numerical responses to some questions about hypothetical scenarios. Individual responses, even from experts, may not all converge on a unique numerical answer. They may admit of a degree of inter- or even intrapersonal variability. Nevertheless, we may discover philosophically informative tendencies toward certain types of responses. Whether there is such a tendency (as opposed to just a lot of noise) is something that can be discerned with the help of modern statistical techniques, and likely cannot be discerned by the use of traditional armchair methods alone. We will see examples of this approach in this chapter. Of course, the experimental method will have drawbacks. Experimenters often rely on philosophically naive populations (as we do in this chapter). Although this method has the advantages we saw above, it is still problematic in situations where the judgments under investigation concern a concept which requires advanced training to master, or where the hypothetical scenario concerns difficult or technical subject matter. But not all cases are like that. In this chapter, for example, we seek judgments concerning a widely used and nontechnical concept, “knowledge,” and make use of simple hypothetical scenarios. Moreover, we use various checks to ensure our subjects are

12

Advances in Experimental Epistemology

responding to what is being asked of them. Thus, we think that the responses of our subjects can be used in philosophical analysis in a similar way that judgments are used in traditional philosophical theorizing.

3 Our hypothesis and Anti-intellectualism about knowledge Suppose you have an unimportant haircut appointment today at 3 p.m. You quickly glance at your calendar on your way out and form the belief that it is at 3 p.m. This may suffice for you to know the appointment is at 3 p.m. Now modify the case so that the appointment is a matter of life and death. In that case, if you form the same belief in the same way, would you now know that the appointment is at 3 p.m.? It may seem to some that quickly glancing at the calendar in the second case is not enough to attain knowledge. If this is right, then AIK may be true, since, apparently, the only relevant difference between the cases are practical and nonintellectual features of the situation. Let us define AIK as the thesis that whether someone who believes P also counts as knowing P may sometimes depend on practical features concerning the agent’s relation to P, including what is at stake for the agent. We need an easier way of talking about this so we will often just describe the thesis as saying that knowledge is sometimes sensitive to practical interests. We will also include in the definition a claim about the direction of the effect. In cases where stakes matter, for example, higher stakes raise the bar for knowledge. As Weatherson (2012) has pointed out, the thesis has an existential form. It doesn’t claim that for every difference in practical interest facts, there will be a corresponding difference in knowledge facts. Rather, it just claims that there are cases in which differences in practical interest facts correspond to differences in knowledge facts. As mentioned at the outset, we defend AIK by appealing to judgments about cases. But, we are adopting the experimental method for collecting and analyzing those intuitions. The hypothesis we seek evidence for is (H) (where “sensitive to practical interests” is given the gloss from above): (H) Folk attributions of knowledge are sometimes sensitive to practical interests.

Experimental Evidence Supporting Anti-intellectualism About Knowledge

13

(H) by itself does not deductively entail AIK. But (H) does entail AIK if we add the assumption that the attributions of knowledge mentioned in (H) semantically express propositions that are true or true with respect to the hypothetical cases presented to the folk. Unless we accept broad skepticism about our ordinary capacity to attribute mental states to others, this assumption should be prima facie accepted—at least for the simple cases we will be discussing. The prima facie assumption, of course, can be defeated. For example, the folk could be making performance errors, their attributions may be tracking conversational implicatures that diverge from the semantically expressed propositions, or perhaps the attributions could be explained by other theories in epistemology. These and related worries, if actual, will sever the connection between (H) and AIK. These worries are infinite and there is no way that we can address all of them. So our strategy will be to provide some new evidence for (H) and at the same time give reasons to think that some of the worries are not so worrisome. This will be enough to give new support for AIK.

4 Prior work As of the writing of this manuscript, there have been over half a dozen papers reporting on experiments that are directly relevant to assessing (H). The “first wave” reported results that disfavor (H).6 The second wave reported results that challenge the first-wave papers and are positive for (H).7 Finally, two further papers challenge the second-wave results. In a later section we will address in detail the challenge to the second-wave results.8 But for now, we want to make some general remarks about the first- and second-wave studies.

4.1 Negative results The negative first-wave papers report on experiments, most of which follow a certain pattern. The researchers presented subjects with pairs of vignettes that differ only in what is at stake (practical interests) for the protagonist (low vs. high stakes). The subjects were then asked to record how much they agreed with a particular knowledge claim ascribed to the protagonist. In general it was found that there were no statistically significant differences in the mean

14

Advances in Experimental Epistemology

level of agreement with the knowledge attribution across the level of stakes. This was thought to constitute some evidence against (H). One may worry that since (H) is an existential thesis, the fact that differences in reactions to knowledge attributions were not discovered for some cases does not impugn (H). However, this defense of (H) is weak, since the researchers used vignettes similar to those that defenders of AIK claimed illustrated their thesis. But, there are other worries with these experiments. We will focus on two issues, the “Awareness of Stakes” problem and the “Same Evidence” problem. The “Awareness of Stakes” Problem: As mentioned above, all the experiments in the negative papers involved comparing subjects’ responses to a high-stakes scenario with ones typically involving a low-stakes situation. However, with the exception of one probe, all the high-stakes vignettes depicted a protagonist who was aware that the stakes were high. This may have a distorting influence in that subjects may expect the protagonist in the high-stakes situation to be anxious or less confident than his low-stakes counterpart (who is in turn aware of his low-stakes situation). This is enough to create a possible confound in the experiment, since the probes in question may differ not just with respect to stakes. They may also differ in perceived levels of anxiety, confidence, and whatever else may be involved when a person is aware that the stakes are high. In sum, awareness of stakes on the part of the protagonist may weaken the evidentiary force of these experiments against our hypothesis (H).9 We note, moreover, that these problems need not just arise in the experimental setting. They may also arise when philosophers construct thought experiments for themselves.10 The “Same Evidence” problem: There is a general worry about keeping the perceived evidence available to the protagonist constant across the low- and high-stakes vignettes (recall that most of these experiments are betweensubject studies). For example, in Buckwalter (2010), subjects are told about a protagonist who claims to know that a bank will be open on Saturday based on the evidence that the protagonist was at the bank last week. But this simple statement describing the evidence possessed can be interpreted in different ways. For example, the protagonist might be thought to have asked a worker about the bank hours, or perhaps he asked another customer, or he might have just quickly glanced at the hours posted. This issue is exacerbated by the fact that in most of the probes, the protagonist is aware of the stakes and

Experimental Evidence Supporting Anti-intellectualism About Knowledge

15

sincerely claims that he knows the relevant proposition. Supposing knowledge is sensitive to stakes and subjects are aware of this, subjects must think that the evidence the protagonist in the high-stakes situation has is sufficient to meet the threshold for knowledge in a high-stakes case (as opposed to the threshold for a low stakes case)—after all, the protagonist knows the stakes are high yet still knows the relevant proposition. If this is right, then the perceived evidence would not be the same across the high- and low-stakes scenarios. The confound then may weaken the evidentiary force of these experiments against our hypothesis (H).11

4.2 Positive results Pinillos (2012) reports on some new experiments that aim to minimize the two problems above. First, he developed a type of probe he dubbed “evidence seeking,” where the evidence available to the protagonist of a vignette is not fixed by the experimenter. Instead, subjects are asked their opinion about how much evidence the protagonist would need to collect before he counts as knowing. The hypothesis (H) would gain support if subjects in the high-stakes conditions tend to say that more evidence is needed to know than subjects in the low-stakes condition. This type of probe then would alleviate, somewhat, some of the issues surrounding the “Same Evidence” problem. Second, in one of his studies, the high-stakes vignette was “ignorant” in the sense that the protagonist in question was not aware of the high stakes. This type of probe alleviates, somewhat, some of the issues surrounding the “Awareness of Stakes” problem. Overall, Pinillos’ results support (H). Sripada and Stanley (2012) reported on experiments that also support (H). They used the direct method from the first-wave experiments where subjects are asked about their level of agreement with an attribution of knowledge of the form “X knows P” (concerning a given vignette). However, their experiments were carefully designed to avoid some worries from earlier studies. For example, the protagonists of the vignettes are not aware of what is at stake, nor do they self-attribute knowledge. We do not want here to engage in a detailed discussion of prior work. We do wish to point out, however, that there is an asymmetry in the data collected so far. The data that have been taken to support (H), including the work of

16

Advances in Experimental Epistemology

Pinillos (2012) and Sripada and Stanley (2012), involve experiments where statistical significance was found (at the standard p level of 0.05) and hence the null hypothesis (no stakes effect) was rejected. In contrast, the experiments from the first wave, which are taken to count against (H), all invoke null results. That is, they involve experiments where statistical significance was not found, and so the null hypothesis (no stakes effect) fails to be rejected. It is well known that a null result does not, in general, support the null hypothesis nearly to the same degree that a significance finding disconfirms it. In many cases, it may hardly support the null hypothesis at all. Hence, the researcher who wants to use statistical insignificance to defend the null hypothesis (no stakes effects) needs to do quite a bit more if she wants to make her point. This is especially so in our case where there are other statistically significant findings that tend to disconfirm the null hypothesis. (Of course, we cannot criticize the first-wave authors for not taking into account data from the second-wave studies. Our point mainly concerns what we should conclude after the two waves, and not so much how we should criticize the first-wave studies.) There are a variety of statistical techniques that can be used by the researcher wishing to confirm the null hypothesis.12 In particular, one may report the chances of making a Type II error. That is, the chances of accepting the null hypothesis (no stakes effect) in the case that the null hypothesis is false. The higher this probability, the less prone we should be to accept the null hypothesis based on statistically insignificant results. Without this information, which can be gotten through a power analysis, it is highly unclear to what extent nonsignificant findings can support the null hypothesis (no stakes effects).13 Stevens (2007, p. 111) makes the point: Researchers not sufficiently sensitive to the power problem may interpret non- significant results from studies as demonstrating that “treatments” made no difference. In fact, however, it may be that treatments did make a difference, but that the researchers had poor power for detecting the differences. The poor power may result from small sample size and/or from small effect size. The danger of low power studies is that they may stifle or cut off further research in an area where effects do exist, but perhaps are more subtle (as in personality, social, or clinical psychology).14

Following this advice, we can raise a concern about drawing strong conclusions from the first-wave results (not that the authors drew strong conclusions—they

Experimental Evidence Supporting Anti-intellectualism About Knowledge

17

were appropriately cautious). Although all three papers use null results to cast doubt on (H), only Feltz and Zarpentine’s papers directly address power. They worry (correctly) that their studies did not have enough statistical power to detect a stakes effect. As a corrective, they collapse the data sets, thus getting a larger sample size. Indeed, when they consider the aggregate (454 in highstakes and 184 in low-stakes participants), Feltz and Zarpentine do discover a statistically significant difference between high- and low-stakes conditions (a fact that has not been often reported in the subsequent literature). But they claim this does not support (H) since the effect size is too small15: the effect size approaches triviality at .01. The small effect size indicates that the variance in knowledge attributions explained by the stakes, however real, is very small. . . . the results of this analysis suggest that the practical facts in these situations do not qualitatively change knowledge attributions and they are not likely to be a fundamental or important feature of our ordinary knowledge attributions. (p. 17)

We raise two worries about this response. First, let us grant that the effect size might be small, but we don’t think we can conclude that the effect of practical interests is “not likely to be a fundamental or important feature of ordinary knowledge attributions.” For the purposes of epistemology and hence philosophy, we do not require the effects of practical interests to be psychologically important. Philosophical importance is not psychological importance. A subtle psychological effect is still an effect. And so may be enough to support the thesis that knowledge is sensitive to practical interests. However, we grant that a small effect may weaken the case for (H). Second, many of the studies that Feltz and Zarpentine aggregate have the problems I mentioned at the outset (“awareness of stakes” and “same evidence” problems). Hence, the fact that statistically significant differences were found in these studies (even with a small effect), despite the problems mentioned, should be encouraging to the defender (H).

5 Our studies In this chapter, we follow Pinillos (2012) in developing new evidence-seeking probes. However, we go well beyond that work in a number of ways, including

18

Advances in Experimental Epistemology

(1) testing to see whether some other pragmatic features, beyond stakes, play a role in knowledge ascriptions. In particular, we probe whether the probability of what could go wrong if the agent is mistaken plays a role in knowledge ascriptions; (2) we develop new “ignorant” probes in which both the highand low-stakes protagonists are unaware of what is at stake for them; and (3) we develop a new way to test folk attitudes concerning the connection between knowledge and action, a connection that plays a central role in Antiintellectualist theories of knowledge.

5.1 Study 1: Evidence-seeking experiments, stakes, and probabilities We describe two “evidence-seeking” experiments aimed to test (H). In both experiments we ask subjects their opinions about how much evidence is required before a fictional agent counts as knowing some proposition P. We predict that as the practical interests become more pressing for the fictional agents, subjects’ responses will reflect a more stringent evidentiary condition on knowledge. Experiment 1 tests whether folk attributions of knowledge are sensitive to stakes. Experiment 2 tests whether they are sensitive to stakes and the likelihood that potential negative repercussions will actually happen. In addition, Experiment 2 attempts to control for a further confound concerning ignorance of stakes/probabilities.

5.2 Experiment 1 (Water purifier) Method A total of 141 subjects from Amazon Turk were paid 15 cents (US) each to take exactly one of two surveys. The two surveys (conditions) are dubbed “Low Stakes” and “High Stakes.” They both concern a protagonist, Brian, who is installing a water purifier at home because he does not like the taste of the tap water. However, in High Stakes, it is also the case that the water supply has been poisoned. If Brian fails to assemble the water purifier properly, he and his family might die. However, Brian is ignorant of these high stakes. For both vignettes, subjects are told that Brian has gone to the other room to get the water purifier instructions off the internet. He copies them down on a piece of

Experimental Evidence Supporting Anti-intellectualism About Knowledge

19

paper and heads to the water faucet. All subjects then were asked to respond to the following prompt: (Opinion question) Suppose Brian goes back and compares his entire written copy to the instructions online, and he can do this as many times as he wants. After how many comparisons will Brian know he has written them down correctly? Please write your answer in the box below. This should be a whole number. (Note: If you think Brian already knows, write “0.” If you think he’ll never know, no matter how many times he checks, write “never”) .

Participants were given 10 minutes to respond. As explained above, in accordance with (H), we expect that the numerical answers for High Stakes will be higher than the numerical answers to Low Stakes.

Results We discarded 47 surveys because the participants either failed to follow instructions, failed a reading comprehension check or did not write a numerical response to the main prompt. The reading comprehension checks were placed before the target question and included a question to see if subjects were aware of what was at stake. We computed statistics for N  94 subjects. The results are as follows: Low Stakes (N  46, m  0.72, sd  0.72), High Stakes (N  48, m  1.29, sd  1.254). The difference in means was statistically significant t(75.54)  2.70, p  0.01. Cohen’s d  0.54 (this is a medium-size effect).

Discussion These results support (H). Subjects seem to think it takes more evidence to know something when the stakes are high (compared with when stakes are low). This experiment improves on previous studies. First, the fact that the experiment is “evidence seeking” minimizes the “Same Evidence” problem. This is because instead of recording a subject’s agreement with a knowledge claim—a claim we think is in many cases likely influenced by prior judgments concerning the amount of evidence possessed by the protagonist—we record a subject’s judgment concerning how much evidence would actually be needed for such a claim. Second, the fact that the protagonist in high stakes is ignorant of what is at stake for him makes the “Awareness of Stakes” problem less acute.

20

Advances in Experimental Epistemology

In particular, there is less reason to think that the higher responses to high stakes are due to the subjects assuming that the protagonist is anxious or less confident because of his awareness of the stakes.

5.3 Experiment 2 (Airplane) Experiment 2 goes beyond the first one in two main ways. First, an important feature of the first experiment is that in High Stakes, but not in Low Stakes, the protagonist is ignorant of what is at stake for him. That is, he’s not aware of all the relevant factors determining the stakes of his situation. Therefore the conditions differ in more than just what is at stake for the protagonist. They differ in whether there is ignorance. This difference could be the source of a confound. We eliminate it in Experiment 2. Second, recall that Experiment 1 tests whether folk attributions are sensitive to stakes, a dimension of practical interests. However, one’s practical interests surrounding a belief may depend on more than just the costs of being wrong. It may also involve the probability of the possible costs coming to fruition. The second experiment then considers not just what happens when the cost of being wrong varies, but also tests what happens when we change the probability of the possible costs being realized.16 The first prediction is that as the cost of being wrong goes up, subjects will raise the bar for knowledge. The second prediction is that as we raise the probability of the thing that may go wrong, subjects will also raise the bar for knowledge. Both predictions are in accordance with (H).

Method We constructed four “Airplane” vignettes, each corresponding to a condition. They are about an airline steward, Jessie, who is assigned to find a name on a roster of 200 passengers before a flight. In every case, Jessie thinks the name he is being asked to look up belongs to someone that is supposed be bumped up to first class and it wouldn’t matter much if Jessie failed to find the name. In every case, Jessie looks through the roster just once and comes to think that the name is not on the list. Subjects are told that in fact the name is not on the list. The four cases all differ only in some feature concerning Jessie’s practical interests and do not differ in intellectual features of the situation. In particular,

Experimental Evidence Supporting Anti-intellectualism About Knowledge

21

they differ in what is at stake—what event would happen if Jessie made a mistake. But they also differ on the probability of that event happening. In high stakes–high probability (HSHP), if Jessie is wrong, there is a high probability that the person on the list, a criminal, would hijack the plane. In high stakes– low probability (HSLP), if Jessie is wrong, the probability is low (but existing) that the criminal would hijack the plane. In low stakes–high probability (LSHP), if Jessie is wrong, there is a high probability that the person on the list, a nice guy, would accept the invitation to go to first class. In LSLP, there is a low probability that the nice guy would accept the invitation to go to first class. Again, Jessie is unaware that the name belongs to a nice guy/hijacker and unaware of the probability that he would go to first class/hijack the plane. The four surveys totaling 305 were randomly distributed to volunteer students taking introductory courses at Arizona State University. The main prompt was as follows: We are now interested in your opinion about what it would take for Jessie to know that the name is not on the roster (the name of the nice guy/ hijacker). Recall that according to the story, Jessie has already surveyed the entire roster once. How many more times do you think Jessie needs to survey the entire roster before he knows the name is not on the list (enter a whole number: 0,1,2,3, . . . etc. or write “never” if you think Jessie will never know) .

The surveys contained reading comprehension checks. The reading comprehension checks occurred before the target question and they included questions checking to see if subjects knew what was at stake (and the probabilities). The subjects were given approximately 10 minutes to complete the entire survey.17

Results We discarded 75 surveys because participants either failed to follow instructions, failed a reading comprehension check or wrote “never” as a response to the main prompt.18 We computed statistics for N  230 subjects. LSLP (N  50, m  1.6, sd  0.969), LSHP (N  58, m  1.76, sd  0.823), HSLP (N  61, m  1.93, sd  0.944), and HSHP (N  61, m  2.15, sd  1.152) (Figure 1.1). A one-way ANOVA reveals that there were statistically significant differences between the responses across the conditions. F(3,224)  3.211, p  0.05.

22

Advances in Experimental Epistemology 2.5 2 1.5 1 0.5 0 LSLP

LSHP

HSLP

HSHP

Airplane mean scores

Figure 1.1 Airplane mean scores.

This suggests that practical interests play a role in knowledge ascriptions. Tukey’s post hoc reveals statistically significant differences only between the LSLP and HSHP conditions, p  0.05, Cohen’s d  0.52 (large effect size). A two-way ANOVA was used for a factorial analysis. Comparing all the low-stakes cases against the high-stakes cases reveals a statistically significant main effect for stakes F(1,224)  7.639, p  0.01. Cohen’s d  0.38 (moderate effect size). No statistically significant main effect was found for probability. And no interaction effect was found.

Discussion As mentioned above, there was a significant main effect for stakes. This supports (H): that folk attributions of knowledge are sensitive to practical interests. However, there is no significant main effect for probability. Nor were there statistically significant differences between LSLP and LSHP or between HSLP and HSHP. So we failed to get good evidence that probability plays a role in knowledge attributions. However, two facts suggest that further research may discover such a role. First, the mean scores reveal that for each of the stakes conditions (low and high), the higher probability condition corresponds to a higher mean response. Second, as mentioned above, Tukey’s post hoc analysis for the four conditions revealed a significant difference only between LSLP

Experimental Evidence Supporting Anti-intellectualism About Knowledge

23

and HSHP. These conditions differ in both stakes and probability. This suggests that probability may be playing some role though not one that we were able to detect directly.19

6 Study 2: Agreement/disagreement with knowledge ascriptions The previous experiments made use of evidence-seeking probes. In this study, we attempt a different type of experiment to ensure that the prior results were not merely products of the way we asked the questions. We construct pairs of conditions concerning potential knowers (protagonists) of some proposition P. The conditions differ in what is at stake for the protagonists. Subjects are asked about the extent to which they agree or disagree with a statement that says that the protagonist knows P. As with Experiment 2 from Study 1, in both the low-stakes and high-stakes conditions, the protagonist is ignorant of what is at stake. Moreover, to ensure the vignettes represent the protagonist as having the same beliefs and matched for accuracy, the protagonist is always mistaken about what is at stake: but it is always the same mistake.

Method We developed three pairs of vignettes (Coin, Air, and Bridge) each with two conditions for what is at stake for the protagonist (Low and High). The surveys were taken by Amazon Turk workers in the United States. We present the Coin vignettes in full (see the Web appendix for the others): Coin Low Stakes: Peter is a college student who has entered a contest sponsored by a local bank. His task is to count the coins in a jar. The jar contains 134 coins. Peter mistakenly thinks the contest prize is one hundred dollars. In fact, the prize is just a pair of movie passes for this weekend. Peter wouldn’t want them, however, since he is leaving town this weekend. So nothing bad would happen if Peter doesn’t win the contest. After counting the coins just once, Peter concludes there are 134 coins in the jar. His friend, who also thinks the prize is one hundred dollars says to Peter “you only counted once, even if there are in fact 134 coins in the jar, you don’t know there are 134 coins in the jar.20 You should count them again.” Coin High Stakes: Peter is a college student who has entered a contest sponsored by a local bank. His task is to count the coins in a jar. The jar

24

Advances in Experimental Epistemology

contains 134 coins. Peter mistakenly thinks the contest prize is one hundred dollars. In fact, the prize is $10,000 which Peter really needs. He would use the money to help pay for a life-saving operation for his mother who is sick and cannot afford healthcare. So the stakes are high for Peter since if he doesn’t win the contest, his mother could die. After counting the coins just once, Peter concludes there are 134 coins in the jar. His friend, who also thinks the prize is one hundred dollars says to Peter “you only counted once, even if there are in fact 134 coins in the jar, you don’t know there are 134 coins in the jar. You should count them again.”

Along with some comprehension questions, subjects were presented with the following prompt followed by a 7-point Likert scale (0–6) where 6 is “strongly agree” and 3 is “neutral”: Besides giving Peter advice about what he should do, Peter’s friend also said that Peter doesn’t know something. He said that since Peter only counted the coins once, Peter doesn’t know that there are 134 coins in the jar (even if it turns out there are 134 coins in the jar). We are interested in your opinion about this. To what extent do you agree with the following statement: “PETER KNOWS THERE ARE 134 COINS IN THE JAR”

In accordance with (H), we predicted a higher level of agreement with the knowledge statement in low stakes than in high stakes. We also ran two other probes, Air and Bridge, with 5-point Likert scales (0–4) where 4 is “strongly agree” and 2 is “neutral” (See Web Appendix). We made similar predictions. For these experiments, we also added a “normative” question before the knowledge prompt. For example, in the COIN case we asked whether the subject thought Peter should count the pennies again. The main reason we do this is to help prevent subjects from thinking that the knowledge question implicates a question about what Peter should do (since we just asked this question, we would be violating a principle about relevance if we asked for it again immediately after). We think this feature of the experiment, together with the fact that the knowledge prompt is designed to help subjects focus on the concept of “knowledge,” can help alleviate possible worries about implicatures.

Results For the statistical analyses, we excluded those who failed comprehension checks or failed to follow instructions. Table 1.1 displays the main results.

Experimental Evidence Supporting Anti-intellectualism About Knowledge

25

Table 1.1 Study 2 Low stakes

High stakes

Coin (Likert: 0–6)

N  87, m  3.68, sd  1.80

N  78, m  3.06, sd  1.76

Air (Likert 0–4)

N  25, m  2.16, sd  1.03

N  30, m  2.03, sd  1.0

Bridge (Likert 0–4)

N  28, m  2.32, sd  1.16

N  31, m  1.71, sd  1.13

In accordance with (H), the mean level of agreement with the knowledge attribution is lower in the high-stakes conditions than in the low-stakes conditions. The differences reached statistical significance for the Coin vignettes t(161.78)  2.23, p  0.023, d  0.35 (small effect size) and the Bridge vignettes t(57)  2.053, d  0.53 (medium effect size). But not the Air Vignettes.21

Discussion The results from this experiment support (H). Mean level of agreement with the knowledge attribution differed across the stakes conditions. Significance was reached for two out of the three experiments, suggesting that stakes play a role in knowledge attribution in accordance with (H). Moreover, we get corroborating data for (H) which does not rely on “evidence-seeking” experiments. This is significant since the “evidence-seeking” results conflict with earlier work and so it might be thought that the former results are simply due to an artifact of the experiment.

7 Study 3: Knowledge reason principle So far, we have provided some evidence that folk attributions of knowledge are sensitive to practical interests (H). Our perspective is that a good explanation of this purported fact is that knowledge is in fact sensitive to practical interests. Now, it has been noted that AIK plausibly follows from deeper principles connecting knowledge and action. Given that people seem to be using knowledge as if the notion were sensitive to practical interests, we might wonder whether they implicitly accept any one of those principles. One such

26

Advances in Experimental Epistemology

principle is the Reason-Knowledge principle defended by Hawthorne and Stanley (2008): (RKP) Where one’s choice is p-dependent, it is appropriate to treat the proposition that p as a reason for acting if and only if you know that p.

The principle connects knowledge with a normative claim about action. The connection is intimate since the principle states an equivalence between the notions (when the possible object of knowledge is relevant for the action). Pinillos (2012) uncovered some evidence that people implicitly accept the principle. What was discovered was that people seemed to replace a question about knowledge with a normative question about action, questions that are deemed equivalent according to the principle. It was also discovered that people responded to questions about knowledge with the same answers other people gave to normative questions about action, where again, the questions are equivalent according to RKP. To be sure, this sort of evidence does not in any way establish that the folk implicitly accept RKP. But it makes the idea more plausible. We add further to the evidence in Study 3. Experiment 1 of this study reports results that support RKP. It should be flagged, however, that RKP is a strong principle. It is much stronger than is required to make plausible that knowledge is sensitive to practical interests AIK.22 Consider the weaker principle, ACTION: (ACTION) (Where P is relevant for action) If X knows P, then it is appropriate for X to act on P.

Experiment 2 will report on some evidence which suggests that the folk accept ACTION.

7.1 Study 3: Experiment 1 (Do People Accept RKP?) Method For this experiment, we test whether people implicitly accept RKP. To do this, we revisit Study 1, Experiment 2 (Air). Recall that in all four conditions (LSLP, LSHP, HSLP, HSHP), we asked subjects an evidence-seeking question: “How many more times do you think Jessie needs to survey the entire roster before he knows the name is not on the list .” Now, right before we asked

Experimental Evidence Supporting Anti-intellectualism About Knowledge

27

this “knowledge” question, we asked subjects a normative question about the relevant action in the vignette: Recall that according to the story, Jessie has already surveyed the entire roster once. At least how many more times do you think he should survey the entire roster looking for the name? .

If RKP is true, this normative question is equivalent (given the context) to the knowledge question. To be sure, if Jesse needs to survey the roster four times before he knows the name is not on the list, then (according to RKP) it wouldn’t be appropriate for Jesse to use the belief that the name is not on the list in the relevant action until he surveys the list four times. But what is the relevant action? Given the story and Jesse’s task, the relevant action is reporting that the name is not on the list. That is, it wouldn’t be appropriate for Jesse to report that the name is not on the list until he surveys the roster four times (we are assuming that there are no other reasons available that Jesse could use to justify making the report). At the very least then, Jesse should survey the roster four times (response to the normative prompt). That is, if people accept RKP, then responses to the normative prompt should correlate with responses to the knowledge prompt.

Results Across all four conditions of the Air experiment (N  228), there is a statistically significant correlation between the knowledge prompt responses (m  1.87, sd  0.996) and the normative prompt responses (m  1.75, sd  0.994); r  0.708, p  0.001. Moreover, the effect size is large. This is some evidence that subjects are treating the normative and knowledge prompts in the same way and also evidence that subjects implicitly accept RKP. As we have seen, this adds plausibility to (H).

7.2 Study 3: Experiment 2 (Do People Accept ACTION?) Method For this experiment, we test whether people implicitly accept the principle: ACTION. In order to do this, we revisit the results from the Low-Stakes and High-Stakes conditions for the Coin, Air, and Bridge probes (Study 2). Recall that in these probes we sought responses concerning the level of agreement

28

Advances in Experimental Epistemology

with a “knowledge” prompt. In COIN, for example, we asked about the statement “Peter knows there are 134 coins in the jar.” Now right before we asked this question we also asked a normative question. In the COIN probe, for example, we asked our subjects whether they thought that Peter should count the pennies again.23 Here, they only had three options: NO, NEUTRAL, and YES. Coding the NO and NEUTRAL in one category and coding the YES in a second category, we can compare the answers to the knowledge prompt across these two groups. Similar comparisons are made for AIR and BRIDGE. Focusing on the COIN case, if people tend to implicitly accept the ACTION principle, we should see that subjects in the “yes, should count the coins again” category are less likely (compared with the other group) to agree with the knowledge statement from the prompt above. Similar predictions are made about AIR and BRIDGE.

Results As predicted, we found evidence that people accept the ACTION principle. In the COIN case (7-point Likert from 6 was “strongly agree” and 3 was neutral). Among those who answered YES SHOULD COUNT AGAIN, the mean response was 3.1 (sd  1.69). Among those who answered NO/NEUTRAL, the mean response was 3.7 (sd  1.9). The difference was statistically significant t(163)  1.91, p  0.029 (one-tailed), d  0.3. A similar result was found for AIR (5-point Likert where 4 is “Strongly Agree” and 2 is “Neutral”): YES SHOULD CHECK ROSTER AGAIN (M  1.96, sd  0.908), NO (M  2.88, sd  1.246), t(53)  2.5, p  0.016, d  0.95. And, again, the same result holds for BRIDGE (5-point Likert as with AIR): YES SHOULD JUST CROSS THE BRIDGE (M  2.32, sd  1.06), NO (M  1.56, sd  1.19), t(57)  2.58, d  0.68. These experiments suggest that those subjects who thought that the protagonist did not have enough information to act on P were less likely to agree that the protagonist knows P. This indirectly supports the idea that people accept ACTION.

7.3 Discussion of both experiments The foregoing results give some evidence that people implicitly accept RKP and ACTION. These principles have been used by researchers as major premises to justify AIK. A number of issues arise which we address in turn.

Experimental Evidence Supporting Anti-intellectualism About Knowledge

29

First, there is no question that the level of support that these data provide for the claims that people accept RKP and ACTION is not very strong. A number of worries come to mind. In the general discussion below we consider an important objection developed by Buckwalter and Schaffer. Second, although (as defenders of AIK argue) RKP and ACTION may entail (assuming fallibilism) that knowledge is sensitive to practical interests AIK, it does not follow that folk acceptance of those principles entails (H), that is, folk attributions of knowledge are sometimes sensitive to practical interests. However, a person who accepts those principles (and fallibilism) but is such that her attributions of knowledge are not sensitive to stakes is less rational than a person who accepts the principles and is such that her attributions of knowledge are sensitive to stakes. Barring some special reason to think otherwise, we should think a person is more rational rather than less rational. Hence in this case, because we seem to lack those special reasons, evidence that the folk accept the principles gives us some further evidence that folk attributions of knowledge are sometimes sensitive to practical interests. Third, suppose it turns out that although the folk implicitly accept principles like RKP and ACTION, (H) is nonetheless false. The implicit folk acceptance of the principles may actually directly support AIK, bypassing (H), since the implicit acceptance of those principles will support those principles themselves (in the way that folk belief often supports philosophical claims). And arguably, those principles lead to AIK. Insofar as those principles are thought to be natural, the discovery that they are implicitly accepted by the folk adds to their plausibility.24 In sum, we have gathered some new evidence in favor of the notion that people implicitly accept principles used by theorists to support AIK. We indicated how these data can support (H) and how even if not, they may still support AIK.

8 Buckwalter and Schaffer objection Buckwalter and Schaffer (MS) have raised an important worry about the evidence-seeking experiments we use here and that also appear in Pinillos (2012). They think that subjects’ responses to the evidence-seeking

30

Advances in Experimental Epistemology

knowledge prompts reveal very little about their use of “knowledge.” And so the evidence-seeking experiments cannot tell us much about whether ordinary uses of knowledge are sensitive to practical interests (H). Buckwalter and Schaffer’s criticisms of evidence-seeking experiments can be broken down into three parts. First, they show that replacing the word “knows” in the evidence-seeking experiments with other attitude verbs like “believes,” “hopes,” and “guesses” yields responses that are not different (as measured by statistical significance) from the original answers to the “knows” question. They conclude that the original responses are not really about knowledge. Second, they provide a positive account of how people are interpreting the evidenceseeking prompts—this interpretation is not about knowledge. Finally, they run another experiment that they argue should yield a certain result if the evidence-seeking experiments were really testing folk uses of “knowledge,” but in fact, they don’t get that result. These are valuable criticisms. Before we address them in detail, we should keep in mind that the experimental evidence in favor of (H) goes beyond the evidence-seeking experiments. Experiments in this chapter as well as Pinillos (2012) and Sripada and Stanley (2012) involve nonevidence-seeking experiments which support (H).

8.1 Beliefs, guesses, and hopes In an attempt to give support for the claim that ordinary knowledge attributions are sensitive to practical interests (H), Pinillos (2012) gave participants a pair of vignettes about a student Peter who will soon turn in a paper for an English class. In one condition, the cost of having a single typo is high (he could fail the class) and in another case, the cost is low (the teacher won’t care much). Pinillos then asked participants the following evidence-seeking question: “How many times do you think Peter has to proofread his paper before he knows that there are no typos? times.”

Pinillos discovered that the median answer for the low- and high-stakes condition were 2 and 5, respectively. The difference was statistically significant. Moreover, Pinillos ran a few variations on this experiment including some that involved ignorance of what is at stake and others that involved a situation

Experimental Evidence Supporting Anti-intellectualism About Knowledge

31

where the protagonist (Peter) formed the belief that there are no typos right from the start. Pinillos also ran another case involving a person having to count pennies in a jar for a contest. In all these cases, he detected a stakes effect. Pinillos then goes on to conclude (together with further evidence) that this evidence supports AIK. Buckwalter and Schaffer (2013) and Buckwalter (this volume) doubt that these data support Anti-intellectualism because they think these results say very little about how people deploy the concept of knowledge. The main argument they give for this claim comes from some follow-up experiments where they replicate Pinillos’ experiments but also where “knows” in the prompt is replaced with “believes,” “guesses,” and “hopes” (These questions will be referred to as the “Knows,” “Belief,” “Guess,” and “Hopes” questions, respectively): “How many times do you think Peter has to proofread his paper before he believes/guesses/hopes that there are no typos? times.”

There are two main findings here. First, they find a stakes effect for each of these constructions. Second, they find no statistically significant differences between responses to these constructions and responses to the original Knows question.25 Buckwalter and Schaffer conclude that the effect of stakes has nothing to do with people’s perceptions of knowledge. They go on to offer a hypothesis about why this may be. As Buckwalter and Schaffer discuss, it is possible to explain the results for “believes” and “guesses” in a manner that does not impugn the evidenceseeking experiments.26 Concerning belief, we can follow Williamson (2000, 47) in thinking that knowledge is the norm for belief. For instance, subjects may accept that one should believe P only if they know P. If so, responses to the Belief and Knows prompts should be similar. Concerning guesses, there are uses of “guess” that are sensitive to the guesser’s practical interests. Note first that guesses need not be wholly blind. Consider, for example, the colloquial definition of a scientific hypothesis as an “educated guess.” How much evidence will a scientist gather before she makes an educated guess? Plausibly, this may depend on what is at stake—the costs of being wrong. Hence some types of guessing may be thought to be sensitive to practical interests.

32

Advances in Experimental Epistemology

Consider also various “guessing” games. Country Fairs often have a “guess the weight of the pumpkin” contest (or sometimes an animal, like a pig). The instructions for such contests are to guess the weight of the object in front of them. Of course, participants may wish to think hard about the answer and try to use various strategies for getting the question right especially if the prize is really big—hence the stake sensitivity of guessing. If it is relatively easy for the subject to know (if only they gathered a little bit more information), then the extra effort taken before they are willing to guess might just be enough for them to put them in a position to know. In the Typo vignette, it is not very difficult for Peter to know that there are no typos (by proofreading x times, say), so we would expect Peter to proofread x times even before he is willing to guess there are no typos. In sum, we don’t find it very surprising that agents would give the same answer to the Guess and Belief questions. The defense we just gave does not seem to be available for “hopes,” however. We do not see how correct responses to the Hopes question could be the same as correct responses to the Knows question. Nonetheless, we think these data can be explained in a way that is not problematic for us (or for the thesis advanced in Pinillos (2012)). To see this, we will consider and evaluate three possible explanations of Buckwalter and Schaffer’s data. The first is a thesis about the intelligibility of the Hopes prompt. The second is their preferred explanation of the data. The third is the explanation we prefer. We think our explanation vindicates the evidence-seeking probes. First, one might worry that the Hopes evidence-seeking question (“How many times do you think Peter has to proofread his paper before he hopes that there are no typos? times.”) is too difficult to understand. We don’t normally expect that hoping for something should depend on how much evidence we gather, so the question seems to presuppose something odd. According to this hypothesis, responses to the Hopes question do not reflect speaker competence since subjects do not properly understand it. This hypothesis, however, is not supported by the data we collected. We ran an experiment where we presented subjects with the Hopes question and other subjects with the Knowledge question. We then asked them to report the extent to which they understood the question (We used a Likert Scale with endpoints at 1  not understanding at all, and 4  fully understanding). What we discovered was that there were no differences in the reported level of

Experimental Evidence Supporting Anti-intellectualism About Knowledge

33

understanding between the questions and that the level of understanding was very much near the “fully understanding” ceiling.27 So this first hypothesis is not supported empirically. Apparently, subjects have no problem understanding the Hopes question. Though perhaps we should be cautious concerning selfreports of comprehension. The second hypothesis we consider is the one put forward by Buckwalter and Schaffer (2013). Here’s their explanation about how subjects are interpreting the Typo probes involving the evidence-seeking question (“How many times do you think Peter has to proofread his paper before he believes/guesses/hopes that there are no typos? times.”): . . . Pinillos is . . . seeing a stakes effect on the modal element “has” embedded in his complex probes. This “has” is most naturally read as a deontic modal (a “deontic modal” is one interpreted normatively). For instance, Typo probe know is most naturally reading as asking how many times Peter should, given his goals in life and the circumstances he finds himself in, proofread his paper before he knows that there are no typos. It is entirely unsurprising to find a stakes effect on deontic modality. Everyone can agree that the practical consequences of error matter when it comes to considering what one needs to do, normatively speaking. As the stakes get higher, Peter should proofread his paper more times, period. A fortiori he should proofread his paper more times before forming any attitude. Knowledge plays no role whatsoever in the matter, which is why replacing “know” with “believe” or “guess” or “hope” makes no difference whatsoever to the data. (p. . . .)

Buckwalter and Schaffer think that the stakes effect found is attributed to the modal “has” and it has little to do with the particular attitude mentioned (“knows,” “hopes,” etc.). According to this interpretation, the stakes effect discovered only reveals that subjects will think Peter should check more times for typos before forming any attitudes when the stakes are high (compared with low stakes). It doesn’t reveal much about people’s use of “knowledge” per se. Let us call this interpretation of the evidence-seeking question, “the modal reading.” We now turn to an investigation of the modal interpretation hypothesis. To test Buckwalter and Schaffer’s hypothesis, we ran an experiment where we presented a set of subjects with the low-stakes Typo vignette and another set, the high-stakes Typo vignette. We then asked all subjects both the Knows and Hopes questions concurrently.28 The idea here is that if the responses to

34

Advances in Experimental Epistemology

the Knows and Hopes prompts were different for an individual, then that individual is not giving the questions the modal reading (on the modal reading, the Hopes and Knows questions should be given the same answers). Furthermore, if the stakes effect still persists, then we can conclude that the stakes effect is not to be explained by subjects giving the evidence-seeking questions the modal reading. Hence, Buckwalter and Schaffer’s modal interpretation hypothesis would be on the wrong track. Our idea is supported by the data. Placing the Knows and Hopes questions side by side reveals that subjects give different answers to these questions. When the Hopes question is presented first (N  40), the mean response (for both low and high stakes) for the Hopes question is 1.5 (SD  1.038) and the mean response for the Knows question is 3.7 (SD  1.09). A paired-samples t-test reveals a statistically significant difference between these responses t(39)  12, p  0.01 (We excluded five outliers with responses above 9.). Essentially the same result holds when the Knows question is presented first (N  29). Hopes mean  1.97 (SD  0.9), Knows mean  3.93 (SD  1.4). Paired sample t-test: t(28)  10.06, p  0.01 (We exclude five outliers with responses above 9.). Moreover, the stakes effect still persists for the Knows question (though not for Hopes). The mean for the Knows low-stakes response was 3.31 (N  39, SD  0.86). The mean for the Knows high-stakes response was 4.42 (N  31, SD  1.36). The difference was statistically significant: t(48.35)  3.95, p  0.01. The mean for the Hopes low-stakes response was 1.54 (N  39, SD  0.96). The mean for the Hopes high-stakes response was 1.9 (N  39, SD  1.0). The difference was not statistically significant: t(68)  1.53, p  0.153. There were no ordering effects for the Knows questions. The mean for the Knows questions when they were presented first was 3.93 (SD  1.43) and 3.7 (SD  1.09) when they were presented second, N  69, t(67)  0.75, p  0.45. There were no ordering effects for the Hopes question. The mean for the Hopes questions when they were presented first was 1.5 (SD  1.03) and 1.97 (SD  0.9) when they were presented second, N  69, t(67)  1.9, p  0.057.29 We now discuss our preferred explanation of the puzzling data Buckwalter and Schaffer uncovered. We believe that the stakes effect that they discovered for the Hopes prompt and the fact that the Hopes responses did not differ from the Knows responses is due to the “Anchoring and Adjustment” effect (Tversky and Kahneman 1974).30

Experimental Evidence Supporting Anti-intellectualism About Knowledge

35

The effect, which researchers have called “strikingly pervasive and robust,” is exhibited when humans attempt to make an estimate under uncertainty such as the one required for the evidence-seeking experiments.31 Their responses will tend to gravitate toward an anchor point.32 For example, when subjects are asked to estimate the year George Washington was elected US. President (1788), they will first anchor on a known number such as 1776—the year of independence. Their estimate will then tend toward the anchor. In a study by Epley and Gilovich (2006), subjects’ responses to this question averaged at 1779.67 even though the range of possible answers (given by subjects taken from the same population) is 1777–1784. Hence, the response mean skewed toward the anchor number. Similar findings were replicated for a number of different questions. We think the evidence-seeking experiments applied to “hope” are naturally seen as revealing the anchoring and adjustment bias. When subjects are presented with the Typo vignette and the evidence-seeking question, it is natural for them to anchor on the number of times Peter should proofread his paper before he turns it in. The responses here will naturally be sensitive to stakes. This is because the question is just one about rational decisions which, under classical decision theory, is sensitive to the costs of being wrong (stakes). Pinillos (2012) shows that the answer to this question was the same as the answer to the Knows question.33 According to Epley and Gilovich’s (2006) processing model, subjects’ responses will gravitate toward the anchor as a result of a satisficing process: Subjects will form a series of hypotheses about possible responses starting from the anchor and moving away, but terminating when a plausible response is found. We believe that when subjects are presented with the Hopes question, they are anchoring, satisficing, and terminating soon at or near the anchor. This explains Buckwalter and Schaffer’s data. We do not think that subjects are adjusting enough. Subjects are anchoring to the natural answer to the normative question “how many times should Peter proofread” and not adjusting enough to properly answer the Hopes question. To be clear, the reason we think subjects are making a performance error is that their answers seem obviously mistaken. And the reason why we think the error arises from insufficient adjustment is that this seems to be the best explanation of their mistake.

36

Advances in Experimental Epistemology

Although we think that subjects presented with the Hopes question in the Buckwalter and Schaffer study are not adjusting enough, we do not believe that the same bias is at play when subjects are presented with the Knows prompt. Epley and Gilovich (2006) show that motivation or willingness to spend mental effort can correct for the anchoring bias in their satisficing model.34 If there were an anchoring bias involved in answering the knowledge question, we would expect less reflective and more reflective participants to give different responses to the Knows prompt. But Pinillos (2012) reported that reflectiveness did not affect answers to the Knows prompts. So we do not think that the anchoring and adjustment bias is at play for the Knows questions. We now consider Buckwalter and Schaffer’s final critique of the evidenceseeking experiments. They developed a pair of modified typo vignettes, “two reads,” with essentially the same content except that it is stipulated that Peter has already checked for typos twice. Subjects are then asked for their level of agreement with the statement: “Peter knows that there are no typos in his paper.” Recall now that in the original high-stakes evidence-seeking experiment subjects said, on average, that Peter had to proofread five times before he knows there are no typos (in low stakes, subjects said, on average, that Peter had to proofread two times). Given this, we would expect in “two reads” that the level of agreement with the knowledge statement would differ across conditions. And we would also expect people to disagree with the statement in the highstakes condition. However, this is not what was discovered. In a betweensubjects study, Buckwalter and Schaffer (2013) discovered that people tended to agree with this statement for both high- and low-stakes conditions. And more importantly, there are no statistically significant differences across the conditions. This result is trouble for the original evidence-seeking experiments. Buckwalter and Schaffer take this result to further confirm their hypothesis that the evidence-seeking probes are not telling us anything about how people deploy “knowledge.” We will consider two responses which put some pressure on Buckwalter and Schaffer’s criticism. First, we think that their vignettes trigger an implicature that Peter is satisfied with his proofreading after proofreading twice. Here’s their exact wording: “Peter is naturally a pretty good speller, plus he has a dictionary with him which he has already used to check the paper carefully, twice over.”

Experimental Evidence Supporting Anti-intellectualism About Knowledge

37

According to Grice’s maxim of Quantity, conversational participants should be as informative as required. If in the story, Peter in fact counts three times, it would be a violation of Quantity if the story just said he counted twice. For this reason we think it is plausible that subjects read the story as implicating that Peter counted exactly twice. Now if subjects think that counting twice is enough to satisfy Peter, then they would likely think that this must be enough for him to know there are no typos. After all, Peter is aware of what is at stake and he is in the best position to judge how carefully he was proofreading. Subjects may interpret the high-stakes vignette as involving two very careful reads of the story, and this may be enough for Peter to know there are no typos. If this is right, Buckwalter and Schaffer’s experiment does not impugn the evidence-seeking probes. For the second response we revisit the anchoring effect. The crucial difference between the Two-Counts probe and the evidence-seeking experiments is that in the former, the anchor (which is two counts) is provided by the experimenter. This is important. Research indicates that the satisficing process, which leads to insufficient adjusting, best applies to cases which involve self-generated anchors.35 When the anchor is given, the process may be one of selective accessibility.36 According to this processing model, agents will test a hypothesis naturally given by the anchor. This process of testing the hypothesis will cause subjects to selectively access information about the target that is consistent with the hypothesis. And so make it more likely to agree with the hypothesis. For example, Mussweiler et al. (2004) report on an experiment where subjects are asked to judge whether the percentage of African nations who are members of the United Nations are higher or lower than 65 percent. When they do so “they selectively retrieve knowledge from memory that is consistent with this assumption (e.g., ‘Africa is a huge continent,’ ‘There are more African nations than I can keep in mind,’ etc.)” (p. 192). Hence their estimation of the relevant percentage will be closer to 65 percent than it would otherwise be since they will access information about Africa consistent with the given anchor number. We think a similar bias may be at play in Two Counts. Agents consider the hypothesis that Peter knows there are no typos after proofreading just twice. Doing so causes them to construe the vignette so that in fact the proofreads are of sufficient quality to engender knowledge after just proofreading twice.

38

Advances in Experimental Epistemology

We think that either the Gricean or the Anchoring mechanisms are at play in the Two-Counts probe. If this is right, then we can meet Buckwalter and Schaffer’s criticisms. Evidence-seeking probes still give us good reason to accept (H) and AIK.

9 Conclusion In this chapter we reported on some new experiments that support (H), folk attributions of knowledge are sometimes sensitive to practical interests; and AIK, a substantive thesis in epistemology. We use evidence-seeking experiments to show this as well as more traditional measures. In addition, we give some evidence that people accept principles like ACTION and RKP which dovetail with AIK. Finally, we address some objections raised in the literature against evidence-seeking experiments and find the objections to be problematic. As a final point, we add that (contrary to initial impressions perhaps) there is a sense in which our method resembles traditional methods in philosophy. We rely on judgments about cases to get at important philosophical questions. Where we depart from tradition is that we use an experimental approach to collecting and analyzing those judgments.

Notes 1 Pinillos, at Arizona State University, is the primary author. Simpson is at the Graduate Center of the City University of New York. 2 See Hawthorne (2003), Stanley (2005), Fantl and McGrath (2002, 2007), Pinillos (2012) and Weatherson (2012). 3 We take the premise in question to not be, in general, a psychologized entity: the premise is the content of the judgment, not the fact that people possess the intuition. 4 Weinberg et al. (2001) and Buckwalter and Stitch (2010). 5 See Schwitzgebel and Cushman (2012) for evidence that philosophers are not immune to order effects. 6 Buckwalter (2010), Feltz and Zarpentine (2010), May et al. (2010). Phelan (forthcoming) gives evidence that practical interests don’t play a role in

Experimental Evidence Supporting Anti-intellectualism About Knowledge

39

attributions of evidence, but doesn’t address knowledge. The term “first wave” is used by Buckwalter and Schaffer (MS) in referring to these first experiments. 7 Pinillos (2012) and Sripada and Stanley (2012). 8 Buckwalter (this volume) and Buckwalter and Schaffer (MS). 9 A version of this problem for the experimental work is raised by DeRose (2011). 10 Nagel (2010). 11 A version of the “Awareness of Stakes” problems can be found in Pinillos (2011, 2012) and Sripada and Stanley (2012). 12 It may be, as James Beebe pointed out to us, that the failure to detect significance is due to the survey materials themselves (as opposed to having an insufficient number of subjects). 13 The probability is (1-power). 14 Stevens (2011). Intermediate Statistics: A Modern Approach. 15 Another reason they give for not claiming (H) has been supported is that the responses to low- and high-stakes vignettes are both on the agreement side of neutral (with neutral being 4). So we don’t have the switch from knowing to nonknowing responses. We do not find this reason very convincing. If the goal is to show that ordinary knowledge attributions are sensitive to practical interests, we just need to find a difference in responses across the conditions in the predicted direction. We do not also need the stronger claim that the responses must average to a certain range. In fact, a second paper from the first wave by May et al. also discovers statistically significant differences between low- and high-stakes responses. But since the responses do not switch from knowing to nonknowing, they do not take these results to support the stake sensitivity of ordinary knowledge ascriptions. Our criticism applies here as well. In fact, we think that since two of out the three first-wave studies which are taken to be trouble for (H) actually find statistically significant differences between responses to low- and high-stakes scenario, we don’t think that the first-wave studies turn out to be very troubling for (H) after all. 16 Of course, the probability of the costs being realized is distinct from the epistemic probability of the belief being true. 17 In addition, right before the knowledge prompt, we asked a normative question “how many times should Jessie check the roster.” We ask this question right before the knowledge prompt so that people won’t take the knowledge prompt to implicate this normative prompt. We think people won’t make the implicature because it would violate the principle of relevance. Moreover, this normative question will serve a further purpose concerning the connection between knowledge and action. Study 3 addresses this issue.

40

Advances in Experimental Epistemology

18 Writing “never” does not necessarily reveal an error on the part of subjects. It may indeed reveal a skeptical attitude toward the possibility of knowledge. Giving people an option to write “never” prevents people from writing arbitrarily large numbers, as was found in Pinillos (2012). 19 Note that these are ignorant probes so that the probability of the possible harm is not known by the subject. It is instead known by the attributor. If probabilities are understood subjectively, an interesting question would arise as to whether the possible sensitivity to probabilities in ignorant cases must be understood as supporting attributor sensitivity (contextualism), as opposed to subject sensitivity. We do not explore this issue here. 20 We insert a knowledge denial here so that we make it natural for subjects to evaluate Peter’s response to this challenge. 21 Air: t(53)  0.462, p  0.646. 22 Fantl and McGrath (2009). 23 The normative prompt for COIN was this: “Peter’s friend told Peter he should count the coins again. Do you think that Peter should count the coins again?.” The normative prompt for AIR was “Do you think Jason should look through the entire roster at least one more time?” The normative prompt for BRIDGE was “Do you think that John should just cross the bridge.” For BRIDGE and AIR, subjects were not given a “Neutral” option in responding, it was just a binary “Yes” and “No.” 24 Although in Experiment 2 those who think that Peter should count the coins again are less likely to say he knows that there are 134 coins in the jar (thereby supporting folk acceptance of ACTION), we must note that 25 percent of our participants thought that (1) Peter should count the coins again and (2) nonetheless still think that Peter knows that there are 134 coins in the jar. All of those people are failing to act in accordance with ACTION. Since this number does not approach a majority, we cannot say that it gives us good evidence against ACTION. Yet, the number is high enough to merit further investigation. 25 We replicated their studies for “Hopes.” 26 The discussion was prompted by a response from Pinillos in the Experimental Philosophy blog and in personal communication. They do not endorse this response. 27 Low-Stakes Hopes (M  3.81, SD  0.402), Low-Stakes Knowledge (M  3.95, SD  0.218), t(30.828)  1.43, p  0.163. High-Stakes Hopes (M  3.85, SD  0.489), High-Stakes Knowledge (M  3.95, SD  0.229), t(37)  0.788, p  0.433.

Experimental Evidence Supporting Anti-intellectualism About Knowledge

41

28 We instructed participants to read both questions before writing down their answers. 29 One may reply that in this experiment only the Knows question gets the modal reading but the hopes question doesn’t. This would explain why subjects give different responses to these questions. It would also explain why there is a stakes effect for Knows but not for Hopes. This response is ad hoc. We see no reason to think that subjects would give the modal reading to one but not the other question, in the cases when they are given side by side. Note that there were no ordering effects (see prior footnote) so one can’t argue that order forces a particular reading. 30 Tversky, A., and Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124–30. 31 Mussweiler, T., Englich, B., and Strack, F. (2004) Anchoring effect. In R. Pohl (ed.), Cognitive illusions: A handbook of fallacies and biases in thinking, judgement, and memory (pp. 183–200). London, UK: Psychology Press. 32 This anchor point may or may not be self-generated. Epley and Gilovich (2006) “The Anchoring-and-Adjustment Heuristic. Why the Adjustments Are Insufficient.” Nicholas Epley and Thomas Gilovich; Psychological Science, 2006, 17(4), pp. 311–18. They argue that self-generated anchors give rise to a satisficing mechanism. More on this below. 33 This was verified for the “coin counting” experiment. It wasn’t tested for the Typo cases. 34 Epley and Gilovich (2006) show that participants who are under the influence of alcohol score lower in the “Need For Cognition” test and those who are under heavy cognitive load are more likely to display the bias. The authors also argue that this phenomenon only holds for self-generated anchors but not for provided anchors. The bulk of research on anchoring involves experimenterprovided anchors. 35 See Epley and Gilovich (2006) and Mussweiler, T., Englich, B., and Strack, F. (2004) Anchoring effect. In R. Pohl (ed.), Cognitive illusions: A handbook of fallacies and biases in thinking, judgement, and memory (pp. 183–200). London, UK: Psychology Press. 36 Mussweiler, (1997). A selective accessibility model of anchoring: Linking the anchoring heuristic to hypothesis-consistent testing and semantic priming. Psychologia Universalis (Vol. 11), Lengerich, Germany: Pabst; Mussweiler and Strack, (1999a) Hypothesis-consistent testing and semantic priming in the anchoring paradigm: A selective accessibility model. Journal of Experimental

42

Advances in Experimental Epistemology Social Psychology, 35, 136–64. Mussweiler and Strack, (1999b), Comparing is believing: A selective accessibility model of judgmental anchoring. In W. Stroebe and M. Hewstone (eds), European review of social psychology (Vol. 10, pp. 135–67). Chichester, UK: Wiley. Strack and Mussweiler, (1997). Explaining the enigmatic anchoring effect: Mechanisms of selective accessibility. Journal of Personality and Social Psychology, 73, 437–46.

References Buckwalter, W. (2010), “Knowledge isn’t closed on saturday: A study in ordinary language.” Review of Philosophy and Psychology, 1, 395–406. Buckwalter, W. and Schaffer, J. (2013), “Knowledge, stakes, and mistakes.” Noûs, 47(3). Buckwalter, W. and Stich, S. (2010), “Gender and philosophical intuition.” Available at SSRN: http://ssrn.com/abstract  1683066. DeRose, K. (2011), “Contextualism, contrastivism, and x-phi surveys.” Philosophical Studies, 156, 81–110. Epley, N. and Gilovich, T. (2006), “The anchoring and adjustment heuristic: Why adjustments are insufficient.” Psychological Science, 17, 311–18. Fantl, J. and McGrath, M. (2002), “Evidence, pragmatics, and justification.” Philosophical Review, 111, 67–94. —. (2007), “On pragmatic encroachment in epistemology.” Philosophy and Phenomenological Research, 75, 558–89. —. (2009), Knowledge in an Uncertain World. New York: Oxford University Press. Feltz, A. and Zarpentine, C. (2010), “Do you know more when it matters less?” Philosophical Psychology, 23, 683–706. Goldman, A. (2007), “Philosophical intuitions: Their target, their source and their epistemic status.” Grazer Philosophische Studien, 74, 1–26. Hawthorne, J. (2003), Knowledge and Lotteries. Oxford University Press: Oxford. Hawthorne, J. and Stanley, J. (2008), “Knowledge and action.” Journal of Philosophy, 105, 571–90. May, J., Sinnott-Armstrong, W., Hull, J. G. and Zimmerman, A. (2010), Practical interests, relevant alternatives, and knowledge attributions: An empirical study. Review of Philosophy and Psychology, 1, 265–73. Mussweiler, T. (1997), Psychologia Universalis: Vol. 11. A Selective Accessibility Model of Anchoring: Linking the Anchoring Heuristic to Hypothesis-Consistent Testing and Semantic Priming. Lengerich (Germany): Pabst.

Experimental Evidence Supporting Anti-intellectualism About Knowledge

43

Mussweiler, T. and Strack, F. (1999a), “Hypothesis-consistent testing and semantic priming in the anchoring paradigm: A selective accessibility model.” Journal of Experimental Social Psychology, 35, 136–64. —. (1999b), “Comparing is believing: A selective accessibility model of judgmental anchoring,” in W. Stroebe and M. Hewstone (eds), European Review of Social Psychology, Vol. 10. Chichester, UK: Wiley, pp. 135–67. Mussweiler, T., Englich, B. and Strack, F. (2004), “Anchoring effect,” in R. Pohl (ed.), Cognitive Illusions: A Handbook of Fallacies and Biases in Thinking, Judgement, and Memory. London: Psychology Press, pp. 183–200. Nagel, J. (2007), “Epistemic intuitions.” Philosophy Compass, 2, 792–819. —. (2010), “Knowledge ascriptions and the psychological consequences of thinking about error.” Philosophical Quarterly, 60, 286–306. Phelan, M. (forthcoming), “Evidence that stakes don’t matter for evidence.” Philosophical Psychology. Pinillos, N. Á. (2011), “Some recent work in experimental epistemology.” Philosophy Compass, 6, 675–88. —. (2012), “Knowledge, experiments and practical interests,” in J. Brown and M. Gerken (eds), Knowledge Ascriptions. Oxford: Oxford University Press. Schaffer, J. (2006), “The irrelevance of the subject: Against subject sensitive invariantism.” Philosophical Studies, 127, 87–107. Schwitzgebel, E. and Cushman, F. (2012), “Expertise in moral reasoning? Order effects on moral judgment in professional philosophers and non-philosophers.” Mind & Language, 27, 135–53. Sripada, C. and Stanley, J. (2012), “Empirical tests of interest-relative invariantism.” Episteme, 9, 3–26. Stanley, J. (2005), Knowledge and Practical Interests. Oxford: Oxford University Press. Stevens, J. (2007), Intermediate Statistics: A Modern Approach (2nd edn). Mahwah, NJ: Erlbaum. —. (2011). Intermediate Statistics: A Modern Approach (3rd edn). Mahwah, NJ: Erlbaum. Strack, F. and Mussweiler, T. (1997), “Explaining the enigmatic anchoring effect: Mechanisms of selective accessibility.” Journal of Personality and Social Psychology, 73, 437–46. Tversky, A. and Kahneman, D. (1974), “Judgment under uncertainty: Heuristics and biases.” Science, 185, 1124–31. Weatherson, B. (2012), “Knowledge, bets and interests,” in J. Brown and M. Gerken (eds), Knowledge Ascriptions. Oxford: Oxford University Press, pp. 75–103. Weinberg, J., Nichols, S. and Stich, S. (2001), “Normativity and epistemic intuitions.” Philosophical Topics, 29, 429–60. Williamson, T., (2000), Knowledge and its Limits. Oxford: Oxford University Press.

2

Winners and Losers in the Folk Epistemology of Lotteries1 John Turri and Ori Friedman

Two assumptions anchor most contemporary discussions of knowledge in cases of (large, fair, single-winner) lotteries. First, based on the long odds alone, you don’t know that your ticket lost. Second, based on watching a news report of the winning numbers, you do know that your ticket lost. Moreover, it is often treated as an uncontroversial datum that this is how most people view matters. Explaining why people hold this combination of attitudes is then treated as a criterion for an acceptable theory of knowledge and knowledge attributions. But do people actually hold the views they’re assumed to hold? We did the necessary empirical work to find out. We studied people’s reactions to lottery cases and discovered that they respond as predicted. We report those results here. We also evaluate three previous explanations for why people deny knowledge in lottery cases; none of them seems to work. Finally, we present evidence for a new explanation for why some people deny knowledge in lottery cases. We suggest that they deny knowledge in lottery cases due to formulaic expression.

1 Introduction Suppose that Smith is considering the fate of a particular ticket in a large, fair lottery.2 After considering the long odds, Smith concludes that the ticket is a loser and, unsurprisingly, Smith is right. Does Smith know that the ticket is a loser, or does he only believe it? As Jonathan Vogel puts it, “No matter how

46

Advances in Experimental Epistemology

high the odds that the ticket will not win, it strikes us that [Smith] doesn’t know that [the] ticket will not win” (Vogel 1990: 292). Call this the skeptical lottery judgment. Now suppose that Brown is considering the fate of that very same lottery ticket. After hearing the winning numbers announced on the nightly news, Brown concludes that the ticket is a loser. Does Brown know that the ticket is a loser, or does she only believe it? As Keith DeRose notes, “after she’s heard the winning numbers announced,” people “judge that [Brown] does know” (DeRose 1996: 570ff ). Call this a nonskeptical lottery judgment. This combination of skeptical and nonskeptical tendencies is puzzling. After all, mistaken testimony seems much more likely than winning the lottery. Indeed, even if you watch the drawing in person and see the winning number with your own eyes, it’s far from clear that misperception is any less likely than a false inference based on the long odds. Nevertheless, people readily judge that you know the ticket lost after being told the results or watching the drawing, whereas they readily judge that you don’t know after simply calculating the odds. This chapter asks two main questions. First, do people actually display the pattern of skeptical and nonskeptical judgment described above? That is, do people share (a) the skeptical lottery judgment in lottery cases involving statistical reasoning and (b) the nonskeptical judgment in lottery cases involving testimony? We find that, yes, people do display this pattern of judgment. Second, why do people judge this way? Many explanations have been proposed. We assess three previous proposals and identify a new factor that we believe contributes to skeptical lottery judgment. The new factor is formulaic expression. It is widely assumed that people conform to the pattern of skeptical and nonskeptical lottery judgment described above. Indeed, theorists claim that it is “uncontroversial”—a “datum” to be explained—that people conform to the pattern (Hawthorne 2004: 8). In short, philosophers assume that they have identified uncontroversial elements of the folk epistemology of lotteries. But this sweeping empirical generalization has never been tested. This should inspire caution. For experience shows that armchair predictions often misidentify which epistemological judgments are widespread and uncontroversial. For example, until recently it was widely assumed that virtually everyone shares the intuition that Gettier subjects lack knowledge (Gettier 1963; see Turri 2012a

Winners and Losers in the Folk Epistemology of Lotteries

47

for an overview of the literature). But it turns out to be questionable whether people intuit that Gettier subjects lack knowledge (Starmans and Friedman 2012; Turri 2012b; Weinberg et al. 2001).3 And this should be unsurprising in light of the empirical literature on expertise, which shows that experts are especially bad at predicting what novices will do (Hinds 1999). If professional theorists of knowledge are experts in the areas of knowledge and knowledge ascription, then they’ll face predictable obstacles in predicting what most people will think or say about knowledge.4 Also absent from the literature is any experimental evidence that prior explanations of lottery judgments identify psychologically relevant factors. And make no mistake about it: philosophers have explicitly said that they’re trying to explain “why we typically judge” the way we do in lottery cases (DeRose 1996: 569), and that they’re proposing accounts of “the relevant psychological forces driving the relevant” judgments (Hawthorne 2004: 14; see also Vogel 1990: section IV). Many of their proposals generate testable predictions. We test them. The present chapter, therefore, is just as much a matter of armchair psychology meeting experimental philosophy as it is of armchair philosophy meeting experimental psychology. We will consider three previous proposals offered to explain skeptical lottery judgment. ●





The justification account: unjustified belief inhibits knowledge ascription. That is, if people think that you’re unjustified in thinking that P,5 they will deny that you know P. In basic lottery cases, people think your belief that the ticket will lose is unjustified, so they deny that you know it will lose (Nelkin 2000; Sutton 2007: 48–53; compare Williamson 2000). The chance account: chance of error inhibits knowledge ascription. That is, if people think that there is a chance that you’re wrong about P, then they will deny that you know P. In basic lottery cases, people think that there’s a chance that you’re wrong about the ticket losing, so they deny that you know it will lose (Cohen 1988: 196; Lewis 1996: 557). The statistical account: unanchored statistical inference inhibits knowledge ascription. That is, if people recognize that you believe P based on statistical grounds unanchored by relevant observation, then they will deny that you know P. In basic lottery cases, people recognize that you believe that

48

Advances in Experimental Epistemology

the ticket will lose based on unanchored statistical grounds, so they deny that you know it will lose (Harman 1968; compare Nelkin 2000: 396ff ).6 We will also propose and test a new account of our own. ●

The formulaic account: formulaic expression inhibits knowledge ascription in basic lottery cases.

Here is the plan for the paper. Section 2 reports an experiment that tests whether people share the skeptical judgment in basic lottery cases. It also provides an initial test of the justification and chance accounts of skeptical judgment. Section 3 reports an experiment that tests whether people share the nonskeptical judgment in testimonial lottery cases. It also provides a more pointed test of the chance account. Section 4 reports an experiment that tests the statistical account. Section 5 reports an experiment that provides an initial test of the formulaic account. Section 6 reports an experiment that further tests the formulaic account. Section 7 is a general discussion of the significance of our findings.

2 Experiment 1: Skeptical judgment in basic lottery cases and the justification account’s demise We begin by reporting a simple experiment designed to test two things. First, it tests whether participants tend to deny knowledge in basic lottery cases. Second, it tests whether either the justification account or the chance account can help to explain skeptical judgment in lottery cases. Participants (N  45, 69 percent males) were recruited and tested using an online platform (Qualtrics and Amazon Mechanical Turk) and compensated $0.25 for approximately 2 minutes of their time. Participants were 29 years old on average.7 Participants were located in the United States and 91 percent listed English as a native language. Participants were not allowed to retake the survey and repeat participation was prevented by screening Mechanical Turk Worker IDs. Participants read a simple story and then answered a series of comprehension and test questions, followed by a brief demographic questionnaire. Test and comprehension questions were always asked in the same order; the order of response options was rotated randomly. Participants

Winners and Losers in the Folk Epistemology of Lotteries

49

who failed comprehension questions were excluded from the analysis. We followed these same procedures in all the studies reported in this chapter. Participants read this story: Lois is checking out at the grocery store. The clerk says to her, “Do you want to buy a lottery ticket?” Lois answers, “No thanks—I’m not going to buy a losing lottery ticket.” And Lois is right: the ticket is a loser.

Participants answered these dichotomous test questions pertaining to knowledge, justification, and the chance of error: Lois _____ that the ticket is a loser. [knows/only believes] Lois is _____ in believing that the ticket is a loser. [justified/unjustified] Was there at least some chance, no matter how small, that the ticket was a winner? [yes/no]

Upon answering the dichotomous knowledge question and the dichotomous justification question, respectively, participants were asked to rate how confident they were in their answer. Responses were collected on a 1–10 scale, anchored with “not at all confident” ( 1) and “completely confident” ( 10). Answers to the dichotomous knowledge and justification questions were scored either 1 (knows, justified) or 1 (only believes, unjustified). In each case, we combined the answer to the dichotomous question with the confidence rating by multiplying them. The result is a weighted knowledge score and a weighted justification score, each of which fell on a 20-point scale ranging from 10 (maximum knowledge or justification denial) to 10 (maximum knowledge or justification ascription). The results demonstrate that the skeptical lottery judgment is widely shared in basic lottery cases. The vast majority (91 percent) of participants judged that Lois only believes that the ticket is a loser. This is significantly more than could be expected by chance.8 The mean weighted knowledge score (7.36) was significantly below midpoint ( 0) on the scale (which, again, ranges from 10 through 10).9 In light of that resounding result, let’s examine participant response to the justification question. This will tell us whether the justification account might be on the right track. The justification account says that in basic lottery cases people deny knowledge because they think that justification is absent. So if

50

Advances in Experimental Epistemology

the justification account is correct, very few participants should say that Lois is justified in thinking that the ticket is a loser. That is, almost all participants should answer “no” to the justification question and the mean weighted justification score should be very low. The results were highly unfavorable to the justification account. A very strong majority (80 percent) answered “yes” to the justification question, which is more than could be expected by chance.10 The mean weighted justification score (6.13) was much higher than the mean weighted knowledge score and, importantly, significantly higher than midpoint on the scale.11 These results strongly suggest that when people deny knowledge in lottery cases, it’s not because they think justification is absent. Next let’s examine participant response to the chance question. This will tell us whether the chance account might be on the right track. The chance account says that in basic lottery cases people deny knowledge because they think that there’s a chance you’re wrong. So if the chance account is correct, then we should expect almost all participants to answer that there was a chance that the ticket was a winner. That is, almost all participants should answer “yes” to the chance question. The results were highly favorable to the chance account. The vast majority (96 percent) of participants answered “yes” to the chance question, which is what we would expect if the chance account explains skeptical lottery judgment (Table 2.1). In summary, our findings from Experiment 1 suggest three things. First, the skeptical lottery judgment is widely shared in basic lottery cases. Second, the justification account of the skeptical lottery judgment is almost certainly false. Third, the chance account does a good job of explaining the skeptical judgment in this case and so merits further consideration. Table 2.1 Experiment 1: The percentage of participants answering “yes” to the knowledge, justification, and chance questions, as well as the mean weighted knowledge and justification scores (derived by multiplying dichotomous knowledge choice by confidence) Weighted knowledge score

7.36

Ascribing knowledge (% yes)

9%

Weighted justification score

6.13

Ascribing justification (% yes)

80%

Chance of error (% yes)

96%

Winners and Losers in the Folk Epistemology of Lotteries

51

3 Experiment 2: Nonskeptical judgment in testimonial lottery cases and the chance account’s demise In this section we report an experiment designed to do three things. First, it seeks to replicate our finding from Experiment 1 on skeptical judgment in basic lottery cases. Second, it tests whether people tend to judge lottery cases differently when the protagonist concludes that the ticket lost based on testimony (as opposed to the long odds). That is, it tests whether people do tend toward nonskeptical judgment in testimonial lottery cases. Third, it further tests the chance account of skeptical lottery judgment. Participants (N143, 51 percent males) were assigned to one of three conditions: Odds, News, and Odd News. Participants were 32 years old on average. Ninety-six percent listed English as a native language. Participants in Odds read a basic lottery case where the ticket owner doesn’t watch a newscast but simply bases her belief on the long odds. Participants in News read a story about a ticket owner who watches the evening newscast of the winning lottery numbers; no odds or chances are ever mentioned. Participants in Odd News read a similar story, except that this time the ticket owner recalls the odds of a newscaster misreporting the winning number and bases her belief on that. All participants answered comprehension questions and two test questions, a knowledge question and a chance question, analogous to the questions from Experiment 1. Participants also rated how confident they were about their answer to the knowledge question. Here are the three stories (manipulations italicized): [NEWS]12 Ellen bought a ticket in this week’s Super Lotto. Her numbers are 49-20-3-15-37-29-8. Ellen just finished watching the evening news and they reported that a completely different number won. It was the same newscaster that reports the winning number every week on the local channel that Ellen watches. On that basis, Ellen concludes that her ticket lost. And she is right: her ticket lost. [ODD NEWS] Ellen bought a ticket in this week’s Super Lotto. Her numbers are 49-20-3-15-37-29-8. Ellen just finished watching the evening news and they reported that a completely different number won. And she recalls from her statistics class that there is only a 1-in-10,000,000 (one-in-ten-million) chance that a newscaster will misreport the winning number. On that basis, Ellen concludes that her ticket lost. And she is right: her ticket lost.

52

Advances in Experimental Epistemology

[ODDS] Ellen bought a ticket in this week’s Super Lotto. Her numbers are 49-20-3-15-37-29-8. Ellen wasn’t able to watch the evening news where they reported which number won. But she recalls from her statistics class that there is only a 1-in-10,000,000 (one-in-ten-million) chance that a Super Lotto ticket will win. On that basis, Ellen concludes that her ticket lost. And she is right: her ticket lost.

Turning now to analyzing the results, there was an overall effect of condition on knowledge ascription (see Table 2.2).13 Next we’ll look for three things. First, we’ll check whether the pattern of knowledge-ascription in Odds replicates the pattern observed in Experiment 1. Second, we’ll check whether knowledge ascription in News displays the predicted nonskeptical pattern. Third, we’ll check whether the chance account correctly predicts the overall relationship between knowledge ascription and response to the chance question, paying special attention to the results in Odd News. First, did the pattern of knowledge ascription in Odds replicate the pattern observed in Experiment 1? Yes, it did. As before, a strong majority (80 percent) denied that Ellen knows that the ticket lost, which falls significantly above chance.14 And the mean weighted knowledge ascription (5.7) fell significantly below midpoint.15 Second, did knowledge ascription in News display the predicted nonskeptical pattern? It did so beautifully. In News we see the mirror image of Odds. A strong majority in News (80 percent) answered that Ellen knows that the ticket lost, which is significantly above chance.16 Mean weighted knowledge ascription was also significantly above the midpoint (5.78).17 Third, did the chance account correctly predict the overall relationship between knowledge ascription and response to the chance question? The Table 2.2 Experiment 2: Comparison across conditions of the percentage of participants answering “yes” to the knowledge and chance questions, the mean weighted knowledge scores (derived by multiplying dichotomous knowledge choice by confidence), and percentage of participants affirming chance of error

Weighted knowledge score

Odds

Odd News

News

5.7

2.73

5.78

Ascribing knowledge (% yes)

20%

66%

80%

Chance of error (% yes)

88%

90%

39%

Winners and Losers in the Folk Epistemology of Lotteries

53

chance account says that people deny knowledge in basic lottery cases because they think that there’s a chance you’re wrong about the ticket losing. The chance account fits well with the results from the Odds condition: few people ascribed knowledge, and most people affirmed the chance of error. The chance account fits less well with the results from News: most people ascribed knowledge, while a middling percentage affirmed the chance of error. Most importantly, the chance account fits very poorly with the results from Odd News. In Odd News, a majority answered “yes” to the chance question and a majority also ascribed knowledge, both at rates significantly higher than expected by chance.18 The mean weighted knowledge ascription is also significantly above the midpoint.19 This is hard to reconcile with the chance account’s claim that people deny knowledge because they think that there’s a chance that the protagonist is wrong. Further difficulties for the chance account arise when comparing responses across conditions. Chance judgments in Odd News and Odds don’t differ significantly,20 whereas knowledge ascription in the two conditions does differ significantly.21 Moreover, knowledge ascription in Odd News and Odds doesn’t differ significantly (although by one measure the difference is trending),22 whereas chance judgments in the two conditions do differ significantly.23 If the chance account were on the right track, we shouldn’t observe these outcomes. To put these difficulties another way, the chance account predicts that the rate at which participants ascribe knowledge should be inversely proportional to the rate at which they affirm a chance of error. Odds and News roughly fit this pattern, but Odd News doesn’t. We anticipate the following objection to our criticism of the chance account. Arguably a fairer test of the chance account would begin by eliminating from the analysis all participants who denied that there was a chance that Ellen’s ticket won. For, it could be argued, those participants rejected a basic premise of the story by rejecting the error possibility.24 With those participants eliminated, the chance account predicts that the remaining participants (who all affirmed a chance of error) should overwhelmingly deny knowledge. However, following through on this suggestion wreaks greater havoc on the chance account (Table 2.3). For after we eliminate participants who answered “no” to the chance question, the rate of knowledge ascription in Odd News and News is identical. Moreover, if we combine the remaining participants in

54

Advances in Experimental Epistemology Table 2.3 Experiment 2: Including only participants who answered “yes” to the chance question. Comparison across conditions of mean weighted knowledge scores and the percentage of participants answering “yes” to the dichotomous knowledge question Odds

Odd News

News

6.86

2.09

2.44

Ascribing knowledge (% yes)

14%

63%

63%

N

35

56

16

Weighted knowledge score

Odd News and News for purposes of analysis, the aggregate rate of knowledge ascription (63 percent) is significantly above chance,25 and the aggregate mean weighted knowledge score (2.26) is significantly above midpoint.26 But now that we’re analyzing only participants who affirm the possibility of error, the chance account can’t explain this enormous disparity with these results and those observed in Odds. In summary, our findings from Experiment 2 taught us three things. First, we replicated the skeptical result from Experiment 1, again observing that a very strong majority share the skeptical judgment in basic lottery cases. Second, a very strong majority share the nonskeptical judgment in basic testimonial lottery cases. This pair of results confirms that philosophers have mainly gotten the folk epistemology of lotteries correct. Third, the chance account of skeptical lottery judgments faces some problems. Of course, it’s consistent with our findings that the chance account captures a small part of what explains skeptical lottery judgment. We don’t claim to have ruled that out. Neither do we rule out more sophisticated versions of the chance account or more sophisticated ways of testing its viability. Nevertheless, our findings in this experiment motivate us to seek alternatives.

4 Experiment 3: Skeptical judgment in other statistical cases and the statistical account’s demise In this section we evaluate the statistical account of skeptical lottery judgment in light of the results from Experiment 2. We then report an experiment designed to further test the statistical account.

Winners and Losers in the Folk Epistemology of Lotteries

55

It might initially seem that Experiment 2 also provides evidence against the statistical account of skeptical judgment in basic lottery cases. For Ellen’s statistical inference in Odd News very closely resembles her statistical inference in Odds. In each case she recalled that there was a 1-in-10,000,000 chance of error and on that basis concluded that the ticket lost. Despite Ellen’s use of statistical reasoning in each case, people judge them very differently. They judge nonskeptically in Odd News but skeptically in Odds. The statistical account can’t explain why. Tempting as that line of criticism might be, it misconstrues the statistical account. The statistical account doesn’t identify the relevant factor as statistical inference per se. Rather, it identifies unanchored statistical inference as the culprit. Let us explain. Unanchored statistical inference occurs when the relationship between the premises and the conclusion is merely statistical and not explanatory. Inference in testimonial lottery cases arguably involves explanation. Gilbert Harman notes that our “natural non-philosophical” view of true testimonial belief involves two assumptions (Harman 1968: 166–7). First, you believe the truth because an informant told you. Second, your informant believes what he says and “believes as he does because he” has first-hand knowledge (or was told by someone else who does have first-hand knowledge). In Odd News these explanatory assumptions inform Ellen’s statistical inference, or so it is natural to think. The fact that she also relies on statistics doesn’t obscure the explanatory anchoring. By contrast, in the statistical reasoning featured in basic lottery cases, “no explanation is involved” (Harman 1968: 167). In Odds, Ellen’s ticket doesn’t lose because it has only a 1-in-10,000,000 chance of winning. Nor does the ticket have a 1-in-10,000,000 chance of winning because it loses. Nor is it natural to think that Ellen believes such explanatory connections are in place. Let’s understand “observation” broadly to include both perception and consumption of testimony, and let’s put the essential point this way: Ellen’s conclusion is partly based on a relevant explanatory observation in Odd News but not in Odds. The relevant observation is the newscast, which leads us to suppose that there is a “causal or explanatory” connection between Ellen’s belief and “the fact that makes it true” (Nelkin 2000: 398). Thus the statistical account can explain the results from Experiment 2.

56

Advances in Experimental Epistemology

The statistical account’s explanation of the results from Experiment 2 is ingenious. Moreover, although its proponents haven’t touted this fact, it coheres with a well-documented general tendency in human judgment whereby “causes trump statistics” (Kahneman 2011: ch. 16). Decades of experimental research show that causal information drives human judgment in ways that purely statistical information doesn’t. People typically underappreciate and often completely neglect statistical base rates when evaluating specific cases. By contrast, people are better at appreciating causal base rates and treat them as information relevant to evaluating specific cases. Even meager causal cues tend to exert more influence than ample statistical evidence does (Ajzen 1977: 307). In Daniel Kahneman’s memorable phrase, “A mind that is hungry for causal stories finds nothing to chew on” in statistics about categories. Addicted to causation, averse to statistics—that’s the fate of intuitive human judgment. The statistical account generates testable predictions. One prediction is that in lottery cases where the subject clearly bases her conclusion on a relevant explanatory observation, participants will tend to ascribe knowledge to her.27 In a word and vividly: if you feed the causal monster, it will come. And it will chase our inner statistical dullard away. We tested this prediction with the following experiment. Participants (N  133, 66 percent males) were 29 years old on average. Ninety-six percent listed English as a native language. Participants were randomly assigned to one of three conditions: State Odds, Mafia, and State News. Participants in State Odds read another basic lottery case, this time about the State lottery, in which Ellen performs an unanchored statistical inference based on the 1-in-10,000,000 chance of winning. Participants in State News read a testimonial lottery case in which Ellen bases her belief on the newscast. Mafia is the crucial condition because Ellen bases the conclusion on the observation that the local mafia rigged the lottery such that her ticket has only a 1-in-10,000,000 chance of winning. Although her conclusion is based on statistical inference (the long odds), it’s also anchored in the causal-explanatory evidence that the lottery is rigged by the mafia. In short, Mafia involves anchored statistical reasoning, just like Odd News from Experiment 2 did.

Winners and Losers in the Folk Epistemology of Lotteries

57

Here are the stories: [STATE ODDS] Ellen bought a ticket in this week’s State Lottery. She wasn’t able to watch the evening news where they reported which number won. But she is a professional statistician and correctly calculates that there is only a 1-in-10,000,000 (one-in-ten-million) chance that her ticket will win. On that basis, Ellen concludes that her ticket lost. And she is right: her ticket lost. [MAFIA] Ellen bought a ticket in this week’s State Lottery. She wasn’t able to watch the evening news where they reported which number won. But she does watch a special report that reveals that the State Lottery is rigged by members of the local mafia, so that there is only a 1-in-10,000,000 (one-inten-million) chance that anyone not in the mafia will win. On that basis, Ellen concludes that her ticket lost. And she is right: her ticket lost. [STATE NEWS] Ellen bought a ticket in this week’s State Lottery. She just finished watching the evening news where they reported which number won. It was the same newscaster that reports about the lottery every week, and the number announced as the winner was a completely different number than Ellen’s. On that basis, Ellen concludes that her ticket lost. And she is right: her ticket lost.

Participants were then asked a series of comprehension and test questions similar to those in our previous studies. For present purposes, only the knowledge question is relevant. (We will briefly mention one point about the justification question later in the section.) This experiment provides a good test of the statistical account because we can compare whether participant response in Mafia (anchored statistical inference) more closely resembles that of State Odds (unanchored statistical inference) or that of State News (anchored observation). Participants in Mafia were fed a juicy chunk of causal flesh—a rigged lottery!—whereas participants in State Odds weren’t. So if participants ascribe knowledge more in Mafia than in State Odds, then it will support the statistical account. By contrast, if there is no difference between Mafia and State Odds, or if participants decline to ascribe knowledge in Mafia, then it will undermine the statistical account. At the same time, we should expect State News to elicit the highest rate of knowledge ascription of all three conditions. The results undermined the statistical account. There was an overall effect of condition on knowledge ascription (see Table 2.4).28 As expected,

58

Advances in Experimental Epistemology

Table 2.4 Experiment 3: Comparison across conditions of the percentage of participants answering “yes” to the knowledge and chance questions, mean weighted knowledge scores (derived by multiplying dichotomous knowledge choice by confidence), and of percentage of participants ascribing justification

Weighted knowledge score

State Odds

Mafia

State News

4.11

6.5

7.53

Ascribing knowledge (% yes)

27%

14%

89%

Ascribing justification (% yes)

98%

77%

98%

knowledge ascription was highest in State News (89 percent, 7.53), far exceeding what could be expected by chance.29 However, knowledge ascription in Mafia (14 percent, 6.5) didn’t differ significantly from State Odds (27 percent, 4.11).30 Indeed, knowledge ascription in Mafia was actually lower than in State Odds, and it was well below what could be expected by chance.31 Given that Ellen anchors her statistical inference on an explanatorily relevant observation in Mafia, the statistical account of skeptical judgment can’t explain this result. It’s worth briefly noting that response to the justification question in State Odds replicates the main finding from Experiment 1 that doomed the justification account. While knowledge ascription in State Odds was very low, a full 98 percent of participants thought that Ellen’s belief was nevertheless justified. Indeed, rates of justification ascription in State News and State Odds are identical, even though rates of knowledge ascription in the two conditions differ dramatically. In Mafia too, rates of justification ascription were very high even though knowledge ascription was very low. In summary, our findings from Experiment 3 cast serious doubt on the statistical account. Proponents of the statistical account might propose a version of the explanatory requirement that avoids these problems and withstands empirical scrutiny; we would welcome such a development. And further work might reveal greater nuance in how people attribute causal relevance to factors and assimilate that information when assessing specific outcomes. Such work could inspire more successful versions of the statistical account; again, we would welcome this. Until then, the statistical account is more a promissory note than a predictive theory, and we’re inclined to look elsewhere for an explanation of skeptical lottery judgment.

Winners and Losers in the Folk Epistemology of Lotteries

59

5 Experiment 4: Relenting skeptical judgment in nonstereotypical cases and the formulaic account’s promise In this section we propose an alternative explanation for skeptical lottery judgments. We don’t propose that it entirely explains the rate of knowledge denial in basic lottery cases. But we submit that it’s probably part of the explanation. We suspected that there is something formulaic and stereotypical about denying knowledge in basic lottery cases. Formulaic expressions are characterized by stereotyped intonation and rhythm, familiarity, predictability, and unreflective automaticity (Van Lancker-Sidtis and Rallon 2004). Advertising campaigns by gaming boards feature formulaic slogans like “Hey, you never know” (Hawthorne 2004: 8). And in our experience people’s verbal response to basic lottery cases often comes across as clichéd. This motivated us to hypothesize that although people deny knowledge in lottery cases, they should be more likely to ascribe knowledge in similar scenarios where the protagonist’s conclusion does not relate to lotteries. To test this prediction we conducted a simple experiment. Participants (N  242, 56 percent males) were 33 years old on average. Ninety-seven percent listed English as a native language. Participants were randomly assigned to one of two conditions: Lotto and Phone. The story for each condition featured two people, Abigail and Stan, discussing the serial number on a ten-dollar bill. Stan specifies that the serial number is extremely likely to not be identical to a certain other number. In response, Abigail flat-out denies that the serial number is identical to the other number. Crucially, the two stories differ in what the other number is. In the story for Lotto, the other number is the winning lottery number; in the story for Phone, it is Barack Obama’s mobile phone number. Participants answered comprehension and test questions analogous to those in our earlier studies. Here is the story (variations italicized and separated by a slash): [LOTTO/PHONE] Abigail is talking with her neighbor, Stan, who is a statistician. Stan hands Abigail a bill and says, “Here is the ten dollars I owe you.” Abigail looks at the bill and sees that its serial number is 5-06-7-4-1-6-9-8-2. Stan continues, “I made an interesting calculation. If you played that serial number in this week’s lottery / dialed that serial number on

60

Advances in Experimental Epistemology

a telephone, it’s 99.999999% certain to lose / not be Barack Obama’s mobile phone number.” Abigail answers, “That serial number will not win this week’s lottery / is not Obama’s phone number.” And Abigail was exactly right: it was a losing number / wasn’t Obama’s number.

The principal difference between LOTTO and PHONE is the content of Abigail’s conclusion. In the one case, she concludes that the serial number isn’t the winning lottery number; in the other, she concludes that it isn’t Barack Obama’s phone number. The chance of her being right is exactly the same in both cases. If the formulaic account is correct, then participants will ascribe knowledge significantly more in Phone than in Lotto, because Phone is not a lottery case and so shouldn’t trigger the formulaic response. This was our prediction for the experiment. The prediction was true. There was an overall effect of condition on knowledge ascription in the predicted direction for both the dichotomous question32 and the weighted knowledge score (Table 2.5).33 On each measure, it was significantly higher in Phone and also surpassed what could be expected by chance.34 Interestingly, although LOTTO featured unanchored statistical reasoning about losing the lottery, the rate of knowledge ascription in Lotto more than doubled from previous studies to over 50 percent. If we take as our baseline comparison the ~20 percent rate of knowledge ascription observed in earlier basic lottery cases involving statistical reasoning (such as Odds from Experiment 2), then this increase is statistically significant and very surprising.35 The mean weighted knowledge score in Lotto was above midpoint, though not significantly. We observe a similar outcome in Experiment 6 and discuss possible explanations for it toward the end of Section 6. Table 2.5 Experiments 4: Comparison across conditions of mean weighted knowledge scores, the percentage of participants answering “yes” to the dichotomous knowledge question, and the percentage of participants affirming chance of error. Lotto and Phone conditions are from Experiment 4; Odds and Odd News conditions are from Experiment 2 and listed here for comparison

Weighted knowledge score

Odds

Lotto

Phone

Odd News

5.7

0.083

2.38

2.73

Ascribing knowledge (%yes)

20%

50%

63%

66%

Chance of error (%yes)

88%

88%

79%

90%

Winners and Losers in the Folk Epistemology of Lotteries

61

This experiment provided some initial support for the formulaic account. The next section tests it further.

6 Experiment 5: Relenting skeptical judgment in qualitatively comparative cases One question about Experiment 4 is whether the results were driven by framing the probability as “99.999999% certain to lose.” A further test of the formulaic account would be to present subjects with similar cases where the probability is framed differently. For example, it could be framed qualitatively in comparison to some non-lottery-related outcome. This section reports an experiment that follows up on this. Participants (N  200, 58 percent male) were 29 years old on average. Ninety-seven percent listed English as a native language. Participants were randomly assigned to one of two conditions: Comparative Lotto and Comparative Phone. The story for each condition featured the same two people, Abigail and Stan, discussing the serial number on a ten-dollar bill. Stan specifies that the serial number is “just as likely to be Brad Pitt’s mobile phone number as it is to win this week’s lottery.” The probability is specified qualitatively and comparatively. In response, Abigail fl at-out denies that the serial number is identical to one of those two possibilities. Crucially, the two stories differ in which specific possibility Abigail denies and which she ignores in her response. In the story for Comparative Lotto, Abigail denies that the number is the winning lottery number; in the story for Comparative Phone, she denies that the number is Brad Pitt’s phone number. Participants answered comprehension and test questions similar to those in our earlier studies. For present purposes, response to only the knowledge question is relevant. Here is the story (variations italicized and separated by a slash): [COMPARATIVE LOTTO/PHONE] Abigail is talking with her neighbor, Stan, who is a statistician. Stan hands Abigail a bill and says, “Here is the ten dollars I owe you.” Abigail looks at the bill and sees that its serial number is 5-0-6-7-4-1-6-9-8-2. Stan continues, “I made an interesting calculation. That serial number is just as likely to be Brad Pitt’s mobile

62

Advances in Experimental Epistemology

phone number as it is to win this week’s lottery.” Abigail answers, “ That combination will not win this week’s lottery / is not Brad Pitt’s mobile number.” And Abigail was exactly right: that combination was a loser / it was not Brad Pitt’s number.

If the formulaic account is on the right track, then the rate of knowledge ascription will be significantly higher in Comparative Phone. This was our prediction about the results. By contrast, if Comparative Phone and Comparative Lotto don’t differ, then it will undermine the formulaic account. The prediction was true. There was an effect of condition on knowledge ascription in the predicted direction for both the dichotomous question36 and the weighted knowledge score (Table 2.6).37 On each measure, it was significantly higher in Comparative Phone. Moreover, once again we observed a significant difference between Comparative Lotto and the ~20 percent rate of knowledge ascription observed in earlier basic lottery cases involving statistical reasoning. Rate of knowledge ascription in Comparative Lotto was significantly higher than in Odds, for both the dichotomous question38 and the weighted score.39 These results provide further support for the formulaic account. They also demonstrate that the formulaic account’s explanatory potential isn’t limited to cases where the probability is explicitly and negatively framed. Experiments 4 and 5 raise at least one unanswered question: why was knowledge ascription in Lotto and Comparative Lotto significantly higher than in earlier basic lottery cases involving statistical inference, such as Odds (Experiment 2)? Explicit, negative, quantitative framing of the odds can’t be the entire explanation because Comparative Lotto omits such framing. Table 2.6 Experiment 5: Comparison across conditions of mean weighted knowledge scores, the percentage of participants answering “yes” to the dichotomous knowledge question, and the percentage of participants affirming chance of error. Comp. Lotto and Comp. Phone conditions are from Experiment 5; Odds is from Experiment 2 and listed here for comparison

Weighted knowledge score

Odds

Comp. Lotto

Comp. Phone

5.7

2.62

0.35

Ascribing knowledge (% yes)

20%

35%

49%

Chance of error (% yes)

88%

97%

78%

Winners and Losers in the Folk Epistemology of Lotteries

63

One hypothesis is that Lotto and Comparative Lotto are presented nonstereotypically, whereas Odds follows the stereotypical lottery “script.” But this hypothesis is too coarse to fit all the data. For example, it doesn’t fit with the extremely low rate of knowledge ascription observed in Mafia (Experiment 3), which seriously violates the stereotypical lottery script. Rigged lotteries are not stereotypical. Another hypothesis attributes the higher rates in Lotto and Comparative Lotto to a much more specific aspect of the stories, namely, the losing number’s non-stereotypical source. In Lotto and Comparative Lotto, the source is a ten-dollar bill’s serial number, whereas the source in Odds is an actual lottery ticket. Importantly, the source in Mafia is an actual lottery ticket, so the low rate of knowledge ascription in Mafi a doesn’t threaten this hypothesis. Future research could test this hypothesis by simply matching Lotto or Comparative Lotto with a case that differs in only one respect: make the source a lottery ticket. If rates of knowledge ascription drop significantly when the source is a ticket, then it will support the hypothesis. And if rates don’t drop significantly, then the hypothesis is undermined. But even if this particular hypothesis doesn’t explain the difference in question, explaining the difference remains part of fully understanding the psychology of skeptical lottery judgment. Further research could also test other explanations for the higher rates of knowledge ascription in Phone and Comparative Phone when compared to Lotto and Comparative Lotto, respectively. We have suggested that the difference is due to formulaic response or habituation in the lottery cases. An alternative explanation for the difference is that people consider successful guesses more plausible in Lotto than in Phone, leading them to be more reluctant in Lotto to ascribe knowledge that the number is a loser (compare Teigen and Keren 2003). Successful guessing might be viewed as plausible in Lotto because people are familiar with actual instances where winning lotto numbers are guessed successfully, and because successful guessing might be viewed as the point or purpose of lotteries. But this alternative explanation is undermined by the extremely low rates of knowledge ascription in Mafia. It’s not similarly plausible that Ellen wins a rigged lottery (when she’s not the one who rigged it). Nor are people familiar with actual instances where someone wins a rigged lottery that the riggers didn’t intend her to win.

64

Advances in Experimental Epistemology

In any event, let us reiterate that we don’t think the formulaic account explains the entire difference between skeptical judgment in basic lottery cases and nonskeptical judgment in testimonial cases. Instead, we propose that formulaic expression accounts for part of the difference.40

7 General discussion We have shown that people share the skeptical judgment in lottery cases involving statistical reasoning, and that they also share the nonskeptical judgment in lottery cases involving testimony. In this regard, people’s judgments are consistent with philosophical theorizing and philosophers have gotten the folk epistemology of lotteries mostly correct. We also tested three existing accounts for these judgments, but our findings did not support them. Contrary to the justification account, people viewed protagonists as having justified beliefs in lottery cases involving statistical reasoning. Contrary to the chance account, people viewed a protagonist as knowledgeable in a lottery case involving testimony (Odd News) even while admitting there was a chance the protagonist could have been wrong. And contrary to the statistical account, people denied that a protagonist has knowledge even when the protagonist’s belief was based on anchored statistical inference (Mafia). Our findings regarding the chance account are relevant to several lines of research in theoretical epistemology. The chance account of skeptical lottery judgment is motivated by a more general account of the nature of knowledge. Infallibilists, relevant alternative theorists, and contextualists all view knowledge as, roughly, a cognitive state that rules out chance of error. Infallibilists say that knowledge rules out any chance of error whatsoever, whereas relevant alternative theorists and contextualists say that knowledge rules out any relevant or contextually salient chance of error. But the results from Odd News suggest that this isn’t the ordinary view of knowledge: most participants ascribed knowledge even while admitting the chance of error. And many participants in other experiments did the same (e.g., Lotto, Phone, Comparative Lotto, and Comparative Phone). Our results regarding the justification account should be taken into consideration when evaluating the increasingly popular “knowledge-first”

Winners and Losers in the Folk Epistemology of Lotteries

65

approach in epistemology. The knowledge-first approach tries to explain important epistemic concepts, such as evidence or epistemic probability, in terms of knowledge, thereby inverting the more traditional approach that tries to explain knowledge in terms of those other, supposedly more basic epistemic concepts (Williamson 2000). Perhaps the most radical plank in the knowledgefirst platform is the identification of justification with knowledge (Sutton 2007). To the extent that this is supposed to reflect the way people actually think about knowledge, our results undermine the view. For the vast majority of participants in basic lottery cases ascribe justification but deny knowledge. Of course, if knowledge-first epistemology is intended as a prescription, rather than as a description of our actual concept or practice, then our results don’t necessarily undermine it. Our results reveal a further line of research on variations of the traditional “justified true belief ” theory of knowledge. Recent empirical work suggests that the ordinary concept of knowledge is, roughly, justified true belief based on “authentic evidence” (Starmans and Friedman 2012). Authentic evidence is evidence genuinely informative about reality. If the ordinary concept of knowledge is authentically justified true belief, then why don’t participants ascribe knowledge in basic lottery cases, given that they acknowledge that the protagonist’s belief is both true and justified? The presumptive explanation is that merely probabilistic evidence isn’t viewed as genuinely informative about reality. But this is complicated by the results from Odd News. For Odd News also features probabilistic evidence even though it elicits high rates of knowledge ascription. It is also complicated by the results from Mafia. For Mafia features a lottery rigged against the protagonist’s winning, which arguably is genuinely informative about whether the protagonist will win. In these ways, the “authentically justified true belief ” theory of knowledge (K  AJTB) faces challenges similar to those faced by the statistical account. Importantly, all of this points to a potential fifth factor in the ordinary concept of knowledge, beyond belief, truth, justification, and evidential authenticity. Finally, we also provided initial evidence for a new explanation for why some people deny knowledge in basic lottery cases: the formulaic account. Although further tests are needed to support the account, the main findings were that many people ascribed knowledge in nonstereotypically presented lottery cases and that even more people ascribed knowledge in Phone and Comparative

66

Advances in Experimental Epistemology

Phone, both lottery-like cases. We also found that the formulaic account makes accurate predictions across contexts where the relevant probabilities are framed differently. This includes contexts where the probabilities are framed (i) quantitatively and explicitly and (ii) qualitatively and comparatively. Regardless of whether the formulaic account is supported in future experiments, this finding is of broader import. The finding suggests that theorists may want to be cautious in proposing any general explanation of why knowledge is not possessed in cases where the protagonist concludes, on purely statistical grounds, that a certain outcome obtains. For the findings from Phone and Comparative Phone show that many people are willing to ascribe knowledge in at least some cases matching this description.41

Notes 1 Authorship is coequal and listed unalphabetically. 2 Contemporary epistemological discussion of lotteries is vast and traces to Kyburg (1961). Influential recent discussions include Nelkin (2000) and Hawthorne (2004). 3 See Buckwalter (2012) for other examples of experimental work calling into question conventional wisdom about what’s supposedly obvious to anyone competent with the concept of knowledge. See also Beebe and Buckwalter (2010) and Myers-Schulz and Schwitzgebel (2013). 4 See Buckwalter unpublished ms. for a discussion of professional philosophical intuitions in light of the literature on expertise. 5 We use “P” and “Q” as placeholders for declarative sentences or that-clauses. 6 Cohen (1988: 106) attributes skeptical judgment in lottery cases to “the statistical nature of our reasons.” But it turns out that, on Cohen’s view, this is just a mechanism for making the chance of error salient, and it is the chance of error that really explains skeptical judgment. Writes Cohen, “When the chance of error is salient, we are reluctant to attribute knowledge. Statistical reasons of the sort [possessed] in the lottery case make the chance of error salient.” 7 An omnibus analysis of variance (ANOVA) revealed no effect of sex or age. The same is true of all other experiments reported in this chapter. 8 Binomial test, p  0.000001. 9 One-sample t-test, t(44)  8.9, p  0.000001. 10 Binomial, p  0.0001.

Winners and Losers in the Folk Epistemology of Lotteries

67

11 One-sample t-test, t(44)  5.97, p  0.000001. 12 We use caps to name narrative elements, and we often name narrative elements after the experimental conditions they were used in. This eases exposition and helps readers keep track of which stories appeared in which conditions, while avoiding confusion between the experimental conditions and the stories. (Participants never saw the labels.) 13 For the dichotomous knowledge question: X2(df  2, N  143)  33.74, p  0.000001. For the weighted knowledge ascription: ANOVA, F(2)  22.87, p  0.000001. 14 Binomial, p  0.001. 15 One-sample t-test, t(39)  5.03, p  0.0001. We should also acknowledge that dichotomous responses were lower than in Experiment 1 (binomial, p  0.048), though mean weighted scores did not differ from scores in that experiment (one-sample t-test, t(39)  1.465, p  0.151). 16 Binomial, p  0.001. 17 One-sample t-test, t(40)  4.812, p  0.0001. 18 Binomial, both ps  0.015. 19 One-sample t-test, t(61)  2.49, p  0.016. 20 Fisher’s exact test, p  0.748. 21 For the dichotomous knowledge question: Fisher’s, p  0.00001. For the weighted knowledge ascription: ANOVA: F(1)  26.4, p  0.000001, hp2  0.21. 22 For the dichotomous knowledge question: Fisher’s, p  0.124. For the weighted knowledge ascription: ANOVA: F(1)  3.37, p  0.07, hp2  0.032. We note that the p-value on the ANOVA is trending and probably would turn significant with a larger sample size. Proponents of the chance account might take some comfort in this. 23 Fisher’s, p  0.000001. 24 There are other ways to interpret such a denial. For example, these participants might be interpreting “chance” as “genuine chance” or “meaningful chance” or “chance that should be taken into account” for planning purposes. We won’t pursue the matter here. 25 Binomial, p  0.044. 26 One-sample t-test, t(71)  2.06, p  0.043. 27 Assuming, of course, that the belief is also true and, perhaps, justified. 28 For the dichotomous knowledge question: X2(df  2, N  133)  58.37, p  0.000001, Cramer’s V  0.662. For the weighted knowledge score: ANOVA: F(2)  52.96, p  0.000001, hp2  0.449.

68

Advances in Experimental Epistemology

29 For the dichotomous question: binomial, p  0.000001. For the weighted knowledge score: t(44)  8.97, p  0.000001. 30 For the dichotomous question: Fisher’s, p  0.186. For the weighted knowledge score: ANOVA: F(1)  2.25, p  0.137. 31 For the dichotomous question: binomial, p  0.00001. For the weighted knowledge score: one-sample t-test, t(43)  6.86, p  0.000001. 32 Fisher’s, p  0.035, one-tailed. 33 ANOVA, F(1)  3.973, p  0.047. 34 For the dichotomous question: binomial, p  0.006. For the weighted knowledge score: one-sample t-test, t(120)  2.937, p  0.004. 35 Binomial, p  0.00001. 36 Fisher’s, p  0.041, one-tailed, Cramer’s V  0.133. 37 ANOVA, F(1)  4.078, p  0.045, hp2  0.02. 38 Binomial, p  0.004, test proportion  0.2. 39 One-sample t-test, t(98)  3.56, p  0.001, test value  5.7. 40 We propose several additional factors in forthcoming work. 41 For helpful feedback and discussion, we thank James Beebe, Peter Blouw, Wesley Buckwalter, Charles Millar, Merreck Levene, and Angelo Turri. This research was supported by the Social Sciences and Humanities Research Council of Canada and an Ontario Early Researcher Award.

References Ajzen, I. (1977), “Intuitive theories of events and the effects of base-rate information on prediction.” Journal of Personality and Social Psychology, 35, 303–14. Beebe, J. R. and Buckwalter, W. (2010), “The epistemic side-effect effect.” Mind & Language, 25, 474–98. Buckwalter, W. (2012), “Non-traditional factors in judgments about knowledge.” Philosophy Compass, 7, 278–89. Buckwalter, W. (Unpublished ms.), “Expert intuition fail.” University of Waterloo. Cohen, S. (1988), “How to be a fallibilist.” Philosophical Perspectives, 2, 91–123. DeRose, K. (1996), “Knowledge, assertion and lotteries.” Australasian Journal of Philosophy, 74, 596–605. Gettier, E. (1963), “Is justified true belief knowledge?” Analysis, 23, 121–3. Harman, G. (1968), “Knowledge, inference, and explanation.” American Philosophical Quarterly, 5, 164–73. Hawthorne, J. (2004), Knowledge and Lotteries. Oxford: Oxford University Press.

Winners and Losers in the Folk Epistemology of Lotteries

69

Hinds, P. J. (1999), “The curse of expertise: The effects of expertise and debiasing methods on prediction of novice performance.” Journal of Experimental Psychology: Applied, 5, 205–21. Kahneman, D. (2011), Thinking, Fast and Slow. Toronto: Doubleday. Kyburg, H. E., Jr. (1961), Probability and the Logic of Rational Belief. Middletown: Wesleyan University Press. Lewis, D. (1996), “Elusive knowledge.” Australasian Journal of Philosophy, 74, 549–67. Myers-Schulz, B. and Schwitzgebel, E. (2013), “Knowing that p without believing that p.” Noûs, , 47, 371–84. Nelkin, D. (2000), “The lottery paradox, knowledge, and rationality.” Philosophical Review, 109, 373–409. Starmans, C. and Friedman, O. (2012), “The folk conception of knowledge.” Cognition, 124, 272–83. Sutton, J. (2007), Without Justification. Cambridge, MA: MIT Press. Teigen, K. and Keren, G. (2003), “Surprises: Low probabilities or high contrasts?” Cognition, 87, 55–71. Turri, J. (2012a), “In Gettier’s wake,” in S. Hetherington (ed.), Epistemology: The Key Thinkers. London: Continuum, pp. 214–29. —. (2012b), “Is knowledge justified true belief?” Synthese, 184, 247–59. Van Lancker-Sidtis, D. and Rallon, G. (2004), “Tracking the incidence of formulaic expressions in everyday speech: Methods for classification and verification.” Language & Communication, 24, 207–40. Vogel, J. (1990), “Are there counterexamples to the closure principle?”, in M. D. Roth and G. Ross (eds), Doubting. Dordrecht: Kluwer Academic Publishers, pp. 13–27. Williamson, T. (2000), Knowledge and Its Limits. Oxford: Oxford University Press. Weinberg, J. M., Nichols, S. and Stich, S. (n.d.), “Normativity and epistemic intuitions.” Philosophical Topics, 29, (1&2), 429–60.

3

Contrasting Cases1 Nat Hansen

This chapter concerns the philosophical significance of a choice about how to design the context-shifting experiments used by contextualists and antiintellectualists: Should contexts be judged jointly, with contrast, or separately, without contrast? Findings in experimental psychology suggest (1) that certain contextual features are difficult to evaluate when considered separately, and there are reasons to think that one feature that interests contextualists and anti-intellectualists—stakes or importance—is such a difficult to evaluate attribute, and (2) that joint evaluation of contexts can yield judgments that are more reflective and rational in certain respects. With those two points in mind, a question is raised about what source of evidence provides better support for philosophical theories of how contextual features affect knowledge ascriptions and evidence: Should we prefer evidence consisting of “ordinary” judgments, or more reflective, perhaps more rational judgments? That question is answered in relation to different accounts of what contextualist and antiintellectualist theories aim to explain, and it is concluded that evidence from contexts evaluated jointly should be an important source of evidence for such theories, a conclusion that is at odds with the methodology of some recent studies in experimental epistemology.

1 Background: Experiments and context The empirical foundation of the debate over the nature and extent of context sensitivity in natural language rests in large part on data generated primarily by experiments of a certain kind: context-shifting experiments.2 Context-shifting

72

Advances in Experimental Epistemology

experiments are devised to isolate the effects of some particular feature of context on particular kinds of judgments about specified features of the context. So, for example, a context-shifting experiment might vary what’s at stake for participants in a conversational context, or whether some possibility of error has been mentioned, and elicit metalinguistic judgments concerning some semantic or pragmatic property of the use of a target expression when those features are varied: what some particular use of a sentence says; whether it says something true or false (or neither); how acceptable the use of the expression in each context is, and so on.3 As long as there aren’t more plausible nonlinguistic explanations of those judgments, they are evidence of underlying semantic and pragmatic phenomena that linguistic theories aim to explain (Ludlow 2011, ch. 3). Alternatively, instead of eliciting judgments about linguistic features of the context (e.g., whether what is said is true or acceptable), one might elicit judgments about some nonlinguistic aspect of the context, such as whether some character in the story knows something, or how confident she should be that something is the case.4 Some of the experimental philosophers who have investigated the claims of anti-intellectualism—the view that whether one counts as knowing a proposition, or the quality of one’s evidence in favor of the proposition, partly depends on the “stakes” or practical costs of getting it wrong—employ this kind of context-shifting experiment (May et al. 2010; Phelan 2013). The goal of context-shifting experiments is to set up conditions so that the effects (if there are any) of changing specific features of the relevant context (the independent variable) on judgments (the dependent variable) can be observed. Contextualists and their opponents then go on to try to explain those observed effects using their preferred theoretical resources: indexicality, free enrichment, occasion-sensitivity, conversational implicature, focal bias, and so on. Many context-shifting experiments have been conducted informally, from the theorist’s armchair. But with increasing frequency, formal versions of context-shifting experiments have been conducted with all the apparatus of contemporary psychology at their disposal. The turn to formal versions of context-shifting experiments is motivated on the one hand by a general skepticism about the reliability of philosophers’ intuitions and on the other

Contrasting Cases

73

by the need to respond to such skepticism. (See Hansen and Chemla 2013, for discussion of such skepticism as well as vindications of certain armchair judgments about context-shifting experiments.) One side effect of the turn to more formal experiments is that it has drawn attention to subtle but important elements of the design of context-shifting experiments that have been largely overlooked in their informal use. As an illustration of the features of a contextshifting experiment that are brought into relief when they are adopted for use in formal experiments, consider the highlighted features of the following much-discussed context-shifting experiment introduced by Keith DeRose (1992, 2009): Bank Case A. My wife and I are driving home on a Friday afternoon. We plan to stop at the bank on the way home to deposit our paychecks. But as we drive past the bank, we notice that the lines inside are very long, as they often are on Friday afternoons. Although we generally like to deposit our paychecks as soon as possible, it is not especially important in this case that they be deposited right away, so I suggest that we drive straight home and deposit our paychecks on Saturday morning. My wife says, “Maybe the bank won’t be open tomorrow. Lots of banks are closed on Saturdays.” I reply, “No, I know it’ll be open. I was just there two weeks ago on Saturday. It’s open until noon.” [The bank is open on Saturday.] Bank Case B. My wife and I drive past the bank on a Friday afternoon, as in Case A, and notice the long lines. I again suggest that we deposit our paychecks on Saturday morning, explaining that I was at the bank on Saturday morning only two weeks ago and discovered that it was open until noon. But in this case, we have just written a very large and very important check. If our paychecks are not deposited into our checking account before Monday morning, the important check we wrote will bounce, leaving us in a very bad situation. And, of course, the bank is not open on Sunday. My wife reminds me of these facts. She then says, “Do you know the bank will be open tomorrow?” Remaining as confident as I was before that the bank will be open then, still, I reply, “Well, no, I don’t know. I’d better go in and make sure.” [The bank is open on Saturday.]5

The metalinguistic judgments DeRose expects us to make in response to the “bank” context-shifting experiment—truth value judgments about the sentences in boldface—are supposed to provide evidence of the context sensitivity of the word “know.”

74

Advances in Experimental Epistemology

But there are two asymmetries between the two contexts DeRose describes that make it difficult to isolate the effect that changes in the context of utterance have on metalinguistic judgments about the target sentences. First, in addition to varying specific features of the contexts of utterance, DeRose also varies the sentences that are supposed to be evaluated in each context (those that I have marked in boldface). He varies the polarity of the sentences (“I know . . .” versus “I don’t know . . .”), whether there is anaphoric reference to the bank (“it”) and what linguistic material is elided (“I know it’ll be open [tomorrow]” versus “I don’t know [the bank will be open tomorrow]”), and whether the discourse marker “Well, . . .” is present.6 Varying all of those linguistic elements makes it harder to defend the idea that it is the change in the context of utterance that is affecting our judgments about the uses of the sentences, rather than the changes DeRose makes in the sentences that are used (or some combination of both factors). Second, the italicized sentences are where the character in the story who claims to know the bank will be open tomorrow states evidence in support of the proposition that the bank will be open tomorrow. But those statements differ subtly in how they are worded, occur in different places in the story, and the statement in Case A is in direct discourse, while the statement in Case B is in indirect discourse. The statement of evidence is arguably more salient in Case A, where DeRose’s judgment is that he knows that the bank will be open, while it is less salient in Case B, where DeRose’s judgment is that he does not know the bank will be open. It is possible that simply locating that statement in different places in the story affects our judgment of whether or not the character’s statement that he knows the bank will be open is true. This is not to argue that these factors do affect our judgments in these cases, only that they make it more difficult to isolate the effect that changing the context has on our judgments. Anyone interested in identifying those effects should revise their context-shifting experiments accordingly, so that as little as possible is varied between contexts except for the relevant features of the context of utterance (in DeRose’s investigation of “know,” those features are the stakes and whether a possibility of error is mentioned).7 Even once the unnecessary asymmetries between the contexts being evaluated are eliminated, there remain questions about how subtle features of experimental design affect judgments. For example, there is evidence that

Contrasting Cases

75

the order in which scenarios are presented (Schwitzgebel and Cushman 2012), whether the sentences participants are asked to judge are positive or negative (Hansen and Chemla 2013), and whether participants only see contexts separately (without contrast) or jointly (with contrast) (Phelan 2013) can significantly affect judgments about them. In this chapter, I will consider this final feature of the design of context-shifting experiments—whether to employ separate or joint evaluation of contexts—in detail. I will first describe reasons to think that separate evaluation (involved in experiments with a between-subject design) is the better design for context-shifting experiments because it more closely resembles the structure of ordinary judgments (which do not involve explicit comparisons between contexts). I will then draw on findings in experimental psychology to argue that joint evaluation of contexts can yield judgments that are more “rational” in certain respects. With those two arguments in place, it is then possible to raise a question about which experimental design generates better evidence for contextualist and antiintellectualist theories: Should the evidence consist of “ordinary” judgments, or more reflective, perhaps more “rational” judgments? How one answers that question depends on what one understands the explanatory project of contextualist and anti-intellectualist theories to be. In the final section of the chapter, I’ll describe two different ways of understanding those explanatory projects and how they bear on the question of what kinds of experiments provide the best evidence for such theories.

2 DeRose on joint versus separate evaluation of contexts DeRose says that when his contextualist scenarios (like the bank scenario discussed above) are considered separately, the intuitions that they generate are “fairly strong” (DeRose 2005, p. 175/2009, p. 49), “fairly clear” (DeRose 2005, p. 193), or “quite strong” (DeRose 1999, p. 196; 2009, p. 55, n. 7).8 But he worries that if the two contexts that make up a context-shifting experiment are considered jointly, we may become less certain of our intuitions about the two contexts: Of course, we may begin to doubt the intuitions above when we consider [the contexts] together, wondering whether the claim to know in the first

76

Advances in Experimental Epistemology

case and the admission that I don’t know in the second can really both be true (DeRose 2002, p. 195, n. 6/2009, p. 55, n. 7).

One interesting feature of DeRose’s remarks is that he doesn’t say whether he finds joint or separate evaluation of contexts (if either) preferable. His practice favors joint evaluation: The informal presentation of DeRose’s context-shifting experiments (indeed, of all the informal context-shifting experiments in the contextualist debate) requires judgments about contexts that are presented jointly.9 But I get the feeling that DeRose would prefer that the contexts be considered individually, since that would, by his own account, produce intuitions that are more strongly aligned with his predictions. And DeRose is committed to a view about what constitutes the best evidence for contextualist theories which lends support to the practice of using contextshifting experiments that present contexts separately: The best grounds for accepting contextualism come from how knowledgeattributing (and knowledge-denying) sentences are used in ordinary, nonphilosophical talk: What ordinary speakers will count as “knowledge” in some non-philosophical contexts they will deny is such in others. This type of basis in ordinary language not only provides the best grounds we have for accepting contextualism concerning knowledge attributions, but, I believe, is evidence of the very best type one can have for concluding that any piece of ordinary language has context-sensitive truth-conditions (DeRose 2005, p. 172/2009, pp. 47–8).

Given that DeRose thinks that the “best grounds for accepting contextualism come from how . . . sentences are used in ordinary, non-philosophical talk,” and given that, as Daniel Kahneman puts it, “We normally experience life in the between-subject mode, in which contrasting alternatives are absent” (Kahneman 2011, p. 354), it seems plausible that DeRose should think that context-shifting experiments that present contexts separately generate better grounds for contextualism than context-shifting experiments that present contexts jointly.10 Further support for this idea can be found in recent experimental philosophy, where it has been explicitly argued that evidence gathered from contextshifting experiments that evaluate contexts separately is preferable to evidence gathered from joint evaluation of contexts.

Contrasting Cases

77

3 Experimenting with separate and joint evaluation Phelan (2013) conducted a series of experiments that revealed significant effects of a feature of context invoked in certain context-shifting experiments, namely practical importance, or what is at stake, in contexts evaluated jointly. But those effects disappeared when each of the contexts making up the context-shifting experiment was considered separately, in a “non-juxtaposed” experimental design. Phelan’s finding of no significant difference between responses to contexts when those contexts are evaluated separately lines up with other recent experimental results concerning anti-intellectualism about knowledge (Feltz and Zarpentine 2010) and contextualism about knowledge ascriptions (Buckwalter 2010), which relied exclusively on separate evaluation of contexts. In this section, I will describe Phelan’s findings. Later, I will argue that while Phelan’s findings may suggest a problem for using contrasting cases in the design of context-shifting experiments, it isn’t at all obvious whether that problem is genuine.11 Phelan takes as his target the “anti-intellectualist” view that the practical importance, or “cost,” or “stakes,” of being right or wrong about a proposition has an effect on one’s evidence supporting the proposition (p. 3).12 Antiintellectualism about evidence is motivated in part by judgments about context-shifting experiments in which only the practical importance (or “stakes,” or “costs”) of being right about a proposition is varied between contexts. For example (given certain assumptions13), the anti-intellectualist view targeted by Phelan would predict that judgments about how confident the character Kate is in the following two contexts should vary in the following way: In the Unimportant context, Kate should be more confident that she is on Main Street than she is in the Important context. (The material in square brackets in the contexts that follow is not present in the version given to participants. Italicized material varies in the two contexts; the paragraph that follows the italicized text is the same in both contexts.) [Unimportant (Passerby)]: Kate is ambling down the street, out on a walk for no particular reason and with no particular place to go.

78

Advances in Experimental Epistemology

[Important (Passerby)]: Kate needs to get to Main Street by noon: her life depends on it. She comes to an intersection and asks a passerby the name of the street. “Main street,” the passerby says. Kate looks at her watch, and it reads 11:45 a.m. Kate’s eyesight is perfectly normal, and she sees her watch clearly. Kate’s hearing is perfectly normal, and she hears the passerby quite well. She has no special reason to believe that the passerby is inaccurate. She also has no special reason to believe that her watch is inaccurate. Kate could gather further evidence that she is on Main Street (she could, for instance, find a map), but she doesn’t do so, since, on the basis of what the passerby tells her, she already thinks that she is on Main Street.

Phelan goes about attempting to verify the prediction by asking participants in his experiment to rate, on a 7-point Likert scale (anchored at 1 with “not confident” and at 7 with “very confident”), how confident the character Kate should be that she is on Main Street. He found no significant difference between judgments about Kate’s confidence in the two contexts when each participant was asked to judge only one of the two contexts.14 But, interestingly, Phelan found that changing the stakes had a significant effect on judgments of confidence in “juxtaposed cases,” when participants were allowed to jointly evaluate both the Unimportant and Important contexts.15 Phelan then ran two additional context-shifting experiments testing for the effects of changing stakes, but which differed from the scenario described above in terms of the reliability of the information source that supplies Kate with the information that she’s on Main Street. In the second version, it is a pair of “drunks” who tell Kate that she is on Main Street, while in the third version, Kate gets her information about what street she’s on from a street sign. In each experiment, there was a significant difference in responses to the important and unimportant contexts when participants saw them “juxtaposed,” but that difference disappeared when they saw them separately. As Phelan points out, his findings are interesting because the contextshifting experiments that involve joint evaluation of contexts more closely mirror the standard, informal set up of context-shifting experiments. Those reading a philosophy paper, for example, form their judgments while having multiple contexts simultaneously in view.16 One might conclude that philosophers who unreflectively employ informal context-shifting experiments with joint evaluation of contexts are mistakenly offering theories that

Contrasting Cases

79

aim to explain what turns out to be merely an artifact of their particular experimental design, rather than a fact about judgments made in ordinary circumstances.17

4 Why is contrast a problem? Here is a schematic representation of the central results of Phelan’s experimental study: ●



Changing stakes do not have a significant effect on judgments of confidence about contexts when participants see those contexts separately, without contrast. Changing stakes do have a significant effect on judgments of confidence about contexts when participants see those contexts jointly, with contrast.

Phelan infers that it is problematic for philosophers to cite the effect of changing stakes on judgments of confidence seen in jointly considered contrasting cases in support of a theory like anti-intellectualism about evidence. But that inference is only reasonable given a commitment to the idea that effects that only show up in “juxtaposed” contrasting cases do not reveal genuine effects of stakes on judgments of confidence. Why accept that commitment? Phelan considers two arguments that defend the importance of effects that show up only in contexts considered jointly, and he criticizes and rejects both. I’ll briefly sketch both arguments and his responses before developing a third argument in favor of embracing effects that show up only in contexts considered jointly that avoids Phelan’s criticisms. First, one might argue that the effect of changing stakes on judgments of confidence emerges only in contexts considered jointly because only then are stakes salient. When contexts are evaluated separately, the stakes are not a particularly prominent feature of the context and so do not end up affecting judgments of confidence.18 Second, one might argue that when evaluating contexts separately, participants are uncertain how to respond, and so make judgments that “land, more or less arbitrarily, somewhere in the middle of the scale” (p. 11).19 But when evaluating contexts jointly, they have “more guidance,”

80

Advances in Experimental Epistemology

and so better represent the role that stakes play in affecting judgments of confidence.20 Phelan responds to both of these arguments by comparing responses to the nonjuxtaposed contexts in the three versions of his context-shifting experiment that differ in terms of the reliability of the information source that provides Kate the information that she is on Main Street. He observes that, even in contexts considered separately, the mean responses of how confident Kate should be that she is on Main Street track the reliability of the source of her information that she is on Main Street: “[T]he mean value of participants’ answers for the non-juxtaposed cases involving the highly reliable street sign (5.7) was higher than that for cases involving the moderately reliable passerby (5.02), which was higher than that for the unreliable drunks (4.56)” (p. 12). Phelan found that there was a significant effect of the reliability of the information source on responses in “non-juxtaposed” cases, but no significant effect of importance. He then takes that finding to support the denial of the consequent in the following conditional: [I]f participants’ responses to a single case do not properly reflect the extent to which stakes matter, then they should also not properly reflect the extent to which other, equally salient, factors matter (p. 12).

Because both the antecedent and consequent of the conditional involve negations, it is easier to see what’s going on here if you to take the experiment to affirm the antecedent of the conditional’s contrapositive: If participants’ responses to a single case properly reflect the extent to which factors that are equally salient to stakes matter, then they should also properly reflect the extent to which stakes matter.

Participants in Phelan’s experiments had significantly different responses about how confident a character should be when she received information about what street she was on from sources of varying reliability (a drunk, a normal passerby, and a street sign), and they did so in contexts presented separately. If reliability of the information source in a context is as salient as what is at stake, then Phelan has good reason to affirm the antecedent of the (rewritten) conditional and conclude that participants’ responses to a single case properly reflect the extent to which stakes matter. Put another way, without some reason to think that participants’ responses to stakes and

Contrasting Cases

81

reliability of information source differ systematically, “it would be ad hoc to claim that they do not . . . notice, or do not properly respond to, the stakes in the single cases” (p. 13). A key part of Phelan’s argument is the assumption that the reliability of information sources is equally as salient as what is at stake. If there is reason to reject that assumption, then his argument against the idea that judgments about contexts presented separately do not properly reflect the extent to which stakes matter is not convincing. I will present some reasons to reject that assumption in the following section.

5 Further case studies on separate and joint evaluation Hsee et al. (1999, pp. 583–4) discuss several experiments in which switching from separate to joint evaluation corresponds not just with a significant difference in judgments, but with a reversal in the judgments of participants. So, for example, when participants in an experiment (conducted in Hsee 1998) were asked to judge how much they would be willing to pay for each of the two sets of dinnerware in Table 3.1, they judged set J to be more valuable when the sets were presented jointly. But when participants only saw one or the other set of tableware and asked to judge how much they would be willing to pay for them, judgments were reversed: Participants were willing to pay more for Set S than for Set J (Hsee 1998; Hsee et al. 1999; Kahneman 2011). Hsee et al. (1999, p. 584) notes that even though Set J contains all the pieces in Set S plus six additional intact cups and one more intact saucer, participants were willing Table 3.1 Judging the value of sets of tableware Set J (includes 40 pcs)

Set S (includes 24 pcs)

Dinner Plates

8, in good condition

8, in good condition

Soup/salad bowls

8, in good condition

8, in good condition

Dessert plates

8, in good condition

8, in good condition

Cups

8, 2 of which are broken



Saucers

8, 7 of which are broken



82

Advances in Experimental Epistemology

to pay more for Set S when the sets were considered separately, “although it was the inferior option.” Or consider another experiment from Hsee (1998), which “asked students to imagine that they were relaxing on a beach by Lake Michigan and were in the mood for some ice cream” (Hsee et al. 1999, p. 583). Like the Tableware experiment, some participants were asked to judge how much they were willing to pay for each of two ice cream servings offered by two vendors presented jointly, while others were asked to judge how much they were willing to pay for one or the other serving option, presented separately (see Table 3.2). Both serving options were accompanied by a drawing depicting the serving. Hsee et al. (1999, p. 583) report the findings of the earlier study as follows: Note that, objectively speaking, Vendor J’s serving dominated Vendor S’s, because it had more ice cream (and also offered a larger cup). However, J’s serving was underfilled, and S’s serving was overfilled. The results revealed a JE/SE [Joint Evaluation/Separate Evaluation] reversal: In JE [Joint Evaluation], people were willing to pay more for Vendor J’s serving, but in SE [Separate Evaluation], they were willing to pay more for Vendor S’s serving.

What accounts for this (and many other) reversals in judgment between separate and joint evaluation of cases? The answer given in (Hsee et al. 1999, p. 578) turns on the fact that “some attributes . . . are easy to evaluate independently, whereas other attributes . . . are more difficult to evaluate independently.” For example, whether a particular set of tableware has broken pieces or whether an ice cream cup is overfilled is easy to evaluate independently, while the significance of the total number of pieces in a set of tableware, or “the desirability of a given amount of ice cream,” is more difficult to evaluate independently. Whether an attribute is easy or difficult to evaluate, according to Hsee et al., “depends on the type and the amount of information the evaluators Table 3.2 Choosing ice cream Vendor J

Vendor S

10 oz. cup with 8 oz. ice cream

5 oz. cup with 7 oz. ice cream

Contrasting Cases

83

have about the attribute.” Relevant information includes which value for the attribute would be evaluatively neutral, what the best and worst values for the attribute would be, and “any other information that helps the evaluator map a given value of the attribute onto the evaluation scale” (p. 578). An extremely difficult attribute to evaluate would be one where the judge has no information about the upper and lower values the attribute can have, or what the average value of the attribute would be. So, for example, suppose you were asked to judge how suitable a candidate is for entry into philosophy B.A. program based solely on her score of 15 on her French baccalauréat général.21 Unfortunately you don’t know what a good or bad score on the bac would be, or even what the average is. You only know that higher scores are better. Suppose also that you also don’t get to compare the candidate with any others—she’s the only French applicant to the program. In this situation, any judgment would be a stab in the dark—there are no grounds on which to give the candidate either a positive or a negative evaluation. Your job is easier if you know what the average, neutral value for the attribute is, even if you don’t know what the highest and lowest values for the attribute would be. Given a particular score, you can then easily judge whether it falls above or below the average, and correspondingly give it a positive or negative evaluation. So suppose you know that the average score on the bac is 11. Now you can evaluate the student’s score of 15 positively, but you have no way to judge how positively it should be evaluated. Still easier is a situation in which you know not only the average score, but also scores on the high and low end of what is possible: In the baccalauréat général, ten out of twenty is a pass . . . 16 is a très bien (summa cum laude), a big bouquet of starred As in the British system. Cambridge expects 17 from a French bachelier (Harding 2012).

Now you are in a position to make a much more nuanced evaluation of the applicant’s score. It’s quite good—not fantastic, but good enough for this program (it’s not Cambridge, after all). With a more concrete sense of the kind of information that makes an attribute easy or difficult to evaluate, we can then ask whether there is any reason to think that what’s at stake in a context is more difficult to evaluate than the reliability of an information source. I think the answer is that it is more difficult to evaluate what’s at stake. First of all, the reliability of an

84

Advances in Experimental Epistemology

information source has a clear upper and lower bound: a source can be 100 percent reliable, or completely unreliable. Given a particular information source (a drunk, an ordinary passerby, a street sign), it is possible to make an informed (if rough) judgment about where that information source falls on the (upper- and lower-bounded) scale of reliability, even without comparing it to the reliability of other information sources. In contrast, there is no clear upper bound to what can be at stake in a context. It seems that there is a lower bound: Nothing might turn on whether a proposition turns out to be true or false. That seems to be an element of the “Unimportant” context Phelan describes. But, on the other end of the scale, what’s the most important thing that could turn on whether or not a proposition is true or false? Certainly whether someone lives or dies is important, but there’s always something more important (two people’s lives, a million, the fate of the country, the planet, the universe, all possible universes . . .). Since there’s no clear upper bound, there’s also no clear sense of what something of average importance would be. So when a participant in a survey is asked to make a judgment about a single context in which what’s at stake is mentioned, that attribute counts as difficult to evaluate, in contrast with the reliability of an information source, which is (comparatively) easy to evaluate.22 Phelan wants to defend the idea that responses to contexts considered separately provide better evidence for anti-intellectualism than cases considered jointly. He responds to the idea that joint evaluation might make subjects better equipped to evaluate what’s at stake in a context as follows (this is my reconstruction of his response): 1. If participants’ responses to a single case do not properly reflect the extent to which stakes matter, then they should also not properly reflect the extent to which other, equally salient, factors matter (p. 18). 2. The reliability of an information source is as salient as what is at stake in a context. 3. Participants’ responses to a single case do properly reflect the reliability of a relevant information source. Conclusion: Participants’ responses to a single case do properly reflect the extent to which stakes matter.

Contrasting Cases

85

The upshot of the discussion of what makes an attribute easy or difficult to evaluate in this section is that premise (2) in Phelan’s argument is false, assuming that the ease or difficulty of evaluating an attribute is a suitable construal of Phelan’s notion of “salience.” The reliability of an information source is easier to evaluate than what is at stake. That explains why the effect of changing the reliability of the relevant information source shows up in separate evaluation, while the effects of changing stakes only show up in joint evaluation.23 So Phelan’s argument that responses to contexts considered separately do properly reflect the extent to which stakes matter (in contrast with responses elicited in contexts considered jointly) should be resisted. But that’s only to say that there isn’t yet a convincing argument that separate evaluation should be favored over joint evaluation—so far, it’s still an open question whether data gathered using separate or joint evaluation is better evidence for contextualism and anti-intellectualism.

6 Which type of evaluation generates better evidence for contextualism and anti-intellectualism? Phelan observed that changing stakes only seemed to have an effect on judgments about confidence when contexts were evaluated jointly. He then argued that the effect of stakes observed in contexts evaluated separately does genuinely reflect the effect of what’s at stake on judgments about confidence. In the last section I challenged that argument. Now, in this section, I will consider another argument that tries to show that effects that show up in contexts considered separately are better evidence for contextualist and anti-intellectualist theories than effects that show up only in contexts evaluated jointly. Here is my reconstruction of the argument, which is implicit in DeRose’s remarks concerning “the best grounds for accepting contextualism” and his attitude toward contexts considered separately and jointly (introduced in Section 2, above): 1. The best grounds for accepting contextualism come from how knowledgeattributing (and knowledge-denying) sentences are used in ordinary, nonphilosophical talk (DeRose 2005, p. 172/2009, p. 47).

86

Advances in Experimental Epistemology

2. Contexts evaluated separately (and not contexts evaluated jointly) accurately represent how subjects use ordinary, non-philosophical talk. 3. So data gathered from contexts considered separately (and not contexts considered jointly) provides the best grounds for accepting contextualism. DeRose does not explicitly commit himself to premise 2, but as discussed above, I think there is reason to think he implicitly accepts it. Embracing this argument would mean that the proper design of context-shifting experiments (both informal and formal) should involve separate evaluation of contexts, and not joint evaluation. I now want to challenge premise (1) in (my reconstruction of) DeRose’s argument by giving reasons to think that, for certain purposes, data generated by joint evaluation of contexts should be at least on the same footing as (if not considered superior to) data generated by separate evaluation of contexts. The essential move in my argument can be summarized by the following remark from Kahneman (2011, p. 361): . . . rationality is generally served by broader and more comprehensive frames, and joint evaluation is obviously broader than single evaluation.24

Subjects tend to make better, more informed, more “rational” judgments about contexts when they are given more than one context to evaluate. This idea was present in the earlier discussion of judgments about the value of the two sets of tableware and the different ice cream options: When considered side by side, ice cream option J is obviously preferable, and participants select it, but when considered separately, subjects do not choose the dominant option, they choose the “objectively inferior option” (Hsee et al. 1999, p. 588). That is a clear illustration of how being able to evaluate options jointly can lead to improved judgments.25 Another illustration of how joint evaluation can produce improved judgments is given in Kahneman and Tversky (1996) in relation to the “conjunction fallacy.” The “conjunction fallacy” is the tendency of subjects, in certain conditions, to judge that p&q is more probable than p alone. So, for example, consider the following vignette and response options (Kahneman and Tversky 1996, p. 587): Linda is in her early thirties. She is single, outspoken, and very bright. As a student she majored in philosophy and was deeply concerned with issues of discrimination and social justice. Suppose there are 1,000 women who fit this description. How many of them are

Contrasting Cases

87

(a) high school teachers? (b) bank tellers? or (c) bank tellers and active feminists?

Kahneman and Tversky report that when participants were allowed to see options (a), (b), and (c), 64 percent conformed to the conjunction rule, which holds that conjunctions must be less probable (or equally probable) than either conjunct. But in an experiment with a between-subject design (i.e., one where subjects consider the relevant responses separately), when participants saw only either options (a) and (b) or (a) and (c), “the estimates for feminist bank tellers (median category: ‘more than 50’) were significantly higher than the estimates for bank tellers (median category: ‘13-20,’ p  0.01 by a Mann-Whitney test)” (p. 587). That is, in the between-subject design, when participants were asked to evaluate the probability of (b) and (c) separately, they tended to violate the conjunction rule, while in the within-subject design, when they were allowed to see both objects jointly, they tended to adhere to the rule. So there is an argument that supports the idea that we should favor data generated by contexts considered jointly over data generated by contexts considered separately. And we’re now in a position to be able to challenge DeRose’s assumption that The best grounds for accepting contextualism come from how knowledgeattributing (and knowledge-denying) sentences are used in ordinary, nonphilosophical talk (DeRose 2005, p. 172/2009, p. 47).

There is now a competing conception of what might be considered “better” grounds for accepting contextualism, namely more informed judgments, based on joint evaluation of contexts. Pinillos et al. (2011, p. 127) put the idea this way: “In general, giving subjects further relevant information will allow them to make a more informed judgment. In short, it will put them in a better epistemic situation.”

7 Conclusion: Two explanatory projects One explanatory project that contextualists and anti-intellectualists might be engaged in is a branch of cognitive science. In the case of contextualism, this project is closely related to the explanatory projects of empirical semantics and pragmatics: The goal is to build up a linguistic theory that explains and

88

Advances in Experimental Epistemology

predicts certain linguistic phenomena. Evidence of those phenomena can be uncovered by eliciting judgments in linguistic experiments, looking at linguistic corpora, and recording and transcribing linguistic use “in the wild.” While the immediate goal of this project is to explain a domain of specifically linguistic phenomena, evidence for and against competing theories also comes from how well theories mesh with neighboring areas of empirical investigation. The ultimate goal is a satisfactory explanation of “the total speech act in the total speech situation”—how linguistic capacities interact with other forms of cognition to produce the richly textured conversational understanding we enjoy. This explanatory project is essentially focused on language and linguistic activity. I think it is uncontroversial that both evidence collected from separate and joint evaluation of contexts is relevant to this explanatory project. Those engaged in this type of project want to know, among other things, why linguistic judgments differ in separate and joint evaluation (when they do), and to know that, we obviously need both kinds of evidence.26 The second explanatory project is not essentially focused on linguistic or psychological explanation. It seeks answers to metaphysical questions: What is knowledge? What is evidence? We might approach those metaphysical questions by way of answers to linguistic questions: How do we use the word “know”? Or by way of questions about judgments involving the relevant concepts: How do people make judgments about how confident someone should be? These routes to the nature of knowledge or evidence depend on controversial assumptions about the relation between our linguistic behavior with “know” or our judgments about confidence and the nature of knowledge and evidence. I won’t engage here in disputes over the best way to understand that relation.27 Instead, I only want to suggest that insofar as one is engaged in the project of getting at the nature of knowledge and evidence via linguistic or psychological investigations, it makes sense to be interested in the best judgments that subjects make about knowledge ascriptions or how confident subjects should be, and not exclusively in “ordinary” judgments, subject as they are to known forms of bias and distortion. If subjects’ judgments are taken to be a mirror of reality, that mirror should be as polished as possible. So, insofar as contextualists are interested in getting at the nature of knowledge, or anti-intellectualists are interested in getting at the nature

Contrasting Cases

89

of evidence, in addition to being engaged in an aspect of the (extremely worthwhile) project of empirical linguistics and psychology, they should drop the commitment to the idea that the best grounds for contextualism are offered by ordinary uses of knowledge-ascribing (and knowledge-denying) sentences in ordinary talk. Better grounds for contextualism and anti-intellectualism, understood as theories concerning the nature of knowledge and evidence, are how speakers use knowledge-ascribing and knowledge-denying sentences, or make judgments about confidence, in situations where all the necessary work has been done to eliminate avoidable sources of bias. Employing contextshifting experiments that ask for joint evaluation of contexts is a step toward generating that kind of improved evidence. In summary, whether contextualists and anti-intellectualists take themselves to be engaged in the cognitive scientific or the metaphysical explanatory project (or both), they should be interested in—and cannot dismiss as mere experimental artifacts—responses to contexts evaluated jointly. Moreover, experimental results that show no significant effect of changing stakes on judgments when those contexts are evaluated separately (e.g., Buckwalter 2010; Buckwalter and Schaffer 2013; Feltz and Zarpentine 2010; Phelan 2013) don’t pose a serious challenge to anti-intellectualism, since there is reason to think that what’s at stake in a context is a difficult-to-evaluate attribute, the effects of which emerge most clearly in joint evaluation of contexts.

Notes 1 Thanks to Zed Adams, Jonas Åkerman, James Beebe, Gunnar Björnsson, Mikkel Gerken, Chauncey Maher, and Eliot Michaelson for helpful comments. Special thanks to Mark Phelan for comments and discussion. 2 “Context shifting experiments” are a part of (and the name is derived from) what Cappelen and Lepore (2005, p. 10) call “Context Shifting Arguments.” A Context-Shifting Argument takes the data generated by a context-shifting experiment as a premise. 3 For a discussion of metalinguistic judgments, see Birdsong (1989) and Schütze (1996, Ch. 3). 4 Hazlett (2010, pp. 497–8) distinguishes “two competing methods of theorizing in epistemology—one based on intuitions about knowledge, and the other based

90

Advances in Experimental Epistemology on intuitions about language.” DeRose argues that only metalinguistic contextshifting experiments yield data that can confirm or disconfirm predictions made by his particular variety of contextualism. For his argument, see DeRose (2009, p. 49, n. 2) and (2011, pp. 84–5). Sosa (2000, p. 1) characterizes contextualism as engaging in “metalinguistic ascent,” whereby it “replaces a given question with a related but different question. About words that formulate one’s original question, the contextualist asks when those words are correctly applicable.” Sosa goes on to say that there are questions, like the nature of justification, that the epistemologist can discuss “with no metalinguistic detour” (p. 6).

5 I have added boldface to pick out the sentences we’re supposed to evaluate, and I have italicized the sentences where the character in the stories who claims to know or denies that he knows gives evidence in support of the proposition that the bank will be open tomorrow. 6 For a discussion of the pragmatic significance of the discourse marker “well,” see Jucker (1993). Thanks to Emma Borg for bringing this paper to my attention. 7 More recent context-shifting experiments avoid these asymmetries. See, for example, Sripada and Stanley (2012) and the context-shifting experiment discussed below, taken from Phelan (2013). 8 See also DeRose (2009, p. 2). 9 It would be awkward (though not impossible) to craft a paper in which readers only saw one or the other context by itself. 10 For other examples of the claim that everyday life resembles a between-subject experiment, see Kahneman (2000, p. 682) and Shafir (1998, p. 72). 11 An early, unpublished (but often cited) version of Phelan’s study (Neta and Phelan ms) contains the claim that their studies “obviously suggest a problem for the philosophical strategy of [using] contrasting cases to elicit intuitions in support of one position or another” (p. 24). 12 Phelan discusses two subtly different versions of this view, “Anti-intellectualism about Evidence,” given in Stanley (2005, 2007). 13 In order for anti-intellectualism about evidence to make testable predictions about ordinary judgments, Phelan introduces what he calls the “Bridge from Rational Confidence to Evidence (BRCE): People’s implicit commitments about an agent’s evidence set or quality of evidence are reflected in their explicit intuitive judgments about how confident that agent ought to be in various propositions supported by that evidence” (p. 7). The BRCE allows Phelan to draw conclusions about people’s commitments about evidence from their judgments about how confident subjects ought to be.

Contrasting Cases

91

14 The usual caveats about drawing conclusions from null results apply here. 15 Phelan reports that the mean responses to the important and unimportant contexts were 4.5 and 5.32, respectively, with p  0.001. Emmanuel Chemla and I (Hansen and Chemla 2013) uncovered a similar result with truth value judgments about knowledge ascriptions using several different context-shifting experiments based on DeRose’s bank scenario. We found a significant effect of changing contexts on truth value judgments about bank-style scenarios only when participants had the chance to make judgments about multiple contexts. In our experiment, unlike Phelan’s, participants never saw two contexts simultaneously. Instead, over the course of the experiment, participants in our experiment made judgments about knowledge ascriptions in response to 16 bank-style contexts. Hsee et al. (1999, p. 576, n. 1) says the kind of evaluation mode we used “involve[s] a JE [Joint Evaluation] flavor because individuals evaluating a later option may recall the previous option and make a comparison.” 16 Stanley’s (2005) bank context-shifting experiment involves considering five related contexts. 17 As mentioned above, Neta and Phelan (ms) draw just such a conclusion from observations about the role played by joint evaluation in judgments about the effect of stakes on confidence. 18 Sripada and Stanley (2012) make an argument along these lines, defending antiintellectualism against experimental results indicating that stakes do not affect judgments about knowledge based only on separate evaluation of contexts. 19 DeRose (2011, p. 94) hilariously calls this kind of response the “WTF?! neutral response.” 20 Ludlow (2011, p. 75) gives an example of how joint evaluation can improve subjects’ understanding of an experimental task: “As reported in Spencer (1973), Hill (1961) notes that sentences drawn from Syntactic Structures drew mixed results from experimental subjects. ‘The child seems sleeping’ was accepted by 4 of the 10 subjects until it was paired with ‘The child seems to be sleeping’ at which point all 10 subjects vote negatively. Establishing the contrast helped the subjects to see what the task demand was.” 21 This example is based on an experiment conducted in Hsee et al. (1999), concerning evaluations of a foreign applicant to a university who has taken an “Academic Potential Exam” in her home country. 22 Hsee et al. (1999, p. 580) observe that the fact that an attribute is difficult to evaluate does not mean that subjects do not understand what the attribute means:

92

Advances in Experimental Epistemology “For example, everybody knows what money is and how much a dollar is worth, but the monetary attribute of an option can be difficult to evaluate if the decision maker does not know the evaluability information for that attribute in the given context. Suppose, for instance, that a person on a trip to a foreign country has learned that a particular hotel room costs $50 a night and needs to judge the desirability of this price. If the person is not familiar with the hotel prices of that country, it will be difficult for him to evaluate whether $50 is a good or bad price.”

23 Hsee et al. (1999) conducted an experiment that tested for effects of different types of evaluability information that subjects might have, corresponding to the three situations described above: no information, information about average scores, and best- and worst-score information. Their flat (no significant difference between scores) result for the no-information situation parallels Phelan’s result for evaluations of contexts involving different stakes considered separately, whereas the significant differences they observed between evaluations of different scores in the situation where participants had information about best and worst scores parallels Phelan’s result for separate evaluation of contexts involving sources of information of varying reliability. 24 In Kahneman’s Nobel Prize lecture, he makes a claim that can seem like it’s in tension with this idea. He says: . . . intuitive judgments and preferences are best studied in between-subjects designs . . . The difficulties of [within-subjects] designs were noted long ago by Kahneman and Tversky (1982), who pointed out that ‘withinsubjects designs are associated with significant problems of interpretation in several areas of psychological research (Poulton 1975)’ ” (Kahneman 2003, pp. 473–4). But the apparent tension is resolved when it is pointed out that “intuitive judgments” for Kahneman are rapid and automatic, and contrast with “deliberate thought processes,” which are slow and involve reflection. Separate evaluation may be the right way to study intuitive judgments in Kahneman’s sense, but the question under consideration in this section is whether it is better to employ “intuitive judgments” or “deliberate thought processes” as evidence for contextualism and anti-intellectualism. It is possible to both think that “deliberate thought processes” are more rational than “intuitive judgments,” and therefore provide better evidence, and also that separate evaluation is the best way to study “intuitive judgments.” For further discussion of the distinction between “intuitive” and “deliberate” (or type-1 and type-2 processes) in relation to the contextualist debate, see Gerken (2012). Thanks to Mikkel Gerken for pointing out the passage in Kahneman.

Contrasting Cases

93

25 Additional reflection on this idea can be found in Pinillos et al. (2011). Pinillos et al. conducted a study of the Knobe Effect, which, unlike Knobe’s original study, allowed joint evaluation of scenarios, and found that participants were “less likely to give the asymmetric ‘Knobe’ response” (p. 129). Discussing this result, Pinillos et al. say “we believe that presenting agents with both vignettes (and letting them see the range of multiple choice answers) pushes them to think more carefully before giving the final judgment. If we compare this with the original Knobe experiments (where subjects were given only one vignette followed by just two answer options), it is plausible that subjects there were less careful in their reasoning” (p. 133). 26 For example, Kahneman and Tversky (1996, p. 587) say that “the betweensubjects design is appropriate when we want to understand ‘pure’ heuristic reasoning; the within-subjects design is appropriate when we wish to understand how conflicts between rules and heuristics are resolved,” and Stanovich (2011, pp. 124–5) discusses the way that within- and between-subject designs may interact differently with individual differences in rational thinking dispositions. 27 There are many views about the relation between linguistic facts about “know” and the nature of knowledge. Ludlow (2005, p. 13) claims that “any investigation into the nature of knowledge which did not conform to some significant degree with the semantics of the term ‘knows’ would simply be missing the point . . . epistemological theories might be rejected if they are in serious conflict with the lexical semantics of ‘knows.’ ” And DeRose (2009, p. 19) says that “It’s essential to a credible epistemology, as well as to a responsible account of the semantics of the relevant epistemologically important sentences, that what’s proposed about knowledge and one’s claims about the semantics of ‘know(s)’ work plausibly together. . . .” In contrast, Sosa (2000, p. 3) argues that epistemic contextualism as a “a thesis in linguistics or in philosophy of language” is plausible, but its interest as a theory of knowledge “is limited in certain ways” (p. 8), and for an argument in favor of a “divorce for the linguistic theory of knowledge attributions and traditional epistemology,” see (Hazlett 2010, p. 500)—though see Stokke (2013) for a criticism of the reasons Hazlett offers in favor of the divorce.

References Birdsong, D. (1989), Metalinguistic Performance and Interlinguistic Competence. Berlin: Springer-Verlag. Buckwalter, W. (2010), “Knowledge isn’t closed on Saturdays.” Review of Philosophy and Psychology, 1, 395–406.

94

Advances in Experimental Epistemology

Buckwalter, W. and Schaffer, J. (2013), “Knowledge, stakes and mistakes.” Noûs. Cappelen, H. and Lepore, E. (2005), Insensitive Semantics: A Defense of Semantic Minimalism and Speech Act Pluralism. Oxford: Blackwell. DeRose, K. (1992), “Contextualism and knowledge attributions.” Philosophy and Phenomenological Research, 52, 913–29. —. (1999), “Contextualism: An explanation and defense,” in J. Greco and E. Sosa (eds), The Blackwell Guide to Epistemology. Oxford: Blackwell. —. (2002), “Assertion, knowledge, and context.” The Philosophical Review, 111, 167–203. —. (2005), “The ordinary language basis for contextualism, and the new invariantism.” The Philosophical Quarterly, 55, 172–98. —. (2009), The Case for Contextualism. Oxford: Oxford University Press. —. (2011), “Contextualism, contrastivism, and x-phi surveys.” Philosophical Studies, 156, 81–110. Feltz, A. and Zarpentine, C. (2010), “Do you know more when it matters less?” Philosophical Psychology, 23, 683–706. Gerken, M. (2012), “On the cognitive bases of knowledge ascriptions,” in J. Brown and M. Gerken (eds), Knowledge Ascriptions. Oxford: Oxford University Press, pp. 140–70. Hansen, N. and Chemla, E. (2013), “Experimenting on contextualism.” Mind & Language, 28(3), 286–321. Harding, J. (2012), “Short cuts.” London Review of Books, 34, 38. Hazlett, A. (2010), “The myth of factive verbs.” Philosophy and Phenomenological Research, 80, 497–522. Hill, A. (1961), “Grammaticality.” Word, 17, 1–10. Hsee, C. K. (1998), “Less is better: When low-value options are valued more highly than high-value options.” Journal of Behavioral Decision Making, 11, 107–21. Hsee, C. K., Loewenstein, G. F., Blount, S. and Bazerman, M. H. (1999), “Preference reversals between joint and separate evaluations of options: A review and theoretical analysis.” Psychological Bulletin, 125, 576–90. Jucker, A. H. (1993), “The discourse marker well: A relevance-theoretical account.” Journal of Pragmatics, 19, 435–53. Kahneman, D. (2000), “A psychological point of view: Violations of rational rules as a diagnostic of mental processes.” Behavioral and Brain Sciences, 23, 681–83. —. (2003), “Maps of bounded rationality,” in T. Frangsmyr (ed.), Les Prix Nobel. The Nobel Prizes 2002. Stockholm: Nobel Foundation. —. (2011), Thinking, Fast and Slow. New York: Farrar Straus & Giroux. Kahneman, D. and Tversky, A. (1982), “On the study of statistical intuitions,” in D. Kahneman, P. Slovic and A. Tversky (eds), Judgment Under Uncertainty: Heuristics and Biases. New York: Cambridge University Press, pp. 493–508.

Contrasting Cases

95

—. (1996), “On the reality of cognitive illusions.” Psychological Review, 103, 582–91. Ludlow, P. (2005), “Contextualism and the new linguistic turn in epistemology,” in G. Preyer and G. Peter (eds), Contextualism in Philosophy: Knowledge, Meaning and Truth. Oxford: Oxford University Press, pp. 11–50. —. (2011), The Philosophy of Generative Linguistics. Oxford: Oxford University Press. May, J., Sinnott-Armstrong, W., Hull, J. G. and Zimmerman, A. (2010), “Practical interests, relevant alternatives, and knowledge attributions: An empirical study.” Review of Philosophy and Psychology, 1, 265–73. Neta, R. and Phelan, M. (ms), “Evidence that stakes don’t matter for evidence.” Phelan, M. (2013), “Evidence that stakes don’t matter for evidence.” Philosophical Psychology. Pinillos, N. Á., Smith, N., Nair, G. S., Marchetto, P. and Mun, C. (2011), “Philosophy’s new challenge: Experiments and intentional action.” Mind & Language, 26, 115–39. Poulton, E. (1975), “Range effects in experiments with people.” American Journal of Psychology, 88, 3–32. Schütze, C. T. (1996), The Empirical Base of Linguistics. Chicago: The University of Chicago Press. Schwitzgebel, E. and Cushman, F. (2012), “Expertise in moral reasoning? Order effects on moral judgments in professional philosophers and non-philosophers.” Mind & Language, 27, 135–53. Shafir, E. (1998), “Philosophical intuitions and cognitive mechanisms,” in M. R. DePaul and W. Ramsey (eds), Rethinking Intuition: The Psychology of Intuition and Its Role in Philosophical Inquiry. Lanham, MD: Rowman and Littlefield, 59–83. Sosa, E. (2000), “Skepticism and contextualism.” Philosophical Issues, 10, 1–18. Spencer, N. (1973), “Differences between linguists and nonlinguists in intuitions of grammaticality-acceptability.” Journal of Psycholinguistic Research, 2, 83–98. Sripada, C. S. and Stanley, J. (2012), “Empirical tests of interest-relative invariantism.” Episteme, 9(1), 3–26. Stanley, J. (2005), Knowledge and Practical Interests. Oxford: Oxford University Press. —. (2007), “Précis of Knowledge and Practical Interests.” Philosophy and Phenomenological Research, 75, 168–72. Stanovich, K. (2011), Rationality and the Reflective Mind. Oxford: Oxford University Press. Stokke, A. (2013), “Protagonist projection.” Mind & Language, 28(2), 204–32.

4

Salience and Epistemic Egocentrism: An Empirical Study Joshua Alexander, Chad Gonnerman, and John Waterman1

1 Introduction Nagel (2010) has recently proposed a fascinating account of the decreased tendency to attribute knowledge in conversational contexts in which unrealized possibilities of error have been mentioned. Her account appeals to epistemic egocentrism, or what is sometimes called the curse of knowledge, an egocentric bias to attribute our own mental states to other people (and sometimes our own future and past selves). When asked to consider a story in which someone is making a judgment, and told that it’s possible that her judgment is wrong, we treat our concerns as hers and penalize her for failing to appropriately respond to those concerns. This account of the relationship between our willingness to attribute knowledge and what possibilities have been made salient in a given conversational context may be particularly appealing to invariantists. It provides them with a way to explain why, although standards for knowledge attribution don’t depend on what possibilities have been made salient in a given conversational context, we are nevertheless less willing to attribute knowledge in conversational contexts that include mention of the possibility that the subject might be wrong than in contexts in which no such possibilities are raised. Our aim in this chapter is to investigate the empirical merits of Nagel’s hypothesis about the psychology involved in knowledge attribution. We

98

Advances in Experimental Epistemology

begin by briefly reviewing the epistemological and psychological background. After setting the stage, we present four new studies showing that our willingness to attribute knowledge is sensitive to what possibilities have been made salient in a given conversational context, that this sensitivity can be, at least in part, explained in terms of epistemic egocentrism, and that increased motivation doesn’t seem to drive down our tendency to mistakenly project our own mental states onto others.2 We conclude by previewing three additional studies involving individual differences, social distance, and a specific kind of interventional debiasing strategy.

2 Philosophical and psychological background One of the most hotly debated questions in contemporary epistemology is whether standards for knowledge attribution depend on what possibilities have been made salient in a given conversational context.3 Contextualists contend that it does, arguing that whether it is right to say that someone knows something depends at least in part on whether our conversational context includes any mention of the possibility that she might be wrong (DeRose 1992; Cohen 1999). When a conversational context includes mention of the possibility that the subject might be wrong, the standards for knowledge attribution rise, or at least tend to rise; the verb “to know” comes to refer to a more stringent relation, and subjects who could once have been truly described as knowing something might no longer be truly described that way. Invariantists contend that it does not, arguing that what possibilities are mentioned in a given conversational context does not affect whether someone knows something; in their view, the verb “to know” always denotes the same relation (Hawthorne 2004; Williamson 2005; Nagel 2010). What makes this debate especially interesting is that both sides agree that our willingness to attribute knowledge seems to depend at least in part on what possibilities have been made salient in a given conversational context: we seem to be more willing to say that someone knows something in conversational

Salience and Epistemic Egocentrism: An Empirical Study

99

contexts that don’t include any mention of the possibility that she might be wrong than in contexts that do. Consider, for example, the following two vignettes (Nagel 2010): Plain Story John A. Doe is in a furniture store. He is looking at a bright red table under normal lighting conditions. He believes the table is red. Q: Does he know that the table is red? More Detailed Story John B. Doe is in a furniture store. He is looking at a bright red table under normal lighting conditions. He believes the table is red. However, a white table under red lighting conditions would look exactly the same to him, and he has not checked whether the lighting is normal, or whether there might be a red spotlight shining on the table. Q: Does he know that the table is red?

Contextualists and invariantists agree that we are more willing to say that John Doe knows that the table is red in the plain story than we are to say that John Doe knows that the table is red in the more detailed story because they agree that we become less willing to attribute knowledge in conversational contexts that include any mention of possible error. This creates a challenge for invariantists, however. If invariantists are right, and standards for knowledge attribution don’t depend on what possibilities have been made salient in a given conversational context, then they need some way of explaining why we are nevertheless more willing to attribute knowledge in conversational contexts that don’t include any mention of the possibility that the subject might be wrong than in contexts that do. Nagel (2010) has recently offered a fascinating psychological explanation of the relationship between our willingness to attribute knowledge and what possibilities have been made salient in a given conversational context.4 Nagel’s account is grounded in the idea that we have a difficult time representing perspectives more naïve than our own and, in particular, we struggle to suppress privileged information when evaluating other people’s judgments (Nickerson 1999; Royzman et al. 2003; Birch and Bloom 2004, 2007). This psychological bias, known as epistemic egocentrism or the curse of knowledge, explains why we seem to be less willing to attribute knowledge

100

Advances in Experimental Epistemology

in conversational contexts that include any mention of the possibility of error: we evaluate other people’s judgments as though they share our privileged information and subsequently penalize them for failing to respond to that information in the way we think they should.5 In other words, and in terms of the two vignettes just described, we are less willing to say that John Doe knows that the table is red in the more detailed story than we are in the plain story precisely because we treat him in the detailed story as sharing our concerns and failing to act accordingly. Nagel’s argument depends on specific empirical claims about the psychology involved in knowledge attribution, claims that can be tested, and our plan in the next section is to examine the empirical merits of Nagel’s account.

3 Our studies 3.1 Study 1: Salience effects Contextualists and invariantists agree that we are less willing to attribute knowledge in conversational contexts that include mention of unrealized possibilities of error, but disagree about how to explain this. As we might put it, contextualists and invariantists agree that salience matters, but disagree about why. Somewhat surprisingly, the results of several early empirical studies suggested that salience doesn’t matter. These studies seemed to show that people are just as willing to say that someone knows something when the possibility of being wrong has been made salient as they are to say that someone knows something when that possibility goes unmentioned. The studies focused on several different versions of Keith DeRose’s (1992, 2009) famous bank cases, which involve a husband and wife stopping by a bank on their way home from work on a Friday afternoon intending to deposit their paychecks. When they notice that the lines inside are quite long, the husband suggests coming back the following day to make the deposits, noting to his wife that he knows that the bank will be open on Saturday because he was there two Saturdays ago. The relevant difference between the two cases, at least for our purposes, is that, in the second case, the wife mentions the possibility that her husband might be wrong, noting that banks sometimes

Salience and Epistemic Egocentrism: An Empirical Study

101

change their hours. Contextualists and invariantists agree that folk knowledge attributions will track this difference, that people will be less inclined to say that the husband knows that the bank will be open on Saturday when the possibility of being wrong has been made salient than when that possibility goes unmentioned. In contrast, what the early studies found was that people are just as willing to say that the husband knows that the bank will be open on Saturday when the possibility that he might be wrong has been made salient as they are to say that he knows that the bank will be open on Saturday when that possibility goes unmentioned (Buckwalter 2010; Feltz and Zarpentine 2010; May et al. 2010). While these studies give us reason to worry that salience might not matter, several recent objections warn against drawing this conclusion too quickly. One worry with the early studies is that they failed to make sufficiently salient the possibility that the bank will be closed on Saturday: merely mentioning a possibility doesn’t not necessarily make that possibility salient, particularly when that possibility seems strange or improbable (Schaffer and Knobe 2012). And, in fact, it does seem that people are less inclined to say that the husband knows that the bank will be open on Saturday when more care is given to making that possibility salient—for instance, by embedding it in the context of a personal anecdote (Schaffer and Knobe 2012; Buckwalter 2014). A second worry with the early studies is that several of them involved asking participants to make knowledge attributions rather than to evaluate knowledge attributions, and that when participants were asked to evaluate knowledge attributions they were asked to do so in situations where it was natural to deny knowledge (DeRose 2011). And, when adjustments are made to address these worries, it again seems that people are less inclined to say that the husband knows that the bank will be open on Saturday when the possibility of being wrong has been made salient than when that possibility goes unmentioned (Buckwalter 2014).6 In light of this unsettled experimental landscape, another look at salience seemed to us to be in order. To do this, we constructed a study using Nagel’s plain story and more detailed story. Participants (N  40, Age M  30, Female  28 percent) were recruited via Amazon Mechanical Turk.7 Participants received one of the vignettes, and were then asked to indicate the extent to which they agreed or disagreed with the claim that John knows

102

Advances in Experimental Epistemology

that the table is red. Answers were assessed using a 6-point Likert scale with 1  strongly disagree and 6  strongly agree. Contrary to the results of the early studies, but consistent with the results of more recent studies of the influence of salience on folk knowledge attribution, we found that participants were significantly more willing to attribute knowledge in the simple story (M  5.50, SD  1.14) than in the more detailed story (M  3.78, SD  1.40). A planned comparison showed a statistically significant difference (t  4.29, df  38, p  0.001). Cohen’s d  1.36, which indicates that the difference between the means is larger than one standard deviation, a large effect according to Cohen’s guidelines.8 These results can be visualized as follows:

Mean strength of agreement

6

5

4

3

2

1

Elaborate vignette Simple vignette Experimental condition Error bars: 95% Cl

Figure 4.1 Relationship between knowledge attribution and mention of possible error.

By focusing on vignettes where the possibility of error is made sufficiently salient, these results contribute to the recent trend of studies that have found a salience effect. But while these results suggest that salience does influence folk knowledge attributions, neither this study nor the ones that preceded it suggest why salience influences folk knowledge attributions. Our next step was to focus on this question.

Salience and Epistemic Egocentrism: An Empirical Study

103

3.2 Study 2: Epistemic egocentrism and the curse of knowledge, Round 1 Epistemic egocentrism involves misrepresenting other people’s mental states. For instance, we might mistakenly treat our concerns as theirs, and penalize them for failing to appropriately respond to these concerns. This may help to explain why we are less willing to say that John Doe knows that the table is red in the more detailed story than we are in the plain story. In the more detailed story, we treat him as sharing our concerns about the possibility of abnormal lighting conditions and subsequently penalize him for failing to check whether the lighting is normal. Epistemic egocentrism involves two important empirical predictions. First, it predicts that that there should not be a significant difference between how we assess cases where the relevant information is privileged and how we assess cases where it is not. That is, epistemic egocentrism predicts that there should be relatively little difference between our assessment of narrator cases, cases where the relevant information is shared only with the reader, and our assessment of subject cases, cases where the relevant information is shared with both reader and subject.9 Second, epistemic egocentrism predicts that there should be relatively little difference between how we assess entertain cases, cases where the subject is portrayed as entertaining the possibility that she is wrong, and how we assess neutral cases, cases that leave open whether or not the subject is entertaining the possibility that she is wrong. In order to test these two predictions, we constructed the following four vignettes, staying as close as possible to Nagel’s more detailed case: Narrator Defeater (Neutral) John and Mary are in a furniture store. John is looking at a bright red table under normal lighting conditions. He believes the table is red. However, a white table under red lighting would look exactly the same. John has not checked whether the lighting is normal, or whether there might be a red spotlight shining on the table. Narrator Defeater (Entertain) John and Mary are in a furniture story. John is looking at a bright red table under normal lighting conditions. He believes the table is red. However,

104

Advances in Experimental Epistemology

a white table under red lighting would look exactly the same. John thinks about this, but does not check whether the lighting is normal, or whether there might be a red spotlight shining on the table. Subject Defeater (Neutral) John and Mary are in a furniture store. John is looking at a bright red table under normal lighting conditions. He believes the table is red. Mary points out, however, that a white table under red lighting would look exactly the same. John has not checked whether the lighting is normal, or whether there might be a red spotlight shining on the table. Subject Defeater (Entertain) John and Mary are in a furniture store. John is looking at a bright red table under normal lighting conditions. He believes the table is red. Mary points out, however, that a white table under red lighting would look exactly the same. John thinks about this, but does not check whether the lighting is normal, or whether there might be a red spotlight shining on the table.

Participants (N  187, Age M  27, Females 26 percent) received one of these vignettes, and were then asked to indicate the extent to which they agreed or disagreed with the claim that John knows that the table is red. Answers were assessed using a 6-point Likert scale with 1  strongly disagree and 6  strongly agree. As expected, we found that there was no statistically significant difference between how participants assessed narrator cases and how they assessed subject cases, nor was there a statistically significant difference between how participants assessed entertain cases and how they assessed neutral cases. The means and standard deviations for the four conditions were Narrator Defeater Neutral (n  58, M  3.93, SD  1.70), Narrator Defeater Entertain (n  45, M  3.78, SD  1.49), Subject Defeater Neutral (n  48, M  3.5, SD  1.50), Subject Defeater Entertain (n  36, M  3.56, SD  1.70). The results of a 2  2 ANOVA showed no interaction between the various conditions (F  0.783, p  0.505).10 A subsequent pairwise comparison failed to find a statistically significant difference between entertain and neutral conditions in either subject cases (t  0.159, df  82, p  0.874) or in narrator cases (t  0.479, df  101, p  0.633).11 These results can be visualized as follows:

Salience and Epistemic Egocentrism: An Empirical Study

105

Mean strength agreement

6

5

4

3

2

1 Subjectdefeater neutral

SubjectNarratordefeater defeater entertain neutral Experimental condition Error bars: 95% Cl

Narratordefeater entertain

Figure 4.2 Relationship between knowledge attribution, and both the availability and consideration of the possibility of error.

These results give us some reason to think that something like epistemic egocentrism is driving our evaluation of other people’s judgments, at least in some conversational contexts that include mention of unrealized possibilities of error. Since we are treating our concerns as theirs, it doesn’t matter whether they are portrayed as sharing these concerns. It doesn’t even matter whether they are portrayed as being aware of them at all. All that seems to matter is that they haven’t done enough to appropriately respond to these concerns. And, this is just what we’d expect if epistemic egocentrism were playing a role in our evaluations of other people’s judgments.

3.3 Study 3: Epistemic egocentrism and the curse of knowledge, Round 2 It turns out that there is another way to test this. If we are projecting our concerns onto others and penalizing them for failing to appropriately respond

106

Advances in Experimental Epistemology

to these concerns, then we should find a negative correlation between the projection of our concerns onto others and our willingness to attribute knowledge to them, at least in situations where they fail to appropriately respond to these concerns. In order to test this prediction, we gave participants (N  93, Age M  28, Female  32 percent) Nagel’s more detailed case, and then asked them to indicate the extent to which they agreed or disagreed with the claim that John knows that the table is red and the claim that John is considering the possibility that he’s looking at a white table under a red spotlight. As before, answers were assessed using a 6-point Likert scale with 1  strongly disagree and 6  strongly agree. We found the expected negative correlation (r  0.211, p  0.042).12 The more likely participants were to say that John shared their concerns, the less likely they were to attribute knowledge to him, and vice versa. These results can be visualized as follows:

6

Projection of consideration

5

4

3

2

1 1

2

4 3 Mean strength of agreement

Scale 7 6 R2 Linear = 0.045

5

4

3

5

2

6

1

Figure 4.3 Negative correlation between projection measure and knowledge attribution.

Salience and Epistemic Egocentrism: An Empirical Study

107

As the scatterplot helps to show, the inverse correlation that we found is a modest one. With r equaling –0.211, we can only say that about 4 percent of the variability in responses to the knowledge probe is directly predictable from variability in responses to the considering probe, leaving the other 96 percent to other factors. To be sure, having remaining factors is common for a correlational study. Even a large correlation coefficient by Cohen’s standards (r  0.50) would leave up to 75 percent to other factors. Having said that, our r is not quite that large; it qualifies as a small correlation according to Cohen’s standards, which suggests that there are other factors worth identifying here. At this point, it is difficult to determine precisely what is going on here. One possibility is that this modest correlation is precisely what we should expect, on Nagel’s hypothesis.13 After all, according to it, when we project our concerns onto subjects like John Doe, our projections are due to the operation of an unconscious cognitive bias. So we might well expect our participants to have rather limited access to not only the cause of their projection but perhaps to their projections as well, resulting in the modest correlation observed in our study. If this is right, then the key to improving our predictive powers, while working with Nagel’s hypothesis, would be to find a better way of measuring the largely unconscious projections. But there are other possibilities to consider. It could be that participants do in fact have reasonably decent access to their projections (though probably not the causes of their projections). If so, then to fully predict the variability we observed in the knowledge attributions of our participants, we would have to turn to factors other than those having to do with projective tendencies. The question then becomes, what are those other factors? Insofar as the dialectic between invariantists and contextualists is concerned, one rather strange possibility is that perhaps in conversational contexts that include mention of an unrealized possibility of error, participants do raise the standards for knowledge attribution, but not in a uniform way across all participants. That might be a welcomed result to the contextualists. For, when it comes to explaining salience effects, it seems that a partial victory for them would be victory enough. Of course, the results reported here don’t settle the matter. But perhaps we can say this much: they do suggest that

108

Advances in Experimental Epistemology

epistemic egocentrism is at least playing a role in our willingness to attribute knowledge in at least some cases that include mention of an unrealized possibility of error.

3.4 Study 4: Salience, epistemic egocentrism, and motivation At this point, it might be natural to think that we just need to try harder to avoid mistakenly projecting our own mental states onto others. And some recent work in the social and cognitive sciences supports this kind of optimism, suggesting that certain kinds of epistemic egocentrism can be reduced with sufficient practice (Wu and Keysar 2007; Cohen and Gunz 2002), effort (Epley et al. 2004), or motivation (Epley et al. 2004). Yet, there is also reason to worry that our bias against adopting other perspectives cannot be canceled completely. Hindsight bias and outcome bias, for example, turn out to be exceptionally resilient (Camerer et al. 1989; Krueger and Clement 1994; Pohl and Hell 1996), and motivation alone does not ensure that we leave our own perspectives behind entirely (Epley and Caruso 2009). Against this background, we wanted to see whether sufficient motivation might help reduce epistemic egocentrism in the kinds of cases we’ve been discussing. Although motivation is often measured by means of financial incentive, we decided to measure it in terms of people’s need for cognition (Cacioppo and Petty 1982; for additional work on need for cognition [NFC] and knowledge attributions, see Weinberg et al. 2012).14 People’s NFC corresponds to their intrinsic motivation to give a great deal of care and attention to cognitive tasks, and people with a high NFC have been shown to be less likely to be influenced by certain cognitive heuristics and biases (Priester and Petty 1995; Cacioppo et al. 1996; Smith and Petty 1996).15 Participants (N  126, Age M  30, Females 34 percent) received either Nagel’s plain story or her more detailed story, were asked to indicate the extent to which they agreed or disagreed with the claim that John knows that the table is red (again, answers were assessed using a 6-point Likert scale with 1  strongly disagree and 6  strongly agree), and were then asked to complete

Salience and Epistemic Egocentrism: An Empirical Study

109

the NFC survey together with several additional demographic questions. The mean NFC score for our participants was 63.3 (SD  13.4). We defined NFC grouping by 1/2 standard deviation above and below that mean; NFC scores ranging from 18 to 57 were designated as “low NFC” (n  44), those with NFC scores ranging from 58 to 69 were designated as “mid-NFC” (n  48), and those with NFC scores ranging from 70 to 90 were designated as “high NFC” (n  34). As before, participants seemed much more willing to attribute knowledge in the simple story than in the more detailed story. In the latter, participants seemed willing to penalize John Doe for failing to rule out the possibility that the lighting might be abnormal, a concern that was shared only with them as readers of the vignette. The means and standard deviations for the two conditions were Simple Story (n  68; M  5.19; SD  0.78) and More Detailed Story (n  58; M  3.69, SD  1.54). A planned comparison showed a statistically significant difference (t  7.073, df  124, p  0.001, d  1.26). What is particularly interesting in the current context is that increased motivation doesn’t seem to drive down our tendency to mistakenly project our own mental states onto others in these kinds of cases: highly motivated people seem just as likely as highly unmotivated people to project their concerns onto John Doe.16 The means and standard deviations for the different conditions, breaking participants into three groups (high NFC, mid-NFC, and low NFC), are as follows: High NFC: simple story (n  22, M  5.18, SD  0.85), more detailed story (n  22, M  3.91, SD  1.41). Planned comparison showed a statistically significant difference (t  3.621, df  42, p  0.001, d  1.09). Mid NFC: simple story (n  27, M  4.96, SD  0.71), more detailed story (n  21, M  3.29, SD  1.52). Planned comparison showed a statistically significant difference (t  5.079, df  46, p  0.001, d  1.48). Low NFC: simple story (n  19, M  5.53, SD  0.70), more detailed story (n  15, M  3.93, SD  1.71). Planned comparison showed a statistically significant difference (t  3.393, df  17.674, p  0.003, d  1.28). A two-way ANOVA showed that there was no significant interaction between NFC groupings and responses (p  0.70).17 These results can be visualized as follows:

110

Advances in Experimental Epistemology

6.00

Mean strength of agreement

5.00

4.00

3.00

2.00

1.00 LowNFC (69) Experimental condition by NFC group Error bars: 95% Cl Condition Simple vignette

Elaborate vignette

Figure 4.4 Relationship between knowledge attribution, mention of possible error, and need for cognition.

4 Conclusion We have seen that our willingness to attribute knowledge can be sensitive to what possibilities have been made salient in a given conversational context, that this sensitivity can, at least in part, be explained in terms of epistemic egocentrism, and that increased motivation doesn’t seem to drive down our tendency to mistakenly project our own mental states onto others in the cases examined here. There is more to learn, and we think that three additional avenues of research will be particularly important going forward. Recent work in the social and cognitive sciences has shown that there are individual

Salience and Epistemic Egocentrism: An Empirical Study

111

differences in our ability to set aside what we know (Kuhn 1991; Stanovich and West 1998; Musch and Wagner 2007), and that social distance influences the tendency to project our own mental states onto other people (Robinson et al. 1995; Epley et al. 2004; Ames 2004). Studies have also shown that at least some kinds of epistemic egocentrism—hindsight bias, for example—can be overcome using a specific kind of interventional debiasing strategy known as the consider the opposite strategy (Lord et al. 1984; Arkes et al. 1988; Arkes 1991; Larrick 2004). As we come to better understand the role that epistemic egocentrism plays in knowledge attribution, especially in conversational contexts involving privileged information, it will be important to study whether knowledge attributions track individual differences in susceptibility to the curse of knowledge, what role social distance plays in our ability to reason about what others know, and whether the influence of epistemic egocentrism on knowledge attribution can be mitigated by asking people to consider reasons why their initial judgments might have been wrong. These are projects for another day.

Notes 1 Authorship is equal; authors are listed in alphabetical order. We would like to thank James Beebe, Susan Birch, Steven Gross, Kareem Khalifa, Jennifer Nagel, and Jonathan Weinberg for helpful comments; Ronald Mallon, Shaun Nichols, and the National Endowment for the Humanities for organizing and funding the 2012 Experimental Philosophy Institute where this project was conceived; and the Indiana Statistical Consulting Center for their help in analyzing the results of Study 3. 2 We should note that the terms “project” and “projection” can be used in a number of different ways. In the Freudian tradition, paradigmatic cases of projection involve ascribing negative attributes to others without the projector being aware that he possesses these attributes (Gilovich 1991, p. 113), a form of what David Holmes (1968) calls “similarity projection.” For others, projection has more to do with a particular account of third-person mindreading: it arises when the simulation processes of the attributor inappropriately fail to suppress a mental state of her own (e.g., Goldman 2006, p. 165). In this chapter, we are going to use the terms in a more general way; for us, projection will involve

112

Advances in Experimental Epistemology attributing one’s own mental state to others, whether or not one is aware of having the state in question and however the psychology responsible for the attribution occurs.

3 There is a related controversy about whether knowledge attribution is sensitive to the personal costs of being wrong, and why. Subject-sensitive and interestrelative invariantists argue that knowledge is sensitive to what is at stake for the person whose epistemic situations is being described (Hawthorne 2004; Stanley 2005; Hawthorne and Stanley 2008; Fantl and McGrath 2010) and contextualists argue that knowledge is sensitive to what is at stake in the conversational context in which knowledge is being either attributed or denied (DeRose 1999). 4 Hawthorne (2004) and Williamson (2005) provide a different psychological explanation, arguing that the relationship between our willingness to attribute knowledge and what possibilities have been made salient in a given conversational context can be explained in terms of well-known psychological bias called the availability heuristic. In addition to providing her own psychological account, Nagel (2010) argues that there are empirical and conceptual problems with an explanation grounded in the availability heuristic. We aren’t interested in taking sides in this debate, and plan to focus only on the empirical merits of Nagel’s alternative account. For more information about the availability heuristic, see Tversky and Kahneman (1973). 5 Epistemic egocentrism and the curse of knowledge are related to other wellknown psychological biases, including the hindsight bias and the outcome bias. For additional discussion, see Hawkins and Hastie (1990) and Baron and Hershey (1988). 6 When discussing empirical results that suggest that perhaps folk knowledge attributions aren’t sensitive to what is at stake for the person whose epistemic situation is being described in a given vignette, Brian Weatherson (2011) notes that interest relative invariantists don’t think that subject stakes always matter; instead, they simply think that subject stakes sometimes matter. It seems that a similar point could be made about salience, and would provide a way for invariantists to explain away the first wave of empirical results. Another way to explain away these results is by suggesting that some defeaters are properly ignored, a point made quite nicely by Austin (1990). 7 Five additional participants were excluded from the analysis: two for failing to answer the target question, and three for incorrectly answering a comprehension question. The exclusion criteria were decided before collecting data, and including their data doesn’t affect the significance of the result.

Salience and Epistemic Egocentrism: An Empirical Study

113

8 Because the distribution of responses to the plain story seems to violate the assumption of normality, with skewness  3.25 (SE  0.49), kurtosis  11.91 (SE  0.95), and p  0.000 for a Shapiro–Wilk test, a follow-up nonparametric Mann–Whitney U test was performed. According to it, once again, participants were more willing to attribute knowledge in the simple story than in the detailed story: U  50.5; p  0.000. The Glass rank biserial correlation  0.74, a “large” effect in Cohen’s (1988) classification. 9 There shouldn’t be a significant difference with relatively small Ns. With really large Ns, however, matters may be different, if only because being told that John Doe is considering the possibility might result in a slightly stronger signal, compared with that resulting from automatic, largely unconscious mindreading processes. A similar comment holds for the next prediction. 10 Twenty-five subjects were excluded for failing the comprehension question, or failing to answer it at all, and thirty-five additional subjects were excluded for not following basic directions. This is a large number of exclusions; however, the exclusion criteria were decided before the collection and analysis of data. Including these participants does not change the result (F  1.23, p  0.301). 11 A subsequent comparison between the original plain story and the four narrator- and subject-defeater cases confirmed a significant effect of salience (F (4, 209)  7.28, p  0.000) on knowledge attribution. Post hoc Tukey tests showed a significant difference between the simple case and all narrator- and subject-defeater cases at the p  0.001 level. Because the simple vignette seems to violate the assumption of normality (see footnote 8), a nonparametric test was performed, which confirmed the results (H (4)  30.6, p  0.000). Mann– Whitney U tests were used to follow up this finding. The simple and subjectdefeater neutral cases (U  145, p  0.000) were significantly different; the rank biserial correlation  1.10, which indicates a “large” effect. The simple and subject-defeater entertain cases (U 127, p  0.000) were significantly different; the rank biserial correlation  1.27, which indicates a “large” effect. The simple and narrator-defeater neutral cases (U  289, p  0.000) were significantly different; the rank biserial correlation  0.75, which indicates a “large” effect. The simple and narrator-defeater entertain cases (U  158, p  0.000) were significantly different; the rank biserial correlation  1.01, which indicates a “large” effect. 12 Eleven participants were excluded from the analysis: one for failing to answer the target question, another for failing the comprehension question, and nine others for ignoring basic instructions. Exclusion criteria were decided before

114

Advances in Experimental Epistemology any analyses were run. Including the data does not change the significance of the result (r  0.264, p  0.007).

13 It is perhaps important to emphasize that the standard line on epistemic egocentrism is that we are unaware of the fact that the projection has occurred (Baron and Hershey 1988; Camerer et al. 1989). 14 A person’s NFC is determined on the basis of her responses to a survey with 18 self-report items like “I find satisfaction in deliberating hard and for long hours” or “I like tasks that require little thought once I’ve learned them.” The 18-statement survey is a shortened version of the survey originally used by Cacioppo and Petty; however, studies have shown that the shortened version is just as accurate a measure as the longer version of a person’s NFC. See Cacioppo et al. (1984). 15 Since a person’s NFC is supposed to represent her intrinsic motivation to engage in effortful thinking, she will find such an activity rewarding even if no external rewards are offered. In fact, Thompson et al. (1993) found that people with a high NFC will actually give less effort and care to cognitive tasks when they are presented with external rewards than when they are simply asked to engage in the cognitive tasks for their own sake. 16 Some caution is probably needed here. If Nagel’s story is only a small part of the real story, and some other factors are also playing a significant role in generating these kinds of salience effects, then we might expect to find decreased tendencies to attribute knowledge in Nagel’s more detailed case even when people aren’t mistakenly projecting their own concerns onto others. 17 While the two-way ANOVA didn’t reveal a main difference across conditions, it very nearly found a main difference across NFC groupings (p  0.059). When we collapse the conditions, the different NFC groupings seemed (somewhat) to be more or less permissive with their knowledge attributions: low NFC (M  4.82), mid-NFC (M  4.23), and high NFC (M  4.23).

References Ames, D. (2004), “Inside the mind reader’s tool kit: projection and stereotyping in mental state inference.” Journal of Personality and Social Psychology, 87, 340–53. Arkes, H. (1991), “Costs and benefits of judgment errors: implications for debiasing.” Psychological Bulletin, 110, 486–98. Arkes, H., Faust, D., Guilmette, T., and Hart, K. (1988), “Eliminating the hindsight bias.” Journal of Applied Psychology, 73, 305–7.

Salience and Epistemic Egocentrism: An Empirical Study

115

Austin, J. L. (1990), Philosophical Papers (3rd edn). Oxford: Oxford University Press. Baron, J. and Hershey, J. (1988), “Outcome bias in decision evaluation.” Journal of Personality and Social Psychology, 54, 569–79. Birch, S. and Bloom, P. (2004), “Understanding children’s and adults’ limitations in mental state reasoning.” Trends in Cognitive Sciences, 8, 255–60. —. (2007), “The curse of knowledge in reasoning about false beliefs.” Psychological Science, 18, 382–6. Buckwalter, W. (2010), “Knowledge isn’t closed on Saturday.” Review of Philosophy and Psychology, 1, 395–406. —. (2014), “The mystery of stakes and error in ascriber intuitions,” in J. Beebe (ed.), Advances in Experimental Epistemology, London: Continuum Press. Cacioppo, J. and Petty, R. (1982), “The need for cognition.” Journal of Personality and Social Psychology, 42, 116–31. Cacioppo, J., Petty, R., and Kao, C. (1984), “The efficient assessment of need for cognition.” Journal of Personality Assessment, 48, 306–7. Cacioppo, J., Petty, R., Feinstein, J. and Jarvis, B. (1996), “Dispositional differences in cognitive motivation: the life and times of individuals varying in need for cognition.” Psychological Bulletin, 119, 197–253. Camerer, C., Loewenstein, G., and Weber, M. (1989), “The curse of knowledge in economic settings.” Journal of Political Economy, 97, 1232–54. Cohen, J. (1988), Statistical Power Analysis for the Behavioral Sciences (2nd edn). Hillsdale, NJ: Lawrence Earlbaum Associates. Cohen, S. (1999), “Contextualism, skepticism, and the structure of reasons.” Philosophical Perspectives, 13, 587–89. Cohen, D. and Gunz, A. (2002), “As seen by the other . . .: Perspectives on the self in the memories and emotional perceptions of Easterners and Westerners.” Psychological Science, 13, 55–9. DeRose, K. (1992), Contextualism and knowledge attributions. Philosophy and Phenomenological Research, 52, 913–29. —. (1999), “Contextualism: an explanation and defense,” in J. Greco and E. Sosa (eds) The Blackwell Guide to Epistemology. Oxford: Blackwell, pp. 187–205. —. (2009), The Case for Contextualism. Oxford: Oxford University Press. —. (2011), “Contextualism, contrastivism, and X-Phi Surveys.” Philosophical Studies, 156, 81–110. Epley, N. and Caruso, E. (2009), “Perspective taking: Misstepping into others’ shoes,” in K. Markman, W. Klein, and J. Suhr (eds), Handbook of Imagination and Mental Simulation. New York: Psychology Press, pp. 295–309. Epley, N., Keysar, B., Van Boven, L., and Gilovich,T. (2004), “Perspective taking as egocentric anchoring and adjustment.” Journal of Personality and Social Psychology, 87, 327–39.

116

Advances in Experimental Epistemology

Fantl, J. and McGrath, M. (2010), Knowledge in an Uncertain World. Oxford: Oxford University Press. Feltz, A. and Zarpentine, C. (2010), “Do you know more when it matters less?” Philosophical Psychology, 23, 683–706. Gilovich, T. (1991). How We Know What Isn’t So. New York: The Free Press. Goldman, A. I. (2006). Simulating Minds: The Philosophy, Psychology, and Neuroscience of Mindreading. Oxford: Oxford University Press. Hawkins, S. and Hastie, R. (1990), “Hindsight.” Psychological Bulletin, 107, 311–27. Hawthorne, J. (2004), Knowledge and Lotteries. Oxford: Oxford University Press. Hawthorne, J. and Stanley, J. (2008), “Knowledge and action.” Journal of Philosophy, 105, 571–90. Holmes, D. S. (1968). “Dimensions of projection.” Psychological Bulletin, 69, 248–68. Krueger, J. and Clement, R. (1994), “The truly false consensus effect.” Journal of Personality and Social Psychology, 67, 596–610. Kuhn, D. (1991), The Skills of Argument. Cambridge: Cambridge University Press. Larrick, R. (2004), “Debiasing,” in D. Koehler and N. Harvey (eds), Blackwell Handbook on Judgment and Decision Making. Oxford: Blackwell Publishing, pp. 316–38. Lord, C., Lepper, M. and Preston, E. (1984), “Considering the opposite: a corrective strategy for social judgment.” Journal of Personality and Social Psychology, 47, 1231–43. May, J., Sinnott-Armstrong, W., Hull, J., and Zimmerman, A. (2010), “Practical interests, relevant alternatives, and knowledge attributions: an empirical study.” Review of Philosophy and Psychology, 1, 265–73. Musch, J. and Wagner, T. (2007), “Did everybody know it all along? A review of individual differences in hindsight bias.” Social Cognition, 25, 64–82. Nagel, J. (2010), “Knowledge ascriptions and the psychological consequences of thinking about error.” The Philosophical Quarterly, 60, 286–306. Nickerson, R. (1999), “How we know – and sometimes misjudge – what others know.” Psychological Bulletin, 125, 737–59. Pohl, R. and Hell, W. (1996), “No reduction of hindsight bias after complete information and repeated testing.” Organizational Behavior and Human Decision Processes, 67, 49–58. Priester, J. and Petty, R. (1995), “Source attributions and persuasion: perceived honesty as a determinant of message scrutiny.” Personality and Social Psychology Bulletin, 21, 637–54. Robinson, R., Keltner, D., Ward, A., and Ross, L. (1995), “Actual versus assumed differences in construal: ‘naïve realism’ in intergroup perception and conflict.” Journal of Personality and Social Psychology, 68, 404–17.

Salience and Epistemic Egocentrism: An Empirical Study

117

Royzman, E., Cassady, K., and Baron, J. (2003), “I know, you know: epistemic egocentrism in children and adults.” Review of General Psychology, 7, 38–65. Schaffer, J. and Knobe, J. (2012), “Contrastive knowledge surveyed.” Nous, 46, 675–708. Smith, S. and Petty, R. (1996), “Message framing and persuasion: a message processing analysis.” Personality and Social Psychology Bulletin, 22, 257–68. Stanley, J. (2005), Knowledge and Practical Interests. Oxford: Oxford University Press. Stanovich, K. and West, R. (1998), “Individual differences in rational thought.” Journal of Experimental Psychology: General, 127, 161–88. Thompson, E., Chaiken, S., and Hazelwood, D. (1993), “Need for cognition and desire for control as moderators of extrinsic reward effects: a person situation approach to the study of intrinsic motivation.” Journal of Personality and Social Psychology, 64, 987–99. Tversky, A. and Kahneman, D. (1973), “Availability: a heuristic for judging frequency and probability.” Cognitive Psychology, 5, 207–32. Weatherson, B. (2011), “Defending interest-relative invariantism.” Logos and Episteme, 2, 591–609. Weinberg, J. M., Alexander, J., Gonnerman, C., and Reuter, S. (2012). “Restrictionism and reflection: challenge deflected, or simply redirected?” The Monist, 95, 200–22. Williamson, T. (2005), “Contextualism, subject-sensitive invariantism, and knowledge of knowledge.” The Philosophical Quarterly, 55, 213–35. Wu, S. and Keysar, B. (2007), “The effect of culture on perspective taking.” Psychological Science, 18, 600–6.

5

Semantic Integration as a Method for Investigating Concepts Derek Powell, Zachary Horne, and N. Ángel Pinillos

1 Introduction The last 10 years in philosophy have been marked by widespread interest in the psychology of philosophy. Much of this work has been carried out by experimental philosophers, who aim to better understand the contours of philosophical concepts and intuitions by importing the methods of the empirical sciences. Their hope is that a better understanding of the psychology of philosophical concepts such as KNOWS, PERSONHOOD, FREE WILL, and many others, will allow them to better assess philosophical arguments which utilize such notions. Experimental philosophers have amassed many interesting results, but compelling concerns have been raised about the survey-based experimental methods that they typically employ. Here we argue, on the basis of these concerns and our own, that the possibility of experimental artifacts is good reason to adopt a new experimental paradigm that we call semantic integration. This methodology uses a memory task as an implicit measure of the degree to which different situations instantiate concepts. This measure avoids the methodological challenges researchers must address if they are to continue to use surveys. The plan of the chapter is as follows: First, we consider some challenges associated with survey methodology (Section 2) and then describe how semantic integration tasks can be used to implicitly examine people’s concepts (Section 3). Next we argue that, by investigating concepts implicitly, semantic integration offers important advantages over more explicit survey methods

120

Advances in Experimental Epistemology

(Section 4). Finally, we discuss caveats regarding semantic integration methods (Section 5), variations on these methods, and briefly consider how they might be used alongside survey-based research (Section 6).

2 The methods of experimental philosophy Experimental philosophers investigate philosophical concepts by presenting participants with short passages and then asking them to make judgments about what they read. These passages, which are often derived from philosophical thought experiments, are designed to test whether certain features are parameters for instantiating a philosophical concept. Studies using this survey methodology have improved substantially when compared to early research that lacked proper control conditions, but the methodology is still limited in important ways. In this section we review some of the challenges faced by researchers that use surveys. We discuss issues raised by Simon Cullen (2010), as well as other limitations of survey methodology.

2.1 Pragmatic cues in experimental materials In a recent critique, Cullen (2010) argues that researchers conducting surveys need to take into account both the semantic and the pragmatic features of their experimental materials. Grice (1975) observed that when people attempt to comprehend some utterance of natural language, they do not attempt to comprehend the exact meaning of the words as spoken or as written on the page. Rather, they attempt to comprehend the speaker’s meaning. Cullen argues that the participants in an experiment behave similarly, attempting to comprehend the experimenter’s meaning, a consideration that many experimental philosophers have ignored. According to Grice (1975), people make assumptions about the requirements for rational communication. These assumptions, often referred to as Gricean norms, allow listeners or readers to grasp what a speaker means to convey or what they think a speaker means to convey. People assume that speakers are “cooperative communicators”—that their utterances are true, orderly, relevant, and nonredundant. Typically, speakers are themselves aware

Semantic Integration as a Method for Investigating Concepts

121

that their interlocutors make these assumptions and so they exploit these assumptions to help conversational participants understand them. For example, sometimes the best way of making sense of someone’s communications, given that they are following Gricean norms, is by inferring that they mean something that goes beyond what they said or stated with their utterance. If Speak asks, “Has the number two bus come by yet?” a listener, Hear, can rightly infer that there is a number two bus, that its route passes by this location, and that Speak is hoping to catch the bus. Of course, none of these facts are explicitly stated in Speak’s question. Hear can infer these things because she assumes that Speak is following Gricean norms. For example, Speak would not be following the norm of relevance if he was not planning to get in the bus. Moreover, Speak can count on Hear to infer these things about his utterance because Speak knows that Hear will assume he is following the Gricean norms. Roughly, the propositions that the speaker means to convey (but go beyond what is said) and are inferable in a conversation applying the Gricean norms are called “conversational implicatures” (as opposed to propositions that are conventionally associated with the words used). It is widely accepted that the deployment and computing of conversational implicatures is pervasive in human communication. For this reason, Cullen (2010) argues that if experimental philosophers ignore conversational implicatures, then their instructions, stimuli, response options, and other experimental materials may not convey the meanings they intend. For an illustration of how conversational implicatures can affect survey results, consider research on “base-rate neglect.” Base-rate neglect is the tendency for people to ignore relevant statistical base rates when judging the probabilities of events and to instead rely on simpler heuristics (for a review, see Nisbett and Ross 1980). In one study, Kahneman and Tversky (1973) presented people with a description of a fictional college student and asked them to estimate the probability that the student majored in various fields. If the descriptions included traits that seem stereotypical of an engineering student (e.g., introverted, enjoys solving problems), then people estimated the probability that the student was an engineer was high. What was interesting was that people make similar probability estimates even when they were told that only a small percentage of students study engineering. Kahneman and Tversky concluded that people ignore base-rate information when making

122

Advances in Experimental Epistemology

their probability estimates and instead employ a representativeness heuristic: since the student resembles an engineer, they judge that it is probable he is one, and they ignore the base-rate information which would suggest that any individual student is most likely not an engineer. However, more recent research suggests that base-rate neglect may be due, at least in part, to conversational processes rather than to decision processes. If participants assume that experimenters are cooperative communicators, then they assume that the information they’ve been given is the most relevant to the task at hand. This may lead them to place a greater weight on the descriptions given than they would have otherwise. Schwarz et al. (1991) examined this by manipulating the guarantee of relevance. Participants in one condition were told that the descriptive information presented to them had been compiled by psychologists (as in the original experiments of Kahneman and Tversky 1973), and in another condition, they were told that the same description had been compiled by a computer that randomly sampled from a database of information. Whereas communication from another person comes with an implied guarantee of relevance, computer-generated text does not. As predicted, researchers found that participants were significantly less influenced by computer-generated descriptions than by human-generated descriptions (Schwarz et al. 1991). Even relatively subtle pragmatic cues can have important effects on people’s responses to survey questions. For instance, people seem to place greater weight on the last source of evidence they are shown: Krosnick and colleagues (1990) found that base rates had a larger effect on participants’ judgments when they were the last piece of information participants read before making their response. The guarantees of relevance and nonredundancy imply that if experimenters present an apparently sufficient source of evidence (e.g., base-rate information), and then present another source of evidence (e.g., an individual description), then this second source should be interpreted as nonredundant and highly relevant to the task at hand. Cullen (2010) demonstrated that pragmatic cues can likewise affect people’s responses to philosophical thought experiments. However, he argues that researchers can overcome these challenges if they are sensitive to the context in which participants interpret their experimental materials, and the norms that govern these interpretations. Following Schwarz (1994), he argues that

Semantic Integration as a Method for Investigating Concepts

123

experimenters and participants are engaged in a conversation governed by the norms of cooperative communication (Cullen 2010; Grice 1975; Schwarz 1994). Since participants abide by these norms, and expect researchers to abide by them as well, experimental materials must be constructed with pragmatic cues in mind. We agree that addressing the pragmatic features of experimental materials would improve the conclusions that can be drawn from surveys. However, overcoming these challenges might prove difficult. In practice, researchers still need to determine exactly how materials and questions ought to be phrased, and what implicatures they ought to contain. To make matters more difficult, this would need to be determined for each concept that experimental philosophers intend to examine. To illustrate the difficulty of designing appropriate questions and materials, consider the challenges faced by researchers studying causal learning: An important construct in research on causal learning is causal strength, defined as the probability that some cause produces an effect (Cheng 1997). Although people often make judgments about causal strength, researchers can ask participants to report such a judgment in any number of ways, and it is not obvious which way is optimal. In one experiment Buehner et al. (2003) asked their participants to make a causal strength rating on a scale from 0 (X does not cause Y at all) to 100 (X causes Y every time). They found that participants’ judgments tended to cluster into two groups: one group of participants made judgments consistent with Cheng’s (1997) probabilistic definition of causal strength, whereas other participants made judgments consistent with competing associative models. As causal learning is often taken to be a relatively basic cognitive mechanism (Cheng 1997), it would be remarkable if some people learned causal relationships via wholly different cognitive mechanisms. However, Buehner and colleagues investigated whether ambiguities in the question they used to probe participants’ judgments were responsible for the divergent pattern of responses. Indeed, they noted that the causal strength question they initially used can be interpreted as applying in one of two different contexts: (1) the experimental learning context where the effect is also produced by other background causes or (2) a counterfactual context where only the cause of interest is present. The clustering of participants’ different responses was explained by these different interpretations: under the first interpretation, the best response is consistent with an associative model,

124

Advances in Experimental Epistemology

whereas under the second interpretation, the best response is consistent with the probabilistic definition of causal strength. After further research, these researchers discovered that the best way to unambiguously probe participants’ causal strength estimates was to phrase their questions counterfactually and in terms of frequencies (Buehner et al. 2003). For example, “Imagine 100 healthy people started smoking cigarettes, how many do you think would develop cancer?” This wording makes the context clear, and participants do not need to make any inferences beyond what is stated. The upshot of all this is that resolving ambiguities and constraining participants’ interpretations of questions and materials is feasible, but can require systematic investigation for each concept at issue.

2.2 Demand characteristics The results of surveys can also be affected by demand characteristics (Orne 1962). Crudely put, demand characteristics are artificial features of an experimental task that lead participants to perform some task other than what the researchers intended them to.1 Demand characteristics can occur when participants are apprehensive about being evaluated (Weber and Cook 1972). Apprehension can lead participants to respond in ways they perceive as either socially desirable, or “correct,” irrespective of their actual attitudes or intuitions. Demand characteristics can also occur when participants assume the role of a faithful participant, eschewing all pragmatic cues and following instructions exactly to the letter (Weber and Cook 1972). Survey materials in experimental philosophy studies are particularly likely to exhibit demand characteristics because experimental philosophers often present naive participants with bizarre thought experiments. Although the uniqueness of thought experiments is harmless in professional philosophy, there is evidence that survey participants are more likely to assume a faithful role, ignoring pragmatic and contextual cues, when experimental materials are particularly unrealistic (Weber and Cook 1972). In other words, if experimental materials are convoluted or strange, then participants are more likely to ignore the contextual cues in experimental materials, or to interpret them under different assumptions. Additionally, if participants are apprehensive about being evaluated, then they are more likely to try to guess at desirable or

Semantic Integration as a Method for Investigating Concepts

125

“correct” response. When the passages they are asked to read are bizarre thought experiments, they may engage in a kind of amateur philosophizing, diverging from the aims of experimental philosophers. If demand characteristics cannot be ruled out, then it is unclear how to interpret the results of surveys.

3 Semantic integration In this section, we propose a new methodology for investigating concepts that we call semantic integration. First, we introduce research on memory and language processing that inspired the semantic integration methodology. Then, we describe the components of a semantic integration task, and two experiments in which we employ the method. Semantic integration uses memory tasks as an implicit measure of how concepts are activated by different situations. As we discuss, this method has important advantages over survey-based research: it minimizes the influence of pragmatic cues and greatly reduces the possibility of demand characteristics. It also provides a more direct way of examining concepts. That is, semantic integration provides a measure of conceptual activation that is not influenced by downstream judgment or decision processes. In contrast, participants’ responses to survey questions in experiments typically constitute their judgments about whether a particular concept applies in a given situation. These judgments may be the products both of people’s concepts and of downstream decision processes.

3.1 Memory and language processing research People tend to think of errors in memory as errors of omission—they acknowledge that we sometimes forget things that have happened to us, but assume that we can only form memories for events that we have experienced. Yet, psychologists have amassed significant evidence that people sometimes remember events that never actually occurred (for a review, see Schacter 1995), indicating that memory is not entirely dependent on external inputs. Bartlett (1932) is often credited with reporting the first experimental evidence for the formation of false memories. In his research, he had participants read a

126

Advances in Experimental Epistemology

story and then recall it several times after subsequent delays. His analyses were informal, but he reported that memories grew increasingly distorted after each recall. Since Bartlett, researchers have found evidence for the formation of false memories in list-learning paradigms (Deese 1959; Roediger and McDermott 1995), as well as in retention of sentences (Bransford and Franks 1971), longer prose passages (Sulin and Dooling 1974), image sequences (Loftus et al. 1978), and videos (Loftus and Palmer 1974). These phenomena are more than just curiosities; researchers have leveraged false memory to investigate the nature of our mental representations as well as our language comprehension processes. Psychological research indicates that people’s memories are influenced by semantic processing, and that people’s memory is better for semantic information than for specific episodes or verbatim utterances (Anderson et al. 1994; Anderson and Ortony 1975; Deese 1959; Loess 1967; Roediger and McDermott 1995; Sachs 1967). Even in simple experimental contexts (e.g., learning lists of words), experiences are processed and given semantic representations. In one study, Roediger and McDermott (1995; also see Deese 1959) asked participants to memorize lists composed of different words that were semantically related to a single target word. When participants were asked later to recall the words they had been presented with, they were often just as likely to falsely recall the target word, which had never been presented, as any of the other words that actually appeared in the list. For example, when presented with a list made up of words like “glass,” “pane,” and “shade,” people are likely to recall the target word “window,” even if the word never appeared in the list. To introduce some terminology, the words in the list semantically activate the word “window”—which is to say that they cause people to form or retrieve stored mental representations associated with the word. Researchers have leveraged the relationship between false memories and semantic activation to examine language processing (e.g., Bransford and Franks 1971; Flagg 1976; Gentner 1981). In particular, prior research investigated how semantic information is combined to form meaningful structured representations, or discourse meanings. This process, sometimes called semantic integration (Bransford and Franks 1972), enables people to comprehend complex ideas communicated through connected discourse. Early research by Sachs (1967) found that memory for the meanings of sentences is more robust than memory for their specific wordings. He asked participants to read

Semantic Integration as a Method for Investigating Concepts

127

passages and then tested their recognition for sentences either immediately or after they had read different amounts of intervening material. Some of the tested sentences had actually appeared in the text, but others were altered semantically or syntactically. When the meanings of the sentences were changed, participants made few errors; even after substantial distraction, participants rarely reported memory for sentences that had not appeared in the passage. However, when the changes were syntactic (e.g., a shift from active to passive voice), participants often reported recognizing the new sentences. After distraction, their recognition performance was near chance. Sachs concluded that during language processing, the original form of presented material is stored temporarily, only long enough to be comprehended, whereas the material’s meaning is encoded into long-term memory. If semantic information is integrated during language processing and it is the meaning of a passage that is encoded into memory, then memory ought to exhibit productivity. That is, it should be possible for exposure to several basic, interrelated sentences to produce false memory for a sentence that expresses the integrated representation. A number of studies have confirmed this prediction, indicating that people integrate simple sentences to form representations for more complex sentences during language comprehension (Bransford and Franks 1971; Cofer 1973; Flagg 1976). Additionally, people have been found to integrate information from text passages read during an experiment with their general background knowledge, leading to false recall for additional information that was not experimentally presented (Owens et al. 1979; Sulin and Dooling 1974; Thorndyke 1976). To explain these findings, Gentner (1981) examined a model of language processing in which sentences are considered both individually and in the broader context in which they appear. Her model states that when a sentence is read within the context of a larger passage, the discourse meaning that a reader forms may incorporate information not contained in the original sentence. She focused her investigation on an examination of the integration of verb meanings in context. Following research in linguistics (e.g., Chafe 1970), artificial intelligence (e.g., Schank 1972, 1973), and psychology (e.g., Miller and Johnson-Laird 1976; Stillings 1975), Gentner hypothesized that complex verb meanings can be represented by networks of subpredicates that express semantic relationships. Crudely put, a verb’s subpredicates are simpler verbs

128

Advances in Experimental Epistemology

that function as components of the more complex verb’s meaning. To illustrate, consider the relationship between the verb “give” and the more specific verb “pay.” On Gentner’s analysis, “giving” some item is to take some action that transfers ownership of that item to a recipient. “Paying” is a more specific form of giving, in which the giver owes the recipient. Thus, a representation of “gave” would include subpredicates like “caused,” “changed,” and “possession,” and a representation of “paid” would add the subpredicate “owed.” Gentner tested this hypothesis by asking her participants to read paragraph-long stories that each included a sentence with the verb of interest—the critical sentence. For instance, one story contained the critical sentence, “Max finally gave Sam the money.” In the experimental condition, additional context explained that Max owed Sam money, whereas the control condition lacked this context. After reading one version of the story, participants performed a recall task in which they were shown the critical sentence with the word “gave” removed, and they were asked to fill in the word that had appeared in the story. In support of Gentner’s predictions, participants who had been provided with the additional context were more likely to falsely recall the verb “paid” than participants in the control condition.

3.2 Using semantic integration to investigate philosophical concepts Whereas Gentner (1981) used a false recall paradigm to examine how verbs with known meanings are integrated during language processing, we propose that the same methods can be used to investigate the meanings of philosophically significant concepts. On a traditional view, many philosophical concepts are complex mental entities constituted by simple concepts. The simple concepts jointly provide a “definition” of the complex concept. This means that the constituent concepts express properties which provide necessary and (jointly) sufficient conditions for the instantiation of the complex concept. In the terms of semantic integration research, the traditional view makes the prediction that a concept C will have subpredicates which are the constituents of C.2 For example, a view which says that KNOWLEDGE is a complex concept constituted by JUSTIFIED TRUE BELIEF (e.g., Ayer 1956) will make the prediction that these constituent concepts, expressing necessary conditions,

Semantic Integration as a Method for Investigating Concepts

129

will be subpredicates for KNOWLEDGE. Researchers can test whether including these subpredicates in a passage leads to false recall for words picking out KNOWLEDGE, offering evidence that these subpredicates were integrated to produce the concept. When this integration occurs, researchers can infer that the concepts JUSTIFIED TRUE BELIEF are constituents of KNOWLEDGE. This application of semantic integration is straightforward because we are able to propose a jointly sufficient set of constituents for the concept. However, there may be other situations where this is not possible. For instance, some constituents of a complex concept might be unknown. Alternatively, some concepts may be simple, or may be sensitive to nonconstituent parameters, or may be context-sensitive. Fortunately, Gentner (1981) showed that concepts can play the role of subpredicates for a target concept even when they are not jointly sufficient or necessary for instantiating the target concept. For example, she shows that people falsely recall “painting” when they integrate “working” and “workers are carrying brushes, whitewash, and rollers.” Strictly speaking, these features are not jointly sufficient for painting. The workers might have carried the whitewash but ended up working on something unrelated to painting. Of course, we expect participants in semantic integration tasks to understand the story in a plausible way. As a result, there is no requirement that the items which are integrated to yield a concept actually form a jointly sufficient set for that concept in some strong metaphysical sense. Nor is there a requirement that the items playing the role of subpredicates in integration correspond to necessary conditions for the concept at issue. Recall Gentner’s example: “Carrying brushes, whitewash, and rollers” is not a necessary condition for painting (think of spray painting). All that is required is that the context makes it more or less likely that the target concept is instantiated. This is good news for four reasons: First, complex concepts can be examined even if some of their constituents are unknown. If we are interested in studying a complex concept, we can examine participants’ integration of a set of concepts that merely approximate its true constituent concepts. For example, suppose that a concept C has constituents X1, X2, and X3 and we want to test whether X1 is a constituent. We can test for false recall of the lexicalization of C in the presence of X1 and X2 without invoking X3 or by approximating X3.

130

Advances in Experimental Epistemology

Second, some philosophers have argued that many philosophically interesting concepts are simple (e.g., Fodor 1998; Williamson 2002). Yet, these concepts may still have interesting necessary conditions. For example, Williamson (2002) holds that although KNOWLEDGE is simple, the concept still has philosophically important necessary conditions. Many philosophers hold that a necessary condition for S knowing P is that P be true. Semantic integration can be used to examine whether the necessary condition is something that lay people accept. Third, some philosophers think that some concepts are sensitive to certain parameters and that this sensitivity is accessible to lay people. For example, Knobe (2010) holds that competent folk mental state attributions are sensitive to the moral valence of the content attributed, and some epistemologists have claimed that competent folk knowledge ascriptions are sensitive to practical interests (Pinillos 2012; Stanley and Sripada 2012) and moral properties (Beebe and Buckwalter 2010). In these cases, parameters like moral valence and practical interests do not necessarily constitute interesting necessary conditions for the concept. Yet, semantic integration is still apt for testing these parameters. We can do this by developing vignettes that include a critical sentence whose truth, together with a parameter, is thought to yield the target concept. If people consider the parameter to be relevant to the target concept, then the presence of the parameter ought to lead to greater false recall for words that lexicalize the concept (e.g., Henne and Pinillos in preparation; Waskan et al. submitted). Fourth, semantic integration can be used to study the semantic contours of a word even if the word does not express a unique concept. Certain words may be context-sensitive, expressing different concepts depending on the conversational setting. In the previous cases discussed, the concepts X1, X2, . . ., Xn are integrated into a concept C, yielding false recall of a lexicalization of C. If a word is context-sensitive, then its use may express different concepts C1, C2, . . ., Cn depending on the discourse context. A set of concepts X1, X2, . . ., Xn might integrate to produce C1, but not C4. Whether the concepts X1, X2, . . ., Xn are integrated and produce recollection for a lexicalization of C depends on the discourse context. Many philosophers accept that “knows” is context-sensitive, sometimes expressing different concepts KNOWS1, KNOWS2, . . ., KNOWSn corresponding

Semantic Integration as a Method for Investigating Concepts

131

to the different standards associated with “knowledge” (DeRose 1995; Cohen 1986; Lewis 1996). If a conversation takes place in a casual setting, the standards for knowledge might be lower than in a conversation taking place in a philosophy classroom. In the former, the use of “knows” might express the concept KNOWS1 while the use of “knows” in the latter context may express the concept KNOWS4. In examining the context sensitivity of a word, researchers can present information in the vignette to establish the discourse context and set the target concept. Being in a casual setting may communicate to the reader that the epistemic standards of the discourse context are low. Under low epistemic standards, the concepts LOW JUSTIFICATION and P is TRUE may yield KNOWS1 which the word “knowledge” expresses under low epistemic standards. Suppose the target sentence is “S believes that P.” In this case, participants would be likely to falsely recall “knows” as having appeared in the sentence. Alternately, if the discourse context associated with being in a philosophy classroom would establish a high epistemic standard, then the concepts LOW JUSTIFICATION and P is TRUE might not yield KNOWS4, and participants would be unlikely to falsely recall “knows.” If this account is on the right track, then researchers can also exploit semantic integration to examine context sensitivity. What these four points reveal is that the viability of the semantic integration method does not depend on any particular understanding of concepts. On the contrary, the method is applicable under a wide variety of assumptions about concepts. The versatility of the method is then especially useful for philosophers who themselves might disagree about the very nature of concepts.

3.3 Two experiments using semantic integration In the remainder of this section, we discuss two experiments we conducted that demonstrate how semantic integration can be used to investigate philosophically significant concepts. In this research we focus on the concept KNOWLEDGE, but recently other researchers have adopted our methods in order to examine EXPLANATION (Waskan et al. submitted) and CAUSATION (Henne and Pinillos in preparation). There are three main components in a semantic integration study. The first component is the passage containing the contextual information hypothesized

132

Advances in Experimental Epistemology

to semantically activate the target concept. In order to construct passages that yield false recall of KNOWLEDGE, we altered contextual information in different versions of a main story, controlling for word count, sentence length, and overall structure. In a preliminary study, we constructed two versions of a story about a detective (Jack Dempsey) who forms the belief that a suspect (a teenager named Will) is guilty. In the experimental condition, the detective’s belief is justified by legitimate evidence and his belief is true (the suspect is in fact guilty). In the control condition, the detective cannot find any evidence and participants are not told whether the suspect is guilty, but the detective forms the belief anyway. In each of these stories, we included a critical sentence containing a critical verb. Recall that when sufficient contextual information licenses using a more specific verb, people will falsely recall the more specific verb as having appeared in the passage. The critical verb must be consistent with the concept under investigation, but must not entail it. In our knowledge experiment, we chose “thought” as our critical verb; thinking that P is consistent with knowing that P, but does not entail knowing that P. We predicted that when read in the right context, a sentence containing the word “thought” would lead to false recall of the word “knew.” We predicted this will occur more frequently in the experimental condition where the appropriate context is supplied than in the control condition. Critical sentence: “Whatever the ultimate verdict would be, Dempsey thought Will was guilty.”

An additional consideration when choosing a critical and target word is the frequency with which that word occurs in English communications. Generally, it has been found that recall performance is better for high-frequency than for low-frequency words, and that the opposite is true for recognition performance (Kintsch 1970). That said, there is some evidence that low-frequency words might benefit at recall when they presented together with high-frequency words (Duncan 1974; Gregg 1976), as will likely be the case in semantic integration experiments. A good practice is to ensure that critical and target words are matched for frequency of occurrence as closely as possible. “Thought” and “knew” are reasonably well matched as the 179th and 300th most common English words, respectively (Wolfram|Alpha 2013a, 2013b).

Semantic Integration as a Method for Investigating Concepts

133

The second component of a semantic integration study is a distractor task. In principle, this distractor task could consist of almost anything. The purpose of the distractor is simply to diminish the effect of episodic memory in the recall task. Importantly, however, distractors should not contain either the critical verb or the target word. After reading the distractor, participants advance to the third part of the experiment, the recall task. There they are shown several sentences from the story, each with one word removed. Their task is to recall the word that appeared in the blank. In our experiment, we were interested in their recall performance for the critical sentence. During the recall task, participants were shown this sentence with the word “thought” replaced with a blank, as shown below: Recall Task: “Whatever the ultimate verdict would be, Dempsey _______ Will was guilty.”

Participants typed in the word that they recalled as having appeared in the original story. Consistent with our predictions, participants were more likely to recall “knew” as having appeared in the sentence when the detective’s belief was justified and true (Powell et al. submitted). Clearly, this finding does not demonstrate anything particularly interesting about KNOWLEDGE, but it does demonstrate that semantic integration can be used to examine philosophical concepts. Consequently, we investigated Gettier cases, a clearly more substantive issue in philosophy. We adapted our detective story and added another character named Beth. Beth is Will’s soon-to-be ex-girlfriend, who has it in for Will and interferes with Dempsey’s investigation. We created three versions of the story, a false-belief version, a Gettier version, and a justified true-belief version. In the false-belief condition, Will is innocent, but Beth framed him by committing the crime and planting evidence. In the Gettiered condition, Will committed the crime and disposed of all the evidence, but Beth makes sure Will gets caught by planting evidence for Dempsey to find. In the justified true-belief condition, Will committed the crime and left behind evidence. Seeing his mistake, Beth does nothing and waits for Dempsey to arrest Will.

134

Advances in Experimental Epistemology

We found participants were more likely to falsely recall “knew” as having appeared in the critical sentence in the justified true-belief and Gettiered conditions than in the false-belief condition. However, we also found no difference in recall between the Gettier and justified true-belief conditions. That is, a case of Gettiered justified true belief activated participants’ knowledge concept to the same degree that a non-Gettiered justified true belief did. This suggests that our participants did not distinguish between Gettiered and nonGettiered justified true belief (Powell et al. submitted). Though our findings may be surprising to some philosophers, they are consistent with results reported by Starmans and Friedman (2012), who concluded that lay people’s conception of knowledge greatly resembles the traditional definition: JUSTIFIED TRUE BELIEF. Still, more research is needed to explore the contours of lay people’s concept of knowledge.

4 Pragmatic considerations and demand characteristics Semantic integration tasks offer two important advantages over more explicit survey methods. For one, semantic integration tasks avoid the concerns raised by Cullen (2010) over pragmatic cues. Researchers using survey methods need to account for pragmatic cues in the stimuli that they present to participants as well as in their instructions, questions, and response options. In a semantic integration experiment, participants are told they are performing a memory task and nothing in the instructions, response prompts, or options indicates otherwise. While these materials are not devoid of pragmatic cues, pragmatic factors in this context are considerably less problematic and considerably better understood. Psychologists have studied memory since Ebbinghaus (1885/1964), and have developed reliable methods for testing people’s recollection of presented material. While it is clear that stimuli may still contain pragmatic cues and conversational implicatures, this fact is not in any way unique to semantic integration. For one, survey methods will also face these same concerns. Moreover, if one were skeptical about an experimental paradigm for this reason, one would also have to be skeptical about research on causal reasoning, decision-making, psycholinguistics, or nearly any line of research that involves presenting text to participants. The pressing concern

Semantic Integration as a Method for Investigating Concepts

135

is that pragmatic cues in instructions will lead participants to approach the experimental task incorrectly, or to interpret their response options in a manner inconsistent with the researcher’s intentions. Semantic integration tasks avoid these difficulties. Second, semantic integration tasks largely preclude demand characteristics. Even if participants are apprehensive about being evaluated, their apprehension is unlikely to lead researchers to any erroneous conclusions. Evaluationapprehension should motivate participants to perform the task well, and since there is no reliable way for participants to produce “desirable” answers except by probing their own memory, there is little risk of evaluation-apprehension leading to spurious findings. In addition, because the memory task is both intelligible and experimentally realistic, participants are less likely to take on the role of the faithful participant (Weber and Cook 1972). Even if some participants do ignore experimenters’ conversational implicatures, this is unlikely to affect their performance, as the instructions of a memory task can be made comprehensible without many contextual cues.

5 Caveats The interpretation of findings from semantic integration tasks depends on resolving three questions: 1. How are concepts structured? 2. What mental process leads to integration of semantic information? 3. Does “impure” semantic integration complicate matters?

5.1 The structure of concepts If semantic integration directly measures the semantic activation of people’s concepts, then one might wonder about the nature and structure of these concepts. As discussed, Gentner (1981) hypothesized that verb concepts are represented as structured collections of subpredicates. On the basis of this view, she made and confirmed very specific predictions about how representations would be combined during the processing of connected discourse, lending support for this theory. Still, psychologists have attempted

136

Advances in Experimental Epistemology

to describe concepts using a number of representational formats (e.g., Posner and Keele 1968; Medin and Schaffer 1978). This may prompt some to doubt that Gentner’s model of concepts is accurate, or to worry that, even if it accurately describes the representations of certain concepts, different types of concepts may be represented in other ways (e.g., natural kind terms, prototype or exemplar models, distributed representations, etc.). Although these possibilities may complicate the interpretations of semantic integration experiments, researchers who use semantic integration can remain agnostic to the “true” psychological theory of concepts. In fact, the method rests on two basic assumptions: (1) semantic concepts are mentally represented in some fashion and (2) memory for the meaning of a passage is more robust than memory for its exact wording. The first claim is a fundamental assumption of modern psychology and one which we will not defend. The second is supported by a large body of research on memory, some of which we discussed in Section 3 (e.g., Bransford and Franks 1971; Brewer 1977; Barclay 1973; Cofer 1973; Flagg 1976; Sachs 1967).

5.2 Mental processes and semantic integration Thus far we have reasoned as if integration occurs during comprehension and encoding, but another possibility is that integration actually occurs at recall. That is, during encoding people store the meanings of individual propositions separately. Then, at recall, they integrate these meanings by a process of inference to form a reconstruction of the memory for an individual sentence or proposition. Supposing this is true, it is worth noting that semantic integration still overcomes concerns about demand characteristics and pragmatic cues. However, it can no longer be said to provide a direct a measure of semantic activation. Rather, in this case the responses that participants give to recall prompts are just as dependent on inferential processes as their responses to surveys are. Fortunately, Gentner (1981) tested this possibility by inserting contextual information both before and after the critical sentence in a passage. She found that false recall for critical items was greater when the inserted material came before the critical sentence, supporting the interpretation that meanings are integrated online during discourse comprehension rather than after the fact during recall. This supports the claim that semantic integration isolates conceptual activation from downstream inferential processing.

Semantic Integration as a Method for Investigating Concepts

137

5.3 Impure semantic integration “Pure cases of semantic integration” (Gentner 1981, p. 371) occur when the subpredicate structures of n propositions are directly combined to produce some unified structure. For example, Gentner describes “gave” and “owed” resulting in recall for the verb “paid.” However, she also provides evidence that semantic integration can occur when the context does not directly specify any of the subpredicates in the new semantic structure. As discussed earlier, she found that people recall “painting” in place of “working” when they are told that the workers are “carrying brushes, whitewash, and rollers.” For integration to occur, people need to infer that the workers are using these materials, and thus, that they are painting. Ideally, when researchers are examining complex concepts, they can make inferences about the subpredicate structure of a concept based on people’s integration performance. Yet, it would clearly be an error to infer on the basis of Gentner’s findings that “carrying brushes, whitewash, and rollers” is really a component of the subpredicate structure of “painting.” As we noted earlier, spray painting does not require any of these materials. Earlier we identified the possibility of “impure” semantic integration as advantageous, allowing researchers to apply semantic integration methods in many different situations. However, this also means that researchers should exercise caution when making inferences about the subpredicate structures of putatively complex concepts on the basis of integration performance.

6 Alternate experimental designs and surveys 6.1 Similar experimental paradigms In this chapter we described an experimental method modeled after Gentner’s (1981) work on the semantic integration of verb meanings, and described its use for examining people’s concept of KNOWLEDGE. It bears noting that there are a number of other related experimental paradigms that have been used to examine semantic integration in discourse comprehension (e.g., Bransford and Franks 1971; Brewer 1977; Barclay 1973; Cofer 1973; Flagg 1976; Sulin and Dooling 1974; Thorndyke 1976; Owens et al. 1979), and that some of these paradigms might also be employed by experimental

138

Advances in Experimental Epistemology

philosophers. However, Gentner’s (1981) paradigm has several qualities that are desirable for experimental philosophers, even relative to other semantic integration tasks. First, the use of a free recall task makes its results more compelling than tasks that rely on recognition judgments. Participants’ responses to recognition tasks can be influenced by both true recollection as well as mere feelings of familiarity (Tulving 1985). In contrast, explicit recall of the word “knew” provides unambiguous evidence for the semantic activation of the concept KNOWLEDGE. Second, this paradigm focuses responses onto a single specific word of interest, whereas other semantic integration paradigms often ask participants to evaluate larger semantic units, such as phrases or sentences (e.g., Bransford and Franks 1971; Sulin and Dooling 1974). Specifying a target verb can reduce ambiguity in investigations of individual concepts. Thus, where possible, the semantic integration tasks we describe here are a superior method for examining the parameters involved with instantiating people’s concepts. Of course, not all concepts of interest will necessarily have a verb form (“knew”), with nearby synonyms (“thought,” “believe”). Where this is not the case, other semantic integration tasks may be more appropriate. The disadvantages associated with semantic integration tasks measuring recognition for sentences or phrases (e.g., Bransford and Franks 1971; Owens et al. 1979) are not insurmountable. In particular, memory researchers have developed procedures, like the remember-know procedure (Tulving 1985), that can help distinguish between genuine recollection and familiarity. With sufficient care, phrases or sentences can be crafted to unambiguously express whatever concept may be of interest to researchers (e.g., Waskan et al. submitted).

6.2 Surveys and semantic integration The methodological advantages of semantic integration owe to the implicit nature of the task. However, this also marks semantic integration tasks as importantly different from the explicit measures collected during survey tasks. Different research questions might warrant the use of either surveys or semantic integration. Many experimental philosophers hope to assess philosophical arguments by examining the psychology of concepts they employ. We have argued that,

Semantic Integration as a Method for Investigating Concepts

139

in general, semantic integration tasks are well suited for accomplishing this goal. Semantic integration tasks provide an implicit measure of conceptual activation, making them ideal for capturing these sorts of intuitive reactions. However, some philosophical concepts may also be applied to situations by more effortful cognitive processes. In these cases, explicit survey questions that elicit conscious consideration may be better suited if these questions can be adequately constructed. Additionally, surveys may be more appropriate where experimental philosophers are interested in people’s judgments. For instance, some researchers may not be interested in KNOWLEDGE per se, but in knowledge ascription behavior. In this instance, semantic integration tasks are inappropriate and surveys would be preferable.

7 Conclusion In this chapter, we discussed the ways in which pragmatic cues and demand characteristics can affect the results of surveys. In light of these problems, we argued that experimental philosophers should adopt a new experimental paradigm that we call semantic integration. Our experimental investigations of KNOWLEDGE demonstrate how this method can be used to examine philosophical concepts. Semantic integration can be applied to investigate complex concepts in a manner consistent with the aims of traditional conceptual analysis, and used to examine other parameters relevant to the instantiation of concepts. This method avoids concerns about pragmatic cues and demand characteristics because participants’ conceptual activation is measured implicitly through a memory task. For these reasons, semantic integration represents an important methodological advance in experimental philosophy.

Appendix Example vignette: Justified true belief condition Gary Hawkins was a counselor who treated troubled youths with long histories of abuse. He was having an especially hard time getting through

140

Advances in Experimental Epistemology

to two of his clients, a pair of 14-year olds named Will and Beth, who both seemed to dislike him. Most of Gary’s clients grew up poor and were at-risk youths. One morning, Gary was out for a jog in Millennium Park on the east side of Chicago. Gary’s path ran under Columbus Drive, and when he entered the unlit tunnel, his eyes were unadjusted to the dark. Suddenly, Gary felt a terrible pain at the back of his head and he fell to the ground. He hadn’t seen the attacker waiting in the tunnel with a weapon in their hand. The attacker continued to hit Gary with the weapon, bruising his ribs and arms. Then the attacker ran off, and Gary laid in the tunnel, dazed. Another jogger discovered Gary about a half-an-hour later and called the police. Detective Jack Dempsey was assigned to the case. Dempsey was a veteran detective who loved police-work, so he hurried to the hospital to interview Gary as soon as his doctors would allow it. Unfortunately, Gary was useless as a witness. He hadn’t seen the attack coming, and the blow to the head had left his memory hazy. Next, Dempsey started to question Gary’s clients, and Will really rubbed him the wrong way. Dempsey was immediately suspicious of him. Dempsey wasn’t the only one who disliked Will. Beth and Will were dating, and she suspected he was going to leave her. She wanted a way to get even with Will, and Will had told her a couple weeks before that he was planning to attack Gary in Millennium park. Dempsey started his investigation and found several pieces of evidence that pointed to Will. First, another officer found Will’s baseball bat near the scene of the crime. Then, Dempsey got a warrant and searched Will’s phone, where he found texts bragging about beating Gary up. Actually, Beth wanted to get payback for Will leaving her. She hoped Will would be caught for his crime. It sure looked like he was going to be. Will wasn’t careful to cover his tracks after he attacked Gary. He left his baseball bat at the crime scene, and then he sent texts from his phone bragging about the attack. After finishing his investigation, Dempsey wrote up his report for the district attorney based upon the evidence he had collected, including Beth’s testimony. He worked on his other cases until Will’s case went to trial. Whatever the ultimate verdict would be, Dempsey thought Will was guilty.

Semantic Integration as a Method for Investigating Concepts

141

Dempsey tried not to worry about work and just look forward to the weekend. His daughter was visiting colleges, and they were flying to New York together to visit NYU. Dempsey had never visited New York before, and he really needed a vacation. It would be a good chance for a break, although he kept warning his daughter that Chicago’s pizza was vastly superior.

Notes 1 Under this definition, there is some overlap with the concerns raised previously. For instance, pragmatic cues in base-rate experiments may have led participants to focus only on the descriptions they were given and to suppress information about base rates. Since the experimenters were interested in how participants would use all the information they were given to produce the most accurate judgment they could, participants who interpret the instructions in this way clearly did not perform the intended task. 2 Thus far we have treated subpredicates as linguistic items, but this is not necessary. One could also think of them as mental representations or concepts.

References Anderson, R. C. and Ortony, A. (1975), “On putting apples into bottles—A problem of polysemy.” Cognitive Psychology, 7, 167–80. Anderson, M. C., Bjork, R. A. and Bjork, E. L. (1994), “Remembering can cause forgetting: Retrieval dynamics in long-term memory.” Journal of Experimental Psychology. Learning, Memory, and Cognition, 20, 1063–87. Ayer, A. J. (1956), The Problem of Knowledge. London: Macmillan. Barclay, J. (1973), “The role of comprehension in remembering sentences.” Cognitive Psychology, 254, 229–54. Bartlett, F. C. (1932), Remembering: A Study in Experimental and Social Psychology. Cambridge, England: Cambridge University Press. Beebe, J. and Buckwalter, W. (2010), “The epistemic side-effect effect.” Mind and Language, 25, 474–98. Bransford, J. D. and Franks, J. J. (1971), “The abstraction of linguistic ideas.” Cognitive Psychology, 2, 331–50. —. (1972), “The abstraction of linguistic ideas: A review.” Cognition, 1, 211–49.

142

Advances in Experimental Epistemology

Brewer, W. F. (1977), “Memory for the pragmatic implications of sentences.” Memory & Cognition, 5, 673–78. Buehner, M. J., Cheng, P. W. and Clifford, D. (2003), “From covariation to causation: A test of the assumption of causal power.” Journal of Experimental Psychology. Learning, Memory, and Cognition, 29, 1119–40. Chafe, W. L. (1970), Meaning and the Structure of Language. Chicago: University of Chicago Press. Cheng, P. W. (1997), “From covariation to causation: A causal power theory.” Psychological Review, 104, 367–405. Cofer, C. N. (1973), “Constructive processes in memory: The semantic and integrative aspects of memory may reach far beyond the usual notions of the retention of dates, names, and events.” American Scientist, 61, 537–43. Cohen, S. (1986), “Knowledge and context.” The Journal of Philosophy, 83, 574–83. Cullen, S. (2010), “Survey-driven romanticism.” Review of Philosophy and Psychology, 1, 275–96. Deese, J. (1959), “On the prediction of occurrence of particular verbal intrusions in immediate recall.” Journal of Experimental Psychology, 58, 17–22. DeRose, K. (1995), “Solving the skeptical problem.” The Philosophical Review, 104, 1–52. Duncan, C. P. (1974), “Retrieval of low-frequency words from mixed lists.” Bulletin of the Psychonomic Society, 4, 137–8. Ebbinghaus, H. E. (1964), Memory: A Contribution to Experimental Psychology (H. A. Ruger and C. E. Bussenius, Trans.). New York: Dover. (Original work published 1885). Flagg, P. (1976), “Semantic integration in sentence memory?” Journal of Verbal Learning and Verbal Behavior, 15, 491–504. Fodor, J. (1998), Concepts: Where Cognitive Science Went Wrong. New York: Clarendon Press. Gentner, D. (1981), “Integrating verb meanings into context.” Discourse processes, 4, 349–75. Gregg, V. (1976), “Word frequency, recognition, and recall,” in J. Brown (ed.), Recall and Recognition. New York: Wiley, pp. 183–216. Grice, H. P. (1975), “Logic and conversation,” in P. Cole and J. L. Morgan (eds), Syntax and Semantics: Vol. 3: Speech acts. New York: Academic Press, pp. 41–58. Henne, P. and Pinillos, N. (in preparation), “Cause by Omission and Norms.” Kahneman, D. and Tversky, A. (1973), “On the psychology of prediction.” Psychological Review, 80, 237–51. Kintsch, W. (1970), Learning, Memory, and Conceptual Processes. New York: Wiley. Knobe, J. (2010), “The person as moralist account and its alternatives.” Behavioral and Brain Sciences, 33, 353–65.

Semantic Integration as a Method for Investigating Concepts

143

Krosnick, J. A., Li, F. and Lehman, D. R. (1990), “Conversational conventions, order of information acquisition, and the effect of base rates and individuating information on social judgment.” Journal of Personality and Social Psychology, 59, 1140–52. Lewis, D. (1996), “Elusive knowledge.” Australasian Journal of Philosophy, 74, 549–67. Loess, H. (1967), “Short-term memory, word class, and sequence of items.” Journal of Experimental Psychology, 74, 556–61. Loftus, E. F. and Palmer, J. C. (1974), “Reconstruction of automobile destruction: An example of the interaction between language and memory.” Journal of Verbal Learning and Verbal Behavior, 13, 585–9. Loftus, E. F., Miller, D. G. and Burns, H. J. (1978), “Semantic integration of verbal information into a visual memory.” Journal of Experimental Psychology, 4, 19–31. Medin, D. L. and Schaffer, M. M. (1978), “Context theory of classification learning.” Psychological Review, 85, 207–38. Miller, G. A. and Johnson-Laird, P. N. (1976), Language and Perception. Cambridge, MA: Belknap Press. Nisbett, R. E. and Ross, L. (1980), Human Inference: Strategies and Shortcomings of Social Judgment. New York: Prentice Hall. Orne, M. (1962), “On the social psychology of the psychological experiment: With particular reference to demand characteristics and their implications.” American Psychologist, 776–83. Owens, J., Bower, G. and Black, J. (1979), “The ‘soap opera’ effect in story recall.” Memory & Cognition, 7, 185–91. Pinillos, N. (2012), “Knowledge, experiments and practical interests,” in J. Brown and M. Gerken (eds), Knowledge Ascriptions. Oxford: Oxford University Press. Posner, M. I. and Keele, S. W. (1968), “On the genesis of abstract ideas.” Journal of Experimental Psychology, 77, 353. Powell, D., Horne, Z., Pinillos, A. and Holyoak, K. J. (2013), “Justified true belief triggers false recall of ‘knowing.’ ” Proceedings of the 35th Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society. Roediger, H. L. and McDermott, K. B. (1995), “Creating false memories: Remembering words not presented in lists.” Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 803. Thorndyke, P. (1976), “The role of inferences in discourse comprehension.” Journal of Verbal Learning and Verbal Behavior, 15, 437–46. Sachs, J. (1967), “Recognition memory for syntactic and semantic aspects of connected discourse.” Perception & Psychophysics, 2, 437–42. Schacter, D. L. (1995), “Memory distortion: History and current status,” in J. T. Coyle (ed.), Memory Distortion: How Minds, Brains, and Societies Reconstruct the Past. Cambridge, MA: Harvard University Press, pp. 1–43.

144

Advances in Experimental Epistemology

Schank, R. C. (1972), “Conceptual dependency: a theory of natural language understanding.” Cognitive Psychology, 3, 552–631. —. (1973), “Identification of conceptualizations underlying natural language,” in R. C. Schank and K. M. Colby (eds), Computer Models of Thought and Language. San Francisco, CA: W. H. Freeman & Co. Schwarz, N. (1994), “Judgment in a social context: Biases, shortcomings, and the logic of conversation.” Advances in Experimental Social Psychology, 26, 123–62. Schwarz, N., Strack, F., Hilton, D. J. and Naderer, G. (1991), “Judgmental biases and the logic of conversation: The contextual relevance of irrelevant information.” Social Cognition, 9, 67–84. Stanley, J. and Sripada, C. (2012), “Empirical tests of interest relative invariantism.” Episteme, 9, 3–26. Starmans, C. and Friedman, O. (2012), “The folk conception of knowledge.” Cognition, 124, 272–83. Stillings, N. A. (1975), “Meaning rules and systems of inference for verbs of transfer and possession.” Journal of Verbal Learning and Verbal Behavior, 14, 453–70. Sulin, R. A. and Dooling, D. J. (1974), “Intrusion of a thematic idea in retention of prose.” Journal of Experimental Psychology, 103, 255–62. Tulving, E. (1985), “Memory and consciousness.” Canadian Psychology/Psychologie Canadienne, 26, 1–12. Waskan, J., Clevenger, J., Harmon, I., Horne, Z. and Spino, J. (2013), “Explanatory anti-psychologism overturned by lay and scientific case classifications.” Synthese, 1–23. Weber, S. and Cook, T. (1972), “Subject effects in laboratory research: An examination of subject roles, demand characteristics, and valid inference.” Psychological Bulletin, 77, 273–95. Williamson, T. (2002), Knowledge and Its Limits. Oxford: Oxford University Press. Wolfram Alpha LLC. (2013a), Wolfram|Alpha. http://www.wolframalpha.com/ input/?i“knew”&a*C.knew-_*Word- (access March 3, 2013). —. (2013b). Wolfram|Alpha. http://www.wolframalpha.com/ input/?ithought&a*C.thought-_*Word- (access March 3, 2013).

6

The Mystery of Stakes and Error in Ascriber Intuitions Wesley Buckwalter

Research in experimental epistemology has revealed a great, yet unsolved mystery: why do ordinary evaluations of knowledge-ascribing sentences involving stakes and error appear to diverge so systematically from the predictions professional epistemologists make about them? Two recent solutions to this mystery by DeRose (2011) and Pinillos (2012) argue that these differences arise due to specific problems with the designs of past experimental studies. This chapter presents two new experiments to directly test these responses. Results vindicate previous findings by suggesting that (i) the solution to the mystery is not likely to be based on the empirical features these theorists identify, and (ii) that the salience of ascriber error continues to make the difference in folk ratings of third-person knowledgeascribing sentences. Imagine that two spouses are waiting to deposit a check at the bank on a Friday evening and the lines inside are very long. One says to the other, “I know the bank will be open tomorrow,” and suggests they return on Saturday morning instead. This mundane knowledge assertion seems true. However, now suppose instead that their check was actually for a large sum of money meant to cover a series of impending overdue bills, making its immediate deposit extremely important for their financial futures. It even occurs to them that the bank might have altered the open hours since their last Saturday visit. One says to the other, “I don’t know the bank will be open tomorrow.” In the latter circumstance, this knowledge denial seems true. Many philosophers have claimed that these contrary intuitions are best explained by the idea

146

Advances in Experimental Epistemology

that nonepistemic factors like practical stakes or the salience of error can play an important role in everyday evaluations of knowledge-ascribing and knowledge-denying sentences. The precise way of accounting for intuitions in these famous bank vignette pairs (see DeRose 1992) has played a prominent role in contemporary epistemology. One recent trend is to cite the content of our intuitions in bank and bank-style cases to support philosophical analyses of knowledge based on the way we actually make and assess knowledge ascriptions. Thus the ability to explain how the same knowledge denial and knowledge assertion can both be true in these two different contexts has inspired support for some of our best new competing theories of knowledge. And while debate continues among contemporary philosophers, most agree that this ability to account for bank intuitions—or the purported fact that people are more likely to ascribe knowledge when stakes and error are low than when they are high—counts as one significant theoretical advantage. Parallel to this theoretical work in epistemology, experimental philosophers of the last few years have also begun to investigate the factors that influence ordinary language practices involving knowledge sentences (for a review, see Buckwalter 2012). They’ve done this by running experiments in the social psychological tradition by constructing stimulus materials closely resembling bank case vignettes. In doing so, these researchers have sought to understand the conditions and the mechanisms at work when we make thirdperson attributions of knowledge. However, the results these experimental philosophers have uncovered have inspired a great, and yet unsolved, mystery. Current data suggest that people’s answers in these kinds of cases systematically diverge from the predictions of trained epistemologists. With few exceptions, the evidence to date indicates that when participants in an experiment are presented with bank-case-style stimuli, the factor of stakes plays only a marginal role in their knowledge ascriptions to others.1 This is puzzling since on the one hand, most professionals have agreed that stakes play an important role in how we assess knowledge-ascribing sentences. But on the other hand, experimentalists have been unable to reproduce commensurate findings. So, could it be that these experiments are somehow deficient in detecting the mechanisms that buttress our ordinary language practices involving knowledge, and if so, how? Or alternatively, are theoretical

The Mystery of Stakes and Error in Ascriber Intuitions

147

intuition pumps in the hands of epistemic experts less accurate measures of everyday knowledge ascriptions, and if so, why? To answer these questions is to solve the mystery of ascriber intuitions. Recently, two promising solutions to the mystery have been proposed by DeRose (2011) and Pinillos (2011, 2012) along the former lines, by questioning the designs of past experimental studies. They have suggested that folk intuitions only appear to radically diverge from professional epistemologists regarding subject stakes and attributor error due to various problems with the particular experiments that have been used to examine them. In response to these claims, this chapter presents two new experiments designed to directly test the solutions put forward by DeRose and Pinillos. After making adjustments for their worries, results vindicate previous findings by suggesting that (i) the solution to the mystery is likely not based on the particular empirical features these theorists have identified, and (ii) that the salience of ascriber error continues to make the difference in folk ratings of third-person knowledgeascribing sentences. In order to uncover the true culprits in our caper of stakes and error in ascriber intuitions, the chapter proceeds as follows. Section 1 reviews the evidence traditional philosophers and experimental epistemologists have generated involving third-person mental state attributions of knowledge. Sections 2–3 introduce the solutions proposed by DeRose and Pinillos, and respond with experiments designed to test them. Section 4 discusses possible resolutions to the mystery of ascriber intuitions based on these new data, as well as advances two further hypotheses about its origins. Lastly, Sections 5–6 conclude by revisiting the implications such results have for two leading theories of knowledge supported by ordinary intuitions.

1 Professional intuitions and experimental data It is often taken for granted in the contextualist literature that people’s intuitions in bank cases fluctuate between low cases, where the stakes or error possibilities of the case are minimal, and high cases, or where stakes and error are critical. Such intuitions have inspired some epistemologists to develop analyses of knowledge by focusing precisely on ordinary ascribing behaviors.

148

Advances in Experimental Epistemology

The result is that in addition to traditional factors like evidence or justification, recent discussions of knowledge have also begun to include more practical considerations that seem to be at work in everyday judgments.2 One strategy to capture these purported intuitions between bank cases is to focus on the pragmatic conditions that are directly relevant to the subject of the bank vignettes. Theorists such as Hawthorne (2004), Fantl and McGrath (2002, 2010), and Stanley (2005) advance accounts of knowledge by observing that not only the truth-relevant factors, but also the practical factors of the main character’s situation seem to have a large effect on our bank-case intuitions. Specifically, the claim is that ascriptions of knowledge are sensitive to the practical interests of a subject, the importance that he or she is right, and the personal costs involved with being wrong. While each of these theories are subtly different, for the sake of simplicity we will refer to the common metaphysical view, interest-relative invariantism (IRI), as the view that roughly, whether or not a true belief counts as knowledge, depends in part on what is at stake for the subject. A different strategy to capture bank intuitions is to focus on the use of the word “knows” and the factors that influence the truth of knowledgeascribing sentences. Standard epistemic contextualists (DeRose 1992, 1999, 2005, 2009; Cohen 1988, 1999, 2004) claim that in everyday usage, the very same knowledge sentence often seems to be true in specific sorts of contexts but false in others. While different contextualists are free to debate the factors that influence these truth conditions, the two factors most often discussed are the stakes of the attributor when ascribing knowledge to a third person, and also, whether or not the possibilities for error have been made salient for the attributor from context to context. The resulting semantic thesis regarding instances of “knows that p” is that the truth conditions of knowledge-ascribing sentences can be different from one conversational context to another, based on those details of the attributor’s situation. One thing that interest-relative invariantists and advocates of contextualism have in common is that they both claim to capture ordinary language practices involving knowledge-ascribing sentences. To find out what these ordinary practices actually are, evidence is usually collected as follows. We are invited to consider vignettes, or pairs of fictional scenarios, in which all the features of the cases are fixed, and where we are asked to make a decision involving what

The Mystery of Stakes and Error in Ascriber Intuitions

149

the protagonist knows. Then between the cases, we vary the factors of stakes or error, to see if our intuitions about the character’s knowledge changes. When we do this, if the result is that our intuitions change from case to case, this suggests that these factors play a role in how we assess knowledge sentences. The results of such thought experiments in the philosophical literature have indicated that most philosophers agree that stakes play an important role in ordinary knowledge ascription.3 And beyond the careful manipulations presided over by today’s leading epistemologists in these specific cases, there is something generally intuitive about the idea that the costs of one’s beliefs weigh heavily on our assessments of doxastic states.4 If this is the case, then it should be a straightforward task to empirically pinpoint stakes effects in our behaviors. And indeed, there are now several results in the social psychological literature surrounding this general issue of stakes and error independently of research on knowledge specifically. Mayseless and Kruglanski (1987), for instance, show that people’s subjective probability estimates are affected by the desire to avoid judgmental mistakes, in view of their perceived costliness. Fischer et al. (2008) have shown that people make decisions involving loss with significantly less subjective decision certainty (that loss-decisions are more difficult to make than decisions involving gains), and that this effect of loss-framing systematically increases people’s desire to search for confirmatory information like evidence. Similarly regarding error, hypotheses of selective exposure (see Smith et al. 2008) have long since held that people tend to avoid dissonant, but seek out consonant information, suggesting that the salience of error might affect our capacity and motivation for processing evidence in the face of critical information. However, in these discussions it is very important to distinguish the epistemic ascriber, or the person who claims someone has knowledge, from the epistemic subject, or the person of whom these claims are made. What the social psychology literature suggests is that regarding the latter, subjects in high-stakes situations, or cases of high potential for error, will have significantly less confidence in their judgments.5 What these results do not directly speak to is the former, or how such factors might affect third-person knowledge attributions, or our evaluations of other people’s first-person knowledge assertions.

150

Advances in Experimental Epistemology

This further question of third-person attribution is exactly what experimental philosophers have attempted to investigate.6 Despite the intuitions of epistemologists, as well as the results from social psychology, these data in experimental philosophy have largely turned up negative results for stakes. As Schaffer and Knobe (2012) say when summarizing these findings, “This research suggests . . . that—contrary to what virtually all of the participants in the contextualism debate have supposed—neither stakes nor salience impacts the intuitions of ordinary speakers.” The majority of these findings involve the familiar bank stimuli discussed above. While each experiment is slightly different, they all involve manipulating stakes by varying the importance of a protagonist’s financial circumstances (for instance, whether or not they have an impending bill coming due) and error factor by varying whether a character proposes some kind of relevant alternative to the knowledge claim asserted (for instance, that banks can sometimes change their hours without notice). Four independent studies have investigated the way stakes and error influence folk knowledge practices in these particular cases. Buckwalter (2010) found that in cases where a subject in this situation makes the knowledge claim, “I know the bank will be open on Saturday,” participants found it to be true no matter the stakes or error of the case. Feltz and Zarpentine (2010) found that whether or not epistemic subjects made knowledge assertions in a low-stakes cases or knowledge denials in high-stakes cases had no effect on people’s judgments about the truth or falsity of the knowledge statements. On the other hand, when participants were specifically asked to ascribe or deny rather than evaluate stated knowledge sentences, May et al. (2010) were able to detect a small effect of stakes (though mean judgments suggested that participants still agreed that subjects had knowledge despite this difference). Lastly, Schaffer and Knobe (2012) were able to detect a significant effect for error by making the error possibilities more salient through vivid and personal anecdotes (and we will revisit this idea in Section 2.2). Of course, while all of these studies involve the same general bank context, they each have their differences (for instance, they ask slightly different questions or use distinctive stimulus materials). Overall, however, people tended to either strongly ascribe knowledge or judge knowledge statements to be true despite the stakes manipulation of these various bank experiments.

The Mystery of Stakes and Error in Ascriber Intuitions

151

Only one study (Schaffer and Knobe 2012) was able to detect an effect for error. So at first glance, experimental data in philosophy appear to be at odds with evidence from three different sources: general intuitive seemings one might have about the role of stakes, the results contextualists and interest-relative invariantists have uncovered in select cases, and also, some neighboring results in social psychology concerning the impact of stakes on subjects’ degrees of confidence.7 But how could this be?

2 Bank experiments and epistemic contextualism Bank data generated from empirical studies conducted by experimental philosophers and from the intuitions of traditional epistemologists are at odds, and the game’s afoot to explain why. We now turn to the first candidate solution given by DeRose (2011), according to which three important deficiencies in the experimental designs of past studies are said to be responsible.

2.1 DeRose challenges the data In “Contextualism, Contrastivism, and X-Phi Surveys,” DeRose raises several crucial objections against previous bank experiments, as well as the implications these data might have for standard epistemic contextualism. DeRose (2011) points out that the studies all fail to incorporate all of the crucial elements needed to successfully test the empirical predictions of contextualism. The result, claims DeRose, is that “it turns out the intuitive support for contextualism doesn’t really face much of a wave of empirical trouble,” and that “there are some severe problems with taking the results of any of these studies (whatever their aims) as undermining the intuitive basis for contextualism” (83). According to DeRose, the first problem with earlier research is that some studies (May et al. 2010; Schaffer and Knobe 2012) use stimulus materials in which the epistemic subjects did not make an explicit knowledge statement within the confines of the actual vignette. Instead, these researchers asked participants to rate whether or not they thought subjects had knowledge in the various cases. The worry is that in cases where participants are asked to ascribe knowledge, rather than evaluate a knowledge statement, they

152

Advances in Experimental Epistemology

will be more likely to consider the particular features of their context (the experimental setting) rather than the epistemic subjects’ context (as stipulated in the vignette). So, if further research were to explicitly test the predictions of contextualism, a better experiment would have participants evaluate the truth of knowledge statements made by a speaker in the actual vignette (as opposed to simply being asked whether a given character in that vignette has knowledge). A second worry raised by DeRose is that previous experiments (Buckwalter 2010) fail to test all the crucial factors said to influence ordinary truth judgments of knowledge sentences. While it is true that some contextualists predict differences based on attributor stakes or the possibility for error, others suggest that it is the interaction of these two factors that affects attributors’ conversational contexts. And as DeRose says in an earlier paper, “The best case pairs will differ with respect to as many of the features that plausibly affect the epistemic standards, and especially those features which most clearly appear to affect epistemic standards as possible” (2009, 54). In the name of simpler experimental designs, however, bank cases are often presented by comparing the effect of stakes or error individually. By using vignettes that isolate these two factors, the concern is that the test pairs will not have included all of the relevant factors contextualists claim influence epistemic standards. So a better test would include cases with combinations of both high stakes and salience (as opposed to just one or the other). Lastly, DeRose points out that the best test cases for contextualism should capture how ordinary speakers actually use the knowledge claims in question. Typically, people in situations with low epistemic standards will tend to ascribe knowledge, and people in high-standards situations will deny knowledge. But some previous experiments asked participants only about the truth conditions of assertions (Buckwalter 2010; May et al. 2010). So the worry, however, is that by excluding people’s evaluations of knowledge denials, we will have accidentally manipulated some property of knowledge assertion, and not stakes or error more generally. Here, one might appeal to what David Lewis called the “rule of accommodation” (Lewis 1979). According to this rule, if someone says something false in a given conversational context, we will seek to change the features of that context (via any number of specific context-altering factors) such that the new context makes the

The Mystery of Stakes and Error in Ascriber Intuitions

153

content of that utterance true. Supposing that “knows” is a context-sensitive term subject to Lewis’ rule, it’s possible that participants in the bank experiments are more likely to agree that a high knowledge denial is true than they are to agree that a high knowledge attribution is false simply because of accommodation, and not because they sincerely judge high conversational contexts irrelevant when evaluating knowledge sentences. So in order to test this claim, participants in future experiments would need to receive cases involving both knowledge assertion and denials across the various conversational contexts presented.8 DeRose claims that while previous bank experiments may have included just one of these various components, only cases meeting all of these challenges will serve as accurate measures for the intuitive basis of contextualism. In the meantime, the absence of any one of these features could well serve as the explanation to the mystery of ascriber intuitions. So it looks as though experimental philosophers need to go back to the lab in order to more accurately test whether the predictions made by contextualists about stakes and salience in bank cases have ordinary intuitions on their side.

2.2 Meeting DeRose’s challenges The worries by DeRose discussed in the last section serve as important challenges to the interpretation of previous bank results. So one straightforward strategy is to move forward by including these challenges as factors in new experimental studies. The following experiment was performed to further test contextualism in precisely this way, by importing DeRose’s three worries into a new research design point by point. This is accomplished by independently varying three critical factors within the basic bank-style vignette: (i) the speech act of the character in the vignette could either be one of asserting that she has knowledge or denying that she has knowledge, (ii) the stakes could either be high or low, and (iii) the error possibilities could either be salient or nonsalient. And given the findings of Schaffer and Knobe (2012), this manipulation of the salient error possibilities differed from other previous studies by using a concrete and vivid example of error. This resulted in a study with a 2 (stakes)  2 (error)  2 (speech act) between-subjects design in which each participant was randomly assigned to

154

Advances in Experimental Epistemology

one of eight possible conditions. The possible combinations of the resulting eight conditions are shown below.9 Hannah and her sister Sarah are driving home on a Friday afternoon. They plan to stop at the bank on the way home to deposit their paychecks. As they drive past the bank, they notice that the lines inside are very long, as they often are on Friday afternoons. Low. Since they do not have an impending bill coming due, and have plenty of money in their accounts, it is not important that they deposit their paychecks by Saturday. Stakes

High. Since they have an impending bill coming due, and have very little money in their accounts, it is very important that they deposit their paychecks by Saturday.

Hannah says, “I was just at this bank two weeks ago on a Saturday morning, and it was open till noon. Let’s leave and deposit our paychecks tomorrow morning.” Low. Sarah replies, “So the bank will be open tomorrow?” Error

High. Sarah replies, “Well, businesses do change their hours sometimes. Just imagine how frustrating it would be driving here tomorrow and finding the door locked.” Assertion. Hannah says, “I was just here, I know that the bank will be open on Saturday.”

Speech act

Denial. Hannah says, “Maybe you’re right, I don’t know that the bank will be open on Saturday.”

After seeing one of the possible bank case combinations, and receiving a pair of comprehension checks, participants (N  215, 32 percent male) were then asked the following question:10 Assume that as it turns out, the bank really was open for business on Saturday. When Hannah said, “I (know / don’t know) that the bank will be open on Saturday,” is what she said true or false?

Answers were assessed on a five-item scale anchored with truth-value terms (e.g., 1  false, 3  in between, 5  true). Mean truth-value judgments in the bank cases are represented in Figures 6.1 and 6.2. The study yielded three key results.11 First, there was a main effect for speech act whereby people thought that knowledge sentences involving an assertion, no matter the error or stakes of the case, were more likely true than knowledge sentences involving denial.12 However despite this main effect, it’s

The Mystery of Stakes and Error in Ascriber Intuitions

155

5

Mean truth judgments

4

3

2

1 Low error Low stakes

High error High stakes

Figure 6.1 Mean truth judgments for knowledge denials grouped by error ( SE, scales ran 1–5). 5

Mean truth judgments

4

3

2

1 Low error Low stakes

High error High stakes

Figure 6.2 Mean truth judgments for knowledge assertions grouped by error ( SE, scales ran 1–5).

156

Advances in Experimental Epistemology

also true that participants in the experiment just generally judged everything true across the board. Notice that in the graphs above, responses have not been recoded from their original values. But since a “5” in knowledge assertion conditions is logically equivalent to a “1” in knowledge denial conditions, we can clearly see how high truth-value judgments in Figures 6.1 and 6.2 indicate that participants gave very different responses between speech act types. This suggests that while people find the knowledge assertion true in low and a knowledge denial true in high as contextualists predict in bank cases, intuitions were largely driven by accommodation. Second, we find exactly the impact of error possibilities that standard epistemic contextualists predict. When the possibilities for error were made salient (in a concrete and vivid way), participants were more inclined to say that the assertion of knowledge was false and that the denial of knowledge was true.13 This effect is shown below in Figure 6.3 by collapsing across all the various levels of high- and low-stakes cases administered: 5

Mean truth judgments

4

3

2

1 Denial Low error

Assertion High error

Figure 6.3 Mean truth judgments for low- and high-error conditions grouped by assertion and denial ( SE, scales ran 1–5).

Lastly, despite obtaining the predicted effect of attributor error, the predicted effect of subject stakes was not found.14 There was no general tendency for people in the high-stakes bank conditions to be more inclined to think that

The Mystery of Stakes and Error in Ascriber Intuitions

157

assertions of knowledge were false or that denials of knowledge were true.15 Even after correcting for all of the previous worries in bank case stimuli, an effect for stakes was not detected in people’s evaluations of these particular third-person knowledge-ascribing sentences. So here’s the state of play. Previous experiments have demonstrated that there is little reason to think stakes or error play meaningful roles in folk assessments of knowledge ascriptions in bank cases. Then, DeRose proposed three objections to the extant ascription data that might be responsible for the incongruent results between philosophers and ordinary people. When incorporating those exact worries into the current test (along with the vivid error possibilities advocated by Schaffer and Knobe 2012), we found that the actual problem with previous tests had to do with the error manipulation. That is, data seem to show that when error is vividly presented to participants in cases of both attributions and denials, there is an effect on third-person ascription.16 So the key message from this further experiment on the bank cases seems to be that the possibility of error made salient to the attributor does have an impact on the evaluation of truth conditions of sentences that attribute or deny knowledge, but that subject stakes do not. Of course, it is often difficult to confidently interpret null stakes results. Generally speaking, there could be numerous different reasons for why a given experiment does not find a hypothesized effect. While one reason could be that no such effect actually exists, any number of experimental confounds could also be responsible. One could also object that people simply weren’t paying attention, or that they were not holding fixed the important epistemic features of the cases (like the amount of evidence possessed by an epistemic subject, for instance) and that is why an effect of stakes on knowledge was not found on this particular occasion. While these are important worries to consider, it is not clear that they serve as viable explanatory hypotheses of the data in hand in the current bank study. Importantly, the space of candidate explanations of the null results for stakes in the present experiment is constrained by the interaction effect of error possibilities and speech act type. Unlike the previous studies that have turned up negative or null results for both stakes and error, this study found the predicted effect for the latter but not the former in a single experiment across the very same stimulus materials. So a plausible explanation for the absence of

158

Advances in Experimental Epistemology

the impact of stakes would still need to retain the ability to explain this result for error. Therefore it seems unlikely that one could rely on accusations about epistemic attention or shifting evidence to explain the lack of impact stakes had on knowledge judgments, while participants simultaneously behaved exactly as epistemologists were predicting when it comes to error. In this regard, we can have even more confidence than before to think that participants are considering these cases as epistemologists intend. It’s just that they don’t arrive at the same epistemic intuitions as the experts in these particular cases regarding stakes as they do for error.

3 Evidence-seeking experiments and IRI While there now might be some doubt about the specific features of the cases above regarding contextualism and salience of error, these data seem to very clearly question the ability of interest-relative invariantist positions to capture ordinary intuitions regarding subject stakes in bank cases. We now turn to the second candidate solution given by Pinillos (2012) suggesting that problems specifically to do with the stakes manipulations of previous bankcase experiments can account for the mystery of divergent ascriber intuitions.

3.1 Pinillos challenges the data Pinillos (2012) offers some compelling new experimental evidence suggesting that unlike the bank data consensus in experimental philosophy, subject stakes do in fact influence third-person mental state attributions of knowledge. Specifically, Pinillos uses an experimental paradigm measuring the amount of evidence participants require an epistemic subject to collect before they ascribe knowledge to that subject. The stimulus involves a college student who is proofreading his assignment for typos. In a low-stakes condition, it is not particularly important that the assignment has no errors, while in a highstakes condition, the student faces disastrous consequences should even one error be discovered by his professor: Typo Low Stakes. Peter, a good college student has just finished writing a two-page paper for an English class. The paper is due tomorrow. Even

The Mystery of Stakes and Error in Ascriber Intuitions

159

though Peter is a pretty good speller, he has a dictionary with him that he can use to check and make sure there are no typos. But very little is at stake. The teacher is just asking for a rough draft and it won’t matter if there are a few typos. Nonetheless Peter would like to have no typos at all. Typo High Stakes. John, a good college student, has just finished writing a two-page paper for an English class. The paper is due tomorrow. Even though John is a pretty good speller, he has a dictionary with him that he can use to check and make sure there are no typos. There is a lot at stake. The teacher is a stickler and guarantees that no one will get an A for the paper if it has a typo. He demands perfection. John, however, finds himself in an unusual circumstance. He needs an A for this paper to get an A in the class. And he needs an A in the class to keep his scholarship. Without the scholarship, he can’t stay in school. Leaving college would be devastating for John and his family who have sacrificed a lot to help John through school. So it turns out that it is extremely important for John that there are no typos in this paper. And he is well aware of this.

After seeing one of these conditions, participants were asked, “How many times do you think [Peter / John] has to proofread his paper before he knows that there are no typos?”. They were then told to fill in the blank with the number they thought was appropriate. The study showed that when participants were presented with either the lowstakes or the high-stakes conditions, people thought that the student needed to collect more evidence in order to know there were no typos when the stakes of the case were high (median  5) than when they were low (median  2). The finding suggests that since people require more evidence before ascribing knowledge to epistemic subjects in this way, folk attributions of knowledge must be sensitive to stakes. Furthermore, Pinillos suggests that this experiment may give us a unique perspective on what went wrong in bank cases. The worry, claims Pinillos, is that there really is no way to track whether participants are holding fixed crucial details of the cases, such as how much evidence the epistemic subject has between conditions. The thought is that evidence-seeking experiments are better equipped in this regard, since it is precisely the amount of evidence the subject should have which is measured. By detecting fluctuations in the amount of evidence required, these experiments show differences in knowledge judgments by stakes, casting doubt on the divergent intuitions previous

160

Advances in Experimental Epistemology

experimental philosophers have detected in bank cases. So regarding theories like IRI, it looks as though experimental philosophers also need to return to the lab, in order to find out exactly how stakes are influencing judgments about evidence-seeking behaviors when ascribing knowledge.

3.2 Meeting Pinillos’ challenges One latent worry in the evidence-seeking design is that the differences detected in the amount of evidence collected between low- and high-stakes subjects in this particular experiment could arise not because third-person mental state attributions of knowledge are or aren’t intrinsically sensitive to stakes, but rather because high-stakes subjects are expected to collect more evidence than low-stakes subjects to actually have an outright belief on the issue at all. So a further study was run to answer the question of whether stakes specifically affect mental state ascriptions of knowledge, or alternatively, whether these differences are instead an effect for some other mental state, like belief.17 In this study, 100 participants were given a manipulation as close as possible to what is used in Pinillos’ study involving subject stakes, but also varied the kind of mental state ascription that was attributed to that subject.18 This resulted in a 2  2 between-subject experimental design, independently varying the practical stakes (either high or low) and the mental state (either belief or knowledge) of the subject.19 After seeing one of the same stimuli given above in Pinillos’ original experiment, participants were then asked, “How many times do you think Peter has to proofread his paper before he [believes/ knows] that there are no typos?” and to “Please insert the number you think is appropriate in the space below.” Results are represented in Figure 6.4 below by mean scores of the amount of evidence needed in each case: We find that as before, stakes had a huge impact on ascriber intuitions. However, while the experiment showed a significant difference on people’s judgments between low- and high-stakes contexts, the specific mental state they were asked about within these contexts did not.20 In other words, participants gave roughly the same answers when they were asked how much evidence was needed to be collected before the epistemic subject had knowledge as they did how much evidence needed to be collected before the subject had a belief that a certain result would obtain.

The Mystery of Stakes and Error in Ascriber Intuitions

161

8

Evidence needed

6

4

2

0 Low stakes

High stakes Believes

Knows

Figure 6.4 Typo case results grouped by stakes and mental state ( SE).

In the dispute between intellectualists and interest-relative invariantists, advocates of IRI usually hold that subject stakes are themselves supposed to bear directly on the criteria for whether a subject’s true belief constitutes knowledge. Yet, identical scores between subjects in questions regarding whether an epistemic subject knows something is the case and believes something to be true suggest that the effect shown for stakes does not tell us about the specific criteria people are actually using when deciding if a subject’s belief constitutes knowledge. Instead, the fact that participants found that high-stakes subjects are expected to collect more evidence than low-stakes subjects do in order to count as believing suggests that subjects in high-stakes conditions are expected to collect more evidence just to make up their minds at all. It’s not that both subjects’ preexisting beliefs are transformed into knowledge with a greater amount of evidence in high-stakes cases than low-stakes cases, but rather that epistemic subjects need a disproportionate amount of evidence when forming the requisite belief. This suggests that this particular evidence-seeking experiment is a finegrained enough measure to have detected that subject stakes have some kind of an effect on participants’ general responses in the current experiment, but not

162

Advances in Experimental Epistemology

fine-grained enough to show that the relevant effect reveals something specific about knowledge in particular.21 For better evidence supporting the empirical predictions made by IRI, data would need to demonstrate that the stakes of the case are what matter for a subject’s true belief to count as knowledge.22 To be clear, nothing in this response to Pinillos rules out the possibility that stakes do actually play such a role in people’s judgments, or that such evidence could be collected in the future. Instead, this experiment was designed to show that we are not warranted in inferring support for IRI from these particular data regarding stakes. Further research is necessary before supporters of IRI can reasonably make the inference that knowledge ascription is particularly sensitive to stakes in the relevant way. And, antecedent empirical reasons for the conclusion that stakes do not play this role should inspire caution before overturning previous results to the contrary.

4 Toward solving the mystery We began with a great mystery. How can we make sense of the systematic differences between professional intuition and folk judgments in bank cases? Data indicate it is unlikely the mystery can be entirely solved by appealing to the specific problems that DeRose and Pinillos have identified with previous experiments. However, neither does the evidence suggest that philosophers using more traditional methods were completely mistaken about actual knowledge practices. Joining with past research, further data on bank-case intuitions suggest that while error is a factor that influences ordinary third-person mental state ascriptions of knowledge, stakes are not. The latest experiments show that knowledge ascriptions in the bank cases fluctuate when vivid error possibilities are made salient. Data from evidenceseeking experiments on the matter of stakes are shown to be inconclusive, and without further testing, do not yet undermine the growing consensus in experimental philosophy that subject stakes play but marginal roles in attributor judgments. Given these data, it seems that the solution to the mystery of ascriber intuitions is twofold. Something went wrong when previous experimental epistemologists claimed that the salience of ascriber error does not affect

The Mystery of Stakes and Error in Ascriber Intuitions

163

people’s knowledge judgments, and something went wrong when professional epistemologists claimed that their intuitions about the importance of subject stakes actually reflect ordinary people’s evaluations of knowledge-ascribing sentences. While the empirical data are a good start to solving this mystery, an interesting further question remains why experimentalists and traditional philosophers made the mistakes that they did. Before going on to speak of the philosophical ramifications of the data in these cases, we will pause to hypothesize about why or how this mystery developed. One hypothesis to explain why previous bank-case studies were unable to detect the intuitional variance epistemic contextualism predicts between contexts of low- and high-ascriber error seems relatively straightforward. In past experiments on bank cases, the factor of error was manipulated by only minimally mentioning the possibility of error. In Buckwalter (2010), for instance, the high-error bank-case interlocutor challenges her epistemic subject by speculating on only one general way in which she could be wrong (e.g., “Banks are typically closed on Saturday. Maybe this bank won’t be open tomorrow either.”). Similarly, May et al. (2010) take a similar tack when constructing a case of high error by having the epistemic subject point out that generally speaking, banks do change their hours. However, current empirical evidence suggests that the effect was detected in the present experiment simply by making the particular error manipulation in the bank vignettes more vivid: “Just imagine how frustrating it would be driving here tomorrow and finding the door locked.” Following Schaffer and Knobe, who showed a salience of error effect for knowledge attribution, present results demonstrate a similar effect when participants are asked to make truth judgments of knowledge ascriptions.23 The first culprit then, that helps explain the advent of the mystery of ascriber intuitions, is a specific problem with earlier experimental research. The problem was that the error possibilities were not made salient enough in bank cases to detect the difference in intuition between low- and high conversational contexts.24 Regarding the disagreement about the factor of stakes, however, it still remains incredibly puzzling why the intuitions found in these experimental studies continue to diverge so systematically from both the intuitions of trained epistemologists, as well as the predictions that results concerning first-person confidence might have made for third-person

164

Advances in Experimental Epistemology

knowledge ascription. Further experimental evidence continues to support the hypothesis that knowledge is sensitive to error, while continuing to question the joint hypotheses that practical stakes, and the link between subjects’ degrees of confidence and stakes, have anything but marginal impacts on ascribers’ intuitions in bank cases. One hypothesis to explain this difference is that ordinary people and trained epistemologists approach these thought experiments in different ways (see, e.g., Phelan forthcoming). On the one hand, participants of an experiment usually experience one particular case, and are then asked to report an immediate intuition about whether or not the epistemic subject has knowledge. By contrast, trained philosophers often proceed by considering pairs of bank vignettes together and then engage in a kind of reflection about whether the relevant differences in context have any epistemic importance. The philosophers then go on to make predictions about the judgments ordinary people will make in these cases on the basis of that evaluation. And these two different types of approaches to epistemic judgments may be shaped by two very different kinds of psychological processes. The former seems to be an implicit system-one intuition-generating capacity that enables us to respond to epistemic intuitions in particular cases with which we are presented. The latter is a system-two process, involving a more abstract set of theoretical beliefs about epistemic principles, as well as predictions about how others might conform to those principles.25 So could these different decision-making approaches between these two different groups account for the mystery in bank cases? Indeed, it’s possible that the predictions made by philosophers are subject to a kind of distinction bias often discussed in behavioral economics (Chatterjee et al. 2009; Hsee and Zhang 2004, 2010). The basic idea is that when presented with several vignettes differing by stakes—in similar to what Hsee and Zhang call “joint evaluation mode”—professional philosophers identify this feature of the vignettes and then make choices and form predictions based on their training and knowledge of the abstract epistemic principles involved. But conversely, the processes underlying ordinary people’s judgments when they are presented with singular cases—or in “single evaluation mode”—are based on preferences related to actually experiencing those particular cases. And since it seems likely that the preferences that philosophers use in the former mode of evaluation

The Mystery of Stakes and Error in Ascriber Intuitions

165

will be different from those of nonphilosophers in the latter, these different modes of evaluation may encourage philosophers to overpredict the impact of stakes in people’s actual knowledge judgments in bank cases. So regarding the mystery of ascriber intuitions, perhaps the second culprit is the bias that arises from the combination of formal philosophical training, together with making predictions when cases are presented under joint evaluation rather than experienced. If true, this hypothesis may be able to help explain how judgments made by expert epistemologists gave rise to the importance of stakes, but also how the role of stakes in ordinary bank-case judgments was mispredicted or exaggerated.26

5 Implications and philosophical importance Though we have made some empirical progress in resolving the mystery of ascriber intuitions, the mystery’s denouement raises perhaps an even more complicated philosophical question: how does this explanation about stakes and error in bank cases bear on epistemic contextualism and IRI? Beginning with contextualism, many philosophers have argued that as a semantic theory, it makes particular linguistic commitments about word usage. Hawthorne (2004) and Stanley (2005) argue, for instance, that the relevant kind of context-sensitivity of “knows” is objectionable because our usages of that particular word frequently deviate from usages of other common indexicals said to be context-sensitive. Going even further, Brown (2013) argues that since contextualism provides a linguistic model for “knows,” and given that such models provided by leading theories of contextualism are committed to certain kinds of context-sensitivity, contrary behavioral data about folk knowledge practices regarding such sensitivities would actually threaten to undermine the view entirely. If these arguments are correct, then getting the right results in bankcase experiments seems crucial. As it stands, one outcome of research up to this point is that semantic theories of standard epistemic contextualism are supported by experimental data showing an effect for at least one specific kind of context-sensitivity. Particularly, data show that such views that wish to include a correct theory of people’s ordinary language practices regarding

166

Advances in Experimental Epistemology

sensitivity to conversational contexts should focus not on the practical stakes, but rather on the error possibilities made salient to the attributor. In such cases, contextualism would not be undermined by the current experimental results. The clear upshot is not only has contextualism been shown to be compatible with the relevant knowledge behaviors, but also that such empirical evidence can be used to forge more detailed versions that specify with greater accuracy the relevant linguistic model claimed for the word “know.”27 Unlike the data relevant to contextualism showing folk sensitivity to error possibilities, however, current results continue to suggest that third-person attributions were insensitive to stakes. And, such findings seem to be clearly at odds with the premise that IRI best explains bank intuitions. In response to this tension, Brown (2013) argues that such experimental data only threaten to undermine one popular way of arguing for IRI, but not the position itself. Following Brown, we might note that there is no necessary dependence of the truth of metaphysical theses regarding things like temporal parts, the nature of substance—and likewise the determinants of knowledge—on concordant folk judgments or practices regrinding their central theoretical entities. As a metaphysical theory of knowledge, the truth of IRI does not turn on, and is not committed to ordinary intuitions about knowledge per se. Therefore, despite the results of the current studies, interest-relative invariantists are free to continue to include premises about the epistemic roles of subject stakes for the metaphysical determinants of knowledge at the cost of ordinary language.28 What the empirical evidence does seem to suggest, so far at least, is that IRI may not provide the best explanation for our epistemic behaviors regarding stakes in bank cases and beyond. In other words, the data from Pinillos have not been enough to convince us that ordinary assessments of these particular types of knowledge-ascribing sentences count in favor of the metaphysical view that knowledge is stakes-sensitive. And, while a metaphysical view need not enjoy folk agreement to be true, a safe bet is that the conclusions of such a thesis are, generally speaking, more likely true when supported by true premises rather than false ones. So this may encourage future supporters of IRI to develop and embrace alternative arguments for their view based on something other than folk practices regarding stakes (again see, e.g., Brown 2013; Fantl and McGrath 2010). Of course, there’s always the possibility that future experiments will

The Mystery of Stakes and Error in Ascriber Intuitions

167

discover the long-lost case that does display persistent stakes effects. And such cases will have to be evaluated on their own merit as they arise. But at the very least, the current data generated in response to DeRose and Pinillos—joined with the difficulty several independent researchers have faced in detecting anything but negligible stakes effects—begin to question whether building an epistemology around folk stakes sensitivity is a very good idea.29

6 Conclusion Experiments continue to suggest that accommodation and the salience of ascriber error, but not subject stakes, makes the difference in the ordinary evaluation of third-person knowledge sentences. But research exploring the ways in which people actually evaluate knowledge sentences can still be a benefit to the more traditional research in the field, serving to help supplement, and not supplant, such methods when appropriate. One of the main goals of this chapter has been to show that the empirical investigation of the mystery of ascriber intuitions can help contextualists and interest-relative invariantists become the best versions of themselves. By suggesting which specific features of an attributor’s context ordinarily affect the standards for knowledge, this research begins to allow a more accurate estimate of the linguistic model of the word “knows.” The result is that one way to be a better contextualist is to develop versions of the theoretical view in which the context-sensitivity of “knows” varies by accommodation and error. Similarly, experiments also continue to question the evidence supporting the claim that ordinary third-person knowledge judgments are sensitive to subject stakes. Such results may undermine one particular way of arguing for the thesis that knowledge is stakes-sensitive. Yet, they may also suggest that one way to be a better interest-relative invariantist might be to accept versions of the view that do not rely on premises concerning ordinary knowledge practices that people may not—or may not prevalently have. In both cases, empirical research in epistemology is additive to the theoretical work and methods in the field, inciting new directions for the partnership between future theoretical and experimental work on contextualism and IRI.

168

Advances in Experimental Epistemology

Acknowledgments Special thanks to James Beebe, Keith DeRose, Mikkel Gerken, Josh Knobe, Josh May, Jennifer Nagel, N. Ángel Pinillos, Jonathan Schaffer, Jason Stanley, Jonathan Weinberg, and other blog members who participated in lengthy discussions on the Certain Doubts blog, for helpful comments and suggestions. I am grateful to Josh Knobe, Jesse Prinz, and Stephen Stich for insightful comments on previous drafts, and continued support.

Notes 1 Since the writing of this chapter, new work by Sripada and Stanley (2012) has claimed to detect a stakes effect in unrelated cases. However, a critical discussion of these recent findings will be saved for a later occasion (see Buckwalter and Schaffer forthcoming). 2 Indeed there is a real debate as to whether evidence must be understood in a truth-conducive way (see, for instance, Fantl and McGrath 2010). 3 It is important to note that this discussion references the received view in the epistemic literate on these intuitions, and not experimental evidence directly measuring philosophers’ actual judgments. So it’s possible that factors like publication bias against those without stakes intuitions could be playing a role in artificially inflating the near consensus about bank cases. 4 Not all philosophers report having stakes-sensitive intuitions (see Schaffer 2006). This may point to the existence of important individual differences in bank cases and beyond. 5 Another thing that social psychology seems to suggest is that any impact of stakes on knowledge goes through an effect on credence. But if this is the case, then this would show that stakes are not necessarily an independent fourth factor in knowledge along with justified true belief, but merely causally connected to belief (see Weatherson 2005). 6 Phelan (forthcoming) has also shown that subject stakes have a marginal impact on people’s judgments about evidence. Beyond just looking at stakes and error, researchers have also shown that moral judgment can play a large role in people’s willingness to ascribe knowledge (Beebe and Buckwalter 2010; Beebe and Jensen 2012; Buckwalter forthcoming). Presumably, a theory of knowledge

The Mystery of Stakes and Error in Ascriber Intuitions

169

that wished to do justice to folk intuitions would also need to account for these epistemic judgments. 7 Marginal results for stakes on ascribers’ intuitions in bank cases may also call into question the assumption that the impact stakes have on subject’s degrees of confidence mean that stakes will have an impact on ascribers’ intuitions. 8 Though we might wonder, if the purported effect was driven primarily by accommodation, what would knowledge have to do with it? 9 This study used the internet-based commercial research tools Mturk and Qualtrics. Online samples were restricted to participants located in the United States. 10 Thirty participants were removed from this study for failure to pass two very basic comprehension check questions. 11 Means and standard deviations for denial conditions: Low Error/Low Stakes (M  3.48, SD  1.62), Low Error/High Stakes (M  4.15, SD  1.05), High Error/Low Stakes (M  4.27, SD  1.08), High Error/High Stakes (M  3.92, SD  1.12). Means and standard deviations for assertion conditions: Low Error/Low Stakes (M  4.70, SD  0.56), Low Error/High Stakes (M  4.48, SD  0.59), High Error/Low Stakes (M  4.05, SD  1.30), High Error/High Stakes (M  4.33, SD  0.73). In the results to be reported below, a three-way between-subject analysis of variance was conducted to evaluate the effect of error, stakes, and speech-act type on participants’ truth-value judgments in the bank cases. 12 Main effect for the factor of speech act, (F (1, 177)  8.6, p  0.01). 13 A significant interaction effect was found between factors of speech act and error, (F (1, 177)  4.62, p  0.05). 14 No significant interaction effect was found between speech act and stakes, (F (1, 177)  0.40, p  0.53). 15 Indeed, the only significant effect of stakes was an incredibly complex interaction. In cases with salient error possibilities, people were less inclined to say that the denial was true when the stakes were high (M  3.92, SD  1.12), whereas in cases without salient error possibilities, people were more inclined to say that the denial was true when the stakes were high (M  4.15, SD  1.05). In other words, there was an effect such that high stakes had opposite impacts on denials depending on whether error possibilities were made salient (F (1, 177)  6.00, p  0.05).

170

Advances in Experimental Epistemology

16 Relative to Lewis (1996), this may suggest that simply mentioning error possibilities is not enough to make them salient in the relevant, epistemic context-altering way. 17 For similar stakes results for other verbs besides “know” and “believe,” see Buckwalter and Schaffer forthcoming. 18 Ten participants were removed from this study for failure to pass a very basic comprehension check. 19 These materials are borrowed directly from Pinillos (2012). 20 Typo Low-Stakes Belief (M  2.71, SD  1.27), Typo Low-Stakes Knowledge (M  2.61, SD  0.89), Typo High-Stakes Belief (M  6.59, SD  5.05), Typo High-Stakes Knowledge (M  5.12, SD  3.42). A two-way between-subject analysis of variance was conducted to evaluate the effect of Mental State and Stakes on participant-free responses regarding evidence. A significant main effect was obtained for stakes, F (1, 86)  23.1, p  0.01. However, no main effect was found for Predicate, F (1, 86)  1.40, p  0.24, and no interaction between these two factors was detected, F (1, 86)  1.05, p  0.31. 21 The supporter of IRI might respond to this objection by claiming that if knowledge is a norm of belief, then identical results between mental states would be compatible with the impact of stakes on people’s criteria for knowledge ascription. This is certainly a possibility, just one that remains to be proven experimentally. 22 Indeed, another worry is that a possible ambiguity exists whereby a more natural reading of the question asked in these experiments is something like, “how many times should the subject proofread his paper in this situation?”. 23 The effect shown here for error is smaller than what was shown in work by Schaffer and Knobe (2012). 24 Feltz and Zarpentine investigate a range of life or death cases outside of bank contexts where subject stakes are quite vivid, but are also unable to detect stakes effects. 25 Phelan (forthcoming) tests a similar hypothesis by asking participants to judge which factors should affect an epistemic subject’s confidence in her beliefs. While participants profess to the general principle that stakes should affect confidence judgments (about evidence at least), they fail to allow the costs of being wrong to influence the actual judgments made in these cases when they experience them. 26 If this solution is correct, then these differences seem to highlight the need to institute more careful controls when utilizing the evidence-by-intuition method in philosophy. Such methods may be just as susceptible to criticisms one might

The Mystery of Stakes and Error in Ascriber Intuitions

171

make of any research program in psychology regarding experimental design confounds, or biases (see, e.g., order effects in trolley problem intuitions among professional philosophers by Schwitzgebel and Cushman 2012). 27 Specifically, the present evidence may tell against “pragmatist” contextualists, according to which practical matters partly determine the truth of knowledge ascriptions. 28 Though it is important to note that this is nonetheless a considerable theoretical blow, since IRI was not on the board before it was claimed that the alleged intuitions needed to be accounted for. 29 See, for instance, Weinberg’s notion of “philosophical effect size” whereby the simple detection of a psychological effect may not always be sufficient for supporting certain roles in philosophical argument without meeting a series of further conditions (2011).

References Beebe, J. R. and Buckwalter, W. (2010), “The epistemic side-effect effect.” Mind & Language, 25, 474–98. Beebe, J. R. and Jensen, M. (2012), “Surprising connections between knowledge and action: The robustness of the epistemic side-effect effect.” Philosophical Psychology, 25, 689–715. Brown, J. (2013), “Experimental philosophy, contextualism and SSI.” Philosophy and Phenomenological Research, 86(2), 233–61. Buckwalter, W. (2010), “Knowledge isn’t closed on Saturdays.” Review of Philosophy and Psychology, 1, 395–406. —. (2012), “Non-traditional factors in judgments about knowledge.” Philosophy Compass, 7, 278–89. —. (forthcoming), “Gettier made ESEE.” Philosophical Psychology. Buckwalter, W. and Schaffer, J. (forthcoming), “Knowledge, stakes, and mistakes.” Noûs. Chatterjee, S., Heath, T. B. and Min, J. (2009), “The susceptibility of mental accounting principles to evaluation mode effects.” Journal of Behavioral Decision Making, 22, 120–37. Cohen, S. (1988). “How to be a fallibilist.” Philosophical Perspectives, 2, 91–123. —. (1999), “Contextualism, skepticism, and the structure of reasons.” Philosophical Perspectives, 13, 57–89.

172

Advances in Experimental Epistemology

—. (2004), “Knowledge, assertion, and practical reasoning.” Philosophical Issues, 14, 482–91. DeRose, K. (1992), “Contextualism and knowledge attributions.” Philosophy and Phenomenological Research, 52, 913–29. —. (1999), “Contextualism: An explanation and defense,” in J. Greco and E. Sosa (eds), The Blackwell Guide to Epistemology. Oxford: Basil Blackwell, pp. 182–205. —. (2005), “The ordinary language basis for contextualism and the new invariantism.” Philosophical Quarterly, 55, 172–98. —. (2009), The Case for Contextualism. Oxford: Oxford University Press. —. (2011), “Contextualism, contrastivism, and x-phi surveys.” Philosophical Studies, 156, 81–110. Fantl, J. and McGrath, M. (2002), “Evidence, pragmatics, and justification.” The Philosophical Review, 111, 67–94. —. (2010), Knowledge in an Uncertain World. Oxford: Oxford University Press. Feltz, A. and Zarpentine, C. (2010), “Do you know more when it matters less?” Philosophical Psychology, 23, 683–706. Fischer, P., Jonas, E., Frey, D. and Kastenmüller, A. (2008), “Selective exposure and decision framing: The impact of gain and loss framing on confirmatory information search after decisions.” Journal of Experimental Social Psychology, 44, 312–20. Hawthorne, J. (2004), Knowledge and Lotteries. Oxford: Oxford University Press. Hsee, C. K. and Zhang, J. (2004), “Distinction bias: misprediction and mischoice due to joint evaluation.” Journal of Personality and Social Psychology, 86, 680–95. —. (2010), “General evaluability theory.” Perspectives on Psychological Science, 5, 343–55. Lewis, D. (1979), “Scorekeeping in a language game.” Journal of Philosophical Logic, 8, 339–59. —. (1996), “Elusive knowledge.” Australasian Journal of Philosophy, 74, 549–67. May, J., Sinnott-Armstrong, W., Hull, J. G. and Zimmerman, A. (2010), “Practical interests, relevant alternatives, and knowledge attributions: An empirical study.” Review of Philosophy and Psychology, 1, 265–73. Mayseless, O. and Kruglanski, A. W. (1987), “What makes you so sure? Effects of epistemic motivations on judgmental confidence.” Organizational Behavior and Human Decision Processes, 39, 162–83. Phelan, M. (forthcoming), “Evidence that stakes don’t matter for evidence.” Philosophical Psychology. Pinillos, N. Á. (2011), “Some recent work in experimental epistemology.” Philosophy Compass, 10, 675–88.

The Mystery of Stakes and Error in Ascriber Intuitions

173

—. (2012), “Knowledge, experiments and practical interests,” in J. Brown and M. Gerken (eds), Knowledge Ascriptions. Oxford: Oxford University Press. Schaffer, J. (2006), “The irrelevance of the subject: Against subject-sensitive invariantism.” Philosophical Studies, 127, 87–107. Schaffer, J. and Knobe, J. (2012), “Contrastive knowledge surveyed.” Noûs, 46(4), 675–708. Schwitzgebel, E. and Cushman, F. A. (2012), “Expertise in moral reasoning? Order effects on moral judgment in professional philosophers and non-philosophers.” Mind and Language, 27(2), 135–53. Smith, S. M., Fabrigar, L. R. and Norris, M. E. (2008), “Reflecting on six decades of selective exposure research: Progress, challenges, and opportunities.” Social and Personality Psychology Compass, 2, 464–93. Sripada, C. and Stanley, J. (2012), “Empirical tests of interest-relative invariantism.” Episteme, 9(1), 3–26. Stanley, J. (2005), Knowledge and Practical Interests. Oxford: Oxford University Press. Weatherson, B. (2005), “Can we do without pragmatic encroachment?” Philosophical Perspectives, 19, 417–43. Weinberg, J. (2011), “Out of the armchair, and beyond the clipboard: Prospects for the second decade of experimental philosophy.” Invited speaker, The 103rd Annual Meeting of the Southern Society for Philosophy and Psychology. March 11, 2011.

7

Is Justification Necessary for Knowledge? David Sackris and James R. Beebe

Justification has long been considered a necessary condition for knowledge, and theories that deny the necessity of justification have been dismissed as nonstarters. In this chapter, we challenge this long-standing view by showing that many of the arguments offered in support of it fall short and by providing empirical evidence that individuals are often willing to attribute knowledge when epistemic justification is lacking. In the early 1990s, Sartwell (1991, 1992) attempted to call into question the traditional view that justification is a necessary condition for knowledge. Unlike some epistemic externalists who suggested that the justification condition be replaced with reliable indication, sensitivity, or some other externalist condition, Sartwell contended that no replacement was necessary. Sartwell’s claims were initially met with incredulous stares and were soon largely ignored as their novelty diminished. More recently, other philosophers have taken aim at some of the other purportedly necessary conditions for knowledge. Hazlett (2010, 2012), for example, has pointed to the widespread willingness of individuals to attribute knowledge in the absence of truth, arguing that the ordinary concept of knowledge may not be factive after all. Myers-Schulz and Schwitzgebel (forthcoming) and Beebe (2013) have gathered empirical data that display folk willingness to attribute knowledge even in the absence of occurrent or dispositional belief. In this chapter, we seek to reopen the question of whether justification is a necessary condition for knowledge by taking a critical look at some of the philosophical arguments offered in favor of its necessity and by reporting the results of empirical studies that show participants are willing to attribute knowledge when there is insufficient evidence in favor of the belief in question.

176

Advances in Experimental Epistemology

In Section 1, we revisit Sartwell’s reasons for claiming that justification is a criterion for knowledge but not a necessary condition. In Section 2, we respond to objections against Sartwell’s view that are offered by Kvanvig (2003) and Lycan (1994). In Section 3, we report the results of empirical tests of some of Sartwell’s central claims. We hope that the resulting blend of philosophical argument and empirical results leads philosophers to take more seriously the suggestion that the ordinary concept of knowledge may not include justification.

1 Sartwell’s argument Sartwell begins his attack on the epistemological dogma that knowledge is at least justified true belief by arguing that the obvious importance of having a justification for one’s beliefs does not need to be interpreted as showing that justification is a component of knowledge. Rather, he suggests, it might simply be that justification is the most important criterion for knowledge. Asking for justification, after all, is often the best way to determine whether or not someone has a true belief. Because of the link between epistemic justification and truth, knowing that someone fails to have a good reason for believing a proposition is often what we rely upon most in determining that the belief cannot be trusted. Williamson (2000) makes an analogous point when he argues that the fact that knowledge entails justification does not show that justification is a constituent of knowledge. Unlike Williamson, however, Sartwell also argues that justification is not always required in order to correctly attribute knowledge. He notes that we are often willing to ascribe knowledge in instances of very weak or even absent justification, where, if justification was implicitly part of knowledge, we should otherwise deny that knowledge was present. Sartwell offers the example of a man who correctly believes his son is innocent of a crime in the face of overwhelming evidence against him, basing his belief solely upon the fact that the young man is his son. Sartwell claims that, in practice, we would likely say that he knows his son is innocent, despite the fact that the evidence he possesses does not support an attitude of belief. Sartwell considers several cases along these lines where an agent’s belief is eventually vindicated and claims the most natural thing to say is that the agent “knew it all along.” In Section 3, we report

Is Justification Necessary for Knowledge?

177

the results of asking ordinary participants whether the agents in several cases like these had knowledge. In line with Sartwell’s predictions, participants were found to be inclined to say the agents “knew it all along” in contexts where they had no justification or, indeed, where the evidence or justification they possessed pointed to the falsity of their beliefs. Sartwell (1991, 157–8) also considers typical counterexamples offered against his view. Critics often claim that his view implausibly counts as knowledge cases where someone (i) picks a winning horse by closing his eyes and placing his finger at random on a racing form, (ii) dreams that the Pythagorean theorem is true and comes to believe that it is true on that basis, or (iii) forms a true belief on the basis of some delusion. Sartwell argues that in order for these cases to succeed as counterexamples, they need to be examples of true belief but that they are often not plausibly construed as involving belief. Luckily guessing that p does not require believing that p. When picking a winning horse at random, you may hope your guess is correct, but you should not believe that it is. In Section 3, we describe the results of presenting three “lucky guess” vignettes to participants, the majority of whom judged the agents described therein to lack belief. In regard to the case of someone forming true beliefs on the basis of dreams or delusions, Sartwell argues that we need to consider what other supporting beliefs the agent possesses and the extent to which the agent fully understands the content of the belief in question. Sartwell (1991, 159) contends that if the agent has both a solid understanding of the belief and a genuine belief that it is true (which he claims entails “some degree of serious commitment to the claim”), then it should be counted as an instance of knowledge. As we report in Section 3, the intuitions of ordinary participants are modestly in accord with Sartwell’s claims about cases like this.

2 Objections to Sartwell 2.1 Kvanvig’s objections Although the main objection against the view that justification is not necessary for knowledge is its alleged counterintuitiveness, some philosophers have offered additional arguments against the view. For example, Kvanvig (2003)

178

Advances in Experimental Epistemology

believes that Sartwell fails to adequately deflect the challenge posed by some of the counterexamples he considers against his position. When Sartwell asks what we should say about a mental patient who believes that 2  2  4 on the basis of what she thinks the voices in her head have told her, Sartwell admits that, according to his view, we must ascribe knowledge to her. However, Kvanvig (2003, 6) complains: [B]ut all we get [from Sartwell] by way of argument for such a denial [of what the common view in philosophy maintains] is a remark that “it is natural in a case such as this one to say that we all know that 2  2  4; it is ‘common knowledge’; in a typical case it would be perverse to ask of any one person how she knows it.” None of these claims is a sufficient reply to the counterexample, however. It may be natural to say that everyone knows simple arithmetical truths, but it is false. It is natural to say it because the counterexamples are so rare, not because they do not exist.

The problem with Kvanvig’s criticism of Sartwell, however, is that Kvanvig fails to consider Sartwell’s actual response to the apparent challenge posed by the mental patient. Sartwell (1992, 163) distinguishes two reasons for asking “How do you know?”. When we ask this question, we may wish to determine if a person really does know the claim in question and does not merely believe it, or we “may be trying to ascertain the believer’s overall rationality.” That is, we may be trying to determine her overall trustworthiness as an informant, which will affect our further assessment of her claims. If we ask someone how she knows that 2  2  4, this does not necessarily mean that we are seeking to deny her knowledge. We may instead be trying to ascertain what she considers good grounds. When the mental patient replies that she believes this because the voices in her head told her so, we may determine that her belief is not well grounded and that she will be a generally unreliable informant without necessarily denying that she has knowledge. In other words, Sartwell thinks we can impugn the mental patient’s method of justification without denying that she knows. Kvanvig ignores this component of Sartwell’s response to the case. Furthermore, in Section 3, we report the results of a study in which ordinary participants display a willingness to ascribe knowledge to such a mental patient. In addition to asking whether the mental patient knows, we also asked participants if it would be true for the patient to say “I knew that 2  2  4 when I was delusional” after she had

Is Justification Necessary for Knowledge?

179

recovered from her delusion. Individuals were moderately inclined to ascribe knowledge in both instances. In spite of the fact that Kvanvig seeks to refute Sartwell’s position, much of what Kvanvig goes on to say about the nature of inquiry is actually quite amenable to it. For example, Kvanvig (2003, 54) argues that knowledge is not any more valuable than its parts: The goal of inquiry, however, is nothing other than getting to the truth and avoiding error, so any property of belief that is valuable from a purely intellectual point of view had better find some connection between that property and truth. So if justification is a valuable property of belief, it cannot be because it has value in and of itself, independently of any relationship to truth.

Sartwell agrees that the goal of inquiry can be specified in terms of true belief without bringing justification into the picture. If our epistemic end is fully achieved when we obtain true belief, Sartwell recommends understanding knowledge as being fully achieved as well. If, as Kvanvig argues, it is not clear how knowledge could be more valuable than true belief and if justification cannot add any value to true belief, perhaps this is a reason for thinking that knowledge simply is true belief. Kvanvig also criticizes Sartwell’s view that a criterion, or means for achieving some goal, cannot also be a constituent of that goal, arguing that this view is patently false. In maintaining that there are clearly some goals where the means to the goal is constitutive of that goal, Kvanvig gives the example of running a successful campaign as something that is both a means to and a constituent of being elected senator. In a second example, he notes that if one has the goal of having a million dollars, acquiring one hundred dollars is both a means to and a necessary constituent of that goal. Pierre Le Morvan (2002, 161–2) offers a similar objection, noting that, for Mill, pleasure is not only a means to but is also constitutive of happiness. We grant that these examples refute Sartwell’s unnecessarily strong claims about the relation of criteria or means to ultimate goals. However, none of these examples provides any reason for thinking that knowledge is sufficiently like these goals, and neither Le Morvan nor Kvanvig offers any additional reason for thinking that it is. Consider the fact that an incumbent senator can be reelected without running a campaign and that a relatively obscure individual who raises

180

Advances in Experimental Epistemology

his profile in the state as a result of a senate campaign might consider the campaign a success even if he is not elected. Thus, running a successful campaign may be both a means to and a constituent of being elected senator, but it might be neither one. In regard to Kvanvig’s million dollar example, we need to ask whether acquiring knowledge is sufficiently like acquiring a million dollars for us to think that a criterion for knowing must also constitute what it is to know. Kvanvig claims that acquiring one hundred dollars is both a means to and a necessary constituent of acquiring one million dollars. Note that becoming a millionaire is an accumulative goal— the goal is simply an accumulation of its means. However, knowledge does not seem to be a goal of this kind. Knowledge is not simply the accumulation of the means by which it is obtained. It might be correct to say that the more evidence one accumulates for p, the closer one comes to knowing that p. But even on the traditional epistemological view of knowledge, having an abundance of evidence that p is not the same as knowing that p. To the extent that knowledge is disanalogous to the senate campaign and million dollar examples, these examples serve as poor models for what it takes to know something.

2.2 Lycan’s objection Lycan (1994, 1) begins his critical discussion of Sartwell with the following, understated remarks: Crispin Sartwell has recently defended the antiSocratic and outrageous claim that knowledge is, as a matter of philosophical analysis, simply true belief. (Call that claim “TB.”) Sartwell has tried to discredit the obvious presumed counterexamples to TB, and he has also offered an ingenious positive argument in its support. I am unpersuaded by the argument, but in this note I shall merely deduce an ugly consequence from TB taken together with a few harmless assumptions, a consequence I take to be uncontroversially false.

Lycan’s focus is more on Sartwell’s claim that truth and belief are sufficient for knowledge than upon Sartwell’s claim that justification is not necessary.1 And although the latter claim is the focus of the present chapter, we will briefly consider Lycan’s objection before moving on. The “ugly” and “uncontroversially

Is Justification Necessary for Knowledge?

181

false” consequence that Lycan deduces from Sartwell’s view is that it could be possible (i) for Sartwell to believe that knowledge is merely true belief, (ii) for Sartwell to believe that he believes that knowledge is merely true belief, and (iii) for both of Sartwell’s beliefs to be true. What, you may wonder, is so damning about this possibility? Lycan (1994, 2) explains: “it is unlikely that anyone knows any highly controversial philosophical claim to be true, and it is unlikely that Sartwell is so arrogant as to believe he knows [what knowledge is] in particular.” We find it difficult to believe that the possibility that Sartwell knows what knowledge is counts as an “ugly” and “uncontroversially false” consequence of his view. We would have thought it would have been more damning if his view entailed that it could not be known. Lycan seems not to appreciate the fact (i) that most every philosophical position allows for its own knowability, (ii) that philosophers continually make claims about and hence represent themselves as knowing highly controversial philosophical theses, and thus (iii) that there is nothing special about Sartwell in this regard. Furthermore, imagine that one day Sartwell dies and arrives at the pearly gates and that the first question he asks is whether or not he was right about the nature of knowledge. If the answer he receives is “Yes,” we can easily imagine him saying “I knew it!” and this being a correct thing for him to say. In fact, in Section 3 we report results from a study where we presented participants with a case of this kind—pearly gates included—and found that participants agreed this would be the correct thing to say. Perhaps the real worry Lycan is trying to pinpoint is not the stated absurdity of taking oneself to know anything in philosophy but rather the fact that Sartwell’s view allows knowing anything (philosophical or otherwise) to be far too easy. In order for S to know that p, how much epistemic effort is required of S? Very little, if any. The only thing S needs to do is to believe that p. S’s belief also needs to be true, but bringing about the truth of p is (except in exceptional circumstances) not a task that falls to S. Rather, that “task” falls to reality. The objection, then, may be that knowledge requires more of a subject than Sartwell’s account demands. Sartwell can agree that epistemic life requires effort and that epistemic justification can often be difficult to come by; however, he can contend that this does not show that justification must be part of knowledge. It must also be kept in mind that the thesis of the nonnecessity of

182

Advances in Experimental Epistemology

justification is compatible with justification being required almost all the time. This means that Sartwell need not be interpreted as diminishing the important role that epistemic justification plays in our epistemic lives. Relatedly, Sartwell can agree that the norms of assertion license an assertion of p or a claim to know p only when one has a sufficient amount of justification for these claims. This means that on Sartwell’s view someone could know p and yet not be justified in asserting that p. Given the murkiness surrounding the issue of norms of assertion, however, it is far from clear that this should count as an objection against his view.

3 Empirical studies As we noted at several points above, many of Sartwell’s key claims about the intuitively correct verdict concerning potential counterexamples to his position are eminently testable and that we have indeed tested them. In the present section, we report the results of our studies. One of the most common counterexamples offered against the thesis of the nonnecessity of justification is a case where someone picks a winning horse by closing his eyes and placing his finger at random on a racing form. Because Sartwell (1991, 157–8) claimed that cases like this are ones where it does not seem that a belief is present, we asked experimental participants whether or not the protagonists in the following three vignettes had beliefs about the relevant propositions: RACETRACK: Jack decides to spend the day at the race track with his friends, although he does not know much about horses. He merely wishes to have a good time and hopefully to win a little money. In order to decide which horse to bet on, Jack simply closes his eyes and places his finger on the racing form. Whichever horse his finger lands on, he then places the minimum bet on that horse. This time, Jack’s finger lands on the horse named “Buy A Nose.” Jack then dutifully places the minimum bet on Buy A Nose and moves towards the race track to observe the upcoming race. To his delight, Buy A Nose ends up winning the race. Q1: Please indicate the extent to which you agree or disagree with the following claim: “At the time when Jack placed his bet, he believed that Buy A Nose would win the race.”

Is Justification Necessary for Knowledge?

183

BASKETBALL: Susan doesn’t know anything about college basketball but decides to fill out a college basketball bracket in order to participate in a competition being held at her office.2 She makes predictions completely at random about which teams will defeat other teams in order to fill out the bracket, not knowing anything about the teams or even where most of them are located. She then dutifully turns in her bracket to compete in the office competition. To her delight, Susan ends up winning the competition. Q1: Please indicate the extent to which you agree or disagree with the following claim: “At the time when Susan turned in her bracket, she believed that she would win the office competition.” ACADEMY AWARDS: Mike doesn’t know anything about the nominees for this year’s Academy Awards, but he decides to fill out a questionnaire that asks him to predict who will win each of the prizes in order to participate in a competition being held at his office. Mike makes predictions completely at random about which stars will win using a list of nominated actors and actresses he was given. He then dutifully turns in his questionnaire to compete in the office competition. To his delight, Mike ends up winning the competition. Q1: Please indicate the extent to which you agree or disagree with the following claim: “At the time when Mike turned in his questionnaire, he believed that he would win the office competition.”

In each of the studies described in this section, when participants were asked to indicate agreement or disagreement with a belief or knowledge ascription, they reported their answers using a 7-point scale marked with the labels “Completely Disagree,” “Mostly Disagree,” “Slightly Disagree,” “Neither Agree nor Disagree,” “Slightly Agree,” “Mostly Agree,” and “Completely Agree.” In a between-subject design, 98 undergraduate students (average age  22, 64 percent female, 74 percent Anglo-American) from the northeastern United States completed online questionnaires hosted at vovici.com in exchange for extra credit in an introductory course. Results are represented in Figure 7.1. As Sartwell predicted, participants displayed a disinclination to attribute belief in these cases. Averaging across all three cases, 62.2 percent of participants gave answers that fell below the neutral midpoint.3 The foregoing cases are brought forward as examples where truth and belief are supposed to be present without justification. The intuitively correct thing to say about them is that the protagonists do not have knowledge about the relevant propositions,4

184

Advances in Experimental Epistemology

Mean belief attribution

7 6 5 4 3 2 1

Racetrack

Basketball

Academy Awards

Figure 7.1 Mean belief attributions in the Racetrack (3.21), Basketball (2.67), and Academy Awards (2.69) conditions. An “*,” “**,” or “***” indicates that the mean differs significantly from the neutral midpoint at either the 0.05, the 0.01 or the 0.001 level. Error bars represent 95 percent confidence intervals in all figures.

and this is supposed to cast doubt upon the thesis of the nonnecessity of justification. However, as we can see, they are not taken to be instances of belief at all. Consequently, they fail to serve as effective counterexamples to the nonnecessity thesis. A second class of purported counterexamples to the nonnecessity of justification thesis concerns cases where a true belief is obtained in an epistemically unworthy manner as the result of cognitive malfunction or some other improper grounding. Each of the following vignettes is based upon examples discussed by Sartwell: CLINTON1: Sunil is an exchange student who has recently become highly delusional. He claims that demons are talking to him inside his head and that they tell him all sorts of things. Sunil believes everything the demons tell him. One of the things the demons tell him is that Hillary Clinton is the current U.S. Secretary of State. Sunil has never followed American politics very closely, but he comes to believe that Hillary Clinton is Secretary of State on this basis. It turns out, of course, Hillary Clinton really is the current U.S. Secretary of State.

Is Justification Necessary for Knowledge?

185

Q1: Please indicate the extent to which you agree or disagree with the following claim: “Sunil knows that Hillary Clinton is the current U.S. Secretary of State.” CLINTON2: Sunil is an exchange student who has recently become highly delusional. He claims that demons are talking to him inside his head and that they tell him all sorts of things. Sunil believes everything the demons tell him. One of the things the demons tell him is that Hillary Clinton is the current U.S. Secretary of State. Sunil has never followed American politics very closely, but he comes to believe that Hillary Clinton is Secretary of State on this basis. It turns out, of course, Hillary Clinton really is the U.S. Secretary of State. After Sunil eventually recovers from his state of delusion, he begins to learn about American politics. While reading about Hillary Clinton’s current role as Secretary of State, he thinks to himself “I first acquired knowledge of this fact back when I was delusional.” Q1: Please indicate the extent to which you agree or disagree with the following claim: “When Sunil was delusional, he knew that Hillary Clinton was the current U.S. Secretary of State.” SQUARE ROOT1: Jordan, a college aged student, has become highly delusional. He claims that demons are talking to him inside his head and that they tell him all sorts of things. Jordan believes everything the demons tell him. One of the things that the demons tell him is that 125 is the square root of 15,625, and he comes to believe that 125 is the square root of 15,625 on this basis. It turns out that 125 really is the square root of 15,625. Q1: Please indicate the extent to which you agree or disagree with the following claim: “Jordan knows that 125 is the square root of 15,625.” SQUARE ROOT2: Jordan, a college aged student, has become highly delusional. He claims that demons are talking to him inside his head and that they tell him all sorts of things. Jordan believes everything the demons tell him. One of the things that the demons tell him is that 125 is the square root of 15,625, and he comes to believe that 125 is the square root of 15,625 on this basis. After Jordan eventually recovers from his state of delusion, he begins to work on some math problems. Using a calculator, he finds that the square root of 15,625 is 125. Jordan then thinks to himself “I first acquired knowledge of this fact back when I was delusional.” Q1: Please indicate the extent to which you agree or disagree with the following claim: “When Jordan was delusional, he knew that 125 was the square root of 15,625.”

186

Advances in Experimental Epistemology

THEOREM: Brian is a 10 year old boy who has just begun to study geometry. One night he goes to sleep and dreams that the square of the hypotenuse of a right triangle is equal to the sum of the squares of its other two sides. On the basis of this dream, he comes to believe the Pythagorean Theorem. A few days later in school his teacher introduces the Pythagorean Theorem for the first time in class. Brian thinks to himself “I already knew that the square of the hypotenuse of a right triangle is equal to the sum of the squares of its other two sides.” Q1: Please indicate the extent to which you agree or disagree with the following claim: “Brian already knew that the Pythagorean Theorem was true.”

In the first four cases above, the protagonist experiences a psychotic episode in which he hears voices telling him either that a traditionally a priori proposition is true or that a traditionally a posteriori proposition is true. In each case, the protagonist believes what the voices say, and the belief turns out to be correct. In Clinton1 and Square Root1, we had participants consider the protagonists’ beliefs while they were still suffering from their delusions, whereas in Clinton2 and Square Root2 we portrayed the protagonists after they had recovered and were reflecting back upon their delusional state. In the fifth case, we had the protagonist form a belief on the basis of a dreaming episode, which—like hearing voices in one’s head—is widely taken to be an epistemically inappropriate basis for belief. In a between-subject design, 189 undergraduate students (average age  21, 64 percent female, 75 percent Anglo-American) from the northeastern United States completed online questionnaires hosted at vovici.com in exchange for extra credit in an introductory course. Results are depicted in Figure 7.2. In two of the conditions (Clinton2 and Square Root1) participants’ mean knowledge attributions fell significantly above the neutral midpoint.5 However, averaging across all cases, only 34.3 percent of participants gave responses that fell below the midpoint, whereas 54.5 percent gave answers above the midpoint. If the ordinary concept of knowledge requires that beliefs be epistemically well founded, it seems that a sizable portion of philosophically untrained individuals are handling the concept rather poorly. The third and final set of cases we tested involved protagonists whose evidence went against their beliefs but who had true beliefs nonetheless. In

Is Justification Necessary for Knowledge?

187

Mean knowledge attribution

7 6 5 4 3 2 1 Clinton1

Clinton2

Square Root1 Question

Square Root2

Theorem

Figure 7.2 Mean knowledge attributions in the Clinton1 (4.05), Clinton2 (4.85), Square Root1 (4.81), Square Root2 (4.33), and Theorem (3.76) conditions.

a between-subject design, 352 participants (average age  28, 61 percent female, 77 percent Anglo-American) from the United States were presented with one of the following vignettes and one of the two questions that appears after each vignette: JOHN: John’s daughter has been accused of murder. Even though she lacks a strong alibi and the police have compelling evidence against her, John feels she must be innocent. After several very stressful weeks, the actual murderer finally comes forward and confesses. Q1: Please indicate the extent to which you agree or disagree with the following claim: “John knew all along that his daughter was innocent.” Q2: In light of the information available to John BEFORE the actual murderer came forward and confessed, how likely was it that John’s daughter was innocent? SANDRA: The team of doctors responsible for treating Sandra’s cancer told Sandra’s husband, Mickey, that there was virtually no chance she would be able to beat the cancer and survive for more than a few months. In spite of what the doctors told him, Mickey was convinced that she would beat the cancer. In the end, Mickey’s wife survived the cancer and remained cancer free for more than 35 years.

188

Advances in Experimental Epistemology

Q1: Please indicate the extent to which you agree or disagree with the following claim: “Mickey knew all along that his wife would survive the cancer.” Q2: In light of the information available to Mickey BEFORE Sandra survived the cancer and remained cancer free for 35 years, how likely was it that Sandra would survive the cancer? BOB1: Bob is a scientist who has devoted his entire career to defending the view that prolonged cell phone use causes brain tumors. No other scientist, however, has accepted Bob’s theory. In fact, his papers are continually rejected for publication, and funding organizations always reject his requests for grant money. One day Bob dies and arrives at the entrance to heaven. The first question Bob asks upon arrival in heaven is whether or not he was right about the relationship between cell phone use and brain tumors. He learns that his widely disparaged theory is correct. Bob exclaims “I knew that prolonged cell phone use caused brain tumors!” Q1: Please indicate the extent to which you agree or disagree with the following claim: “Bob knew all along that prolonged cell phone use caused brain tumors.” Q2: In light of the information available to Bob BEFORE Bob dies and goes to heaven, how likely was it that Bob’s theory was correct?

Thus, we asked some participants whether the protagonists had knowledge, and we probed other participants about the strength of the protagonists’ evidence. We also employed a second version of the scenario involving Bob (viz., Bob2), where the statement “He learns that his widely disparaged theory is correct” was followed by “even though the experiments he tried to use to prove his theory were flawed.” This was added to make the evidence against Bob’s belief even stronger than in Bob1. The same questions that appear after Bob1 were used with Bob2. Participants responded to each of the Q1 questions using the same sevenpoint scale used in the above experiments (ranging from “Completely Disagree” to “Completely Agree”). Participants responded to the Q2 questions using a 7-point scale that was labeled with “Highly Unlikely,” “Moderately Unlikely,” “Somewhat Unlikely,” “Neither Likely nor Unlikely,” “Somewhat Likely,” “Moderately Likely,” and “Highly Likely.” Results are depicted in Figure 7.3. Mean knowledge attributions in the two Bob conditions fell significantly above the neutral midpoint, while the mean likelihood rating in the Sandra

Is Justification Necessary for Knowledge?

189

7

Mean judgment

6

5

4

3

2

1 John

Sandra

Bob1

Bob2

Vignette Knew all along

How likely

Figure 7.3 Mean knowledge attributions and likelihood ratings in the John (4.43, 3.62), Sandra (4.50, 2.94), Bob1 (5.20, 4.17), and Bob2 (4.82, 3.79) conditions. An “*,” “**,” or “***” with a bracket indicates a statistically significant difference between pairs of conditions at either the 0.05, the 0.01 or the 0.001 level.

case fell significantly below the midpoint.6 A set of independent samples t-tests confirms that each of the four mean knowledge ratings differs significantly from its associated mean likelihood rating.7 Thus, participants were more inclined to attribute knowledge to the protagonists in these four cases than they were to attribute evidence that made the protagonists’ beliefs more likely to be true than not. Although the scale used for rating the extent of participants’ agreement or disagreement with a knowledge attribution employed different verbal anchors than the scale used for rating their assessments of how likely certain outcomes were, we believe that a comparison between the two sets of participant responses is instructive. Both scales, for example, included a neutral midpoint, with deviations in two directions from this point. Because participants were significantly more inclined to think that the protagonists in the John,

190

Advances in Experimental Epistemology

Sandra, Bob1, and Bob2 vignettes have knowledge than they were to think that the protagonists had decent evidence for their beliefs (where evidence is construed in accord with philosophical tradition as having a probabilistic connection to the truth), we contend that these results tell against the necessity of justification thesis—at least to the extent that it purports to model folk knowledge attributions.

4 Conclusion It is important to distinguish the following two claims: 1. Justification is not a necessary condition for knowledge. 2. Knowledge is merely true belief. The second claim is stronger than the first in several respects. For our purposes, the most important is that (2) implies that justification is never required for knowledge. If truth and belief are present, knowledge is guaranteed to be so as well. By contrast, however, (1) is compatible with justification being necessary some of the time or even most of the time. It simply denies that justification is always required for knowledge. This means that a single, compelling counterexample in which an agent’s unjustified true belief fails to seem like a case of knowledge would do damage only to the second claim. Whereas Sartwell defends both claims in his writings, our main goal has been to reconsider the case against the former. We do not think there is any simple and direct argument from “Many philosophically untrained individuals are willing to attribute knowledge in the absence of solid evidence or justification” to the truth of (1). Consequently, we do not take our data to have established (1). However, we contend that our results undermine arguments against (1) that are based upon armchair appeals to what are assumed to be widely shared intuitions about the necessity of justification. According to our interpretation of the history of appeals to epistemic intuitions, (i) epistemologists used to appeal to the intuitions of both philosophical experts and the philosophically untrained in order to support their favored accounts of knowledge, until (ii) experimental philosophers came

Is Justification Necessary for Knowledge?

191

along and showed that the intuitions of the masses were often surprisingly different from what had been expected, after which time (iii) epistemologists claimed they had never been interested in folk intuitions in the first place. We think that one of the great benefits of experimental philosophy has been the motivation it has provided for philosophers to consider what (if anything) philosophical expertise consists in and how it can be detected and measured. Our results leave open the possibility that (1) is false, and that the epistemic intuitions of those with genuine philosophical expertise would support this fact. However, the vast majority of epistemologists who both maintain that (1) is false and reject the reliance upon folk intuitions in philosophical theory formation want to fashion theories of our ordinary, shared concept of knowledge. To the extent that they take themselves to be analyzing a concept possessed by the average person on the street and not a technical notion known only to specialists, our results provide a challenge to the long-standing view that justification is necessary for knowledge. We believe that the folk conception of knowledge is more contextually variable and multifaceted than most philosophical accounts of knowledge have assumed. We hope that the present set of arguments and studies contributes to a better understanding of its richness and complexity.

Notes 1 Cf. Section 4 for further discussion of this distinction and the difference it makes to the present chapter. 2 In the United States, “filling out a bracket” means predicting which teams will win which matches in a tournament. 3 One-sample t-tests revealed that each mean fell significantly below the neutral midpoint. Racetrack: t(32)  –2.802, p  0.01, r  0.44 (medium effect size). Basketball: t(32)  5.204, p  0.001, r  0.68 (large effect size). Academy Awards: t(31)  4.777, p  0.001, r  0.65 (large effect size). 4 Using an independent set of participants, we confirmed this common supposition. The mean knowledge attributions in the three cases were near the floor: 1.35, 1.43, and 1.48, respectively (on a scale from 1 to 7). 5 Clinton1: t(39)  0.149, p 0.05. Clinton2: t(32)  3.076, p  0.01, r  0.48 (medium effect size). Square Root1: t(35)  2.756, p  0.01, r  0.42 (medium

192

Advances in Experimental Epistemology effect size). Square Root2: t(39)  1.131, p 0.05. Theorem: t(32)  0.796, p 0.05.

6 John Knew All Along: t(39)  1.410, p 0.05. John How Likely: t(46)  1.845, p  0.071. Sandra Knew All Along: t(39)  1.900, p  0.065. Sandra How Likely: t(47)  4.223, p  0.001, r  0.52 (large effect size). Bob1 Knew All Along: t(39)  4.778, p  0.001, r  0.61 (large effect size). Bob1 How Likely: t(47)  0.893, p 0.05. Bob2 Knew All Along: t(39)  2.594, p  0.05, r  0.38 (medium effect size). Bob2 How Likely: t(47)  0.896, p 0.05. 7 John: t(85)  2.259, p  0.05, r  0.24 (small effect size). Sandra: t(86)  4.274, p  0.001, r  0.42 (medium effect size). Bob1: t(86)  3.364, p  0.001, r  0.34 (medium effect size). Bob2: t(86)  2.676, p  0.01, r  0.28 (small effect size).

References Beebe, J. R. (2013), “A Knobe effect for belief ascriptions.” The Review of Philosophy and Psychology, 4, 235–58. Hazlett, A. (2010), “The myth of factive verbs.” Philosophy and Phenomenological Research, 80, 497–522. —. (2012), “Factive presupposition and the truth condition on knowledge.” Acta Analytica, 27, 461–78. Kvanvig, J. (2003), The Value of Knowledge and the Pursuit of Understanding. New York: Cambridge University Press. Le Morvan, P. (2002), “Is mere true belief knowledge?” Erkenntnis, 56, 151–68. Lycan, W. (1994), “Sartwell’s minimalist analysis of knowing.” Philosophical Studies, 73, 1–3. Myers-Schulz, B. and Schwitzgebel, E. (2013), “Knowing that p without believing that p.” Noûs, 47, 371–84. Sartwell, C. (1991), “Knowledge is merely true belief.” American Philosophical Quarterly, 28, 157–65. —. (1992), “Why knowledge is merely true belief.” Journal of Philosophy, 89, 167–80. Williamson, T. (2000), Knowledge and Its Limits. New York: Oxford University Press.

8

The Promise of Experimental Philosophy and the Inference to Signal1 Jonathan M. Weinberg

My objective in this short chapter is to clarify and try to bring to the forefront of discussion a particular question of experimental philosophy’s philosophical significance: how is “positive program” experimental philosophy going to be able to make novel, substantive, mainstream contributions to mainstream philosophy? Let us put aside, for the moment, the methodological concerns of the “negative program” experimentalists; it is clear enough how that program at least is meant to have philosophical significance, by raising concerns about some contemporary philosophical practices. Moreover, there should be no question about how x-phi can contribute to projects that are unproblematically continuous with scientific psychology, such as the nature of folk psychology,2 or the effects—or noneffects—of philosophical training on ethical behavior.3 And to the extent that one is interested in the particular psychological project of mapping the contours of “folk philosophy,” such as what factors do or do not drive folk attributions of agency,4 then there is again no question about how experimental philosophy can contribute to such a project. And I also think it would be silly to try to argue that such projects are not “really” philosophical projects. That experimental philosophy is philosophy just is a settled question at this point, despite some garbled noisemaking from a handful of critics who have largely failed to engage with more than a handful of early papers. In that sense, I am taking there to be no issue here about the “philosophical significance” of x-phi (as Sommers has already argued so compellingly in his (2011); see also Alexander (2010)).

194

Advances in Experimental Epistemology

But that still leaves open the pressing question of how x-phi can make more direct contributions to traditional, mainstream sorts of projects in philosophy, addressing first-order questions in such areas as ethics, metaphysics, or epistemology. Suppose we’re interested, not in folk epistemology, but in the nature of knowledge itself, or not in a specification of the drivers of ethical or unethical behavior, but about what is or isn’t morally right and wrong—can x-phi offer any direct help in answering such questions? Experimental philosophy is sometimes lampooned as a kind of “philosophy by popularity poll,” or as operating with a numbskullish inference from “most of the folk intuit that P” to “P.” This has never had but the tiniest shred of truth, and even where it did, the experimental work in question was responding directly to claims in the armchair literature as to which intuitions were or were not widely held by the folk. Despite the spuriousness of that charge, though, it still does invite the question: what sort of inferences would experimental philosophers have us run, using their results as premises, if not something like an argument ad populum interrogatum?5 I want to suggest one possible answer to that question, and in doing so, to maybe tear down a bit the distinction between the positive and negative programs. My starting point—one I have always shared with such critical interlocutors as Timothy Williamson and Ernest Sosa—is to take our ordinary capacities to make judgments6 about knowledge, or right and wrong, or causation, etc., as having a default and defeasible reliability. But as is strongly suggested by work in negative-program x-phi, as well as of course entire research programs in psychology such as heuristics and biases, that baseline reliability is also very noisy. We know that, even if our judgments about philosophical matters are correct more often than not, even much more often, there still are likely a large number of errors that will be produced by those judgments. To make matters worse, we do not seem to have already in our philosophical toolkit very many good ways to sort the good judgments from the bad. Many of the noise making factors will operate subtly and unconsciously, and yield no indicating sign of trouble either in our ordinary practices or under armchair introspection and reflection. And that is where x-phi can come in—in helping us sort truth-tracking drivers of philosophical judgment from nontruthtracking ones.7 In thinking about this very real threat of noise tangled up in our philosophical judgments, we have to distinguish two different notions of signal.

The Promise of Experimental Philosophy and the Inference to Signal

195

Just because there is a psychologically significant effect to be found in our judgments does not mean it is a philosophically significant one, and it is that kind of question I am concerned with here.8 How do we tell when an x-phi study is tracking some bit of philosophical truth, and not merely amplifying a psychologically real divergence of our judgments from philosophical reality? Such a question cannot simply be handed over to the social sciences. It is a question, rather, about what sorts of further philosophical inferences can be drawn from some body of scientific observations that one might have gathered. Let me suggest that x-phi should answer such questions via what we might call the inference to signal: 1. The human capacity to make judgments in domain D and circumstances C delivers the verdict that P. 2. For any potential distorting factor which we may have good prima facie reason to suspect could be producing the verdict that P, we can rule out that it is in fact responsible for that verdict. Therefore, P. The inference to signal is only probabilistic, and it is also defeasible in a number of different ways. For example, for some philosophical subdomains, such as philosophy of physics, we clearly would not extend that degree of selftrust as a default. And any particular deliverance of our intuitive capacities will in principle be overridable by sufficient countervailing evidence. But, given those caveats, it is a cogent form of argument. Far from threatening to instantiate the ad populum fallacy, this inference simply relies on the minimal degree of Reidian degree self-trust that there is legitimate signal to be found in our philosophical cognition, while deploying further experimental resources as needed to amplify that signal as needed for the demanding purposes of philosophical research. Now, I am not claiming that we always need premises in exactly this form in order to be justified in believing P in virtue of our having judged P. I don’t want to have to make too many epistemological commitments here—I think the picture I am painting should work in some form, mutatis mutandis, for pretty much any view about dogmatism, default justification, and so on—but I would be happy to endorse at the very least something like the following: in the total absence of any evidence that one’s cognition in some area is systematically threatened by noise, one bears no obligation to demonstrate

196

Advances in Experimental Epistemology

that any particular deliverance of one’s cognition is sufficiently noise-free. In such cases, where there are no reasons for suspicion in play at all, premise 2 could be seen as vacuously true. Nonetheless, a great many philosophical propositions of interest to us operate in domains that are puzzling and highly contested. Many such propositions concern scenarios that are rather abstruse, or fall far outside the range of what our ordinary cognition is well shaped for, or have been designed specifically in order to pull apart characteristics that usually appear together in our actual experiences. Moreover, if the verdict of a thought-experiment is to do real argumentative work in refuting some candidate philosophical theory, then most likely, there will be significant considerations already in play against that verdict—namely, whatever considerations there might be in play that speak in favor of the theory to be refuted. And, of course, work in negative program x-phi presents a stillgrowing stockpile of possible quirks and kinks in human cognition that plausibly could introduce noise into our philosophical verdicts. Thus, many propositions of philosophical interest face too many causes for doubt already in play, for any simple inference to succeed from premise 1 all by itself to P. Psychological defeaters must be considered and ruled out, and that is the job of premise 2. A key promise of positive x-phi is found in its plausible ambition to determine when premises like those do or do not obtain. The first premise requires evidence that the deliverance that P is sufficiently widespread across the judging population, and not just a local tic. The second requires evidence that the deliverance is not just a widely shared product of some quirk of cognition, such as a funny influence of the order in which cases have been considered, or environmental conditions, or even font selection. It is a fact about the epistemology of the armchair that these sorts of claims are simply not ones that our armchair-available resources are sufficiently trustworthy about. This is not a deep fact about the armchair—it is not based on anything like a theory of the a priori, or the metaphysics of actuality and counterfactuality, or the structure of our concepts—but it is a methodologically very important fact nonetheless.9 To illustrate how the inference to signal might work, let us consider a situation in which we are debating whether some factor F should be incorporated into our theory of a philosophical domain D. I take it this is a common form

The Promise of Experimental Philosophy and the Inference to Signal

197

of first-order philosophical disputes, and would include debates over stakes sensitivity of knowledge or internalism/externalism about justification in epistemology, historical conditions in the theory of content, and consequences of our actions in ethics. And suppose we have at least some evidence, either from the armchair or from some piece of experimental work, that F plays at least some role in judgments in D, and in the circumstances C in which the judgments have so far been elicited. Let me suggest a set of three conditions that, when jointly met, give us good reason to infer that the truths in D really are sensitive to F, that is, that there properly should be an F-parameter in our theory of D: (i) The sensitivity to F is demographically robust; (ii) The sensitivity to F is contextually robust; and (iii) The sensitivity to F is of a philosophically substantive effect size. Demographic and contextual robustness are pretty straightforward. If F is a legitimate contributor to truths in D, then we should generally expect to find that F-sensitive D-judgments are not problematically local. That “generally” does do some work there, since for at least some domains we might have a positive reason to identify some privileged demographics: for example, we really should expect trained philosophers and mathematicians, say, to have more trustworthy judgments about validity than many other populations.10 But in the absence of any very good reasons to expect a given population to be exceptionally good (or bad) regarding D, then cultural or other demographic variation in F-sensitivity is a mark against F as a philosophically legitimate D-factor. But lack of such variation would, conversely, be some evidence that tracking F is part of how we successfully track facts regarding D. For F to be contextually robust, likewise, is for F’s contributions to D-judgments to be generally unresponsive to things like order effects, environmental manipulations, putatively irrelevant content manipulations, and so on. In short, varying the circumstances C should not unduly eliminate the sensitivity to F. If F contributes to D-judgments under some circumstances, but not others, then that is a reason to worry that F-sensitivity when observed really is just a bit of local noise, but where the locality is one of circumstance, not demographics. Again, that “generally” must be taken seriously. For example, if some other set of manipulations drive subjects’ judgments to ceiling or to

198

Advances in Experimental Epistemology

floor, then in the presence of that manipulation, D-judgments may become fairly insensitive to F, without its being F’s fault, as it were. Or, for that matter, under sufficiently demanding cognitive circumstances, one’s D-judgments may be reduced to random guessing, and thus insensitive to F. Although those tokens of “generally” may complicate how easily these criteria of philosophical substantivity may be applied, nonetheless, there is no need for these decisions to be made in a hand-wavy, gut-hunchy way. For we can examine how other, unproblematically D-relevant or D-irrelevant factors do or do not cease to contribute to D-judgments under these various circumstances. For example, suppose we are considering attributions of knowledge to some target agent. A context in which subject’s attributions of knowledge became insensitive to the truth or falsity of the agent’s beliefs is simply a context for which we should not care if they also become insensitive to the presence or absence of F.11 Or, regarding the matter of demographic variation, we can use the factivity of knowledge again to provide a baseline level of noisiness in knowledge attributions; if a factor F’s influence over knowledge attribution varies somewhat across groups, but only as much as factivity does, then that ought not be scored as a mark against F’s epistemological legitimacy. Issues of demographic and contextual robustness are somewhat on the radar of practitioners of positive-program experimental philosophy, in part due to the efforts of negative-program practitioners, and especially due to the high salience of research like that of Henrich et al. (2010). And some researchers have undertaken to check on the cultural variability or invariability of various aspects of philosophical judgments.12 But these are issues that must be pursued more directly and rigorously if we are to license inferences from x-phi results to first-order philosophical conclusions of the sort discussed above. The third criterion I want to put into play, however, is one that really has not been given very much attention yet in the x-phi literature. And that is the question of philosophically meaningful effect size. If our minimally Reidian starting point is correct, then we should expect that, in general, signal should be louder than noise. Alternatively, if the myriad nontruth-tracking factors that can nudge our cognition hither and thither have a bigger say in what our intuitions than the philosophical truths do, then we really would be pretty much epistemically at sea. Given our Reidian presumption, then, we should

The Promise of Experimental Philosophy and the Inference to Signal

199

be able to look at effect sizes as a further important tool for separating wheat from chaff: when F is signal, its influence on our judgments should be bigger than when it is noise. To even begin using that as a rough tool, though, we need to get a handle on effect sizes in x-phi. Now, most studies these days do report effect sizes, certainly at a minimum in gross terms, such as, for example, a 0.8 difference in means on Likert scores across two conditions. And very often (though perhaps not yet often enough), researchers are aware that this doesn’t necessarily tell us very much on its own, and so they appeal to standardized measures of effect size, like Cohen’s d, usually with an invocation of whether such an effect size would count as conventionally “small,” “medium,” or “large.”13 But, at best, such measures only tell us something about the size of the observed effect as compared to the background amount of variation on the task being studied. And though this is useful as a some measure of whether the effect is somewhat psychologically substantive, it is of little use in telling us whether some observed effect is philosophically substantive. For the typical x-phi design controls for most of what we already take to be philosophically meaningful influences on judgment. Most knowledge attribution studies only compare cases in which it is stipulated that the agent’s belief is true, for example. And so the d scores on such an epistemology task may be disconnected from the amount of variation in knowledge attribution at large. In particular, we would expect such d scores to end up being deceptively elevated, since the standard deviation in the reported evaluations of a focused set of closely matched scenarios considered in a typical x-phi study will be rather less than one would expect the standard deviation for a much broader set of scenarios. There are a number of ways that this concern about philosophically significant effect sizes could be addressed methodologically. One first idea, which I unfortunately am not sure would actually buy us much, would be for researchers to include in their studies a broader set of more highly varied cases. The idea here would be that one could have greater confidence that a standardized measure of effect size has not been distorted by which particular tasks have been examined. But for the standard sort of category-attribution tasks that experimental philosophers tend to focus on, I don’t think that this will actually do much good. To see why, consider what it would look like to take the idea to an extreme, where we imagine trying to measure the

200

Advances in Experimental Epistemology

background variation level for the whole broader class of judgments. That is, one would look at a large set of knowledge attributions, designed to be representative of such attributions on the whole, and see what the variance looks like in that much larger set. But we know that this will most likely cover the entire spectrum of possible responses—some cases will get strong clustering around attributions of knowledge, others strong clustering around nonattribution, with a whole bunch of cases in between to various degrees. That is, the standard deviation of knowledge attributions in general is going to be pretty big, and more to the point, it will simply be a function of the total spread of whatever scale one is using, and as such, cannot convey any new information that could help us in evaluating effect sizes. If one is using a 7-point Likert scale, then all 7 points on that scale will be well represented, and the whole question just collapses back into guesstimating how many points on such a scale are meaningful or not.14 If I am right, then it’s still going to be valuable to report measures like Cohen’s d, but primarily as a negative check: if an effect size comes in on the small side, then we probably have good reason to think it is not philosophical signal. But we cannot run the inference the other way, from large standardized effect size measure even to prima facie philosophical reality. Instead of trying to squeeze more juice out of the statistical measures of effect size, we ought instead to search for ways to provide philosophically meaningful measures. For example, one could also consider providing a modulus to subjects, to assign more “real” interpretations to their individual responses. For a knowledge attribution task, one could instruct subjects that a 7 rating meant that the agent knows the target proposition as well as a typical person knows the sum of two and two, or what their own name is; a 6 means that the agent knows the target proposition as well as a typical person knows what the capital of their home country is; and so on, down to a 1 meaning that the agent doesn’t know any better than if they were totally guessing. I think on the whole that we should be looking to explore many different ways of gauging philosophically meaningful effect sizes, but one strategy I would especially like to see explored would be to create scales in which the units are themselves meaningful in the target domain. In the Mohs scale, for example, the ordinal ranking of the hardness of minerals is a matter of which ones will scratch which others. To take a very different example, the Scoville

The Promise of Experimental Philosophy and the Inference to Signal

201

scale of gustatory heat is a measure of just how diluted a given spicy oil needs to be, before it will not register as at all hot to a panel of tasters.15 It’s not that scales like these automatically, transparently answer every question one might ask regarding effect sizes—there is, alas, no Scoville number that is the cutoff between “deliciously fiery” and “too darn hot”—but they go a long way toward providing meaningful measures in their home domains. One way to do this in x-phi would be to construct cases in which a parameter that we take to be unproblematically correlated with the philosophical truths in question can be systematically varied. There’s a lovely example in the behavioral economics literature, where Bertrand et al. (2010) manipulated various ways in which a credit card advertisement could be changed, to see what sorts of impacts those factors could have on whether or not people actually applied for the credit card, using real credit card applications. And one key parameter that they could vary was the actual interest rate on the credit card, which is obviously an economically meaningful variable. They write: Consumer sensitivity to the price of the loan offer will provide a useful way to scale the magnitude of any advertising content effects . . . Our main result on price is that the probability of applying before the deadline . . . increased by 3/10 of a percentage point for every 100 basis point reduction in the monthly interest rate. (Bertrand et al. 2010, p. 290)

And they could thus report, for example, not just that including a photo of a woman in the ad increased the likelihood of someone’s applying for the credit card, but moreover that this effect had the economically meaningful size equivalent to dropping the basis by 200 points! (ibid., p. 268)16 So, we could construct question schemata, in which we varied factors like quality of evidence, for knowledge attribution; number of lives saved/killed, for ethics judgments; or degree of physical restraint, for free will; and so on. And then we could use those results to create an index: such-and-such a manipulation of factor F is equivalent to (e.g.) saving 15 more lives. That could be pretty meaningful, regardless of the d score, and not so much for saving 0.2 lives. Now, consider the following schema: Alonzo is trying to figure out whether it is going to rain tomorrow. He checks a weather website that he is fond of, and the website says that it is

202

Advances in Experimental Epistemology

going to rain tomorrow. Now, this website is generally [X]% accurate, and in fact, in the last 100 times that Alonzo has used the website, it has been right [X] times. Based on what he reads on the website, Alonzo believes that it is going to rain. And in this instance, it turns out he is going to be right: it is in fact going to rain. Please indicate the extent to which you agree with the following statement: “Alonzo knows that it is going to rain.”

We can thus vary the value of X, to steadily decrease the quality of Alonzo’s evidence, and see how the attributions of knowledge thereby decrease. We could also manipulate the truth of the belief as well, if we wanted, as another variant. I unfortunately do not yet have any reportable results in developing such a scale. Some preliminary piloting (aided by Nina Strohminger) produces something roughly like the following, but I must emphasize that this is entirely meant as an illustration of the role such a scale could in principle serve. If we were to have such an index in place, then, we would be able to evaluate various reported results in the literature, mapping them on to the differences in their “X” values. Suppose that we had such an index in hand, that looked something like Table 8.1, what verdict would my (fictional, for now) index tell

Table 8.1 A fictional measure of influence on knowledge attribution Estimated 5-point Likert score

Estimated 7-point Likert score

100

4.8

6.7

99

4.5

6.3

95

4.2

5.9

90

4.0

5.6

85

3.7

5.2

80

3.4

4.8

75

3.0

4.2

50

2.7

3.8

40

2.3

3.2

20

2.1

3.0

X

The Promise of Experimental Philosophy and the Inference to Signal

203

us about various extant results in the experimental epistemology literature? The Beebe and Buckwalter (2010) results, finding a Knobe-like effect on knowledge attribution, would go from an X-score of 99 percent to one of 80 percent—not terribly dramatic, and thus would be better evidence that the Knobe effect is here a source of noise, not signal, concerning what knowledge really is or isn’t. (I believe this to be consistent with their own discussion.) But compare that with the Schaffer and Knobe (2012) contrastivist cases, in which the difference between the Likert scores of their cases is about an X-score difference of 90–20 percent, and thus clearly a much better prima facie case for an effect size that might track philosophical signal. One recent exchange in the literature features an explicit disagreement concerning exactly the epistemological significance of an observed influence of a contested factor on knowledge attribution. Sripada and Stanley (2012) find an effect, in a pair of closely matched cases involving a change in the stakes of an agent as to whether they do or do not know whether there are pine nuts in a particular dish. These authors are proponents of the view of interestrelative invariantism, and of particular relevance here is that this theory asserts that stakes really matter to whether or not an individual case of believing is also a case of knowing. Thus they are invested in giving the observed differences a substantive interpretation, and claim that Overall, the sizes of the stakes effects observed in this study were modest— typically around 1 point on a seven-point scale. But this perhaps should not be surprising. [Interest-relative invariantism] claims that knowledge and other epistemic properties are sensitive to stakes, not that they are entirely dependent on stakes and stakes alone. (16)

But this phrasing omits exactly the possibility under consideration here: that stakes might have a psychologically real, but philosophically spurious, effect on knowledge attributions. And rival labs have, unsurprisingly, challenged that interpretation. In particular, Buckwalter and Schaffer do not agree that stakes should make a legitimate difference in whether or not knowledge is present or absent. They provide an alternative spin, that to our eyes Sripada & Stanley’s effects looks most comparable to the very small effect reported by [earlier researchers]. The reader should keep

204

Advances in Experimental Epistemology

in mind that the cases tested . . . were actually put forward by advocates of . . . stakes sensitivity as the best cases for eliciting a stakes effect. One might be reluctant to build a radical epistemology around so modest a result.

But note both the reliance on a disappointingly squishy mode of argument here (“to our eyes . . .”) and the note of hesitance in the conclusion in this particular argument. (They do go on to offer more forceful objections, I should add.) This underscores exactly the current weakness, in the methodology of x-phi, regarding the question of philosophically significant effect sizes. The promise of something like this “X index” is to preempt such handwaving or eyeballing by rival sets of authors. For applying our toy index to their observed differences yields a measure of something like a difference between an X score of 20 percent and an X score of 25 percent. Given the very large difference in stakes between their cases—between a nut allergy that will make the agent’s mouth go “a little dry” and a nut allergy that will kill her outright—the epistemologically trivial effect size would give us a more objective reason to think that the stakes effects those authors observed are indeed just a very interesting piece of noise. (Let me also be clear that it can be very philosophically important that some effect fails to achieve a philosophically significant effect size. The thesis of stakes-insensitivity is just as philosophically important as the thesis of stakes-sensitivity, after all, but it of course predicts no philosophically significant effect of stakes on knowledge attributions.)17 Now, my point here is not to side with Buckwalter and Schaffer over Stanley and Sripada—this “X index” is just plain not yet up to that job, and I hasten to add that such an index, once properly developed, might well render a verdict in the opposite direction. Rather, my aim is to point out that not only do we need to do better than the kind of hand-waving that these authors do, but also that we can do better. The negative program in experimental philosophy has suggested that there is an awful lot of chaff out there mixed in with the philosophical wheat of our capacity for philosophical judgment. As the positive program develops active ways of threshing that challenge—deploying the inference to signal via more attentively and systematically for demographically and contextually robust results of potentially philosophically significant size—we will all be in a much better

The Promise of Experimental Philosophy and the Inference to Signal

205

place of having a whole loaf of experimental philosophy, and not seemingly incompatible half-loaves.

Notes 1 I am very grateful for helpful comments from Joshua Alexander, James Beebe, and Ron Mallon. 2 For example, Knobe (2010). 3 For example, Schwitzgebel and Rust (2009). 4 For example, Nichols and Roskies (2008). 5 I would welcome a better way of rendering “argument to surveyed people.” 6 Or intuitions, or verdicts, or whatever one’s preferred lingo is here; they can be used interchangeably, for present purposes. 7 I take this to a central point of my (2007). 8 See also Mallon (2007) and Alexander et al. (2010) on a closely related question of philosophical versus psychological competence. 9 It also may be that the inference to signal is both more than is strictly required to secure at least some positive justificatory status for one’s intuitive deliverances, and at the same time also a promising avenue for achieving a more justified status for the propositions in question. Remember that most positive program experimental philosophers happily endorse a “many roads to Allah” metaphilosophy, in which x-phi supplements and works alongside more traditional methods, without thereby displacing them. 10 See Pinillos et al. (2011) for an interesting version of this argument regarding how capable of reflection different subject populations are; though see also Weinberg et al. (2012) for some concerns about how easily we can assume that more reflective subjects will also thereby be more accurate ones. 11 Though if the folk are ultimately to dissent very widely from factivity, that may instead provide reason to worry that knowledge is not actually factive. I will set aside that concern for present purposes. 12 For example, Banerjee et al. (2011) and Sarkissian et al. (2010); also, Feltz and Cokely (2009) on demographic variation not of ethnicity, but of personality type. 13 See Cohen (1988), including both his justifications for the proposed conventions, and some stern caveats against deferring to them too uncritically. 14 This is not a problem that would necessarily arise for other sorts of tasks, such as Pinillos’ proofreading projection task in his (2012).

206

Advances in Experimental Epistemology

15 See Scoville (1912). 16 But only, it seems, for male subjects—female subjects were generally immune to all of the content-based manipulations by the researchers, including the photo manipulation. 17 My thanks to Ron Mallon for raising this concern.

References Alexander, J. (2010), Experimental Philosophy: An Introduction. Malden, MA: Polity Press. Alexander, J., Mallon, R. and Weinberg, J. (2010), “Accentuate the negative.” Review of Philosophy and Psychology, 1, 297–314. Banerjee, K., Huebner, B. and Hauser, M. (2011), “Intuitive moral judgments are robust across demographic variation in gender, education, politics, and religion: A large-scale web-based study.” Journal of Cognition and Culture, 37, 151–87. Beebe, J. and Buckwalter, W. (2010), “The epistemic side-effect effect.” Mind & Language, 25, 474–98. Bertrand, M., Karlan, D., Mullainathan, S., Shafir, E. and Zinman, J. (2010), “What’s advertising content worth? Evidence from a consumer credit marketing field experiment.” The Quarterly Journal of Economics, 125, 263–306. Cohen, J. (1988), Statistical Power Analysis for the Behavioral Sciences. Hillsdale, NJ: Lawrence Erlbaum. Feltz, A. and Cokely, E. (2009), “Do judgments about freedom and responsibility depend on who you are? Personality differences in intuitions about compatibilism and incompatibilism.” Consciousness and Cognition, 18, 342–50. Henrich, J., Heine, S. J., and Norenzayan, A. (2010), “The weirdest people in the world.” Behavioral and Brain Sciences, 33, 61–83. Knobe, J. (2010), “Person as scientist, person as moralist.” Behavioral and Brain Sciences, 33, 315–65. Mallon, R. (2007), “Reviving Rawls inside and out,” in W. Sinnott-Armstrong (ed.), Moral Psychology, Volume 2: The Cognitive Science of Morality: Intuition and Diversity. Cambridge, MA: MIT Press, pp. 145–55. Pinillos, A. (2012), “Knowledge, experiments, and practical interest,” in J. Brown and M. Gerken (eds), Knowledge Ascriptions. Oxford: Oxford University Press, pp. 192–219. Pinillos, N. Á., Smith, N., Nair, G. S., Marchetto, P. and Mun, C. (2011), “Philosophy’s new challenge: Experiments and intentional action.” Mind & Language, 26, 115–39.

The Promise of Experimental Philosophy and the Inference to Signal

207

Roskies, A. L. and Nichols, S. (2008), “Bringing moral responsibility down to earth.” Journal of Philosophy, 105, 371–88.) Sarkissian, H., Chatterjee, A., De Brigard, F., Knobe, J., Nichols, S. and Sirker, S. (2010). “Is belief in free will a cultural universal?” Mind & Language, 25, 346–58. Schaffer, J. and Knobe, J. (2012), “Contrastive knowledge surveyed.” Noûs, 46, 675–708. Schwitzgebel, E. and Rust, J. (2009), “The moral behavior of ethicists: peer opinion.” Mind, 118, 1043–59. Scoville, W. (1912), “Note on capsicums.” Journal of the American Pharmaceutical Association, 1, 453–54. Sommers, T. (2011), “In memoriam: The x-phi debate.” The Philosopher’s Magazine, 52, 89–93. Sripada, C. and Stanley, J. (2012), “Empirical tests of interest-relative invariantism.” Episteme, 9, 3–26. Weinberg, J. M., Alexander, J., Gonnerman, C., and Reuter, S. (2012). “Restrictionism and reflection: challenge deflected, or simply redirected?” The Monist, 95, 2, 201–22.

Index Alexander, J. 108, 193 anti-intellectualism about knowledge 9–38, 71–2, 75, 77, 79, 84–5, 87–9 assertion, norms of 182 Ayer, A. 128 bank cases 14–15, 73–5, 100–1, 145–58, 162–7 Beebe, J. 2, 5, 130, 175, 203 belief 4–5, 29–36, 160–2, 175, 177, 179–80, 182–4 Brown, J. 165–7 Buckwalter, W. 3–5, 14, 29–38, 77, 89, 101, 130, 146, 150, 152, 163, 203–4 chance 47–58, 60, 62, 64 Cohen, S. 47, 98, 130, 148 concepts 135–6, 191 conditions on knowledge 5, 175, 186–9 contextualism 3–5, 64, 76–7, 85–7, 89, 98–101, 107, 130–1, 148, 151–3, 158, 163, 165–7 cross-cultural differences 2, 10–11, 197–8 Cullen, S. 1, 120–3, 134–5 demand characteristics 124–5, 134–5 DeRose, K. 3, 46–7, 73–6, 85–7, 98, 100–1, 130, 145–8, 151–3, 157, 162, 167 effect size 6, 17, 199–204 error possibilities 3–5, 20–3, 72–5, 97, 99–100, 102, 105, 107–8, 110, 150, 153, 156–8, 162–3, 166 experts 11–13, 46–7, 163–4, 190–1 externalism, epistemic 175, 197 fallibilism 29 Fantl, J. 3, 148, 166–7

Feltz, A. 3, 17, 77, 89, 101, 150 Fodor, J. 130 Friedman, O. 1, 5–6, 47, 65, 134 gender differences 10–11, 197–8 Gettier cases 1–2, 46–7, 133–4 Goldman, A. 10–11 Grice 36–8, 120–3 Harman, G. 48, 55 Hawthorne, J. 3, 5, 46–7, 59, 98, 148, 165 Hazlett, A. 175 heuristics and biases 5, 56, 149, 194 anchoring and adjustment 34–8 base-rate neglect 121–4 conjunction fallacy 86–7 curse of knowledge (see epistemic egocentrism) distinction bias 164–5 epistemic egocentrism 2–3, 97–111 hindsight bias 108, 111 judgment reversal 81–3 outcome bias 108 representativeness heuristic 121–2 theoretical bias 10 Hsee, C. 81–3, 86, 164 implicature 13, 24, 72, 121, 134–5 see also Grice implicit measures 2, 119, 125, 138–9 internalism, epistemic 175, 197 intuitions 10, 72–3 invariantism 3–5, 97–101, 107 interest-relative 3–5, 11–12, 17–18, 20, 22, 24–5, 29–31, 148, 151, 158, 161, 166–7, 203 justification, epistemic 5, 47–50, 58, 64–5, 128–34, 175–84, 190–1, 195 justified true belief 1, 65, 128–9, 133–4, 176

210

Index

Kahneman, D. 34, 56, 76, 81, 86, 121–2 Knobe, J. 1, 101, 130, 150–3, 157, 163, 203 knowledge-action principles 25–9 knowledge-first epistemology 64–5, 130, 176 Kvanvig, J. 176–80 Lewis, D. 47, 130, 152–3 lotteries 5–6, 45–66 Lycan, W. 176, 180–2 May, J. 3, 72, 101, 150–2, 163 McGrath, M. 3, 148, 166–7 methodology experimental 1–2, 4–6, 11–12, 71–2, 75–7, 79, 119–25, 137–9, 193–4, 203–4 philosophical 9–11, 71, 76, 85, 87, 146, 149, 163, 165, 190–1, 193–201, 203–4 moral valence 2, 130, 203 Myers-Schulz, B. 5, 175 Nagel, J. 1–3, 97, 99–101, 103, 106–8 need for cognition 108–10 negative program 193–4, 196, 204 Nichols, S. 1–3, 47 Nisbett, R. 121 null results 4–5, 13–16, 78–81, 157–8 Phelan, M. 3–4, 72, 75, 77–81, 84–5, 89, 164 Pinillos, À. 4, 15–17, 30–2, 35–6, 87, 130–1, 145, 147, 158–62, 167 positive program 193, 198, 204 practical interests see stakes

Reid, T. 195, 198 Rose, D. 5, 46 Ross, L. 121 Sartwell, C. 175–86, 190 Schaffer, J. 4–5, 10–11, 29–38, 89, 101, 150–3, 157, 163, 203–4 Schwitzgebel, E. 5, 75, 175 semantic integration 2, 119–20, 125–39 skepticism 2–3, 13, 46–50, 54–5, 59–64, 72–3 Sosa, E. 194 Sripada, C. 15–16, 30, 130, 203–4 stakes 3–5, 12, 14–25, 27, 29–31, 33–7, 71–2, 77–81, 83–5, 130, 145–67, 197, 203–4 Stanley, J. 3, 15–16, 30, 130, 148, 165, 203–4 Starmans, C. 1, 47, 65, 134 Stich, S. 1–3, 47 testimony 46, 48, 51, 54–6, 64, 77–81 Turri, J. 5–6, 46–7 Tversky, A. 34, 86–7, 121–2 type II error 16 value, epistemic 179–80 Vogel, J. 45–7 Weatherson, B. 12 Weinberg, J. 1–3, 6, 47, 108 Williamson, T. 31, 47, 65, 98, 130, 176, 194 Zarpentine, C. 3, 17, 77, 89, 101, 150