Experimental Philosophy of Language: Perspectives, Methods, and Prospects 3031289072, 9783031289071

This book presents the current state of experimental philosophy of language, drawing attention to corpus methods. The vo

230 105 7MB

English Pages 295 [296] Year 2023

Table of contents :
Acknowledgments
Contents
1 Introduction: 20 Years of Experimental Philosophy of Language
1.1 Experimental Philosophy
1.2 Experimental Philosophy of Language
1.3 The Purpose of This Volume
References
Part I The Experimental Philosophy of Language Methodology
2 A Bibliometric Analysis of Experimental Philosophy of Language
2.1 Introduction
2.2 Experimental Philosophy of Language
2.3 Methodology
2.3.1 Data Collection
2.3.2 Bibliometric Techniques
2.4 Results and Discussion
2.4.1 Main Information
2.4.2 Sources
2.4.3 Authors and Countries
2.4.4 Documents
2.5 Conclusions
References
3 Experimental Philosophy and Ordinary Language Philosophy
3.1 Introduction: Experimental Philosophy and Ordinary Language Philosophy
3.2 Argument from Cross-Linguistic Diversity
3.3 Argument from Intra-Linguistic Divergence
3.4 The Supplementary Picture of X-Phi
3.5 X-Phi Defends OLP
3.6 OLP as a Target of Negative X-Phi
3.7 OLP Helps X-Phi
3.8 What Does X-Phi Do?
3.9 Concluding Remarks
References
4 Does Scientific Conceptual Analysis Provide Better Justification than Armchair Conceptual Analysis?
4.1 Introduction
4.2 Armchair Conceptual Analysis and Scientific Conceptual Analysis
4.3 The Argument from Uniformity of Agreement
4.3.1 Argument from Uniformity of Agreement
4.3.2 Argument from Uniformity of Agreement – Complex Version
4.4 The Expertise Defence
4.4.1 The Expertise Defence
4.5 The Study
4.5.1 Design of the Study
4.5.2 Results and Discussion
4.6 Conclusion
Appendix
References
5 Distributional Theories of Meaning: Experimental Philosophy of Language
5.1 Introduction
5.2 Distributional Semantics and the Distributional Hypothesis
5.3 Constructing DSMs: An Overview
5.4 The Success of the Approach
5.5 Objections to DSMs as Theories of Meaning
5.5.1 No Understanding
5.5.2 No Compositionality
5.5.3 No Cognitive Plausibility
5.5.4 No Granularity of Meaning
5.6 DSMs as Holistic Theories of Meaning
5.7 Conclusion
References
Part II Experimental Philosophy of Language and Corpus Methods
6 Are Moral Predicates Subjective? A Corpus Study
6.1 Introduction
6.2 Moral Predicates and Subjectivity: A Snapshot of a Long-Standing Philosophical Debate and the More Recent Empirical Turn
6.2.1 The Vexed Issue of Moral Subjectivity
6.2.2 Tracking Subjectivity Semantically
6.3 The Corpus Study
6.3.1 Corpus Used and Raw Data Collection Method
6.3.2 Initial Results and Data Filtering
6.3.3 Observations on the Corpus Data
6.3.4 Discussion
6.4 Conclusion
References
7 Linguistic Corpora and Ordinary Language: On the Dispute Between Ryle and Austin About the Use of `Voluntary', `Involuntary', `Voluntarily', and `Involuntarily'
7.1 Introduction
7.2 Ryle on `Voluntary' and `Involuntary'
7.3 Further Claims: Austin's `Special Circumstances' and Beyond the Received View of Ryle
7.4 Categorizing Uses
7.5 Conclusion
References
8 Light in Assessing Color Quality: An Arabic-Spanish Cross-Linguistic Study
8.1 Introduction
8.2 Indexical Theories on Color Terms
8.3 What's with the Light?
8.4 Methods and Materials
8.5 Analyses
8.6 Discussion
8.7 Conclusions
References
Part III Politically-Engaged Experimental Philosophy of Language
9 Experimentally-Informed Philosophy of Hate Speech
9.1 Introduction
9.2 The Expressive Nature of Slurs
9.3 The Effects of Slurs
9.4 Reporting Slurs
9.5 Reclaiming Slurs
9.6 Conclusion
References
10 Slurs in the Rio de la Plata
10.1 Introduction
10.2 Pilot Study
10.2.1 Desgin of the Pilot Study
10.2.1.1 Participants
10.2.1.2 Design
10.2.2 Results
10.2.3 Discussion
10.3 A Corpus Analysis of Slurs and Swearwords
10.3.1 Slurs and Swearwords Are Not Pure Expressives
10.3.1.1 Differences Between Pure Expressive Adjectives and Descriptive Adjectives
10.3.1.2 Slurs and Swearwords
10.3.1.3 A Note on Some Non-syntactic Behaviour of Slurs
10.4 The Experiment
10.4.1 Desgin of the Study
10.4.1.1 Participants
10.4.1.2 Design
10.4.2 Results
10.4.3 Discussion
10.5 A Remark on Epithets
10.6 Final Remarks
Appendix: The Syntactic and Semantic Representation of Slurs and Swearwords
References
11 Who Has a Free Speech Problem? Motivated Censorship Across the Ideological Divide
11.1 Introduction
11.1.1 Background
11.1.2 Moral Conviction and the Contours of Speaker Identity Norms
11.1.3 Speaker Identity Norms as Effects of Inverse Planning
11.2 Methods
11.2.1 Power Analysis
11.2.2 Participants
11.2.3 Materials
11.2.4 Procedure
11.2.5 Measures
11.3 Results
11.3.1 Part 1: Offensiveness
11.3.2 Part 2: Attributions of Intent
11.3.3 Part 3: Mediation Analyses
11.3.4 Part 4: Active Versus Inactive Debates
11.4 Discussion
11.4.1 Symmetry and Asymmetry in Offensive Speech Norms
11.4.2 Cognitive Mechanism: The Role of Intent
11.5 Conclusion
References
Part IV Experimental Philosophy of Language and Psychology
12 How Understanding Shapes Reasoning: Experimental Argument Analysis with Methods from Psycholinguistics and Computational Linguistics
12.1 Experimental Argument Analysis: Motivation and Key Ideas
12.2 Example: A Philosophical Argument
12.3 Example: A Comprehension Bias
12.4 Methods from Psycholinguistics
12.5 Methods from Computational Linguistics
12.6 Conclusion
References
13 From Infants to Great Apes: False Belief Attribution and Primitivism About Truth
13.1 Introduction
13.2 Alethic Primitivism
13.2.1 Conceptual vs. Metaphysical Primitivism
13.2.2 Asay's Conceptual Primitivism
13.3 False-Belief Data
13.3.1 False Belief Attribution in Young Children
13.3.2 False Belief Attribution in Infants
13.3.3 False Belief Attribution in Non-human Primates
13.4 Alethic Primitivism and the False-Belief Data
13.4.1 FundamentalityA
13.4.2 Explanatory IndispensabilityA
13.4.3 OmnipresenceA
13.4.4 AbilityA
13.5 Conclusion
References
Author Index
Subject Index

Recommend Papers

The Future of Building: Perspectives: Methods, Objectives, Prospects 9783955531508, 9783920034744

150 51 29MB Read more

Advances in Experimental Philosophy of Language 9781472570734, 9781474219815, 9781472570758

Should philosophy of language use experimental methods, or can it be pursued in the armchair? Advances in Experimental P

148 59 4MB Read more

Atherosclerosis: Experimental Methods and Protocols

Children's Hospital Research Foundation, Cincinnati, OH. Enables biomedical researchers to select those optimized t

565 87 1MB Read more

Survival and Development of Language Communities: Prospects and Challenges 9781847698360

This volume explores the main challenges facing 7 well-established medium-sized language communities with regard to thei

142 101 2MB Read more

Plessner's Philosophical Anthropology: Perspectives and Prospects 9789048522989

The first substantial English-language introduction to Plessner's philosophical anthropology.

100 74 3MB Read more

Experimental Research Methods in Language Learning: Research Methods in Linguistics 9781441125873, 9781441189110, 9781472593566, 9781441197931

Language learning research aims to describe and fully explain how and why language learning takes place, but can fall sh

187 107 6MB Read more

Experimental Evolution: Concepts, Methods, and Applications of Selection Experiments 9780520944473

Experimental approaches to evolution provide indisputable evidence of evolution by directly observing the process at wor

125 74 14MB Read more

Advances in Religion, Cognitive Science, and Experimental Philosophy (Advances in Experimental Philosophy) 1474223842, 9781474223843

Experimental philosophy has blossomed into a variety of philosophical fields including ethics, epistemology, metaphysics

417 115 2MB Read more

The Philosophy of Life and Philosophy of Language 9788028213114

100 44 Read more

Experimental Philosophy of Identity and the Self 9781350246898, 9781350246928, 9781350246904

Exploring issues ranging from the metaphysical to the moral and legal, a team of esteemed contributors bring together so

151 46 8MB Read more

Experimental Philosophy of Language: Perspectives, Methods, and Prospects
3031289072, 9783031289071

Author / Uploaded
David Bordonaba-Plou

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Logic, Argumentation & Reasoning 33

David Bordonaba-Plou Editor

Experimental Philosophy of Language: Perspectives, Methods, and Prospects

Logic, Argumentation & Reasoning Interdisciplinary Perspectives from the Humanities and Social Sciences Volume 33

Series Editor Shahid Rahman, University of Lille, CNRS-UMR 8163: STL, France Managing Editor Juan Redmond, Instituto de Filosofia, University of Valparaíso, Valparaíso, Chile Editorial Board Members Frans H. van Eemeren, Amsterdam, Noord-Holland, The Netherlands Zoe McConaughey, Lille, UMR 8163, Lille, France Tony Street, Faculty of Divinity, Cambridge, UK John Woods, Dept of Philosophy, Buchanan Bldg, University of British Columbia, Vancouver, BC, Canada Gabriel Galvez-Behar, Lille, UMR 8529, Lille, France Leone Gazziero, Lille, France André Laks, Princeton/Panamericana, Paris, France Ruth Webb, University of Lille, CNRS-UMR 8163: STL, France Jacques Dubucs, Paris Cedex 05, France Karine Chemla, CNRS, Lab Sphere UMR 7219, Case 7093, Université Paris Diderot, Paris Cedex 13, France Sven Ove Hansson, Division of Philosophy, Royal Institute of Technology (KTH), Stockholm, Stockholms Län, Sweden Yann Coello, Lille, France Eric Gregoire, Lille, France Henry Prakken, Dept of Information & Computing Sci, Utrecht University, Utrecht, Utrecht, The Netherlands François Recanati, Institut Jean-Nicord, Ecole Normale Superieur, Paris, France Gerhard Heinzmann, Laboratoire de Philosophie et d’Histoire, Universite de Lorraine, Nancy Cedex, France Sonja Smets, ILLC, Amsterdam, The Netherlands Göran Sundholm, 'S-Gravenhage, Zuid-Holland, The Netherlands Michel Crubellier, University of Lille, CNRS-UMR 8163: STL, France Dov Gabbay, Dept. of Informatics, King’s College London, London, UK Tero Tulenheimo, Turku, Finland Jean-Gabriel Contamin, Lille, France Franck Fischer, Newark, USA Josh Ober, Dept of Pol Sci, West Encina Hall 100, Stanford University, Stanford, CA, USA Marc Pichard, Lille, France

Logic, Argumentation & Reasoning (LAR) explores links between the Humanities and Social Sciences, with theories (including decision and action theory) drawn from the cognitive sciences, economics, sociology, law, logic, and the philosophy of science. Its main ambitions are to develop a theoretical framework that will encourage and enable interaction between disciplines, and to integrate the Humanities and Social Sciences around their main contributions to public life, using informed debate, lucid decision-making, and action based on reflection. • • • •

Argumentation models and studies Communication, language and techniques of argumentation Reception of arguments, persuasion and the impact of power Diachronic transformations of argumentative practices

LAR is developed in partnership with the Maison Européenne des Sciences de l’Homme et de la Société (MESHS) at Nord - Pas de Calais and the UMR-STL: 8163 (CNRS). This book series is indexed in SCOPUS. Proposals should include: • • • •

A short synopsis of the work, or the introduction chapter The proposed Table of Contents The CV of the lead author(s) If available: one sample chapter

We aim to make a first decision within 1 month of submission. In case of a positive first decision, the work will be provisionally contracted—the final decision about publication will depend upon the result of an anonymous peer review of the complete manuscript. The complete work is usually peer-reviewed within 3 months of submission. LAR discourages the submission of manuscripts containing reprints of previously published material, and/or manuscripts that are less than 150 pages / 85,000 words. For inquiries and proposal submissions, authors may contact the editor-inchief, Shahid Rahman at: [email protected], or the managing editor, Juan Redmond, at: [email protected]

David Bordonaba-Plou Editor

Experimental Philosophy of Language: Perspectives, Methods, and Prospects

Editor David Bordonaba-Plou Departamento de Lógica y Filosofía Teórica Universidad Complutense de Madrid Madrid, Spain

ISSN 2214-9120 ISSN 2214-9139 (electronic) Logic, Argumentation & Reasoning ISBN 978-3-031-28907-1 ISBN 978-3-031-28908-8 (eBook) https://doi.org/10.1007/978-3-031-28908-8 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Acknowledgments

I am very grateful to all the authors participating in the volume: Javier OsorioMancilla, Masaharu Mizumoto, Hristo Valchev, Jumbly Grindrod, Isidora Stojanovic, Louise McNally, Michael Zahorec, Robert Bishop, Nat Hansen, John Schwenkler, Justin Sytsma, Laila M. Jreis-Navarro, Bianca Cepollaro, Ana Clara Polakof, Manuel Almagro-Holgado, Ivar A. Rodríguez, Neftalí Villanueva, Eugen Fischer, Aurélie Herbelot, Joseph Ulatowski, and Jeremy Wyatt. Without their outstanding contributions, this volume would not have been possible. I would also like to thank the invaluable feedback from all those who have participated in the process of reviewing the chapters. Their thoughtful comments have greatly improved the quality of the contributions. The Logic, Argumentation & Reasoning series of Springer has produced this volume. I am very grateful to series editor Shahid Rahman and co-editor Juan Redmond for their confidence in the project. Finally, I would like to thank Christi Lue and Prasad Gurunadham for their advice and assistance throughout the process. This volume aims to be an overview of the experimental philosophy of language, reflecting its methodology, perspectives, and themes, as well as some of the possible lines of development of the discipline in the future. It is thus an important step in the consolidation of the discipline.

v

Contents

1

Introduction: 20 Years of Experimental Philosophy of Language . . . . David Bordonaba-Plou

Part I

1

The Experimental Philosophy of Language Methodology

2

A Bibliometric Analysis of Experimental Philosophy of Language .. . Javier Osorio-Mancilla

13

3

Experimental Philosophy and Ordinary Language Philosophy .. . . . . . Masaharu Mizumoto

31

4

Does Scientific Conceptual Analysis Provide Better Justification than Armchair Conceptual Analysis? .. . . . . . . . . . . . . . . . . . . . Hristo Valchev

57

Distributional Theories of Meaning: Experimental Philosophy of Language.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Jumbly Grindrod

75

5

Part II

Experimental Philosophy of Language and Corpus Methods

6

Are Moral Predicates Subjective? A Corpus Study . . . . . . . . . . . . . . . . . . . . 103 Isidora Stojanovic and Louise McNally

7

Linguistic Corpora and Ordinary Language: On the Dispute Between Ryle and Austin About the Use of ‘Voluntary’, ‘Involuntary’, ‘Voluntarily’, and ‘Involuntarily’.. . .. . . . . . . . . . . . . . . . . . . . 121 Michael Zahorec, Robert Bishop, Nat Hansen, John Schwenkler, and Justin Sytsma

8

Light in Assessing Color Quality: An Arabic-Spanish Cross-Linguistic Study .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 151 David Bordonaba-Plou and Laila M. Jreis-Navarro vii

viii

Contents

Part III 9

Politically-Engaged Experimental Philosophy of Language

Experimentally-Informed Philosophy of Hate Speech . . . . . . . . . . . . . . . . . 173 Bianca Cepollaro

10 Slurs in the Rio de la Plata . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 189 Ana Clara Polakof 11 Who Has a Free Speech Problem? Motivated Censorship Across the Ideological Divide .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 215 Manuel Almagro, Ivar R. Hannikainen, and Neftalí Villanueva Part IV

Experimental Philosophy of Language and Psychology

12 How Understanding Shapes Reasoning: Experimental Argument Analysis with Methods from Psycholinguistics and Computational Linguistics . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 241 Eugen Fischer and Aurélie Herbelot 13 From Infants to Great Apes: False Belief Attribution and Primitivism About Truth . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 263 Joseph Ulatowski and Jeremy Wyatt Author Index.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 287 Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 297

Chapter 1

Introduction: 20 Years of Experimental Philosophy of Language David Bordonaba-Plou

Abstract Experimental philosophy of language, as a subdiscipline of experimental philosophy, shares its most important defining characteristic: conducting empirical studies to solve traditional problems in the philosophy of language. Much of the attention in the field has been directed to theories of reference because of the influential 2004 article by Edouard Machery, Ron Mallon, Shaun Nichols, and Stephen Stich, Semantics, Cross-Cultural Style. After almost 20 years of research, it is time to take stock. This introduction has two goals. First, to represent the discipline’s past and current state, and highlight which have been the topics of study addressed by the discipline in addition to the theories of reference. Second, to draw attention to corpus methods in the experimental philosophy of language, a methodology that, although not the most widespread today, is gaining more and more adherents.

1.1 Experimental Philosophy Experimental philosophy (see Kauppinen, 2007; Knobe, 2007; Nadelhoffer & Nahmias, 2007; Rose & Danks, 2013; Sytsma & Livengood, 2015; Sytsma, 2017) tries to answer the problems traditionally addressed by philosophers by carrying out empirical studies. Experimental philosophy began at the beginning of the twenty-first century, partly because of philosophers’ use of intuitions, specifically, as a reaction to the most widespread methodology in analytic philosophy, what is known as the “method of cases” (Machery et al., 2004, p. B8; Mallon et al., 2009, p. 338). Early adherents of experimental philosophy argued that the method of cases is unreliable because we cannot conclude general theses about phenomena from examples where the only intuitions considered are those of the author. In the end, there will always be a philosopher who holds a contrary opinion about the case in

D. Bordonaba-Plou () Departamento de Lógica y Filosofía Teórica, Universidad Complutense de Madrid, Madrid, Spain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Bordonaba-Plou (ed.), Experimental Philosophy of Language: Perspectives, Methods, and Prospects, Logic, Argumentation & Reasoning 33, https://doi.org/10.1007/978-3-031-28908-8_1

1

2

D. Bordonaba-Plou

question or who finds a counterexample to the case presented. We therefore have no way of determining whether our intuitions are valid. As Knobe and Nichols (2008) stress, the hallmark of experimental philosophers is a reliance on empirical methods. However, experimental philosophy is not a monolithic discipline, it has changed and diversified over time. In addition to the topics dealt with by experimental philosophers, two essential differences mark them out. These are, firstly, the type of experimental methods they use and, secondly, how narrowly or broadly they conceive the discipline. Regarding the first difference, two empirical methods are generally applied in experimental philosophy: questionnaires and corpus methods. Questionnaires, a methodology from the cognitive sciences, were virtually the only method in the discipline’s early days. During this first period, the influence of cognitive sciences and psychology was palpable in the methodology, the topics covered and the very conception of the discipline. Joshua Knobe and Shaun Nichols define the activity of the experimental philosopher in their famous manifesto: “experimental philosophers proceed by conducting experimental investigations of the psychological processes underlying people’s intuitions about central philosophical issues” (Knobe & Nichols, 2008, p. 3). Similarly, Nadelhoffer and Nahmias (2007, p. 123) states that “[e]xperimental philosophy is the name for a recent movement whose participants use the methods of experimental psychology to probe the way people make judgments that bear on debates in philosophy.”1 Rose and Danks (2013, p. 514) calls this way of characterizing experimental philosophy “the narrow conception of experimental philosophy.” For them, philosophers should resist this manner of characterizing experimental philosophy because, among other reasons, it identifies experimental philosophy with the study of folk intuitions, and this does not do justice to the topical diversity of the studies carried out within the discipline, for example, “experimental work on cognitive representations of causal structure in the world” (Rose & Danks, 2013, p. 514). In addition to these reasons, it should be said that, over time, experimental philosophers began to use methodologies of a different type. Although nowadays questionnaires are used in most studies, some authors (Bluhm, 2013, 2016; Hansen, 2015; Hansen & Chemla, 2015; Caton, 2020; Tallant & Andow 2020) have upheld the virtues of applying a new methodology: corpus methods. Unlike surveys, corpus methods do not appeal to intuitions but rather consist of applying various statistical techniques to analyze naturally occurring uses of language in a corpus. Regarding the second difference, there are two senses of experimental philosophy: a narrow and a broad sense. According to Alexander et al. (2010), two programs can be distinguished in experimental philosophy: the negative and the positive program. The negative program criticizes the traditional method of analytic

1

Other mentions, however, are much more recent. For example, Knobe and Nichols (2017) define research in experimental philosophy as consisting in two key features: “a. the kinds of questions and theoretical frameworks traditionally associated with philosophy; b. the kinds of experimental methods traditionally associated with psychology and cognitive science.”

1 Introduction: 20 Years of Experimental Philosophy of Language

3

philosophy, i.e., using intuitions as evidence is useful to obtain justified beliefs. The methodology traditionally used in analytic philosophy consists in developing an analysis of paradigmatic cases where the only intuitions considered are those of the author. As Kauppinen (2007, p. 96) notes, “[t]he traditional view was that a priori reflection by a philosopher would suffice.” In some cases, the intuitions of people other than the author may be considered. However, the challenge is to specify who these people are. We can assume that the author’s intuitions are part of the set of relevant intuitions, as they often form the starting point of the investigation, but beyond this, there are no mechanisms for knowing whether or to what extent the author’s intuition can be generalized to the community of speakers. The statements used by philosophers are often relatively imprecise, making it clear that the fact that their intuition is shared is an assumption rather than a finding. Kauppinen (2007, p. 96) cites some of the expressions often used in this regard, “‘Intuitively, we . . . ’ or ‘We would say . . . ’ or ‘Ordinarily, we would not describe X as . . . ’ or ‘It is a platitude that . . . p’, and so on.” For its part, the positive program shares with traditional analytic philosophy the idea that we can rely on intuitions as evidence but amends the traditional view by adding an increase in the amount of evidence. It is no longer only the philosopher’s intuitions that count, but also those of the subjects participating in the questionnaires. For its part, corpus linguistics is a methodology that allows us to approach the study of language using better descriptions of it, to test quantitatively which general theories about language are correct, and to prove or disprove certain hypotheses about particular linguistic phenomena. Some authors (Stubbs, 1993; Tognini-Bonelli, 2001; Teubert, 2004) have argued that corpus linguistics is not a mere methodology but a branch of linguistics. However, in this paper, we will follow those authors (see McEnery & Wilson, 2001; Parodi, 2008; McEnery & Hardie, 2012) who consider that corpus linguistics is a set of methods that can be applied to various branches of linguistics. Perhaps this narrow understanding of corpus linguistics is one reason why the use of corpora in fields such as critical discourse analysis is relatively new (see McEnery et al., 2015, pp. 238–239) or why this methodology is not widespread in experimental philosophy of language. During the last decade, there has been a boom in experimental philosophy work. One of the most prominent fields is the experimental philosophy of language. The following section is devoted to outlining the field by presenting some of the topics covered by the discipline.

1.2 Experimental Philosophy of Language Experimental philosophy of language, as a subdiscipline of experimental philosophy, shares its most important defining characteristic: conducting empirical studies to solve traditional problems in the philosophy of language. Much of the attention in the field has been directed to theories of reference because of the influential

4

D. Bordonaba-Plou

2004 article by Edouard Machery, Ron Mallon, Shaun Nichols, and Stephen Stich, Semantics, Cross-Cultural Style. There was widespread agreement among analytic philosophers that Kripkean intuitions about what proper names refer to were correct. The Kripkean strategy was based on presenting a series of examples to show that the intuitions of the speakers did not match the predictions of the descriptivist theory. For example, suppose that speakers associate the following description with the name “Gödel”: “[the person] who discovered the incompleteness theorem of arithmetic.” Suppose that X, the original bearer of the name, does not meet that description. X did not actually discover the incompleteness theorem, but another subject, let us call him Y, did. Given this context, to whom does the name refer, X or Y? According to Kripke, the name refers to X. However, Machery et al. (2004) criticized this supposed universality, using an experiment. Machery’s group drew on previous studies in cultural psychology, specifically Nisbett et al. (2001) and Nisbett (2003), which found differences between Western and Eastern populations in describing events and categorizing the objects of those events. These studies defined the Western way of thinking as “analytic thought” and the Eastern way of thinking as “holistic thought” for several reasons. First, when describing a person’s behavior, the former focuses on “personal dispositions—attitudes and traits inferred from past behavior” (Nisbett, 2003, p. 113), while the latter emphasizes “situational factors” (Nisbett, 2003, p. 113), that is, contextual factors surrounding the behavior. Based on this difference, Machery’s group supposed that there could also be a difference between the two populations in terms of whether names refer to individual traits inherited through successive uses or whether they refer to general situational factors. The results they obtained were statistically significant for the Gödel case; the intuitions of the Western population were Kripkean, while the intuitions of the Eastern population were descriptivist. This experiment caused a great stir in the philosophy of language. It was followed by many articles questioning the consequences drawn from the experiment (see, for example, Deutsch, 2009; Martí, 2009), as well as others where the purpose was to replicate the findings of the original experiment (Sytsma & Livengood, 2011; Machery, 2012; Machery et al., 2015; Sytsma et al., 2015). It is not surprising that in the volumes on experimental philosophy of language existing to date, a large part of the contributions deal with this topic. For example, in the 2015 volume Advances in Experimental Philosophy of Language, edited by Jussi Haukioja, six of the nine chapters deal with the theories of reference. Similarly, the articles on experimental philosophy of language focus on theories of reference (see, for example, Hansen, 2015; Maynes, 2015). However, from the experimental philosophy of language, many other topics have been studied, for example: natural class terms (Braisby et al., 1996; Häggqvist & Wikforss, 2015), the context-dependence of the verb “to know” (May et al., 2010; Schaffer & Knobe, 2012), the nature of retractions (Knobe & Yalcin, 2014; Marques, 2015; Khoo, 2015; Beddor & Egan, 2018; Dinges & Zakkou, 2020; Kneer, 2021), color terms (Hansen & Chemla, 2017; Ziólkowski, 2021), taste disagreements (Bordonaba-Plou, 2021, 2022), pejorative terms (Panzeri &

1 Introduction: 20 Years of Experimental Philosophy of Language

5

Carrus, 2016; Cepollaro et al., 2019, 2021; Bordonaba-Plou & Torices, 2021), dual character concepts (Knobe et al., 2013; Del Pinal & Reuter, 2017; Liao et al., 2020), the philosopher’s claims about the frequency of “know” (Hansen et al., 2021), emotion concepts (Díaz & Reuter, 2021), evaluative language (Soria-Ruiz & Faroldi, 2022; Willemsen & Reuter, 2021), the concept of truth (Reuter & Brun, 2021), and stereotypical inferences (Fischer et al., 2021a, b).

1.3 The Purpose of This Volume After almost 20 years of research in the experimental philosophy of language and two quantitative turns, it is time to take stock. This volume has three goals. First, to represent the discipline’s current state by including works from the field’s best experts. Second, the volume tries to draw attention to corpus methods in the experimental philosophy of language, an under-represented methodology in the field. Thirdly, the volume aims to explore the discipline’s future. To that end, the volume focuses on new trends in the experimental philosophy of language, for example, interdisciplinary studies, cross-linguistic studies, or politically engaged experimental philosophy of language studies. The volume is divided into four sections. Part I, The Experimental Philosophy of Language Methodology, is devoted to presenting and analyzing the methods used in the experimental philosophy of language. In A Bibliometric Analysis of Experimental Philosophy of Language, Javier Osorio-Mancilla conducts a bibliometric analysis of the experimental philosophy of language. Applying citation analysis, coword occurrence analysis, and clustering procedures, the author identifies the most common research topics, the community-pattern trends within the field and how they changed during the last years, the research topics structure, the most prolific authors and works, and the degree of cooperation among countries. In Experimental Philosophy and Ordinary Language Philosophy, Masaharu Mizumoto explores the intersections between experimental philosophy and ordinary language philosophy through the distinction between the negative program and the positive program in experimental philosophy. Specifically, the author maintains that experimental philosophy and ordinary language philosophy are not only in a supplementary relationship but in a friendly and cooperative one since both projects share the same critical attitude toward the a priori construction of universally valid theories. In this way, both fulfill a similarly essential role in contemporary philosophy: that of criticizing and improving the methodologies used by philosophers. In Does Scientific Conceptual Analysis Provide Better Justification than Armchair Conceptual Analysis?, Hristo Valchev explores whether scientific conceptual analysis or armchair conceptual analysis provides better justification. The author develops an argument –the argument from uniformity of agreement– to conclude that scientific conceptual analysis provides better justification than armchair conceptual analysis. Finally, the author presents and discusses an empirical study:

6

D. Bordonaba-Plou

a questionnaire whose purpose is to gather evidence in favor of the argument mentioned above. In Distributional Theories of Meaning: Experimental Philosophy of Language, Jumbly Grindrod provides an overview of distributional semantics, an area of linguistics whose guiding idea is that similarity of meaning of two terms implies similarity of distribution of those terms in a corpus; in other words, that words that are semantically related will be used in similar contexts. The author then evaluates the extent to which distributional semantics can serve as a theory of meaning, paying special attention to the advantages and shortcomings of such methods. Part II, Experimental Philosophy of Language and Corpus Methods, consists of papers that use corpus methods extensively. In Are Moral Predicates Subjective? A Corpus Study, Isidora Stojanovic and Louise McNally investigate moral predicates paying particular attention to how they relate to objective reality and subjective experience. Through the application of corpus methods, the authors conduct an empirical study that allows them to investigate how moral predicates behave with two subjective attitude verbs, “find” and “consider.” In Linguistic Corpora and Ordinary Language: On the Dispute between Ryle and Austin about the Use of ‘Voluntary,’ ‘Involuntary,’ ‘Voluntarily,’ and ‘Involuntarily,’ Michael Zahorec, Robert Bishop, Nat Hansen, John Schwenkler and Justin Sytsma evaluate Ryle’s and Austin’s claims about the ordinary use of ‘voluntary,’ ‘involuntary,’ ‘voluntarily,’ and ‘involuntarily,’ using a combination of methods consisting in aggregating judgments about real examples drawn from the British National Corpus and a clustering algorithm which allows them to create dendrograms to represent the interconnections between the different uses of the four terms. In Light in Assessing Color Quality: An Arabic-Spanish Cross-Linguistic Study, David Bordonaba-Plou and Laila M. Jreis-Navarro conduct a cross-linguistic study in Arabic and Spanish in which they investigate one of the dimensions of color quality, brightness. Through an analysis of real cases in Sketch Engine, they analyze the interaction between the equivalents of the term “white” in the two languages and various brightness modifiers. They then evaluate the implications of the study results for indexical contextualism. Part III, Politically-Engaged Experimental Philosophy of Language, includes works of experimental philosophy of language with a political orientation. In Experimentally-Informed Philosophy of Hate Speech, Bianca Cepollaro offers an overview of the empirical studies of expressives, slurs, and pejorative terms that have been carried out to date and maintains the idea that the application of empirical methods from psychology, psycholinguistics, and experimental philosophy generates benefits to address the philosophical questions arising in the debate on slurs and pejorative terms. In Slurs in the Rio de la Plata, Ana C. Polakof investigates slurs in Rioplatense Spanish. Through a combination of corpus methods and questionnaires, the author investigates whether slurs and swearwords in Rioplatense Spanish have both descriptive and expressive or only expressive contents. Finally, the chapter investigates whether there is a correlation between the degree of descriptiveness and expressiveness they possess and the possibility that they form epithets.

1 Introduction: 20 Years of Experimental Philosophy of Language

7

In Who Has a Free Speech Problem? Motivated Censorship across the Ideological Divide?, Manuel Almagro, Ivar R. Hannikainen and Neftalí Villanueva investigate if there are asymmetries between conservatives and liberals in their evaluations of offensive speech. Through questionnaires, the authors conclude that, contrary to normally accepted ideas, conservatives and liberals make similar evaluations since both tend to consider the claims of outgroup people more offensive than those of ingroup people. Part IV, Experimental Philosophy of Language and Psychology, is dedicated to the crossroad between experimental philosophy of language and psychology. In How Understanding Shapes Reasoning: Experimental Argument Analysis with Methods from Psycholinguistics and Computational Linguistics, Eugen Fischer and Aurelie Herbelot examine how language processing can help philosophers answer questions in different areas of philosophy. Then, they offer a richer methodological overview of the experimental philosophy of language by presenting and analyzing experimental argument analysis, a discipline that studies how automatic inferences shape verbal reasoning. In From Infants to Great Apes: False Belief Attribution and Primitivism about Truth, Joe Ulatowski and Jeremy Wyatt maintain that infants and non-human primates possess the concept of truth and evaluate to which extent TRUTH is a primitive concept. In this way, they show how philosophy, in general, and the philosophy of language, in particular, can benefit from empirical findings in psychology.

References Alexander, J., Mallon, R., & Weinberg, J. M. (2010). Accentuate the negative. Review of Philosophy and Psychology, 1(2), 297–314. Beddor, B., & Egan, A. (2018). Might do better: Flexible relativism and the QUD. Semantics and Pragmatics, 11(7). https://doi.org/10.3765/sp.11.7 Bluhm, R. (2013). Don’t ask, look! Linguistic corpora as a tool for conceptual analysis. In M. Hoeltje, T. Spitzley, & W. Spohn (Eds.), Was dürfen wir glauben? Was sollen wir tun? Sektionsbeiträge des achten internationalen Kongresses der Gesellschaft für Analytische Philosophie e.V. (pp. 7–15). DuEPublico. Bluhm, R. (2016). Corpus analysis in philosophy. In M. Hinton (Ed.), Evidence, experiment and argument in linguistics and philosophy of language (pp. 91–109). Peter Lang. Bordonaba-Plou, D. (2021). An analysis of the centrality of intuition talk in the discussion on taste disagreements. Filozofia Nauki, 29(2), 133–156. Bordonaba-Plou, D. (2022). Disagreement is said in many ways: An experimental philosophy of language study on taste discussions. In P. Stalmaszczyk & M. Hinton (Eds.), Philosophical approaches to language and communication, volume 2 (pp. 109–130). Peter Lang. Bordonaba-Plou, D., & Torices, J. R. (2021). Paving the road to hell: The Spanish word menas as a case study. Daimon, 84, 47–72. Braisby, N., Franks, B., & Hampton, J. (1996). Essentialism, word use, and concepts. Cognition, 59, 247–274. Caton, J. N. (2020). Using linguistic corpora as a philosophical tool. Metaphilosophy, 51(1), 51–70.

8

D. Bordonaba-Plou

Cepollaro, B., Sulpizio, S., & Bianchi, C. (2019). How bad is it to report a slur? An empirical investigation. Journal of Pragmatics, 146, 32–42. Cepollaro, B., Domaneschi, F., & Stojanovic, I. (2021). When is it ok to call someone a jerk? An experimental investigation of expressives. Synthese, 198, 9273–9292. Del Pinal, G., & Reuter, K. (2017). Dual character concepts in social cognition: Commitments and the normative dimension of conceptual representation. Cognitive Science, 41(S3), 477–501. Deutsch, M. (2009). Experimental philosophy and the theory of reference. Mind & Language, 244, 445–466. Díaz, R., & Reuter, K. (2021). Feeling the right way: Normative influences on people’s use of emotion concepts. Mind & Language, 36(3), 451–470. Dinges, A., & Zakkou, J. (2020). A direction effect on taste predicates. Philosophers’ Imprint, 20(27), 1–22. Fischer, E., Engelhardt, P. E., Horvath, J., & Ohtani, H. (2021a). Experimental ordinary language philosophy: A cross-linguistic study of defeasible default inferences. Synthese, 198, 1029– 1070. Fischer, E., Engelhardt, P. E., & Sytsma, J. (2021b). Inappropriate stereotypical inferences? An adversarial collaboration in experimental ordinary language philosophy. Synthese, 198, 10127– 10168. Häggqvist, S., & Wikforss, Å. (2015). Experimental semantics: The case of natural kind terms. In J. Haukioja (Ed.), Advances in experimental philosophy of language (pp. 109–138). Bloomsbury. Hansen, N. (2015). Experimental philosophy of language. In Oxford handbooks online. Oxford University Press. Hansen, N., & Chemla, E. (2015). Linguistic experiments and ordinary language philosophy. Ratio, 28(4), 422–445. Hansen, N., & Chemla, E. (2017). Color adjectives, standards, and thresholds: An experimental investigation. Linguistics and Philosophy, 40(3), 239–278. Hansen, N., Porter, J. D., & Francis, K. (2021). A corpus study of “know”: On the verification of philosophers’ frequency claims about language. Episteme, 18(2), 242–268. Kauppinen, A. (2007). The rise and fall of experimental philosophy. Philosophical Explorations, 10(2), 95–118. Khoo, J. (2015). Modal disagreements. Inquiry, 5(1), 1–24. Kneer, M. (2021). Predicates of personal taste: Empirical data. Synthese, 199, 6455–6471. Knobe, J. (2007). Experimental philosophy. Philosophy Compass, 2(1), 81–92. Knobe, J., & Nichols, S. (2008). An experimental philosophy manifesto. In J. Knobe & S. Nichols (Eds.), Experimental philosophy (pp. 3–14). Oxford University Press. Knobe, J., & Nichols, S. (2017). Experimental philosophy. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy (Winter 2017 Edition). https://plato.stanford.edu/archives/win2017/ entries/experimental-philosophy/ Knobe, J., & Yalcin, S. (2014). Epistemic modals and context: Experimental data. Semantics and Pragmatics, 7(10), 1–21. Knobe, J., Prasada, S., & Newman, G. E. (2013). Dual character concepts and the normative dimension of conceptual representation. Cognition, 127, 242–257. Liao, S.-y., Meskin, A., & Knobe, J. (2020). Dual character art concepts. Pacific Philosophical Quarterly, 101(1), 102–128. Machery, E. (2012). Expertise and intuitions about reference. Theoria, 73, 37–54. Machery, E., Mallon, R., Nichols, S., & Stich, S. P. (2004). Semantics, cross-cultural style. Cognition, 92, B1–B12. Machery, E., Sytsma, J., & Deutsch, M. (2015). Speaker’s reference and cross-cultural semantics. In A. Bianchi (Ed.), On reference (pp. 62–76). Oxford University Press. Mallon, R., Machery, E., Nichols, S., & Stich, S. P. (2009). Against arguments from reference. Philosophy and Phenomenological Research, 79(2), 332–356. Marques, T. (2015). Retractions. Synthese, 195(8), 3335–3359. Martí, G. (2009). Against semantic multi-culturalism. Analysis, 69(1), 42–48.

1 Introduction: 20 Years of Experimental Philosophy of Language

9

May, J., Sinnott-Armstrong, W., Hull, J. G., & Zimmerman, A. (2010). Practical interests, relevant alternatives, and knowledge attributions: An empirical study. Review of Philosophy and Psychology, 1(2), 265–273. Maynes, J. (2015). Interpreting intuition: Experimental philosophy of language. Philosophical Psychology, 28(2), 260–278. McEnery, T., & Hardie, A. (2012). Corpus linguistics: Method, theory and practice. Cambridge University Press. McEnery, T., & Wilson, A. (2001). Corpus linguistics: An introduction. Edinburgh University Press. McEnery, T., McGlashan, M., & Love, R. (2015). Press and social media reaction to ideollogically inspired murder: The case of Lee Rigby. Discourse & Communication, 9(2), 237–259. Nadelhoffer, T., & Nahmias, E. (2007). The past and future of experimental philosophy. Philosophical Explorations, 10(2), 123–149. Nisbett, R. E. (2003). The geography of thought. How Asiasn and Westeners think differently . . . and why. The Free Press. Nisbett, R. E., Peng, K., Choi, I., & Norenzayan, A. (2001). Culture and systems of thought: Holistic versus analytic cognition. Psychological Review, 108(2), 291–310. Panzeri, F., & Carrus, S. (2016). Slurs and negation. Phenomenology and Mind, 11, 170–180. Parodi, G. (2008). Lingüística de Corpus: Una Introducción al Ámbito. Revista de Lingüística Teórica y Aplicada, 46(1), 93–119. Reuter, K., & Brun, G. (2021). Empirical studies on truth and the project of re-engineering truth. Pacific Philosophical Quarterly, 103(3), 493–517. Rose, D., & Danks, D. (2013). In defense of a broad conception of experimental philosophy. Metaphilosophy, 44(4), 512–532. Schaffer, J., & Knobe, J. (2012). Contrastive knowledge surveyed. Nous, 46(4), 675–708. Soria-Ruiz, A., & Faroldi, F. L. G. (2022). Moral adjectives, judge-dependency and holistic multidimensionality. Inquiry, 65(7), 887–916. Stubbs, M. (1993). British traditions in text analysis. From Firth to Sinclair. In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds.), Text and technology: In Honour of John Sinclair (pp. 1–36). John Benjamins Publishing. Sytsma, J. (2017). Two origin stories for experimental philosophy. Teorema: Revista Internacional de Filosofia, 36(3), 23–43. Sytsma, J., & Jonathan Livengood, J. (2011). A new perspective concerning experiments on semantic intuitions. Australasian Journal of Philosophy, 89(2), 315–332. Sytsma, J., & Livengood, J. (2015). The theory and practice of experimental philosophy. Broadview Press. Sytsma, J., Livengood, J., Sato, R., & Oguchi, M. (2015). Reference in the land of the rising sun: A cross-cultural study on the reference of proper names. Review of Philosophy and Psychology, 6, 213–230. Tallant, J., & Andow, J. (2020). English language and philosophy. In S. Adolphs & D. Knight (Eds.), The Routledge handbook of English language and digital humanities (pp. 440–455). Routledge. Teubert, W. (2004). Language and corpus linguistics. In M. A. K. Halliday, W. Teubert, C. Yallop, ˇ & A. Cermáková (Eds.), Lexicology and corpus linguistics (pp. 73–112). Continuum. Tognini-Bonelli, E. (2001). Corpus linguistics at work. John Benjamins Publishing. Willemsen, P., & Reuter, K. (2021). Separating the evaluative from the descriptive: An empirical study of thick concepts. Thought: A Journal of Philosophy, 10(2), 135–146. Ziólkowski, A. (2021). The context-sensitivity of color adjectives and folk intuitions. Filozofia Nauki, 29(2), 157–188.

10

D. Bordonaba-Plou

David Bordonaba-Plou is an Assistant Professor at the Universidad Complutense de Madrid (Spain). He holds a PhD in Philosophy from the Universidad de Granada (Spain). His research focuses on experimental philosophy of language, the role of intuitions, Digital Humanities, and political issues like analysis of parliamentary debates or polarization. Some of his most recent publications are “An Analysis of the Centrality of Intuition Talk in the Discussion on Taste Disagreements,” in Filozofia Nauki, or “Disagreement is Said in Many Ways: An Experimental Philosophy of Language Study on Taste Discussions” in Philosophical Approaches to Language and Communication, vol. 2.

Part I

The Experimental Philosophy of Language Methodology

Chapter 2

A Bibliometric Analysis of Experimental Philosophy of Language Javier Osorio-Mancilla

Abstract Since the implementation of experimental methods in philosophy, several philosophical disciplines are producing a fruitful body of research that has recently gained more recognition. Experimental Philosophy of Language (EPL) is one of the research topics that has drawn a great deal of attention lately. This chapter performs a bibliometric analysis of the EPL field in order to show the research trends, collaboration networks and topics structure that can be found in the literature. Techniques such as citation analysis, co-word occurrence analysis and clustering procedures are applied to identify significant research themes and publications from academic books and peer-reviewed research articles. This study uses quantitative analysis to explore the diverse practices of scholars working in the disciplinary community of EPL. The goal is to discover community-pattern trends within the field and investigate how they changed during the last years, as well as to provide information about the most-cited journals and academic editorial publication record. This analysis allows us to observe the research topics structure within the field, the most prolific authors and works and the collaboration degree among countries. This chapter aims to provide a comprehensive overview of EPL for those interested in one of the most prolific, empirically-driven trend in contemporary philosophy.

2.1 Introduction Experimental philosophy of language (henceforth, EPL) is a novel subfield growing substantially fast due to the proliferation of experimental methods within philosophy. Broadly speaking, it can be defined as the academic field that employs certain empirical methods to tackle a set of issues that historically belong to classical philosophy of language. Experimental philosophers of language have taken two (albeit not incompatible) paths: on the one hand, criticizing the traditional “armchair”

J. Osorio-Mancilla () Department of Logic and Philosophy of Science, Autonomous University of Madrid, Madrid, Spain © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Bordonaba-Plou (ed.), Experimental Philosophy of Language: Perspectives, Methods, and Prospects, Logic, Argumentation & Reasoning 33, https://doi.org/10.1007/978-3-031-28908-8_2

13

14

J. Osorio-Mancilla

method of philosophy of language (i.e., using the philosophers’ own intuitions as evidence about linguistic phenomena); on the other hand, proposing new ways of implementing experimental methods to traditional questions of philosophy of language. As the use of computer-based methods is quickly spreading in philosophy, EPL is receiving great research interest from different academic quarters. Fletcher et al. (2021) recently explored the topic of formal methodology change in philosophy between 2005 and 2019. The paper, concerned with changes in the application of formal methods in philosophy, provides an analysis of the academic work in philosophy between 2005 and 2019. One of the main results of this work is that the proportion of papers using probabilistic methods in philosophy significantly increased in a short time. Nonetheless, no similar study has been carried out in the narrower field of EPL. This chapter is devoted to filling the gap with an analysis of the subfield. One way to evaluate research trends within an academic field is by means of bibliometric methodology. Bibliometrics is the application of statistical methods to academic documents (mainly papers and books) and other means of communication used to analyze scientific publications (Repanovici, 2011). These methods have been extensively used to measure academic progress in science and engineering and are a common computational instrument for systematically analyzing academic work (Kalantari et al., 2017). The present work analyzes research trends in recent years within EPL in order to show the research networks and topics structure that can be found in the literature. This study applies citation analyses and clustering procedures to identify the significant research themes and publications from peer-reviewed research articles indexed in academic journals over time and from books produced by academic publishers. In addition, social network analysis is carried out to check the degree of collaboration among academics in EPL, from the most prolific institutional affiliations to national and international collaboration between researchers. The present study is a purely descriptive work that aims to carry heuristic value for those interested in the academic community behavior and research trends of EPL. The plan for this chapter is as follows. Section 2.2 briefly displays the novel subfield of EPL as a heterogeneous application of empirical techniques to the more classic research questions of philosophy of language. Section 2.3 presents the bibliometric methodology followed in this work. The main units of bibliometric analysis will be investigated. In particular, we will focus on the most relevant academic journals engaging with work on EPL, authors with high academic impact in the subfield, and collaboration patterns between academic works. Section 2.4 shows the main results of the bibliometric analysis, revealing patterns and trends at the community level.

2 A Bibliometric Analysis of Experimental Philosophy of Language

15

2.2 Experimental Philosophy of Language Experimental philosophy is an umbrella expression, a broad label to characterize work that uses conceptual analysis and empirical approaches as methodological tools for philosophers. A distinction is usually made between a negative and a positive program (Alexander et al., 2010; Hansen, 2015; Williamson, 2016). On the one hand, the negative program focus on challenging the traditional ‘armchair’ method of philosophy. Contemporary analytic philosophers have relied on intuitions and thought experiments as an evidential basis for their theories1 (Alfano et al., 2022; Bordonaba-Plou, 2021). Experimental philosophy is, in part, a reaction against this research trend: some philosophers (Hintikka, 1999; Stich, 1990) argue that intuitions are unreliable, they do not have normative force, and, in sum, they are not adequate to be the foundations of any proper theory, so methodologies from other academic fields must be used in philosophical inquiry. On the other hand, the positive program aims to “derive support for specific philosophical theses from their experimental results” (Weinberg, 2016, p. 71). In other words, the focus has been on defending particular theses through empirical methods, not on challenging traditional philosophical approaches. In fact, the positive program considers intuitions as a significant source of information for philosophical inquiry. The crucial difference is how this information is acquired (Alexander et al., 2010). Methodologies linked with experimental philosophy are typically considered “the kinds of experimental methods traditionally associated with psychology and cognitive science” (Knobe & Nichols, 2017). Standard definitions of experimental philosophy focus on a particular set of methods and practices, particularly survey methodology. Survey methodology gathers data from a sample of individuals to derive general conclusions from the responses. This methodology can employ quantitative, qualitative, and mixed methods as research techniques (Check & Schutt, 2012). Surveys are widely utilized in psychological, social and now philosophical research. Nevertheless, in recent times, numerous empirical methodologies not limited to surveys have also been settled within philosophical practice. Let us illustrate this point with two of them: agent-based modeling and corpus linguistics. Agent-based models (ABMs) are computational models that simulate social systems. Such models aim to explore and understand the target system’s complex macro-behavior that arises from its micro-elements’ individual behavior (Railsback & Grimm, 2019). Agent-based modeling is a standard approach in contemporary disciplines such as economics, sociology, ecology, and epidemiology. Analytical and conceptual methodologies cannot provide the situations and possibilities that the modeling approach offers; therefore, ABMs have been a fruitful tool for dealing with the complexity of social communities. They entered philosophy at the beginning of the twenty-first century with the pioneering work of Hegselmann and

1

For a detailed criticism on this idea, see Cappelen (2012).

16

J. Osorio-Mancilla

Krause (2002), followed by many others (Borg et al., 2017, 2018; Grim et al., 2013; Grimm, 2009; O’Connor & Weatherall, 2018; Šešelja, 2021; Šešelja et al., 2020; Zollman, 2007, 2010, 2013). The use of computer simulations has given rise to a substantial experimentally-driven research path in the philosophy of science and social epistemology. Another empirical method widely used in contemporary philosophy is corpus linguistics. Corpus linguistics is a computer-based field that uses sets of text documents (or corpora) that are adequate for research issues (McEnery & Hardie, 2011). A corpus is typically too large to be analyzed manually; hence corpus linguistics techniques are conveniently suitable for searching, reading, and analyzing them. With a proper corpus, one can search for particular terms or expressions and their linguistic co-occurrences that, along with certain statistical techniques, allow to draw a set of conclusions from them. One of the advantages of corpus linguistics is that it is not presumably affected by the ‘experimenter bias’, unlike many survey experiments (Strickland & Suben, 2013). The use and defense of corpus linguistics as a philosophical tool is widespread in the literature (Bluhm, 2013, 2016; Bordonaba-Plou, 2021; Caton, 2020; Hinton, 2021). These two examples of empirical, computer-aid methodologies suggest that we should understand experimental philosophy as a more extensive project within contemporary philosophy. The heterogeneity of the nowadays used methodologies in experimental philosophy is as diverse as their research topics. One of the most fertile applications of experimental methods in philosophical research is the subfield of Experimental Philosophy of Language (Hansen, 2015; Haukioja, 2015). EPL can be roughly defined as the application of empirical methods to topics of interest to philosophers of language, such as “the meaning of particular kinds of expressions (names, determiners, natural kind terms or adjectives), pragmatic phenomena (implicature, presupposition, metaphor, the semantics-pragmatics boundary) and methodological issues” (Hansen, 2015). In sum, EPL is a subfield that provides experimental insights on classical issues in philosophy of language. As in the general experimental philosophy, a negative and a positive program within EPL can be delineated. On the one hand, work like Edouard Machery, Ron Mallon, Shaun Nichols, and Steve Stich’s “Semantics, cross-cultural style” criticize “the standard methodology in the philosophy of language, a methodology that simply tests theories of reference against philosophers’ referential intuitions” (Devitt, 2015, p. 33). On the other hand, there is work (such as Jylkkä et al., 2009; Nichols et al., 2015) that is more interested in analyses of specific linguistic phenomena using experiments -like the meaning of determiners or adjectives- than it is in concerns to conventional philosophical methods. In fact, the spirit of the positive program in EPL is to supplement traditional philosophy of language with new methods and research techniques (Hansen, 2015). One of the consequences of the incremental use of diverse experimental methods in philosophy is that the boundaries between disciplines become fuzzier. Following Hansen (2015), contemporary linguistics using empirical methods is, in most cases, indistinguishable from work in EPL. EPL literature, thus, consists of a (relative) homogeneous content with a heterogenous methodology. As we can note, EPL is

2 A Bibliometric Analysis of Experimental Philosophy of Language

17

not constrained by the use of surveys as its primary methodological tool; corpus linguistics is a major empirical avenue within the subfield as well. Interestingly, Haukioja (2015, p. 2) addresses that many scholars seem to believe that the available survey research does not indeed provide any meaningful information regarding semantics, but perhaps different methodologies will. In this chapter, we will search for work that investigates topics of philosophy of language with methodologies ranging from surveys to corpus linguistics, as well as work that focuses on methodological issues within EPL. This means that work on general experimental philosophy, although highly relevant for the discipline, will not be included.

2.3 Methodology 2.3.1 Data Collection The development of the data scrapping has consisted of several phases whose methodological needs have varied according to the specific objectives of each stage. Following Aria and Cuccurullo (2017), data collection in bibliometrics can be divided into three sub-stages: data retrieval, data loading and data cleaning. The relevant data for a study of this nature is the metadata associated with academic papers and books obtained from various academic platforms. The metadata of an academic document is the information related to the publication, i.e., title words, abstracts, author names, keywords, references, etc. These pieces of information allow us to find relationships, patterns, structures and dynamics of authors, papers and journals that would be very difficult to infer by simply reading the literature. Data Retrieval A combination of automatic and manual literature search methods was used to select the relevant articles. Among the several bibliographic databases where academic work is stored, the mainly used platforms were Web of Science (WoS), Scopus and PhilPapers. WoS and Scopus allow using specific keywords and Boolean operators, among other databases. In this particular case, we used the following search queries: “Experimental philosophy of language”; (“Experimental philosophy” AND “Semantics”) OR (“Experimental philosophy” AND “Pragmatics”) OR (“Experimental philosophy” AND “Language”) OR (“Experimental philosophy” AND “Reference”). We limited the subject area to Humanities and Social Sciences. Regarding PhilPapers, the most relevant category is “Experimental Philosophy of Language” and related subcategories, such as “Experimental Philosophy: Reference,” “Experimental Philosophy: Semantics,” “Experimental Philosophy of Language, Misc,” and “Experimental Philosophy: Corpus Analysis.” Since the work that sparked much of the interest in EPL was Machery, Mallon, Nichols, and Stich’s 2004 paper “Semantics, Cross-Cultural Style” (Genone, 2012; Haukioja, 2015), the time period for the bibliometric analysis was from 2004 to 2021. Other followed criteria were that the downloaded documents must be

18

J. Osorio-Mancilla

published in philosophy journals or, at least, in interdisciplinary journals that included philosophy as one of its disciplines. The chosen language was English. All the documents from the different platforms were downloaded and merged in a single file to clean and sort the metadata. Data Loading and Converting Almost all bibliographic databases allow exporting data in a single format. There were inconsistencies in the data format; not all metadata documents had the same labels even though they were exported in the same format. We needed to convert all the metadata information into a single, standardized format to allow us to perform the necessary quantitative techniques. For ease of use, the BibTeX format was chosen. One disadvantage of the BibTeX format is that it does not allow to export non-standard metadata, such as research funding information or acknowledgments. Nonetheless, these pieces of information are not of relevance here. The main bibliometric indicators included in the dataset are author, document title, journal, year, citation, affiliation, abstract, author keywords, references, publisher, language and document type. Data Cleaning As the same keywords were used in all the consulted databases, there were several inconsistencies as well, such as duplicated documents and misspelled names. Regular expressions were used to correct the duplication of names with different spelling.2 Duplicated papers and book chapters were also deleted from the dataset. The top 30 cited documents in the gathered literature were manually reviewed to ensure that they belonged to the EPL field in order to eliminate false-positive results.

2.3.2 Bibliometric Techniques We used R as the main programming language. In particular, we used the bibliometrix package,3 developed by Massimo Aria and Corrado Cuccurullo (2017). Bibliometrix is an open-source R package for quantitative research in scientometrics and bibliometrics, mainly focused on science mapping. This work’s central units of bibliometric analysis are sources, authors, and academic documents. We will show the most relevant sources within EPL, in particular, the main academic journals and publishers; the authors of the articles and books on the subject with the highest academic impact; and the academic papers, mainly articles and books published in recent years. In addition, we focus on knowledge structures (or K structures). As Aria and Cuccurullo (2017) highlight, there are three main K structures to be analyzed: conceptual, intellectual, and social.

2

Regular expressions are formal expressions extensively used to process strings, i.e., sequences of characters (Mitkov, 2003). 3 https://github.com/massimoaria/bibliometrix.

2 A Bibliometric Analysis of Experimental Philosophy of Language

19

Regarding the conceptual structures, the bibliometrix package allows us to use several analytical tools for bibliometric purposes, such as Multidimensional Scaling (MDS), Correspondence Analysis (CA), or Multiple Correspondence Analysis (MCA). Multidimensional scaling is “an exploratory data analysis technique that attains this aim by condensing large amounts of data into a relatively simple spatial map that relays important relationships in the most economical manner” (Jaworska & Chupetlovska-Anastasova, 2009). Correspondence analysis aims to convert a data frame into two sets of factor scores (Abdi & Williams, 2010). This factor score gives us a structural representation of the dataset to explore the possible relationship among qualitative variables present in the data. Multiple correspondence analysis is an extension of correspondence analysis that allows us to analyze the pattern of relationships of several qualitative variables (Abdi & Valentin, 2007). In addition, the results can be summarized and displayed in a two-dimensional plot for a better understanding of the data. In intellectual structures, the units of analysis are references, authors, and journals, where a network analysis through citation and co-citation techniques is performed. Co-citation analysis attempts to identify high-density areas in citation networks by clustering highly co-cited documents. Co-citation analysis is one of the most common analytical techniques to map scientific disciplines and their literature. This tool uses the metadata to generate the relevant clusters of documents within the field: The co-citation cluster structure is constructed as follows. From the reference lists of a set of publications published within a given period, for instance a year, documents are selected that are cited more than a specified number of times (the citation threshold). Out of these cited documents, pairs are selected that co-occur relatively frequently in the reference lists of publications in the dataset, i.e., these pairs measure up to some specified co-citation strength threshold (Braam et al., 1991, p. 234).

Finally, in social structures, the aim is to seek the degree of collaboration among academics in EPL, where a scientific collaboration network is measured using social network analysis. The unit of analysis is co-authorship academic work, one of the most widely known forms of scientific collaboration (Glänzel & Schubert, 2004).

2.4 Results and Discussion This section will show the statistical results for the three main bibliometric parameters: authors, sources, and documents. In particular, we will focus on the most devoted and local cited journals in the field, the most fertile and cited authors and countries, the degree of collaboration between countries, the most influential and cited documents in the dataset, and thematic evolution of EPL.

20

J. Osorio-Mancilla

Fig. 2.1 Annual production within the EPL field, from 2004 to 2021. The compound annual growth rate (CAGR) of publications over time is 19,61%

2.4.1 Main Information The data scrapping, data converting, and data cleaning results are 215 documents published from 2004 to 2021 in 80 different sources. Research papers constituted 84.65% (n = 182) of the EPL literature; book chapters constituted 7.44% (n = 16); books constituted 5.12% (n = 11) and reviews constituted 2.79% (n = 6). A compound annual growth rate (CAGR) was performed to check the percentage of growth over time, with an annual growth rate of academic documents of 19.61% (Fig. 2.1),4 revealing that EPL has constantly been growing from the earliest works to the present day, enjoying significant interest from the academic community. Although it is still a young research field, the number of publications grew promptly within 5 years (2004–2009). The years in which this growth spiked most rapidly were from 2008 to 2009. Part of the explanation may be the publication of the book Experimental Philosophy by Joshua Knobe and Shaun Nichols, one of the most widely known books in the experimental philosophy field.

4

Some of the plots presented in this work were made with the bbplot R package (https://github. com/bbc/bbplot).

2 A Bibliometric Analysis of Experimental Philosophy of Language

21

2.4.2 Sources Regarding peer-reviewed papers, 42 journals (58.33%) published only one paper on EPL, while the rest of them, 30 journals (41.67%), published two or more. Regarding academic books, four editorials (33.33%) published a single book on the subject, while three editorials (66.66%) published two or more. One way to better understand the distribution and influence of the journals with a higher academic subject output is by means of Bradford’s Law. This model attempts to capture how the literature on a particular subject is distributed in journals: If scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to the subject and several other groups of zones containing the same number of articles as the nucleus (Drott, 1981).

The plot’s first zone represents the journals devoted to the given subjects (Fig. 2.2). In EPL, the core of Bradford’s Law are five journals (6.94%), which published more than a third of the research papers of the dataset: Cognition (14 papers), Synthese (13 papers), Review of Philosophy and Psychology (11 papers), Mind and Language (10 papers) and Philosophical Psychology (8 papers). In a bibliometric analysis, a distinction is usually made between local and global sources (Aria & Cuccurullo, 2017). Computed from the bibliometric indicator references, local sources are those within the dataset, while global sources are those outside the dataset. In this regard, the most relevant measurement is the local sources. The top 5 local cited journals in the dataset are Cognition (128 citations), Philosophy and phenomenological research (77 citations), Analysis

Fig. 2.2 Bradford’s Law representation in the EPL field. 5 journals (out of 62) published more than a third of the research papers of the dataset

22

J. Osorio-Mancilla

Fig. 2.3 Academic collaboration between countries in EPL. Multiple country publications (MCP) and Single country publication (SCP) measures is displayed. MCP indicates the number of documents in which there is at least one co-author from a different country

(73 citations), Philosophical studies (63 citations) and Mind and Language (47 citations). Bornmann, Lutz, Rüdiger Mutz, and Hans-Dieter Daniel. We can measure the source impact by measuring the different indexes proposed in the literature (Fig. 2.3). One of the most widely known and used indexes is the Hirsch index (h-index). The h-index of an author (or, in this case, a journal) is the number h of published articles, each of which has been cited in other papers at least h times (Hirsch, 2005). Another index is the m-index, defined as H/n, where H is the h-index and n is the number of years since the first published paper in the journal (Bornmann et al., 2008). Lastly, we can measure the impact of the journals by the g-index, introduced by Leo Egghe as an improvement of the h-index to measure the global citation performance of a set of papers (Egghe, 2006). Regarding journal dynamics, journals have been publishing steadily regarding source growth. Among them, the fastest growth in less time has been from Synthese, which from 2019 to 2021 has published more than twice as many articles in 2019. Moreover, Cognition is the only journal that has been published since the starting year of the dataset (see Table 2.1).

2.4.3 Authors and Countries In addition to the sources, we can check the statistical information of those authors who have more impact on EPL. Regarding most relevant authors in the dataset, top 10 most prolific authors are Nat Hansen with 12 documents (6,59%), Edouard

2 A Bibliometric Analysis of Experimental Philosophy of Language

23

Table 2.1 Source impact by h-index and its generalizations

Journal Cognition Mind and Language Philosophical Psychology Review of Philosophy and Psychology Synthese Philosophy and Phenomenological Research Journal of Semantics Philosophy Compass Semantics and Pragmatics Erkenntnis

h-index 11 8 7 7 7 6 5 5 5 4

g-index 14 10 8 10 12 6 6 5 5 4

m-index 0,579 0,571 0,500 0,538 0,500 0,429 0,429 0,455 0,357 0,500

N. of papers 14 10 8 10 13 6 6 5 5 4

Machery with 10 documents (5,49%), Eugen Fischer with 10 documents (5,49%), Emmanuel Chemla with 9 documents (4,95%), Joshua Knobe with 9 documents (4,95%), Joseph Ulatowski with 5 documents (2,75%), Jussi Haukioja with 5 documents (2,75%), Paul Engelhardt with 5 documents (2,75%), Ron Mallon with 5 documents (2,75%) and Shaun Nichols with 5 documents (2,75%). In addition, the most prolific affiliations are mostly from the United States and the United Kingdom. This data is extracted from the bibliometric indicator affiliation and is calculated based solely on the institutional affiliation of the first author of the document. The ten most prolific affiliations are Yale University, University of East Anglia, University of Pittsburgh, University of Reading, Rutgers University, University of Genoa, Johns Hopkins University, University of Warsaw, the Norwegian University of Science and Technology, and University of Arizona. Another bibliometric measurement that can be performed for the EPL dataset is by means of Lotka’s Law. Lotka’s law is a bibliometric law proposed to measure the frequency of publication by authors in any given field (Lotka, 1926). In Lotka’s law, the number of authors publishing a certain number of documents is a fixed ratio to the number of authors publishing a single article (Friedman, 2015). Lotka’s law divides the authors into two main groups: “core” authors, who have published more than one document, and “occasional” authors, who have published just one document in the field. In this case, out of 269 authors, 205 (76,20%) have published just one document, whereas 64 (23,79%) have published two or more documents in the field (see Table 2.2). Regarding international collaboration, an analysis of multiple countries publication (MCP) and single country publication (SCP) allows us to check the documents that have been written alone or in co-authorship between countries. It is a measurement of the collaboration intensity of a country. The top 4 countries whose authors have collaborated with other authors from different countries are as follows: France (80% of the documents), Norway (50% of the documents), Estonia (50% of the documents), Finland (50% of the documents). Aside from France, this high international collaboration percentage is partly due to these countries’ low number of publications (three or fewer). In addition, the most cited countries in the EPL

24 Table 2.2 Lotka’s law table in the EPL field

J. Osorio-Mancilla

Documents written 1 2 3 4 5 9 10 12

N. of authors 205 38 9 6 6 2 2 1

Proportion of authors 0.762 0.141 0.033 0.022 0.022 0.007 0.007 0.004

field are the United States (4545 citations), the United Kingdom (636), France (496 citations), Finland (420 citations), and Canada (270 citations).

2.4.4 Documents The Collaboration index (CI) is usually computed to measure the multi-author collaboration within the documents. This index is calculated as the total authors of multi-authored documents/total multi-authored documents (Aria & Cuccurullo, 2017; Elango & Rajendran, 2012). In the EPL literature, more than half (66,51%) of the articles in the dataset are multi-authored documents, with a total collaboration index of 1.34. The years analyzed can be divided into three parts to check the collaboration dynamics in a more granular way. From 2004 to 2009, out of 25 documents, 16 of them (64%) were co-authored; from 2010 to 2015, out of 80 documents, 39 of them (48.75%) were co-authored; finally, from 2016 to 2021, out of 126 documents, 72 of them (57.14%) were co-authored. This reveals at least two critical things: first, the interest in EPL is growing in the contemporary philosophy community. In the last 5 years (2016–2021), more than half of the documents have been published than in the previous 11 years (2004–2015). Second, the coauthorship trend in the literature is visible and rising. It can be partially explained by the fact that the experimental nature of this work requires collaboration and coordination among researchers for a satisfactory development of the studies. In sum, we can see a growth increment in the collaboration in the field. We can track the most cited documents in the dataset from local and global citation scores. Local citations measure how many times an author included in this collection has been cited by other authors also in the collection (Aria & Cuccurullo, 2017). Global citations measure how many times an author included in this collection has been cited by other authors not in the collection. Among the most local cited documents in the dataset, Edouard Machery’s et al. “Semantics, cross-cultural style” and “Linguistic and metalinguistic intuitions in the philosophy of language,” along with Max Deutsch’s “Experimental philosophy and the theory of reference” and Genoveva Martí’s “Against semantic multi-culturalism” research papers are the most local cited documents. On the other hand, the paper of

2 A Bibliometric Analysis of Experimental Philosophy of Language

25

Fig. 2.4 Conceptual structure map of EPL. Two main clusters with their associated words

Edouard Machery et al., “Semantics, cross-cultural style,” and Michael Devitt’s “Experimental Semantics” are the most global cited documents present in the dataset. Additionally, multidimensional scaling is performed. Multidimensional scaling is a statistical technique for estimating values using proximity measures defined over pairs of objects (Davidson & Sireci, 2000). In other words, it is a graphic representation of a relationship between objects. In this case, the objects are the words that appear in the documents’ abstracts. Multidimensional scaling is applied as an exploratory tool to reveal patterns and relationships through the proximity of the available data. A conceptual structure map is generated (Fig. 2.4). The proximity between words exhibits two things: (1) keywords are close to each other because many articles treat them together; (2) they are distant when only a small fraction of articles use them together. Also, the center of the plot represents the center of the research field (i.e., common and large shared words/topics) (Cuccurullo et al., 2016). Finally, we performed a thematic evolution analysis. Thematic evolution analysis relies on co-word analysis in a longitudinal framework (Cobo et al., 2011). The dataset is mapped employing term interaction patterns, dealing with groups of terms shared across texts. We set 3 cutting points and split the dataset into 4-time slices (from 2004 to 2008, 2009 to 2013, 2014 to 2018, and 2019 to 2021). In addition, we analyzed the cumulative occurrences of bigrams (i.e., in this context, two-words expressions) within the abstracts and keywords.5 Cumulative occurrences show 5

Notice that with bigrams, the absolute frequency of expressions is lower than with unigrams.

26

J. Osorio-Mancilla

different trends: on the one hand, an expression may be kept used over time, and its occurrences grow; on the other, the number of occurrences becomes stable; this is, the expression is not further used in the following abstracts and keywords. In general terms, not much thematic change is observed. There is no significant variation in the use of familiar words and expressions within EPL. This indicates that nearly every topic studied at the start of the dataset is still of interest to the EPL community, which suggests thematic continuity. For example, expressions such as “cross-cultural variation” or “proper names” is constantly growing in the literature, which indicates that they are still widely used since the start year of the dataset. Additionally, slight tendencies in the use of certain expressions can be observed. For instance, “semantic intuitions” became stable in 2017, and “linguistic intuitions” became stable in 2016. Finally, the expression “linguistic corpora” starts to gain prominence from 2020 onwards because of its increased frequency of use within the dataset. It is worth noting that this study carries some limitations. First, we are just looking at English-language works; therefore, we may miss some crucial documents in other languages. Second, the study relies on limited sample size because it is a developing field that is expanding. Third, because the study is focused on the period from 2004 to 2021, some earlier and newer works are not included in the dataset. For example, the number of researchers engaging with corpus linguistics techniques seems to be growing significantly, yet papers published in 2022 and future work have not been taken into account for this analysis. This means that some statistical results may be slightly different, although it is not expected to be dramatic.

2.5 Conclusions This chapter analyzes 215 documents from the EPL literature from 2004 to 2021. The employed methodology adheres to standard bibliometrics practices, using network visualization, co-citation analysis and clustering techniques, among other statistical tools. Automatic and manual methods were first used to obtain the metadata associated with all the available publications on the different virtual academic platforms that store books, research papers, and reviews, such as Scopus, Web of Science, and PhilPapers. The programming language used to carry out the statistical analysis was R, in particular the bibliometrix package, an open-source tool for quantitative research in scientometrics and bibliometrics. The work focuses on three main dimensions: sources, authors, and documents. Among all the available analyses, Bradford’s law was used to determine the distribution and influence of the most important journals within EPL, revealing that four journals (out of 61 in total) publish the majority of academic papers in the discipline: Synthese, Review of Philosophy and Psychology, Cognition, Mind and Language and Philosophical Psychology. The impact of the sources was also checked through several of the most well-known indexes in academia: hindex, g-index, and m-index. Concerning authors, the work focuses on checking

2 A Bibliometric Analysis of Experimental Philosophy of Language

27

the most prolific authors in the discipline, as well as their institutional affiliation. In addition, international collaboration among academics is checked through coauthorships of academic work and visualized. Documents are also analyzed. The collaboration index is assessed by measuring collaboration in the documents of the dataset. The data show that, since its inception, the discipline has been eminently collaborative and continues to grow among scholars within the EPL. The most cited papers have also been presented from local and global citation scores. Finally, a multidimensional scaling technique allows us to represent in two dimensions the conceptual relationship between the most frequent terms within the documents. This chapter provides a comprehensive view of EPL at the academic community level. We hope this will be useful for those interested in the EPL subfield, be it scholars, publishers, or anyone interested in novel applications of empirical methodologies. Future research could be carried out to overcome the limitations of this one, such as including other languages in the dataset or increasing the time period for a more detailed analysis. Acknowledgments The research for this chapter was supported by the Ministerio de Economía y Competitividad (grant number FFI2017-87395-P).

References Abdi, H., & Valentin, D. (2007). Multiple correspondence analysis. In N. J. Salkind (Ed.), Encyclopedia of measurement and statistics. Sage. Abdi, H., & Williams, L. J. (2010). Correspondence analysis. In N. J. Salkind (Ed.), Encyclopedia of research design (pp. 267–278). Sage. Alexander, J., Mallon, R., & Weinberg, J. M. (2010). Accentuate the negative. Review of Philosophy and Psychology, 1, 297–313. Alfano, M., Machery, E., Plakias, A., & Loeb, D. (2022). Experimental moral philosophy. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy. https://plato.stanford.edu/archives/ fall2022/entries/experimental-moral/ Aria, M., & Cuccurullo, C. (2017). Bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11, 959–975. Bluhm, R. (2013). Don’t ask, look! Linguistic corpora as a tool for conceptual analysis. In M. Hoeltje, T. Spitzley, & W. Spohn (Eds.), Was dürfen wir glauben? Was sollen wir tun? Sektionsbeiträge des achten inter- nationalen Kongresses der Gesellschaft für Analytische Philosophie (pp. 7–15). DuEPublico. Bluhm, R. (2016). Corpus analysis in philosophy. In M. Hinton (Ed.), Evidence, experiment and argument in linguistics and the philosophy of language (pp. 91–109). Peter Lang. Bordonaba-Plou, D. (2021). An analysis of the centrality of intuition talk in the discussion on taste disagreements. Filozofia Nauki, 29, 133–156. Borg, A. M., Frey, D., Šešelja, D., & Straßer, C. (2017). Examining network effects in an argumentative agent-based model of scientific inquiry. In A. Baltag, J. Seligman, & T. Yamada (Eds.), Logic, rationality, and interaction (pp. 391–406). Springer. Borg, A. M., Frey, D., Šešelja, D., & Straßer, C. (2018). Epistemic effects of scientific interaction: Approaching the question with an argumentative agent-based model. Historical Social Research, 43, 285–309.

28

J. Osorio-Mancilla

Bornmann, L., Mutz, R., & Daniel, H.-D. (2008). Are there better indices for evaluation purposes than the h-index? A comparison of nine different variants of the h-index using data from biomedicine. Journal of the American Society for Information Science and Technology, 59, 830–837. Braam, R. R., Moed, H. F., & van Raan, T. (1991). Mapping of science by combined co-citation and word analysis: II: Dynamical aspects. Journal of the American Society for Information Science, 42, 252–266. Cappelen, H. (2012). Philosophy without intuitions. Oxford University Press. Caton, J. N. (2020). Using linguistic corpora as a philosophical tool. Metaphilosophy, 51, 51–70. Check, J., & Schutt, R. K. (2012). Survey research. In J. Check & R. K. Schutt (Eds.), Research methods in education. Sage Publications. Cobo, M. J., López Herrera, A. G., Herrera-Viedma, E., & Herrera, F. (2011). An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the Fuzzy Sets Theory field. Journal of Informetrics, 5, 146–166. Cuccurullo, C., Aria, M., & Sarto, F. (2016). Foundations and trends in performance management. A twenty-five years bibliometric analysis in business and public administration domains. Scientometrics, 108, 595–611. Davidson, M. L., & Sireci, S. G. (2000). Multidimensional scaling. In H. E. A. Tinsley & S. D. Brown (Eds.), Handbook of applied multivariate statistics and mathematical modeling (pp. 323–353). Academic. Devitt, M. (2015). Testing theories of reference. In J. Haukioja (Ed.), Advances in experimental philosophy of language (pp. 31–65). Bloomsbury. Drott, C. M. (1981). Bradford’s law: Theory, empiricism and the gaps between. Library Trends Summer, 30(Special Issue on Bibliometrics), 41–52. Egghe, L. (2006). Theory and practise of the g-index. Scientometrics, 69, 131–152. https://doi.org/ 10.1007/s11192-006-0144-7 Elango, B., & Rajendran, P. (2012). Authorship trends and collaboration pattern in the marine sciences literature: A scientometric study. International Journal of Information Dissemination and Technology, 2, 166–169. Fletcher, S. C., Knobe, J., Wheeler, G., & Woodcock, B. A. (2021). Changing use of formal methods in philosophy: Late 2000s vs. late 2010s. Synthese, 199, 14555–14576. Friedman, A. (2015). The power of Lotka’s law through the eyes of R. Romanian Statistical Review, 2, 69–77. Genone, J. (2012). Theories of reference and experimental philosophy. Philosophy Compass, 7, 152–163. Glänzel, W., & Schubert, A. (2004). Analysing scientific networks through co-authorship. In H. F. Moed, W. Glänzel, & U. Schmoch (Eds.), Handbook of quantitative science and technology research (pp. 257–276). Kluwer Academic Publishers. Grimm, P. (2009). Threshold phenomena in epistemic networks. In AAAI fall symposium: Complex adaptive systems and the threshold effect (pp. 53–60). AAAI Press. Grimm, P., Singer, D. J., Fisher, S., Bramson, A., Berger, W. J., Reade, C., Flocken, C., & Sales, A. (2013). Scientific networks on data landscapes: Question difficulty, epistemic success, and convergence. Episteme, 10, 441–464. Hansen, N. (2015). Experimental philosophy of language. Oxford Handbooks Online. Haukioja, J. (2015). Introduction. In J. Haukioja (Ed.), Advances in experimental philosophy of language (pp. 1–7). Bloomsbury. Hegselmann, R., & Krause, U. (2002). Opinion dynamics and bounded confidence: Models, analysis, and simulation. Journal of Artificial Societies and Social Simulation, 5(3), 1–33. Hintikka, J. (1999). The Emperor’s new intuitions. Journal of Philosophy, 96, 127–147. Hinton, M. (2021). Corpus linguistics methods in the study of (meta)argumentation. Argumentation, 35, 435–455. Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences, 102, 16569–16572.

2 A Bibliometric Analysis of Experimental Philosophy of Language

29

Jaworska, N., & Chupetlovska-Anastasova, A. (2009). A review of multidimensional scaling (MDS) and its utility in various psychological domains. Tutorials in Quantitative Methods for Psychology, 5, 1–10. Jylkkä, J., Railo, H., & Haukioja, J. (2009). Psychological essentialism and semantic externalism: Evidence for externalism in lay speakers’ language use. Philosophical Psychology, 22, 37–60. Kalantari, A., Kamsin, A., Kamaruddin, H. S., Ebrahim, N. A., Gani, A., Ebrahimi, A., & Shamshirband, S. (2017). A bibliometric approach to tracking big data research trends. Journal of Big Data, 4, 1–18. Knobe, J, & Nichols, S. (2017). Experimental philosophy. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy. https://plato.stanford.edu/archives/win2017/entries/experimentalphilosophy/ Lotka, A. J. (1926). The frequency distribution of scientific productivity. Journal of the Washington Academy of Sciences, 16, 317–324. McEnery, T., & Hardie, A. (2011). Corpus linguistics: Method, theory and practice. Cambridge University Press. Mitkov, R. (2003). The Oxford handbook of computational linguistics. Oxford University Press. Nichols, S., Pinillos, Á., & Mallon, R. (2015). Ambiguous reference. Mind, 125, 145–175. O’Connor, C., & Weatherall, J. O. (2018). Scientific polarization. European Journal for Philosophy of Science, 8, 855–875. Railsback, S. F., & Grimm, V. (2019). Agent-based and individual-based Modeling. Princeton University Press. Repanovici, A. (2011). Measuring the visibility of the university’s scientific production through scientometric methods: An exploratory study at the Transilvania University of Brasov, Romania. Performance Measurement and Metrics, 12, 106–117. Šešelja, D. (2021). Some lessons from simulations of scientific disagreements. Synthese, 198, 6143–6158. Šešelja, D., Borg, A. M., & Straßer, C. (2020). Formal models of scientific inquiry in a social context: An introduction. Journal for General Philosophy of Science, 51, 211–217. Stich, S. (1990). The fragmentation of reason. MIT Press. Strickland, B., & Suben, A. (2013). Experimenter philosophy: The problem of experimenter bias in experimental philosophy. Review of Philosophy and Psychology, 3, 457–467. Weinberg, J. (2016). Going positive by going negative: On keeping x-phi relevant and dangerous. In J. Sytsma & W. Buckwalter (Eds.), A companion to experimental philosophy (pp. 71–87). Wiley Blackwell. Williamson, T. (2016). Philosophical criticisms of experimental philosophy. In J. Sytsma & W. Buckwalter (Eds.), A companion to experimental philosophy (pp. 22–37). Wiley Blackwell. Zollman, K. (2007). The communication structure of epistemic communities. Philosophy of Science, 74, 574–587. Zollman, K. (2010). The epistemic benefit of transient diversity. Erkenntnis, 72, 17–35. Zollman, K. (2013). Network epistemology: Communication in epistemic communities. Philosophy Compass, 8, 15–27.

Javier Osorio-Mancilla is a PhD candidate in the Department of Logic and Philosophy of Science at the Autonomous University of Madrid, specializing in social dynamics of science and agent-based modeling. His research focuses on disagreements and polarization in scientific communities, and in ways to adopt a data-driven approach in agent-based models of scientific inquiry.

Chapter 3

Experimental Philosophy and Ordinary Language Philosophy Masaharu Mizumoto

Abstract This chapter tries to elucidate the complex relationship between ordinary language philosophy (OLP) and experimental philosophy (X-Phi) from the perspective of the contrast between the positive and the negative programs of XPhi. I will first show the relevance of language to the various fields of contemporary philosophy, through what I call the Argument from Cross-Linguistic Diversity and the Argument from Intra-Linguistic Variance, together with empirical data. This will partly vindicate OLP, which is generally thought to be obsolete today. I will then examine the reasons for the demise of OLP, and show that the contemporary metaphilosophical debates over X-Phi are in fact a revival of the debates in the heyday of OLP. This will then also indicate a parallel between Wittgenstein’s negative program, called quietism, and the negative program of X-Phi, especially Stich’s. The positive program of X-Phi can no doubt contribute to science, but the question of whether it is philosophy may depend on our conception of philosophy. The negative program of X-Phi is no doubt philosophy, but the question is whether it can make any positive contribution to philosophy, let alone science. I will answer “yes” to it, by sketching a radical negative picture of philosophy in general.

3.1 Introduction: Experimental Philosophy and Ordinary Language Philosophy In this chapter we try to sketch a friendly relationship between ordinary language philosophy and experimental philosophy from the perspective of the contrast between the positive and the negative programs of experimental philosophy. As we shall argue, the relation is more than just a mutual complementation, but also with a shared meta-philosophical attitude or inclination.

M. Mizumoto () Japan Advanced Institute of Science and Technology, Nomi, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Bordonaba-Plou (ed.), Experimental Philosophy of Language: Perspectives, Methods, and Prospects, Logic, Argumentation & Reasoning 33, https://doi.org/10.1007/978-3-031-28908-8_3

31

32

M. Mizumoto

The connection between experimental philosophy (hereafter X-Phi) and ordinary language philosophy (hereafter OLP) has been previously pointed out. For example, Max Deutsch suspects that experimental philosophers are mostly skeptical of a priori knowledge (and assume that most other philosophers are too), and according to him that is really what really motivates them to do empirical research (Deutsch, 2009, p. 460). For him, what follows from this is that experimental philosophers “end up treating [traditional] philosophers as though they are all, deep down, ordinary language philosophers. All philosophy, they suppose, is ordinary language philosophy, but dressed up in way that masks its true nature—it is ordinary language philosophy in disguise” (ibid.). Presumably, the difference between experimental philosophers and ordinary language philosophers is only whether they do empirical surveys or not. Constantine Sandis is also explicit on this and calls contemporary experimental philosophy “experimental linguistic philosophy” (ELP), because “it is primarily an investigation into what people will ordinarily say when asked certain questions (though for the most part these are not explicitly linguistic or conceptual)” (Sandis, 2010, p. 181). According to him, both ELP and OLP “focus on everyday concepts and take ordinary usage to be philosophically relevant.” Given such connection, it seems that it is linguistic (in particular, semantic) facts that experimental philosophers are mainly concerned with. In fact, in his groundbreaking work of experimental philosophy, Josh Knobe had said that even though ordinary language does not constitute a “court of final appeal”, it “does seem plausible that the examination of ordinary language might provide us with some useful guidance about difficult cases [ . . . ]” (Knobe, 2003, p. 190). Furthermore, note that the title of this paper was “Intentional action and side effects in ordinary language”. So, to this extent at least, experimental philosophers share the interest in the use of the relevant term of ordinary people with ordinary language philosophers. However, “ordinary language philosophy” is a derogatory label for many (experimental or armchair) philosophers today, except some philosophers who explicitly defend OLP against criticisms (Hanfling, 2000; Baz, 2012) or admits that his methodology is that of OLP (DeRose, 2005). If the current status of OLP is due to the difficulties other philosophers think it faces, then, given the above connection, it is natural to expect that X-Phi faces the same problems (other than lack of empirical evidence) that made OLP obsolete (which we shall discuss later). Perhaps for this reason, Knobe later began to say that experimental philosophy is rather mainly concerned with how the mind works (e.g., Knobe, 2007; Knobe & Nichols, 2008; Knobe, 2016), rather than supplementing the armchair study of concepts empirically. Still, at least we may say that almost all the studies of the mainstream experimental philosophy are concerned with either 1. The relevant concept underlying the use (or the judgment about the use) of the corresponding term(s) of ordinary language by ordinary folks, or 2. The underlying cognitive process or psychological mechanism (how the mind works) of ordinary folks,

3 Experimental Philosophy and Ordinary Language Philosophy

33

or both.1 In many cases both should be relevant, and it is in general very hard to sort them apart clearly, regardless of the intentions of the experimental philosophers who conduct the survey. Call the studies devoted mainly to investigating 1 Type-1 studies and those devoted mainly to investigating 2 Type-2 studies. Knobe thinks that the vast majority of X-Phi is concerned with Type-2 studies in this sense, rather than Type-1 studies (Knobe, 2016). Against Type-2 studies, Stich and other experimental philosophers of the negative program (which is mainly concerned with revealing the limits of the use of intuitions as a method in philosophical inquiry) express a worry about them, suggesting that they will face “an obvious challenge”, whether they are philosophy rather than psychology (cf. Stich & Tobia, 2016, n. 10). Indeed, many X-Phi studies can still be, at least, taken to be of Type-1, concerned with a particular concept, such as knowledge, knowledge-how, intentional action, personal identity, reference, free will, etc. Moreover, the first three of them do have the relevant linguistic expressions, like “know”, “know how to ϕ”, “intentionally ϕ”, and possibly even personal identity (“the same person”) and reference (“refer to”, “mean”, “about”). It is true that even many armchair philosophers do not agree that such topics are about the relevant concepts, let alone the relevant linguistic expressions. But that is because such philosophers think that they are concerned with properties and relations in nature. According to Knobe, on the other hand, apparent Type-1 studies of a concept in fact together provide data for underlying cognitive processes, and many experimental philosophers are trying to construct general theories about such cognitive processes, which will explain our judgments concerning many other concepts, rather than trying to construct a theory about a particular concept. It seems obvious, however, that there is also the influence in the opposite direction: concepts also affect cognitive processes. It is even possible that the cognitive process itself consists of interaction between concepts, given that thought is composed of concepts (see Sect. 2 of Mizumoto, 2018b). Against this, Knobe and others may argue that cognitive processes involve non-conceptual sub-personal ones. However, although such processes underly mental processes, since mental processes are (qua what one does) at the personal level, arguably (as long as we are mentally healthy) we are also ultimately responsible for the sub-personal processes (if not those in pathological cases) as part of what we do (embedded in the “space of reasons”). In this sense, all the cognitive processes occur at the personal level, to which (social or linguistic) norms apply.

1

Knobe and Nichols (2008) considered three goals of X-Phi in their Experimental Philosophy Manifesto: (1) investigate the psychological sources of our intuitions and determine whether or not they are warranted, (2) sort out intuitions that are universally shared from those which are not, varying with culture, language, gender, and other demographic factors, and (3) study patterns in people’s intuitions about cases to investigate how the mind works. In the present context, (1) may determine whether intuitions can be explained by the underlying relevant concept(s) or not, and if they are, the cross-linguistic studies we shall see in the following sections are of what we shall call Type-1 below, belonging to (2), whereas if they are not, such studies are of what we shall call Type-2, corresponding to (3).

34

M. Mizumoto

This is of course a difficult issue, the adequate treatment of which would require a book-length discussion. However, those who favor Type-2 studies may rather simply admit this point and still claim that, even so, the studies of individual concepts are not particularly interesting. Usually, of course, psychologists who are interested in cognitive processes are not concerned with languages, and at least try to keep the effect of the linguistic factor to a minimum. But it is not clear that philosophers can be similarly indifferent to language, treating it as if transparent. For example, in epistemology, in response to contextualism, renewed interest in knowledge attribution is now recognized, in what is even called the “new linguistic turn” (Ludlow, 2005; Sect. 1.2 of Brown and Gerken, 2012). Presumably, this trend even emerged independently of X-Phi (Brown and Gerken take X-Phi rather as a cause of “cognitive turn”). However, we can provide here arguments for the constitutive relevance of language to philosophy in general, based on X-Phi studies, to the effect that the studies of concepts are even essential for adequate Type-2 studies, which we shall see in the next two sections. Thus, the picture of the relationship between OLP and X-Phi we will sketch in this chapter is going to be a friendly and intimate one, which is more than just a supplementary picture. Note that this kind of consideration is especially important today, in view of the recent trend of the use of corpus methods in X-Phi (e.g., Bluhm, 2012, 2013; Sytsma et al., 2019; Mejía-Ramos et al., 2019; Ulatowski et al., forthcoming; see Bluhm, 2016 for a brief overview), because of the obvious connection between this approach and OLP.2 In the next two sections (Sects. 3.2 and 3.3), we try to show the relevance of language to the various fields of contemporary philosophy, through what I call the Argument from Cross-Linguistic Diversity and the Argument from IntraLinguistic Variance, together with empirical data, which will thereby set the stage for defending OLP in the two sections that follow (Sects. 3.4 and 3.5). There, we will see some data of X-Phi in relation to OLP, which we argue supplement and defend OLP. Next two sections (Sects. 3.6 and 3.7) concern OLP both as a target of negative X-Phi and something that rather helps X-Phi, and they will in turn give us a hint about the nature of X-Phi, or what X-Phi does, which will be discussed in the next section (Sect. 3.8), followed by concluding remarks (Sect. 3.9). Before proceeding, however, let us note that our primary aim here is to sketch an alternative picture of X-Phi within the contemporary meta-philosophical debate, rather than present a specific philosophical thesis to be defended. In particular, we are not interested in refuting any other views, even if they are incompatible with the present picture. Our hope is to draw this picture as plausible as possible.

2

I would like to thank an anonymous reviewer for bringing up this point. Note also that, this approach covers both Type-1 and Type-2, though traditionally its typical application has been to Type-1 studies.

3 Experimental Philosophy and Ordinary Language Philosophy

35

3.2 Argument from Cross-Linguistic Diversity What we call the argument from cross-linguistic diversity starts from the possibility of radical linguistic variance in use of the relevant term or phrase used in philosophy. If there is such a variance, it follows that philosophy relies on the contingent features of the relevant term or phrase in a particular language that captures the concept in question. But if so, ordinary language is relevant to a variety of branches of philosophical inquiry (metaphysics, epistemology, philosophy of mind, etc.). Epistemologists might hold that linguistic features of English “know” are irrelevant to knowledge itself, a relation instantiated in nature. Whether or not such naturalism is correct, or at least convincing, is not an issue we can discuss in detail here. Some Anglophone philosophers have taken knowledge to be a natural kind, but if there is a radical cross-linguistic variance in knowledge verbs of different languages, what they take to be knowledge qua a natural kind may be very different from what philosophers with a different language take to be knowledge qua a natural kind. Stich and Mizumoto (2018) discussed possible consequences of such findings, while they called the thesis that that will not happen (the properties of “know” being shared by its counterparts in almost all languages) the Universality Thesis (hereafter UT). UT is an empirical thesis, and if it is false, we do not know which features are specific to English “know” and which are features of languageindependent universal knowledge (if there is such a thing), unless we know much about knowledge verbs of other languages. Indeed, if the disagreement is radical, it is hard to expect that any empirical investigation will settle the question of the real language-independent nature of knowledge, contrary to what naturalists would expect. One might argue here that mere possibility is not enough to take the argument seriously, meaning that it does not pose any threat to philosophers who dismiss the relevance of ordinary language unless the suggested linguistic diversity is actual (i.e., UT is actually false). There are indeed such data being reported recently in Type-1 studies of X-Phi. For example, Mizumoto and his colleagues (Mizumoto et al., 2020a, b; Tsugita et al., 2022; Mizumoto et al., manuscript) found a systematic difference between knowledge-how captured by English “know how” and the one captured by Japanese knowing how constructions, in that English knowledge-how is mainly concerned with the possession of the relevant physical ability and the explicit belief about it, while knowledge-how in Japanese is mainly concerned with the possession of the (subjective) description of how one ought to do.3

3

David Bordonaba-Plou pointed out that, although the most natural Spanish translation of “know how” (“Sabe esquiar”, in the case of “She knows how to ski”) seems akin to the English verb, the literal translation (“Sabe cómo esquiar”) is similar to the Japanese one, being likely to be made true by the possession of a description of how one ought to do it, while there is no literal translation of “Sabe esquiar” in English. This is a very interesting case worth further considerations, but in the case of Japanese, there is no such gap between literal and natural translations.

36

M. Mizumoto

In one of their studies, American and Japanese participants were presented a vignette in which a protagonist who has never skied before just learns how to ski through reading books and watching videos, with no practice. When asked whether this protagonist knows how to ski, only 19% (95% CI: 13–28%) of English speakers said yes, whereas 68% (95% CI: 58–76%) of Japanese speakers said yes. On the other hand, to a vignette in which the protagonist got an operation while asleep, and became good at singing without knowing, impressively 79% (95% CI: 69–86%) of English speakers admitted that the protagonist knows how to sing, whereas only 17% (95% CI: 10–25%) of Japanese speakers did so. These differences are not only strongly significant (p < 0.0001, two-sided Fisher’s exact test), but their effect sizes are also huge (0.49 for the former and 0.62 for the latter).4 Thus, even though the details may be still open to dispute, there seems to be no doubt about the fact that knowing how constructions in Japanese and English capture different concepts, or different kinds of knowledge how (thus UT about knowledge-how is false), unless one is committed to a very implausible massive error theory (we shall come back to this point later). In this connection, Alva Noë once described (in his 2005) the approach of Stanley and Williamson (2001) as good old-fashioned Oxford philosophy (GOOP), clearly using it as a pejorative term. However, although both intellectualists and antiintellectualists in Anglophone analytic philosophy have assumed that at least the typical instances of knowledge-how (knowing how to ride a bike, how to swim, etc.) require physical ability (putting aside some exceptional cases like a pianist who lost one’s arm). If it is neither necessary nor sufficient for Japanese knowledge-how, the debate over intellectualism vs. anti-intellectualism itself now seems a local debate over English knowledge-how. In meta-philosophical debates concerning the methodology of philosophy, the status of folk intuitions has been discussed intensely. Here we are concerned with the linguistic difference, arguably reflecting the difference of the underlying linguistic concepts, with different linguistic norms. But if so, the present kind of results apparently support cross-linguistic pluralism about knowledge-how, where which language we speak is an essential factor to be always made explicit in the discussion of knowledge-how in philosophy.

3.3 Argument from Intra-Linguistic Divergence Even if we found radical cross-linguistic differences, however, they may not be differences of the underlying (linguistic) concepts, but rather due to culturalpsychological differences of the linguistic communities. If so, the study is a variant of the Type-2 study, where the different cognitive processes underlying the use of

Throughout this paper, the effect size is ϕ, where the effect size is small when ϕ is 0.1, medium when 0.3, and large when 0.5.

4

3 Experimental Philosophy and Ordinary Language Philosophy

37

relevant expressions will explain the difference of the results between linguistic communities while the same concept is shared. This, however, poses an independent problem for Knobe. One of the reasons why Knobe favors Type-2 studies over Type-1 studies is that he thinks that concepts are messy (there is only a “hodgepodge of facts” about concepts), whereas theories of cognitive processes can be “simple” in the relevant sense (Knobe, 2016, sec. 3.6). As we shall see, even ordinary language philosophers of some sort can agree with this view. However, while the use of the relevant term (or the judgments about the use) can be very subtle and messy, yet be explained by the simple underlying cognitive processes, there can also be the opposite case, that is, the concept might be very simple. For example, one may even claim that the concept of knowledge underlying the use of “know” is in fact nothing more than a true belief. There, all the complexity of the use of and the judgment about the use of “know” in Gettier (and many other) cases may be explained by underlying complex, “messy” cognitive processes. Even though this does not undermine Type-2 studies, it does seem to undermine one of the reasons for his preference of Type-2 over Type-1. Knobe can still claim that, if that is the case, Type-1 studies are all the more uninteresting. However, whether a particular (in this case, the simplistic) analysis of a concept is legitimate, or even whether such a concept is really too simple to be interesting, should also be a matter of disagreement within ordinary language philosophers. In this sense, different individuations of cognitive processes will lead to different analyses of a concept. If so, the individuation of a concept also requires or involves the identification of underlying cognitive processes, and Type 2 studies can be seen as serving the aim of Type 1 studies. In any case, data of diversity in attributions of mental states by relevant expressions between two linguistic communities mean either difference in meaning (linguistic diversity, discussed in, among others, lexical semantics) or one in belief or other propositional attitudes (arising from different cognitive processes, the topic of, among others, cultural psychology), and what we face here is the problem of sorting out the effects of Type-1 and Type-2, which is a version of the notorious problem of sorting out difference in meaning from difference in belief (for this problem in the cross-linguistic context, or radical interpretation, see Davidson, 1973, p. 134, 1975, p. 158). It is then natural to be pessimistic about the prospect of a clear boundary between them. However, what we may call the argument from intra-linguistic variance establishes that there should be non-controversial cases of relevant linguistic divergence, if there are two relevant terms or phrases in a single language whose uses can radically differ from each other in some contexts. For example, there are two knowledge verbs (for expressing propositional knowledge and knowledge-how respectively) in Japanese, shitte-iru and wakatteiru, whose uses in epistemologically interesting cases are sometimes radically different, with large effect sizes (Mizumoto, 2018a).5 See for example the difference 5

Note that the same effects were replicated in our subsequent online survey with a larger sample size, using participants with sufficient age variance (rather than using university undergraduates as in Mizumoto, 2018a).

38

M. Mizumoto

Fig. 3.1 Comparison of propositional knowledge attribution between two Japanese knowledge verbs. (From Mizumoto, 2018a)

in the judgment about the TrueTemp case, in which an agent suddenly becomes able to tell the temperature correctly, though she is not aware of it yet. To the question whether that agent knows the current temperature when she formed a true belief about it, most Japanese answered “Yes” when asked using “wakatte-iru,” while most Japanese denied the knowledge answering “No” to the question, when asked using “shitte-iru”, as in the following figure (Fig. 3.1). The effect size (ϕ) of the significant difference here (p < 0.0001) is 0.64, which is, again, huge. Mizumoto also reports the case of the opposite effect, where most Japanese attribute shitte-iru, but not wakatte-iru in what he called the Name case, in which the agent forgets the name of her old friend and cannot recall it and participants were asked whether the agent knows the name of her old friend. Note also that, even though the result here is that of the within-subject design survey, the same effect was found in the between-subject condition too (see Chap. 4 of Mizumoto, 2018a). Note, however, that the difference here is linguistic, rather than culturalpsychological, because the users of the two verbs are exactly the same people. But if the uses of the two verbs are so radically divergent in some cases, we can expect that at least the use of one of them (possibly both of them) is significantly different from the use of know in such cases (which undermines UT again). But if pluralism about knowledge is correct, we may even say here that (at least in some cases) whether someone has knowledge or not depends on language, in the sense

3 Experimental Philosophy and Ordinary Language Philosophy

39

that what counts as knowledge depends on the language that captures the concept of knowledge. For what is captured by shitte-iru and what by wakatte-iru are clearly distinct sorts of knowledge, if they are both knowledge verbs at all.6 Thus, the argument from intra-linguistic variance supplements the argument from cross-linguistic divergence. From the latter, then, it follows that armchair philosophers debating over knowledge-how and propositional knowledge, even if they think that the issue is normative (about what we ought to take knowledgehow/propositional knowledge), are in fact guided by the intuitions about the linguistic expressions of their own language, whether they are aware of it or not. In this sense, given that the ascription conditions vary so much between languages, such philosophers are not free from (the linguistic norms of) their own language unless they know linguistic norms of other languages, for otherwise they are not aware of what properties (considered necessary) are in fact specific to one’s own language and therefore contingent and what are essential to the concept in question. If so, however, most (if not all) of people’s concepts are linguistically constrained, since they learned such concepts through the use of relevant expressions, unless they are innate. Some philosophers hold that the very concept of knowledge is innate and universal,7 but the data we provided above would commit such a view to a massive error theory, allowing a radical gap between the concept people possess and the use of the relevant expression(s). We shall discuss such a view later, but at least in the case of propositional knowledge, those who are still reluctant to accept pluralism must be committed to the claim that either shitte-iru or wakatte-iru is not a knowledge verb (for expressing propositional knowledge), which is highly implausible given the otherwise almost identical usage of them.8

3.4 The Supplementary Picture of X-Phi Type-1 studies of X-Phi can be taken to supplement the armchair conceptual analysis with empirical data, in the sense that it provides empirical data of ordinary people’s responses where armchair philosophers have simply replied on their own intuitions. If the data is consistent with some theory, it will support that theory, rather than other theories. If the data is largely inconsistent with theorists’ assumptions or intuitions, that will be a typical work of the negative program of X-Phi. Even in the latter case, however, some philosophers may try to construct a new theory based on such data. Thus, either way, X-Phi supplements the armchair conceptual analysis. 6

One worry here is that the difference found here is wholly due to a pragmatic, rather than semantic, effect. See, however, Mizumoto (2021). 7 Even among experimental philosophers, Turri (2018) holds such a view, based on his data and his primatological approach to epistemology. Obviously, however, such a concept cannot specify all the details discussed in contemporary epistemology, and if so, it will even support the negative program we will discuss in later sections. 8 See Mizumoto (2018a, 2021) for more on this point.

40

M. Mizumoto

Call this picture of the role of X-Phi in contemporary philosophy, which seems often assumed (if implicitly) by both Type-1 experimental philosophers and many armchair philosophers, the supplementary picture of X-Phi. Against the supplementary picture, Knobe thinks that his “Knobe effect,” or the moral asymmetry of intention attribution, in which people tend to judge that a side effect was brought about intentionally when the side effect is morally bad, while they tend not to do so when the side effect is morally good (Knobe, 2003), is a typical instance of Type-2 X-Phi, not about the concept of intentional action, let alone the use of “intentionally” (Knobe, 2016). However, the linguistic factor still seems relevant to the Knobe effect, since, as Mizumoto (2018b) shows, the same moral asymmetry of judgements about intention can be produced even without any vignette, based only on felicity judgments about relevant sentences. He used various sentences stating a morally good/bad/neutral action (such as “He intentionally stole a purse”), asked participants to judge each of them whether it is natural, unnatural, or even wrong. The results showed significant differences depending on the morality of the action, where people tended to judge sentences stating a morally (intentional) good action “unnatural”. This suggests that the moral asymmetry of intentionality attribution is encoded at the linguistic level. Moreover, here again we find intra-linguistic variance. “Wazato”, one of the Japanese counterparts of English “intentionally,” lexically encodes the moral asymmetry (significantly) more explicitly, with more than 70% of participants judging “wrong” about a sentence (using “wazato”) stating a morally good action (intentionally improving the environment), as opposed to less than 30% judging so about the sentence stating the same action using “itoteki ni”, another Japanese counterpart of “intentionally.” This fact suggests the empirical possibility that there may be languages in which moral asymmetry is completely encoded at the linguistic level, on the one hand, and languages that show no moral asymmetry whatsoever, even if the equally strong Knobe effect is observed in speakers of those languages, on the other. But if the moral asymmetry is linguistically encoded, one may naturally wonder whether even armchair conceptual analysis could have revealed this moral asymmetry without any empirical study. Indeed, Gilbert Harman has already said, The reason why we say that the sniper intentionally kills the soldier but do not say that he intentionally shoots a bulls-eye is that we think that there is something wrong with killing and nothing wrong with shooting a bulls-eye. If the sniper is part of a group of snipers engaged in a sniping contest, they will look at things differently. From their point of view, the sniper simply makes a lucky shot when he kills the soldier and cannot be said to kill him intentionally. (Harman, 1976, p. 434)

This seems to support the supplementary picture of X-Phi. If Harman cannot be considered an ordinary language philosopher, consider Ryle’s still earlier remark: In their most ordinary employment ‘voluntary’ and ‘involuntary’ are used, with a few minor elasticities, as adjectives applying to actions which ought not to be done. We discuss whether someone’s action was voluntary or not only when the action seems to have been his fault. (Ryle, 1949, p. 69)

3 Experimental Philosophy and Ordinary Language Philosophy

41

Since the Knobe effect is known to occur for not only the attribution of “intention,” but also that of “deciding,” “desire,” “in favor of,” “advocating,” etc. (Knobe, 2010, p. 318), we may naturally expect that “voluntary” should be among them, and if the above observation of Ryle is correct, generalizing it may explain the Knobe effect in terms of our linguistic usage, without appealing to underlying psychological processes. Unfortunately, Mizumoto (manuscript) did not observe the analogous asymmetry in the felicity judgments of the style of Mizumoto (2018b). Nevertheless, in the same study, when Mizumoto used the standard Chairman case (in which the chairman harms/helps the environment as a side effect of his action) and asked participants whether the chairman voluntarily harmed/improved the environment, a moral asymmetry no weaker than the original Knobe effect was found (φ = 0.62). This therefore instantiates the case suggested above, in which we have the same effect but in one language the linguistic factor is in play, while in the other it plays no role in producing the effect. This is a case of the same overall effect with different cognitive processes, due to a difference in the concepts involved. We then need further analyses of the concepts of intentional action and voluntary action, in the style of OLP, in order to identify the language-specific cognitive process and explain the overall effects. Thus, these results suggest that even studies of moral asymmetries such as the Knobe effect, taken by Knobe to be a paradigmatic Type-2 study, can be seen as Type-1, contributing to the analysis of the concepts such as intentional action and voluntary action, positively or negatively, depending on whether the result is evidence for or against that analysis. Such studies are therefore still compatible with the supplementary picture of X-Phi, contributing to traditional OLP.

3.5 X-Phi Defends OLP In fact, X-Phi, especially its negative program, rather seems to save OLP, by undermining the challenges to OLP. There are several reasons for the decline of OLP, some of which are traditionally pointed out (cf. Nerlich, 1964; Soames, 2006; Parker-Ryan, 2012): (1) lack of semantics/pragmatics distinction (cf. Grice’s conversational implicature), (2) failure to accommodate semantic externalism (cf. theories of reference by Kripke and Putnam), (3) rise of formal analysis and truth-conditional semantics (cf. Hintikka, Montague, Davidson), (4) resurgence of metaphysics (cf. type/token identity theory and philosophy of mind in general), and (5) lack of empirical evidence (cf. Mates). Among them, the first two (or three, depending on one’s view about the theory of meaning) are concerned with the difficulties of the use theory of meaning (to which we shall come back later with an explicit formulation). For now, first consider semantic externalism or the theory of direct reference, whose truth (if true at all) also turned out to be culture-/language-specific: it has been shown that the externalist intuitions about reference are not shared by East Asians (Machery et al., 2004;

42

M. Mizumoto

Beebe & Undercoffer, 2015). This suggests that whether the externalist theory of meaning is right, or whether sometimes the referent can only be found a posteriori or not, itself still depends on how people use (or understand the use of) the relevant term (proper name or natural kind term), where the later judgment about the use (especially the retraction of it) is also constitutive data of the use. For example, when we ask who or what someone was talking about by “X” (proper name or natural kind term), that may be answered by the (later) judgment (retraction) about the use of “X”, and the pattern of such use or judgment determines whether that language supports the externalist semantics or not. This may reflect the culturalpsychological difference, but that is perfectly compatible with OLP, and even gives a further motivation for the analysis of the use of relevant expression(s) in different ordinary languages.9 As for the semantics/pragmatics distinction, it is often thought that the distinction is only theoretical, and there is no boundary in linguistic reality. For example, Chomsky says that there is no way of telling pretheoretically whether a felt oddness of a linguistic expression is a matter of syntax, semantics, or pragmatics (Chomsky, 1977, p. 4). Quoting Chomsky, Schaffer says, “The semantic and pragmatic levels, after all, are the sophisticated posits of a scientific theory of language, to which naive intuitions cannot be expected to be sensitive” (Schaffer, 2004, p. 146). A standard candidate for the distinction to be found in our intuitions is the one about truthconditions. Our intuitions about what is relevant to the truth of an utterance is robust, so that we may say that anything that contributes (hence is relevant) to the truth of an utterance is part of its semantic content. But if so, again the distinction itself may also be language-specific. For there is already a study that suggests that this possibility is real. In another study, Mizumoto asked participants to truth-evaluate an utterance in the following vignette (Mizumoto, 2022): Suppose there is utterly no correlation between the appearance and the intelligence of a person. Tom says about Susan, “She is beautiful, but still smart!”, where Susan is smart and beautiful,

Participants were asked whether what Tom says is true or not. According to Grice, this utterance is true.10 This study was done in English and Japanese, with respective native speakers as participants, using truth predicates of respective languages. Note, however, that there are again two truth predicates in Japanese, shin and hontou, and therefore in total three results were reported. The pairwise differences of these three results were all statistically significant (all p < 0.01, two-tailed Fisher’s exact test), but especially the effect size of the one between “true” and “shin” was large (φ = 0.52). 9

Similarly, whether something is a rigid designator or not is arguably determined by use as well. (Cf. Glock, 2003, Chap. 3.6). I owe this reference to Simon Vonlanthen. 10 Note that conventional implicature is counted as part of semantics, but the point here is the cross-linguistic variance of what is said, which can easily be generalized to the variance of the semantics/pragmatics boundary.

3 Experimental Philosophy and Ordinary Language Philosophy

43

Fig. 3.2 Comparison of English and Japanese correctness predicates. (From Mizumoto,2022)

Also, in the same study, he conducted another survey about the same utterance using correctness predicates, obtaining the following result, which shows even more dramatic divergence in judgments with a huge effect size (p < 0.0001, φ = 0.59) between English and Japanese (Fig. 3.2). Thus, even though about 40% of Japanese judged Tom’s utterance true (shin), it should be natural for Japanese philosophers (especially if they had known nothing about Grice) to conclude in their theoretical considerations that, since the content of Tom’s utterance is incorrect, it is also false (in Japanese, gi).11 It then follows that, at least, the truth conditional content depends on the background theory of truth, as based on the use of truth (and correctness) predicates, which may vary from language to language, leading even to theoretical disagreements. If so, we cannot assume a single universal truth predicate in discussing truth-conditions, at least not the truth predicate(s) of natural language. Even apart from such data, the results reported in Sects. 3.2 and 3.3 also undermine the truth-conditional conception of what is said, or at least as the criterion of the equivalence of meaning based on truth-conditions. For, if the content of what is said is (or is captured fully only by) truth-conditional content, ex hypothesi there can never be a divergence in truth-values between two utterances with the same content.12 From this it follows that the translations in such studies are all mistaken. However, professional bilingual translators and scholars who understand

11 A

twist here is that, in the same study when he used a morally neutral vignette (with the analogous utterance with “but” that has this time has no moral-political implication), the crosslinguistic difference completely disappeared in the results. This does not affect our argument here, as long as there is (huge) cross-linguistic difference in truth judgments about some utterances. 12 In fact, though we cannot discuss in detail, Davidson’s argument against the idea of conceptual scheme (Davidson, 1974) heavily uses this assumption. His formulation of the idea was “largely

44

M. Mizumoto

both English and Japanese find no fault in the translations, in particular no such fatal one that could produce such a radical difference in truth-value judgment. If the Japanese knowing how construction and truth predicates used in such studies were judged to be improper translations, then since no other Japanese terms and phrases would be appropriate, Japanese scholars would not be able to discuss knowledge-how or even truth in Japanese at all. Thus, those who still believe in truth-conditional semantics would again be committed to a massive error theory, of all the scholars who understand both languages. This is an even more implausible thesis wanting substantial evidence, and we should rather admit an independent criterion of translation distinct from truth-conditions. It is not our purpose here to criticize truth-conditional semantics. The point is rather that the results of Type-1 (cross-linguistic) studies of X-Phi about the use of the relevant terms or phrases can provide evidence against the semantics/pragmatics distinction based on our intuitions about truth-conditions. Here, one might alternatively appeal to the analogue of the Chomskyan competence/performance distinction, which some experimental philosophers even seem to adopt.13 Given this distinction, one may make conceptual (or semantic) competence to be independent of the use of the term/phrase (performance) in natural language, or what Chomsky calls E-Language, thereby securing the semantics/pragmatics distinction. However, the competence/performance distinction itself also suffers from the same linguistic divergence challenge. For example, if we are to admit multiple semantic competences for knowledge verbs, that is to admit their dependence on the use, but if we are to stipulate a single universal semantic competence, we would be committed to a radical error theory again: We would have to claim that at least one whole linguistic community is always making performance errors concerning a usage of some word or phrase. One may then try to cash out the notion of conceptual competence independently of performance, in terms of knowledge-how. That is, possessing conceptual competence is knowing how to use the relevant expression.14 However, we have seen that the very notion of knowledge-how also varies from language to language. Thus,

true but not translatable”. But instead, we have a dilemma: “if taken to be true, not translatable, but if translatable, not true”. 13 Devitt (2012) argues that experimental philosophers implicitly assume what he calls the Voice of Competence (VoC) view (Devitt, 2006), which he argues is wrong. But if his preferred view, the “Moderate Explanation” (ME), is correct, it follows, according to him, that expert intuitions better serve as evidence than folk intuitions (but see Machery, 2012, p. 227). We cannot go into the detail here, but Devitt seems to assume that in either view the competence in question is innate and prior to usage (the latter being a mere result of the former), whereas what we should point out in this context is that this Chomskyan notion of competence is now replaced by the notion of I-language, or innate, internal and individual language (cf. Chomsky, 2000), as opposed to “shared, public language”, or E-language. What we are concerned with here is the competence of E-language, and Chomsky would agree that the use is constitutive of competence (especially conceptual competence) in this sense, though he is (with Davidson) skeptical of the very existence of E-language.

3 Experimental Philosophy and Ordinary Language Philosophy

45

however we conceive of the conceptual competence, it is a product of everyday usage, in which even the uses of truth predicates are included. We can stipulate as many arbitrary distinctions as we like between semantics and pragmatics based on one’s own theory, but if we have no other independent intuitions about the distinction (as those which could justify the distinction), “meaning,” at least in the pre-theoretic sense, should be primarily explained by use after all, and therefore the results of X-Phi (with the linguistic variance data) seem to save OLP. The remaining (intrinsic) problem of OLP is, then, the lack of empirical evidence. Indeed, OLP can be seen as a primary target of the negative program of X-Phi.

3.6 OLP as a Target of Negative X-Phi In fact, OLP has been criticized for a lack of evidence already in 1950’s. Benson Mates (even before Quine’s Two Dogmas) writes in 1950: it seems to me doubtful that any adequate definition of “synonymity” [ . . . ] will ever be found by means of the usual armchair methods of philosophizing. We need empirical research regarding the [sic] ordinary language in order to determine which expressions are in fact synonymous, and with the help of these data it may be possible to find an acceptable definition of “synonymity” for some language. (Mates, 1950, pp. 208–209, emphasis added)

Mates then cites Naess (1949). Specific targets of Mates’s criticism in his later paper (Mates, 1958) included Ryle’s view about voluntary action, which we saw in Sect. 3.4. It was criticized for being inconsistent with what another prominent ordinary language philosopher says, namely Austin: “For example, take ‘voluntarily’ and ‘involuntarily’: we may join the army or make a gift voluntarily [ . . . ]” (Austin, 1956, p. 17). Mates then asks, “If agreement about usage cannot be reached within so restricted a sample as the class of Oxford Professors of Philosophy, what are the prospects when the sample is enlarged?” (Mates ibid., p. 165).15 Mates cites such examples to argue against the “comfortable suggestion” that “the average adult has already amassed such a tremendous amount of empirical information about the use of his native language, that he can depend upon his own intuition or memory and need not undertake a laborious questioning of other people, even when he is dealing with the tricky terms which are central in philosophical problems.” Against such views he claims, Such an assertion [sic] is itself an empirical hypothesis, of a sort which used to be invoked in favor of armchair psychology, and it is not born out by the facts. (Mates ibid.)

This may sound familiar to the ear of contemporary philosophers involved in the meta-philosophical debates over X-Phi. But then, why did the X-Phi movement not

14 Granting that even English speakers admit that a professional skier knows how to ski, even if she has a broken leg at this moment, though such a case does not constitute evidence against the radical linguistic difference between English and Japanese we saw above. 15 But see the response by Cavell (1958).

46

M. Mizumoto

occur in the 1950s, in the wake of Naess? This seems to be because OLP, the target itself, simply became obsolete, if not refuted (given the results of X-Phi we saw above). That is, even though the method of cases (which in fact relies on a particular language in describing the case and formulating the questions) is still widely used today,16 few philosophers (except DeRose and some others) regard themselves as ordinary language philosophers any more. And that may be why, when the negative program of X-Phi re-started in the twenty-first century (with Weinberg et al., 2001), it started as a criticism of the use of intuitions in the case method, rather than of OLP. It therefore seems that the lack of empirical evidence still remains the last intrinsic problem of OLP. Note however that it is also a problem of contemporary armchair philosophy in general, according to the negative program of X-Phi. Besides, even though OLP seems to suffer from lack of empirical evidence, contemporary armchair philosophy is mostly engaged in armchair theory construction, while OLP is known for its non-systematic, diagnostic approach, and usually taken as a negative (therapeutic) program (called quietism) that, typically, rather aims for dissolving (quasi-)philosophical problems.17 Thus conceived, OLP seems to be immune from such criticism, for it just draws our attention to a particular (obvious) usage in specific situations (rather than controversial cases of philosophical dispute) in order to reveal our misunderstandings of how language works (even though, as we saw, ordinary language philosophers can still be mistaken). There, OLP would not be committed to empirical claims about ordinary usage except in paradigmatic (uncontroversial) cases, not trying to draw a theoretical boundary in borderline cases. Indeed, understood as a negative program, OLP rather helps X-Phi, as we shall argue in the next section.

3.7 OLP Helps X-Phi As a matter of fact, contemporary X-Phi does not criticize OLP as its target. Rather, X-Phi seems to require OLP to make its criticism of contemporary armchair philosophy effective. For many armchair philosophers are actually not moved by the data of negative X-Phi studies showing that philosophers’ intuitions are not reliable (despite all the works surveyed in Stich & Tobia, 2016, for example). That is because, first (as we saw in Sect. 3.1), it is often claimed against Type-1 X-Phi that the data of the X-Phi studies reveal only ordinary people’s use (and/or judgement

16 See

Mallon et al. (2009) for a criticism of the use of this method to support a theory of reference, which is based on the X-Phi data of the cultural variance of intuitions. 17 See, for instance, Horwich (2012) for a defense of such meta-philosophy. Fischer (2018) proposes a picture of the friendly relationship between X-Phi and OLP (Wittgenstein) similar to the one in this paper, but much more focused on Type-2 X-Phi studies.

3 Experimental Philosophy and Ordinary Language Philosophy

47

about the use) of some expression(s), whereas, such armchair philosophers claim, they are rather concerned with properties or relations in the world. The argument from cross-linguistic diversity was a response to such a naturalistic view. Alternatively, armchair philosophers may assume that concepts are independent of our uses of the relevant terms, waiting for analysis by experts. For example, it may turn out that mathematicians are also susceptible to mathematical errors in complex calculations. But mathematicians would not be moved by such data. They are experts anyway, and for them there is no other route to access the mathematical reality than their intuitions. Such Platonists may also hold that concepts are in general abstract entities independent of the minds of individuals. Or those who fear that the thesis of meaning as use would erase the distinction of correct use and misuse (or “performance error”) may appeal to the competence/performance distinction discussed in Sect. 3.5, holding that conceptual competence can be totally independent of performance. As long as a concept is explained by such competence, according to this view, ordinary people’s uses (or judgments about them) are merely derivative phenomena (being subject to performance errors), constituting at most indirect evidence of the competence, not playing any constitutive role for explaining meaning. All these views, however, assume that philosophers themselves are free from the influence of their own language. We suggested that the argument from linguistic diversity undermines this assumption. However, the advocates of the latter two views might hold monism about the relevant concept, thinking that a radical error theory is a real possibility. They admit that two people can possess the same conceptual competence while their uses (and judgments about them) are systematically different, and therefore the data of linguistic variance would rather constitute a good reason for them to ignore any linguistic data (cf. Hazlett, 2010, 2018). This is why at least Type-1 X-Phi studies must also assume the central thesis of OLP, meaning as use,18 which, applied to concepts, can be formulated as: • Most concepts19 people possess are constitutively expressed or manifested in their use (or the judgment about the use) of the corresponding term or phrase. This was also part of the reason why, as we saw at the beginning, some philosophers pointed out the connection between them.20 At the same time, this is where OLP can help X-Phi. For, whether naturalists, Platonists, or Chomskyans, their disagreement with experimental philosophers is more 18 See

Horwich (1998) for a defense of the use theory of meaning, as part of which he lists 22 (!) objections to it and answers all of them one by one. 19 Note Wittgenstein’s qualification about this thesis at PI 43, “For a large class of cases”. This is why it is not a theory that aims to be universally valid. It should in particular be conceived in the context of his negative program, as we shall discuss later. In any case, the thesis above should hold for many philosophically important concepts. 20 Though some defenders of OLP explicitly criticize X-Phi (e.g., Baz, 2012), which will be briefly discussed in a later footnote of this section.

48

M. Mizumoto

metaphysical than methodological. OLP, or Wittgenstein’s negative program,21 is then supposed to show that such (if implicit) metaphysical assumptions are based on misunderstandings about how language works (in our form of life). Though we cannot get into the details here, the concept conceived of independently of its use, or the conceptual competence conceived of as being independent of any performance, would be undermined by rule-following considerations, for instance.22 If use (performance) is considered constitutive of the relevant concept (conceptual competence), to which we refer by using the term or phrase in question, however, any theory about a particular concept should always be full of exceptions and apparent counterexamples, while the very boundary between exception and counterexample is always controversial. Ordinary language philosophers would say it is always arbitrary. For our concepts are in general vague, lacking precise boundaries, and in this sense incomplete (see PI 69, 71, 76, 88, etc.), even resisting clear analysis with their instances having only family resemblances (ibid. 66–67). If so, trying to construct an armchair theory about a concept or property/relation (like knowledge) is a dubious project, which is an attempt to draw a boundary where there is none. Here, negative OLP and the negative program of X-Phi would agree.23 Indeed, even Knobe would join us, for, as we saw he held that there is “only a hodgepodge of facts” about concepts, and proper theories cannot be expected there.

3.8 What Does X-Phi Do? As a matter of historical fact, when A. Naess first began his survey on truth in the 1930s (Naess, 1938a, b), his project was essentially negative, in the sense that he was trying to undermine philosophers’ assumption that ordinary people’s view about truth is naïve and simplistic. Also, the paper by Weinberg et al. (2001), allegedly the first contemporary work of X-Phi, was meant to provide evidence against antecedently held philosophical assumptions through showing the crosscultural variance in intuitions. In this sense, X-Phi began twice as a negative program (of Type-1 studies).24

21 Though it is a matter of controversy whether we can treat Wittgenstein as an ordinary language philosopher, the point here is that OLP was generally reluctant to constructing a theory, and Wittgenstein was no doubt at the center of such a trend. 22 Famously, Kripke (1982) explicitly mentioned the Chomskyan competence/performance distinction in his discussion of the rule-following considerations. And note that the radical skepticism about meaning (which Kripke thought would follow) does not follow from such considerations, on which almost all commentators on Kripke’s book agree. 23 For example, when Baz (2012) criticizes X-Phi for assuming that “answers to the theorist’s question question—be they the philosopher’s or the layman’s [ . . . ]—are our indispensable and best guide when we seek to elucidate our concepts and the phenomena they pick out” (pp. 91– 92), he assumes that X-Phi is trying to answer theorists’ questions, and thereby contributing to theory-constructions. In other words, he has in mind the positive program of X-Phi there.

3 Experimental Philosophy and Ordinary Language Philosophy

49

Also, consider the case of the theory of reference (which is more likely to be taken as Type-2). Machery et al. (2004) claimed that their data of cultural variance of intuitions about reference “raises questions about the nature of the philosophical enterprise of developing a theory of reference” (B1). More recently, Machery says, “it is dubious that philosophers’ theorizing about reference can really enhance the reliability of their intuitions about reference” (Machery, 2012, p. 224). Stich, who had been especially skeptical about the theory of reference based on folk intuitions, rather than the one as proto-science (Stich, 1996, pp. 37–51), cites Machery et al. (2004) and expresses his skepticism about the very reality of the reference relation, agreeing with deflationists like Field and Horwich (Stich, 2009, p. 199. See also Mallon et al., 2009). Thus, it seems that there is a tendency for X-Phi studies, in particular this kind of cross-linguistic/cross-cultural X-Phi studies (whether Type-1 or Type-2),25 to be negative–whether intentionally or not. In fact, there are also reasons for X-Phi in general to be critical about theory-construction in general, armchair or empirical. For if, as we suggested in the earlier sections, most branches of philosophical inquiry turned out to be language-specific, one may legitimately wonder what (allegedly universal) philosophical theories are for. It seems that such data will invite a kind of skepticism about the philosophical debate in question, just as (faultless) peer disagreement in philosophy is often considered to invite (for conciliationists, as opposed to steadfasters) skepticism about the rationality of one’s own doxastic attitude on the topic in question (cf. Feldman & Warfield, 2010), or disagreement in general can be taken as a reason to deny knowledge (cf. Frances, 2018).26 It is expected that more and more cross-linguistic and cross-cultural data of X-Phi will contribute to this trend in philosophy, undermining various armchair philosophical theories. Moreover, though the distinction between Type-1 and Type-2, and the one between the positive program and the negative program are orthogonal, studies that test the universality of intuitions are typically of Type-1, and they are almost

24 See Stich and Tobia (2016) for what the negative program of X-Phi does, and see also Weinberg (2016) for its positive contribution to philosophy, through what he calls the “wheat-from-chaff” project. Weinberg’s is however much more positive in spirit and is possibly even inconsistent with our picture here. 25 Engaged in (2) of the three goals of X-Phi in Nichols and Knobe 2008, mentioned in footnote 1 above. 26 Note however that this assumes that there is only one rational attitude to the totality of evidence, or uniqueness (Kopec & Titelbaum, 2016). In the present context, insofar as one assumes uniqueness (whether conciliationists or steadfasters), one is committed to a massive error theory about either one’s own linguistic community or the other community, implausibility of which arguably leads to skepticism about the subject matter. On the other hand, the denial of uniqueness (permissivism) here corresponds to pluralism (I would like to thank an anonymous reviewer for pressing me to elaborate this point). Indeed, the results we reviewed earlier seem to suggest not only pluralism about specific philosophical concepts, but pluralism about philosophy itself. This kind of cross-linguistic and cross-cultural studies may therefore strike even anti-philosophical. We shall discuss this implication in the last section.

50

M. Mizumoto

destined to be a negative program in the long run, for they will reveal more and more demographic variances of folk intuitions, which will accumulate and never decrease. Cross-linguistic variance is special among such possible demographic variances in that, if it is variance in lexical semantics, it shows the difference of linguistic concepts, or different linguistic norms, behind the use of the relevant expression. There, the theories about a particular concept cannot be theories that aim to capture a universal truth, as those of typical of the natural sciences.27 If they are not theories in natural science, however, they are arguably local accounts, which are worth having only insofar as they contribute to a deeper understanding of the local facts.28 But such local accounts can be better than “universal” theories, if the latter involve strained or ad hoc generalizations. Here again, Type-1 X-Phi studies agree with OLP, and both are in fact primarily negative programs. However, we may even think that Type-2 X-Phi is not primarily engaged in theory construction either, only trying, instead, to report new findings about how the mind works, or “to identify and explore a specific effect” (Knobe, 2016, sec. 3.3). And, if empirically well-informed universally true theories are presented by Type-2 studies, such theories and studies may not be philosophy anymore, at least for more traditional armchair philosophers and even some of Type-1 experimental philosophers. Though this may be a nominal issue (as Knobe and others think), Knobe also thinks that findings of Type-2 must be incorporated into a general theory in cognitive science (because X-Phi is cognitive science). However, the same negative considerations should apply to cognitive science itself. Cognitive science has already been said to be more engineering than the standard natural sciences,29 but if knowledge, belief, intention, understanding, etc. (or properties of them) are all language-specific, being delineated by a specific language of the researchers, then one may legitimately wonder in what sense cognitive science (which uses such notions) is science. Thus, according to lexical semanticists of what is called the NSM (natural semantic metalanguage) approach, Anglophone scholars in the human sciences often unwittingly frame their research hypotheses in English-specific terms. For example, when evolutionary biologists postulate a “universal sense of right and wrong” or puzzle over the evolutionary origins of “animal altruism”, there is little awareness of the problematical fact that their words right, wrong, and altruism are English-specific constructs that lack precise equivalents in many languages of the world, including many European languages. (Goddard & Wierzbicka, 2014, p. 251)

They in fact claim that epistemological terms like “belief,” “justification,” or even “truth,” are not universal (thus not semantic primes) either (Wierzbicka, 2018).30 27 Though this conception of scientific theory is certainly naïve, other conceptions like instrumentalism are generally compatible with the claim that it is a heuristic device. 28 Richard Rorty once contrasted philosophy-as-discovery with philosophy-as-proposal (Rorty 1992), and this conception of philosophy (philosophical theory) loosely belongs to the latter category (proposals about how we should talk and think about the world), in which Wittgenstein’s quietism is also included. 29 In this connection, Daniel Dennett once claimed that cognitive science is reverse engineering (Dennett, 1994) and biology is engineering (Dennett, 1995).

3 Experimental Philosophy and Ordinary Language Philosophy

51

Interestingly, in this connection, Stephen Stich had infamously insisted that the notion of belief, or propositional attitudes in general (with representational content), have no role in serious cognitive psychology and therefore should be eliminated in such sciences (Stich, 1983). Stich later retracted this claim, in view of allegedly legitimate scientific research on the theory of mind, especially the later development of the research program of the false belief task in developmental psychology (cf. Stich, 2009, p. 205). However, if the notion of belief is English-specific, maybe Stich’s earlier view was correct, and the scientific inquiry based on such Englishspecific folk notions cannot be a serious cognitive science, at least as an inquiry aiming for a language-independent universal theory. Also, see for yet another example the discussion on knowledge-how. Although Bengson and colleagues say that they “believe that the status of animal know-how is best left to experts on animal cognition” (Bengson et al., 2009, p. 399), they add in a footnote that this “does not force [them] to admit the truth of any and all attributions of knowledge-how by contemporary scientists” (ibid.). Then they suggest that many attributions of know-how to cognitively unsophisticated animals, such as the caddis fly larvae, may in fact be ultimately scientifically dispensable. For, presumably, many such attributions can be replaced without loss by attributions of some sort of ability. (ibid.)

This assumes that the experts in animal cognition (cognitive ethologists) often attribute knowledge-how to non-human animals with the relevant ability, which is however (if we are right) English-specific in the sense that Japanese ethologists would not attribute Japanese knowledge-how to such animals. Being intellectualists, then, Bengson and colleagues propose to replace such talks of knowledge-how by talks of ability and propositional knowledge, but propositional knowledge itself is also language-specific, as we saw in Sect. 3.3. Moreover, consider Stanley’s argument, in which he presents the following instance of the disquotational schema, “Ana knows how to swim” is true if and only if Ana knows how to swim, and claims that “it could hardly be that science could discover that knowing how to swim was a distinct state than is expressed by “knowing how to swim”” (Stanley, 2011, p. 144)

The assumption here is that the right-hand side of this biconditional is independent of language. However, a Japanese cognitive scientist with a Japanese translation of this biconditional should end up with a very different result of scientific investigation than that of an English-speaking cognitive scientist. All this suggests that there may be different cognitive sciences depending on the (meta-)language scientists use, which further distinguishes cognitive science from other branches of natural science. But even at the level of phenomena, though cognitive scientists and even psychologists are committed to the claims about sub-personal processes (based on studies of various natural sciences), psychology and cognition (qua explananda) are ultimately phenomena at the personal level (with theories about sub-personal

30 They

think “know” is an exception, but see Mizumoto (2021).

52

M. Mizumoto

processes promoting our understanding of what is happening at that level). Also, cultural variance of cognitive processes itself is undeniable as a phenomenon routinely discussed in cultural psychology, but, since not all the neural structures of the brain are innate, the neural basis of cultural-psychological differences are rather products of (the history of) human activities at the personal level (the form of life), which is the subject matter of social science. Indeed, as we suggested in Sect. 3.4, even if we have the same effect at the personal level, if the concepts involved are different, the details of the underlying cognitive processes should also be different. For example, even when we have the same Knobe effect, due to the difference of the concepts of intentional action encoded in the respective languages it can turn out that one judgment is simply linguistic, following linguistic norms, while the other is fully psychological. So, identification of the cognitive process is also subject to linguistic variance and requires prior analysis of linguistic usage (empirical or armchair). Moreover, if, as we have suggested in Sect 3.3, the concept underlying the use of the relevant expression may well be very simple, while any further complication concerning our judgments is explained by the underlying cognitive processes, we still need careful conceptual analysis and empirical study to determine whether it is the concept or the cognitive process that is a “hodgepodge” of facts. These considerations invite us to revise our understanding of cognitive science as a whole. Even though cognitive science no doubt involves natural sciences, as long as it uses folk notions it provides universal truths only insofar as English is considered a universal language about human cognition. Indeed, the overall project is better conceived as, if not engineering, then ultimately (at least at the personal level) as ethno-science,31 a sub-field of anthropology or cultural psychology aiming at understanding the human mind, through investigating cognitive processes of people with a particular culture or language. Theories in cognitive science are in this sense also local answers to local questions, to reach a deeper understanding of local facts. But if so, Type-2 X-Phi studies should be no more positive a program than Type-1 studies, in the sense that they do not in fact aim at providing a universally true theory (of cognitive processes) either, being concerned with particular effects, of particular people with a particular language, undermining more and more alleged universal theories (if there are) about the mind at the personal (rather than subpersonal) level.

31 This

term appears in Chomsky’s writings (e.g., Chomsky, 2000), but he seems to use it to refer to scientific studies of folk theories (such as folk psychology and folk physics). Our use here is rather to refer to cognitive science as part of social sciences like anthropology, though this may still be consistent with Chomsky’s usage.

3 Experimental Philosophy and Ordinary Language Philosophy

53

3.9 Concluding Remarks We have argued that (1) there is still a role for OLP to play in contemporary philosophy along with X-Phi, even if not a positive one in theory construction, and (2) Type-1 studies of X-Phi are still philosophically interesting and important as a negative program, in the sense of undermining armchair theories. Indeed, (3) the primary aim of studies of both Type-1 and Type-2 is not constructing a (universally valid) theory (philosophical or otherwise), but to undermine or assess armchair theories with empirical data about the use of expressions. Moreover, (4) once we take both types of studies to be engaged in the negative program, the negative armchair program (i.e., OLP) and the negative experimental program (of X-Phi) even support each other, both discouraging the armchair positive program of constructing normative, a priori, metaphysical theories. Of course, for those who think that the primary goal of philosophy in general is to construct (universally valid) philosophical theories (with other activities being philosophical only insofar as contributing to theory construction), both OLP and the negative program of X-Phi are not good philosophy, qua being “unproductive.” However, if Western philosophy began as criticism of sophists, negative (critical) philosophy, or at least the negative aspect of philosophy (including Kant’s Critiques and Quine’s “Two Dogmas”) has been even more authentic than the positive programs that followed. As C. Wright says, “Philosophy in the Western tradition is an essentially critical discipline, so it is unsurprising that its historical record is one of sustained self-criticism. Philosophers of every period have selectively dismissed the methods and objectives of their predecessors” (Wright, 1993, p. 1). This is indeed what is going on in the analytic tradition of the twenty-first century too, and OLP and X-Phi, the cooperative picture of which we have presented here, can together promote and accelerate this trend further.32,33

References Austin, J. L. (1956). Performative utterances. In J. O. Urmson & G. J. Warnock (Eds.), Philosophical papers (pp. 220–239). Clarendon Press. Baz, A. (2012). When words are called for: A defense of ordinary language philosophy. Cambridge University Press. Beebe, J. R., & Undercoffer, R. J. (2015). Moral valence and semantic intuitions. Erkenntnis, 80(2), 445–466.

32 In this sense, Stich and Wittgenstein are in fact good friends, even though Stich may not like this picture. 33 I would like to thank Joachim Horvath, Simon Dominik Vonlanthen, and Stephen Stich, as well as participants in Einladung zum Vortrag im Institutskolloquium at Ruhr University of Bochum and Dianoia Seminar Series at Australian Catholic University, and also David Bordonaba-Plou and anonymous reviewers, for their kind and helpful comments and suggestions.

54

M. Mizumoto

Bengson, J., Moffett, M. A., & Wright, J. C. (2009). The folk on knowing how. Philosophical Studies, 142(3), 387–401. Bluhm, R. (2012). Selbsttäuscherische Hoffnung: Eine sprachanalytische Annäherung. Mentis. Bluhm, R. (2013). Don’t ask, look! Linguistic corpora as a tool for conceptual analysis. In M. Hoeltje, T. Spitzley, & W. Spohn (Eds.), Was dürfen wir glauben? Was sollen wir tun? Sektionsbeiträge des achten internationalen Kongresses der Gesellschaft für Analytische Philosophie e.V. (pp. 7–15). DuEPublico. Bluhm, R. (2016). Corpus analysis in philosophy. In M. Hinton (Ed.), Evidence, experiment and argument in linguistics and philosophy of language (pp. 91–109). Peter Lang. Brown, J., & Gerken, M. (2012). Knowledge ascriptions: Their semantics, cognitive bases, and social functions. In J. Brown & M. Gerken (Eds.), Knowledge ascriptions (pp. 1–30). Oxford University Press. Cavell, S. (1958). Must we mean what we say?. Inquiry, 1(1–4), 172–212. Chomsky, N. (1977). Essays on forms and interpretation. North Holland. Chomsky, N. (2000). New horizons in the study of language and mind. Cambridge University Press. Davidson, D. (1973). Radical Interpretation. In Inquiries into truth and interpretation (pp. 125– 140). Clarendon Press. Davidson, D. (1974). On the very idea of a conceptual scheme. In Inquiries into truth and interpretation (pp. 183–198). Clarendon Press. Davidson, D. (1975). Thought and talk. In Inquiries into truth and interpretation (pp. 155–170). Clarendon Press. Dennett, D. C. (1994). Cognitive science as reverse engineering: Several meanings of ‘top-down’ and ‘bottom-up’. In D. Prawitz, B. Skyrms, & D. Westerståhl (Eds.), Proceedings of the 9th international congress of logic, methodology and philosophy of science (pp. 679–689). NorthHolland. Dennett, D. C. (1995). Darwin’s dangerous idea: Evolution and the meanings of life. Simon & Schuster. DeRose, K. (2005). The ordinary language basis for contextualism, and the new invariantism. The philosophical quarterly, 55(219), 172–198. Deutsch, M. (2009). Experimental philosophy and the theory of reference. Mind and Language, 24, 445–466. Devitt, M. (2006). Ignorance of language. Clarendon Press. Devitt, M. (2012). Whither experimental semantics? Theoria, 73, 5–36. Feldman, R., & Warfield, T. (Eds.). (2010). Disagreement. Oxford University Press. Fischer, E. (2018). Wittgensteinian ‘therapy’, experimental philosophy, and metaphilosophical naturalism. In K. M. Cahill & T. Raleigh (Eds.), Wittgenstein and naturalism (pp. 260–286). Routledge. Frances, B. (2018). Disagreement and Scepticism. In D. E. Machuca & B. Reed (Eds.), Skepticism: From antiquity to the present (pp. 581–591). Bloomsbury. Glock, H.-J. (2003). Quine and Davidson on language, thought and reality. Cambridge University Press. Goddard, C., & Wierzbicka, A. (2014). Words and meanings: Lexical semantics across domains, languages, and cultures. Oxford University Press. Hanfling, O. (2000). Philosophy and ordinary language. London: Routledge. Harman, G. (1976). Practical reasoning. The Review of Metaphysics, 29(3), 431–463. Hazlett, A. (2010). The myth of Factive verbs. Philosophy and Phenomenological Research, 80(3), 497–522. Hazlett, A. (2018). Theory of knowledge without (comparative) linguistics. In M. Mizumoto, S. Stich, & E. McCready (Eds.), Epistemology for the Rest of the World (pp. 251–266). Oxford University Press. Horwich, P. (1998). Meaning. Clarendon Press. Horwich, P. (2012). Wittgenstein’s Metaphilosophy. Oxford University Press. Knobe, J. (2003). Intentional action and side effects in ordinary language. Analysis, 63, 190–193.

3 Experimental Philosophy and Ordinary Language Philosophy

55

Knobe, J. (2007). Experimental philosophy and philosophical significance. Philosophical Explorations, 10, 119–122. Knobe, J. (2010). Person as scientist, person as moralist. Behavioral and Brain Sciences, 330(04), 315–329. Knobe, J. (2016). Experimental philosophy is cognitive science. In J. Sytsma & W. Buckwalter (Eds.), A companion to experimental philosophy (pp. 78–96). Wiley-Blackwell. Knobe, J., & Nichols, S. (2008). An experimental philosophy manifesto. In J. Knobe & S. Nichols (Eds.), Experimental philosophy (Vol. 1, pp. 3–14). Oxford University Press. Kopec, M., & Titelbaum, M. G. (2016). The uniqueness thesis. Philosophy Compass, 11(4), 189– 200. Kripke, S. (1982). Wittgenstein on rules and private language. Harvard University Press. Ludlow, P. (2005). Contextualism and the new linguistic turn in epistemology. In G. Preyer & G. Peter (Eds.), Contextualism in philosophy: Knowledge, meaning, and truth (pp. 11–51). Oxford University Press. Machery, E. (2012). Semantic epistemology: A brief response to Devitt. Theoria, 74, 223–227. Machery, E., Mallon, R., Nichols, S., & Stich, S. P. (2004). Semantics, cross-cultural style. Cognition, 92, B1–B12. Mallon, R., Machery, E., Nichols, S., & Stich, S. P. (2009). Against arguments from reference. Philosophy and Phenomenological Research, 79(2), 332–356. Mates, B. (1950). Synonymity. In D. S. Mackay, G. P. Adams, & W. R. Dennes (Eds.), Meaning and interpretation: Lectures delivered before the philosophical Union of the University of California, 1948–1949 (pp. 199–226). University of California Press. Mates, B. (1958). On the verification of statements about ordinary language. Inquiry, 1(1–4), 161– 171. Mejía-Ramos, J. P., Alcock, L., Lew, K., Rago, P., Sangwin, C., & Inglis, M. (2019). Using corpus linguistics to investigate mathematical explanation. In E. Fischer & M. Curtis (Eds.), Methodological advances in experimental philosophy (pp. 239–264). Bloomsbury. Mizumoto, M. (2018a). “Know” and Japanese counterparts; “Shitte-iru” and “Wakatte-iru”. In M. Mizumoto, S. Stich, & E. McCready (Eds.), Epistemology for the Rest of the World (pp. 77– 122). Oxford University Press. Mizumoto, M. (2018b). A simple linguistic approach to the Knobe effect, or the Knobe effect without any vignette. Philosophical Studies, 175, 1613–1630. Mizumoto, M. (2021). The plurality of KNOW: A response to Farese. Language Sciences, 85(1), 101369. Mizumoto, M. (2022). A prolegomenon to the empirical cross-linguistic study of truth. Theoria, 88(6), 1248–1273. Mizumoto, M. (manuscript). Psychological factor and linguistic factor in the Knobe effect, or how to (re)start ordinary language philosophy. Mizumoto, M., Stich, S. P., & McCready, E. (Eds.). (2018). Epistemology for the Rest of the World. Oxford University Press. Mizumoto, M., Tsugita, S., & Yu, I. (2020a). Knowing how and two knowledge verbs in Japanese. In M. Mizumoto, J. Ganeri, & C. Goddard (Eds.), Ethno-epistemology – New directions for global epistemology (pp. 43–76). Routledge. Mizumoto, M., Jonardon, G., & Goddard, C. (Eds.). (2020b). Ethno-epistemology – New directions for global epistemology. Routledge. Mizumoto, M., Yu, I., & Tsugita, S. (manuscript). Knowledge how, ability, and linguistic variance. Næss, A. (1938a). “Truth” as conceived by those who are not professional philosophers (Skrifter Utgitt av Det Norske Videnskaps-Akademi I Oslo Il. Hist.-Filos. Klass 1938 No. 4). I Komisjon Hos Jacob Dybwad. Næss, A. (1938b). Common-sense and truth. Theoria, 4, 39–58. Næss, A. (1949). Toward a theory of interpretation and preciseness. Theoria, 15(1–3), 220–241. Nerlich, G. (1964). Resurgence of metaphysics. Quadrant, 8(2), 58–66. Nöe, A. (2005). Action in perception. MIT Press.

56

M. Mizumoto

Parker-Ryan, S. (2012). Ordinary language philosophy. In The entry of internet encyclopedia of philosophy. https://iep.utm.edu/ord-lang/ Pettit, D., & Knobe, J. (2009). The pervasive impact of moral judgment. Mind and Language, 24, 586–604. Rorty, R. (Ed.). (1992). The linguistic turn. Chicago University Press. Ryle, G. (1949). The concept of mind. Hutchinson & Co. Sandis, C. (2010). The experimental turn and ordinary language. Essays in Philosophy, 11(2), 181– 196. Schaffer, J. (2004). Skepticism, contextualism, and discrimination. Philosophy and Phenomenological Research, 69(1), 138–155. Soames, S. (2006). The philosophical significance of the Kripkean necessary a posteriori. Philosophical Issues, 16(1), 288–309. Stanley, J. (2011). Know how. Oxford University Press. Stanley, J., & Willlamson, T. (2001). Knowing how. Journal of Philosophy, 98(8), 411–444. Stich, S. P. (1983). From folk psychology to cognitive science. MIT Press. Stich, S. P. (1996). Deconstructing the mind. Oxford University Press. Stich, S. P. (2009). Replies. In D. Murphy & M. Bishop (Eds.), Stich and his critics (pp. 190–252). Wiley-Blackwell. Stich, S. P., & Mizumoto, M. (2018). Manifesto. In M. Mizumoto, S. Stich, & E. McCready (Eds.), Epistemology for the Rest of the World (pp. vi–xv). Oxford University Press. Stich, S. P., & Tobia, K. P. (2016). Experimental philosophy and the philosophical tradition. In J. Sytsma & W. Buckwalter (Eds.), A companion to experimental philosophy (pp. 5–21). WileyBlackwell. Sytsma, J., & Buckwalter, W. (Eds.). (2016). A companion to experimental philosophy. WileyBlackwell. Sytsma, J., Bluhm, R., Willemsen, P., & Reuter, K. (2019). Causal attributions and corpus analysis. In E. Fischer & M. Curtis (Eds.), Methodological advances in experimental philosophy (pp. 209–238). Bloomsbury. Tannenbaum, D., Ditto, P. H., & Pizarro, D. A. (2007). Different moral values produce different judgments of intentional action. Unpublished manuscript. University of California-Irvine. Tsugita, S., Izumi, Y., & Mizumoto, M. (2022). Knowledge-How Attribution in English and Japanese. Knowers and Knowledge in East-West Philosophy: Epistemology Extended, 63–90. Turri, J. (2018). Primate social cognition and the Core human knowledge concept. In M. Mizumoto, S. Stich, & E. McCready (Eds.), Epistemology for the Rest of the World (pp. 279–290). Oxford University Press. Ulatowski, J, Weijers, D, & Sytsma, J. (Eds.). (forthcoming). Experimental philosophy and corpus methods. Weinberg, J. M. (2016). Going positive by going negative: On keeping X-phi relevant and dangerous. In J. Sytsma & W. Buckwalter (Eds.), A companion to experimental philosophy (pp. 71–86). Wiley-Blackwell. Weinberg, J. M., Nichols, S., & Stich, S. P. (2001). Normativity and epistemic intuitions. Philosophical Topics, 29, 429–460. Wierzbicka, A. (2018). I KNOW: a human universal. In M. Mizumoto, S. Stich, & E. McCready (Eds.), Epistemology for the Rest of the World (pp. 215–250). Oxford University Press. Wittgenstein, L. (1953). Philosophical investigations (PI), G.E.M. Anscombe and R. Rhees (Eds.), G.E.M. Anscombe (Trans.). Blackwell. Wright, C. (1993). Realism, meaning, and truth. Blackwell.

Masaharu Mizumoto Masaharu Mizumoto is associate professor of Japan Advanced Institute of Science and Technology. He is author of books and papers in epistemology, philosophy of mind, Wittgenstein, etc. and has edited Epistemology for the Rest of the World (OUP, 2018) and EthnoEpistemology: New Directions for Global Epistemology (Routledge, 2020). He obtained his PhD (Social Science) at Hitotsubashi University, Tokyo, Japan.

Chapter 4

Does Scientific Conceptual Analysis Provide Better Justification than Armchair Conceptual Analysis? Hristo Valchev

Abstract The present paper is concerned with the question of whether scientific conceptual analysis provides better justification than armchair conceptual analysis. In order to address this question, I provide exact definitions of armchair conceptual analysis and scientific conceptual analysis. Furthermore, I use a certain criticism of armchair conceptual analysis, raised by experimental philosophers, as a basis for an argument to the conclusion that scientific conceptual analysis provides better justification than armchair conceptual analysis, and consider the expertise defence as a possible response to this argument. The argument is based on the idea that the concept of a common usage implies a certain degree of uniformity among different speakers, and can be called ‘argument from uniformity of agreement’. The expertise defence can be understood as an attack of one of the premises of this argument. Finally, I present and discuss the results from an empirical study in which scientific conceptual analysis was used in order to gather evidence as regards the soundness of the argument from uniformity of agreement and the expertise defence.

4.1 Introduction Conceptual analysis is one of the main philosophical methods. Traditionally, it has been conducted from the armchair, but with the rise of experimental philosophy philosophers have started conducting it empirically as well. We can call the conceptual analysis based on empirical studies ‘scientific conceptual analysis’. The present inquiry is concerned with the question of whether scientific conceptual analysis provides better justification than armchair conceptual analysis. The text is divided into six sections. In Sect. 4.2, I provide exact definitions of armchair conceptual analysis and scientific conceptual analysis. In Sect. 4.3, I present an argument to the conclusion that scientific conceptual analysis does provide better

H. Valchev () Guangdong University of Foreign Studies, Guangzhou, China © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Bordonaba-Plou (ed.), Experimental Philosophy of Language: Perspectives, Methods, and Prospects, Logic, Argumentation & Reasoning 33, https://doi.org/10.1007/978-3-031-28908-8_4

57

58

H. Valchev

justification. This argument is based on one of the main criticisms experimental philosophers have raised against armchair conceptual analysis and can be called ‘argument from uniformity of agreement’. In Sect. 4.4, I address the most influential response to the said criticism – ‘the expertise defence’ – and the way this response is related to the argument from uniformity of agreement. In Sect. 4.5, I present and discuss the results from an empirical study in which scientific conceptual analysis was used in order to gather evidence as regards the soundness of the argument from uniformity of agreement and the expertise defence. Finally, Sect. 4.6 contains some concluding remarks.

4.2 Armchair Conceptual Analysis and Scientific Conceptual Analysis Conceptual analysis, as it is traditionally understood in analytic philosophy, can be defined as follows: a method that consists in drawing a conclusion about what the definition of a predicate is on the basis of an armchair investigation into whether the predicate is semantically applicable in different possible cases (Valchev, 2018, p. 136).

The concept of semantic application is used in this definition in order for strictly conceptual matters to be distinguished from pragmatic matters such as whether it is appropriate to say something in a given situation. It can be defined as follows. A predicate is semantically applicable to an object, if and only if the object falls under the concept the predicate expresses. The conditions that are necessary and sufficient for an object to satisfy in order for a predicate to be semantically applicable to it are the semantic application conditions of the predicate. It might be said that to know the semantic application conditions of a predicate is the same as to know its meaning. In this sense, the meaning of a predicate is the set of its semantic application conditions. Traditional conceptual analysis is itself a viable philosophical method. Classic examples of its application include Plato’s analysis of the concept of beauty (Plato, 1997), Grice’s original analysis of the concept of meaning (Grice, 1950), Ayer’s analysis of the concept of knowledge (Ayer, 1956), etc. I believe, however, that it can be improved by adopting the following two changes to it: • the investigation into whether the predicate is semantically applicable in different possible cases is not to serve as a basis for a conclusion about what the definition of the predicate is, but as a basis for a conclusion about whether this-and-this is an (1) only necessary, (2) only sufficient, (3) both necessary and sufficient, or (4)

4 Does Scientific Conceptual Analysis Provide Better Justification than. . .

59

neither necessary nor sufficient condition for the predicate’s semantic application1 ; • the investigation into whether the predicate is semantically applicable in different possible cases is done not only from the armchair, but also empirically.2 (Valchev, 2018, pp. 140–141). If these two changes are adopted, the new definition of conceptual analysis can be formulated as follows: a method that consists in drawing a conclusion about the semantic application conditions of a predicate on the basis of an investigation into whether the predicate is semantically applicable in different possible cases (Valchev, 2018, p. 141).

The conceptual analysis in which the investigation into whether the predicate is semantically applicable in possible cases is done from the armchair can be called ‘armchair conceptual analysis’,3 and the conceptual analysis in which the said investigation is done empirically – ‘empirical conceptual analysis’. In order clarify the distinction between armchair conceptual analysis and empirical conceptual analysis, we can make use of the concept of intuition. We can say that one’s judgements about whether a predicate is semantically applicable in a possible case are based on certain intuitions that the person has acquired while learning the given language. We can call these intuitions ‘semantic intuitions’. Thus, we can say that conceptual analysis consists in investigating semantic intuitions. From this point of view, the difference between armchair conceptual analysis and empirical conceptual analysis lies in the way semantic intuitions are investigated. We can say that a philosopher conducting armchair conceptual analysis investigates their own semantic intuitions by reflecting on them,4 whereas a philosopher conducting empirical conceptual analysis investigates semantic intuitions by observing their manifestations in the linguistic behaviour or brain activity of competent speakers (Valchev, 2022). Thus, armchair conceptual analysis and empirical conceptual analysis can respectively be defined as follows • Armchair conceptual analysis – a method that consists in drawing a conclusion about the semantic application conditions of a predicate on the basis of reflection on semantic intuitions.

1

This change is adopted, for example, in Kipper (2012, p. 252), Henderson and Horgan (2011, p. 39) and Chalmers and Jackson (2001, p. 322). 2 This change is adopted, for example, in Glasgow (2008, p. 333), Overton (2013, p. 1383) and Sytsma (2010, p. 427). 3 Armchair conceptual analysis, as understood in the present paper, is roughly the same as what is sometimes called ‘the method of cases’. 4 The term ‘reflection’ is used here to refer to the act of looking into oneself in order to acquire access to certain psychological facts. In this sense, reflecting on one’s own semantic intuitions is a kind of “self-reflection” or “introspection”. Thus, one can only reflect on their own semantic intuitions, and not on those of others.

60

H. Valchev

• Empirical conceptual analysis – a method that consists in drawing a conclusion about the semantic application conditions of a predicate on the basis of observation of the manifestations of semantic intuitions. It must be noted, however, that not every argument that consists in usage of empirical conceptual analysis meets the standards of justification in science. According to the definition provided above, even polling a class of students or asking a few colleagues about their semantic intuitions can serve as a basis for empirical conceptual analysis. In order to distinguish the empirical conceptual analysis based on such informal techniques from the empirical conceptual analysis that meets the standards of justification in science, we can call the latter ‘scientific conceptual analysis’. Furthermore, we can say that whereas empirical conceptual analysis in general is based on “observation” or “empirical investigation”, scientific conceptual analysis is based on “empirical studies”. Thus, scientific conceptual analysis can be defined as follows. • Scientific conceptual analysis – a method that consists in drawing a conclusion about the semantic application conditions of a predicate on the basis of empirical studies of the manifestations of semantic intuitions. As far as the concept of empirical conceptual analysis covers all instances of conceptual analysis that do not fall under the concept of armchair conceptual analysis (see Valchev, 2022), and thus makes the classification of kinds of conceptual analysis complete, it is important from a purely conceptual point of view. Yet, philosophers do not generally refer to results from empirical conceptual analysis that is different from scientific conceptual analysis. This does not necessarily mean that they do not use this kind of empirical conceptual analysis. They might be asking their students or certain colleagues about their semantic intuitions, etc., but simply not mentioning it in their writings. If, however, someone has conducted an empirical study, they would surely refer to it. On the other hand, philosophers also refer to the results from armchair conceptual analysis they have conducted. Thus, we can say that the two kinds of conceptual analysis used by philosophers in their writings are armchair conceptual analysis and scientific conceptual analysis. According to the definitions provided above, traditional conceptual analysis is not the same as armchair conceptual analysis. Both of them are conducted from the armchair, but the conclusions drawn when conducting traditional conceptual analysis are only definitions, whereas the conclusions drawn when conducting armchair conceptual analysis also include other statements about the semantic application conditions of the given predicate. Thus, Gettier (1963) used an armchair investigation into whether the predicate ‘knows’ is semantically applicable in possible cases in order to criticize the “traditional” definition of knowledge (proposed by Plato, Ayer and Chisholm); Putnam (1975) and Kripke (1980) used an armchair investigation into whether the predicate ‘refers to’ is semantically applicable in possible cases (the most famous of which being the one about Twin-Earth and the one about Gödel), in order to criticize the descriptivist theory of reference; and Searle (1980) used an armchair investigation into whether the

4 Does Scientific Conceptual Analysis Provide Better Justification than. . .

61

predicate ‘understands’ is semantically applicable in possible cases (namely, the one about the Chinese room), in order to argue that possessing a mind is a necessary condition for the semantic application of that predicate. Scientific conceptual analysis, on the other hand, has been conducted as a part of experimental philosophy. The most common way of investigating the manifestations of semantic intuitions of competent speakers has been through questionnaire surveys. Thus, Weinberg et al. (2001) conducted a questionnaire survey, in which respondents were presented with a possible case of the type described by Gettier and asked whether the predicate ‘knows’ was semantically applicable in it. In a similar way, Machery et al. (2004) conducted a questionnaire survey, in which respondents were presented with a possible case resembling that about Gödel and asked whether the predicate ‘refers to’ was semantically applicable in it. Apart from knowledge and reference, scientific conceptual analysis has also been used as a tool for investigating free will (Nahmias et al., 2005, 2006), hope (Bluhm, 2012, 2013), personal identity (Strohminger & Nichols, 2014), etc. Apart from questionnaire surveys, experimental philosophers have also employed qualitative methods (Strohminger & Nichols, 2014), as well as corpus analysis (Bluhm, 2012, 2013; Bordonaba-Plou, 2021; Caton, 2020; Hansen et al., 2019; Hinton, 2021).

4.3 The Argument from Uniformity of Agreement The answer to the question of whether scientific conceptual analysis provides better justification than armchair conceptual analysis depends on whose usage of the given predicate the philosopher conducting conceptual analysis is concerned with. Grice insisted that when conducting conceptual analysis, he was concerned only with his own usage (Grice, 1958, p. 175), but a number of other philosophers seem to be concerned with the usage of other speakers as well. Weinberg et al. (2001) and Machery et al. (2004) base their conclusions about the semantic application of the predicates ‘knows’ and ‘refers to’ on the results from questionnaire surveys conducted among multiple speakers. Furthermore, when analysing the concept of knowledge, Ayer refers to the way “we” use the predicate ‘knows’ (Ayer, 1956, p. 26); when presenting the thought experiment about Gödel, Kripke refers to the way “we” or “an ordinary man” would use the name ‘Gödel’ (Kripke, 1980, p. 84); and when presenting the thought experiment about Twin-Earth, Putnam refers to what ‘an English speaker’ would have called ‘water’ in 1750 (Putnam, 1975, p. 142). We can say that in such cases, the philosopher conducting conceptual analysis is concerned with the common usage of the given predicate. In principle, one can conduct scientific conceptual analysis, even if they are concerned only with their own usage of the given predicate. This can be done, for example, by conducting a corpus analysis of the corpus of their own writings. Generally speaking, however, a much easier and more reliable option would be for them to just imagine certain possible cases and ask themselves whether the predicate is semantically applicable in them i.e. to conduct armchair conceptual

62

H. Valchev

analysis.5 Thus, we can say that if one is concerned only with their own usage of the predicate, armchair conceptual analysis provides better justification than scientific conceptual analysis. If, however, the philosopher is concerned with the common usage, the situation is more complicated. An argument to the conclusion that in this case scientific conceptual analysis provides better justification can be raised on the basis of a certain criticism of armchair conceptual analysis, raised by experimental philosophers. Experimental philosophers have argued that results from armchair conceptual analysis can be undermined by two kinds of empirical studies: (1) studies which suggest that semantic intuitions are sensitive to irrelevant factors such as the emotions induced by a possible case (Cameron et al., 2013), the order in which possible cases are presented (Petrinovich & O’Neill, 1996; Swain et al., 2008; Wright, 2010) and the way an outcome is described (e.g. Petrinovich & O’Neill, 1996; Schwitzgebel & Cushman, 2015); (2) studies which suggest that the semantic intuitions of the person who has conducted armchair conceptual analysis are different from those of the majority of speakers (Sytsma, 2010; Nahmias et al., 2005, 2006) or of the speakers from certain theoretically interesting social groups, defined by factors such as cultural background (Weinberg et al., 2001; Machery et al., 2004) and gender (see, e.g. Buckwalter & Stich, 2014; Friesdorf et al., 2015). We can say that the latter kind of empirical studies consist in scientific conceptual analysis of the common usage of the given predicate. Thus, if the results from such studies can undermine results from armchair conceptual analysis, then results from scientific conceptual analysis of the common usage can undermine results from armchair conceptual analysis of the common usage. Thus, the statement that results from such studies can undermine results from armchair conceptual analysis can be used in an argument to the conclusion that scientific conceptual analysis provides

5

A possible argument for the superiority of scientific conceptual analysis over armchair conceptual analysis states that if a predicate’s semantic application conditions suggested by a person’s reflection on their semantic intuitions are different from those suggested by observation of the person’s linguistic behaviour, then the person is confused about the way they actually use words. If this is true, then the results from scientific conceptual analysis of one’s own usage could undermine the results from armchair conceptual analysis of one’s own usage, and therefore, scientific conceptual analysis would actually provide better justification. However, this argument is based on the incorrect assumption that armchair conceptual analysis relies on judgements about the way the given person uses words. As I understand the method (and I believe this is the way it is usually understood), when one is considering possible cases as a part of conducting armchair conceptual analysis, the question they are asking themselves is not “Do I normally apply predicate P to object O?”, but “Is predicate P applicable to object O?” This is not to say that a predicate’s semantic application conditions suggested by a person’s reflection on their semantic intuitions cannot differ from those suggested by observation of the person’s linguistic behaviour. One might imagine object O and think that predicate P is not applicable to it, but then encounter object O in real life and apply predicate P to it. This would be an interesting psychological phenomenon, which is in need of explanation, but we have no reason to assume a priori that in the first instance the person has made a mistake. There might be a mistake due to the effect of some cognitive bias, but this is to be determined by further investigation, and the mistake might be in the second instance as well i.e. in the case of “actual” usage.

4 Does Scientific Conceptual Analysis Provide Better Justification than. . .

63

better justification than armchair conceptual analysis (see Sytsma, 2010; Valchev, 2022). This argument can be called ‘Argument from Uniformity of Agreement’ and formulated as follows.

4.3.1 Argument from Uniformity of Agreement 1. If the results from applying one method can be undermined by results from applying another method, then the latter method provides better justification than the former.6 2. The results from armchair conceptual analysis can be undermined by results from scientific conceptual analysis which suggest that the relevant semantic intuitions of the majority of speakers or of the speakers from certain theoretically interesting social groups are different from those of the person who has conducted armchair conceptual analysis. 3. Therefore, scientific conceptual analysis provides better justification than armchair conceptual analysis. This argument is based on the idea that the concept of a common usage implies a certain degree of uniformity among different speakers. However, in the formulation provided above this idea is not made explicit. In order to further elaborate on the argument, I also constructed a more complex version of it, in which the said idea is made explicit. Using the resources of formal logic, this more complex version can be formulated as follows.

4.3.2 Argument from Uniformity of Agreement – Complex Version7 1. CommonU(SCA). CommonU(ACA) 2. MoreS(SCA; ACA) 6

In this formulation, I am using the predicate ‘undermines’ in the meaning in which it is used by the experimental philosophers who have raised the criticism I referred to. The responses to this criticism have not explicitly addressed the question of the meaning of the predicate, but it is generally possible that the philosophers who have made these responses ascribe to the predicate a slightly different meaning. 7 Designations: M – the set of all methods; SCA – the method of scientific conceptual analysis; ACA – the method of armchair conceptual analysis; m, m1, m2 – random members of M; CommonU(m) – m is a method for justifying statements about what the common usage is; UniformU(m) – m is a method for justifying statements about what the usage of the majority of speakers, as well as speakers from certain theoretically interesting social groups is; MoreS(m1; m2) – m1 allows for the usage of multiple speakers, as well as speakers from certain theoretically interesting social groups to be taken into account, and m2 does not; BetterJ(m1; m2) – m1 provides better justification than m2.

64

H. Valchev

3. ∀m, CommonU(m) UniformU(m) 4. ∀m1 , ∀m2 , [UniformU(m1) . UniformU(m2)] => [MoreS(m1; m2 ) => BetterJ(m1 ; m2 ) 5. ∀m1 , ∀m2 , [CommonU(m1) . CommonU(m2)] => [MoreS(m1; m2 ) => BetterJ(m1 ; m2 )] – from 3) and 4) 6. BetterJ(SCA; ACA) – from 1), 2) and 5) The sixth proposition in this formulation is the conclusion of the argument. It states that scientific conceptual analysis provides better justification than armchair conceptual analysis and is logically entailed by the first four propositions, which are the premises of the argument. The first premise states that scientific conceptual analysis and armchair conceptual analysis are methods for justifying statements about what the common usage is. The second premise states that unlike armchair conceptual analysis, scientific conceptual analysis allows for the usage of multiple speakers, as well as speakers from certain theoretically interesting social groups to be taken into account. The third premise states that the two methods are methods for justifying statements about what the common usage is, if and only if they are methods for justifying statements about what the usage of the majority of speakers, as well as speakers from certain theoretically interesting social groups is. Finally, the fourth premise states that a method that allows for the usage of multiple speakers, as well as speakers from certain theoretically interesting social groups to be taken into account provides better justification of statements about the said kind of uniformity than a method which does not allow this. The first two premises are not in need of any additional explanations. The truth of the third premise stems from the idea that the concept of a common usage implies a certain degree of uniformity among different speakers. We can say that a given usage of a predicate is common, if and only if it is employed by the majority of speakers, as well as by the speakers from certain theoretically interesting social groups. These theoretically interesting groups are groups that might reasonably have been predicted to show variation in their judgements about the semantic application of the given predicate (see Sytsma & Livengood, 2011). However, which groups exactly are theoretically interesting and what percentage of the speakers would constitute a majority are questions that are up for debate. Finally, the truth of the fourth premise can be said to be based on theory of probability.

4.4 The Expertise Defence The proponents of armchair conceptual analysis have responded to the criticisms raised by experimental philosophers in different ways (see Horvath & Koch, 2021 for an overview), but their most influential response has been to argue that appeals to the semantic intuitions of philosophers have a higher evidential value than appeals to the semantic intuitions of laypeople. If this is true, then philosophers might be said

4 Does Scientific Conceptual Analysis Provide Better Justification than. . .

65

to be experts as regards the semantic application of predicates in possible cases.8 Thus, this response has become known as ‘the expertise defence’. The expertise defence can be understood as an attack against the second premise of the argument from uniformity of agreement in its simple version. This attack can be formulated as an argument in the following way.

4.4.1 The Expertise Defence 1. If the person who has conducted armchair conceptual analysis is an expert as regards the semantic application of predicates in possible cases, then it is not true that the results from armchair conceptual analysis can be undermined by results from scientific conceptual analysis which suggest that the relevant semantic intuitions of the majority of speakers or of the speakers from certain theoretically interesting social groups are different (i.e. point to a different semantic application of the given predicate) from those of the person who has conducted armchair conceptual analysis. 2. Professional philosophers are experts as regards the semantic application of predicates in possible cases. 3. Therefore, if the person who has conducted armchair conceptual analysis is a professional philosopher, then it is not true that the results from armchair conceptual analysis can be undermined by results from scientific conceptual analysis which suggest that the relevant semantic intuitions of the majority of speakers or of the speakers from certain theoretically interesting social groups are different from those of the person who has conducted armchair conceptual analysis. The first premise of this argument has not been discussed in the literature, but the second premise has been a subject of debate. Kauppinen (2007) argues that when considering possible cases, philosophers are able to employ a distinctive kind of reflection. Ludwig (2007) argues that when considering possible cases, philosophers are able to base their judgements solely on their semantic competence. Williamson (2011) and Nado (2014) argue that philosophical expertise consists in improved performance in tasks like evaluation, assessment and criticism of possible cases.

8

The term ‘expert’ can be understood in two ways. On the one hand, an expert about predicates of the said kind can be someone who is in a position to legislate their usage – as a part of what Putnam calls the ‘division of linguistic labour’ (Putnam 1975). Thus, botanists are in a position to legislate the usage of terms like ‘oak tree’ and ‘elm tree’, grammarians are in a position to legislate the usage of grammar, etc. If, however, the philosopher is an expert in the sense of someone who is in a position to legislate the usage of predicates like ‘knows’ and ‘refers to’, then their aim would be not to determine what the common usage of these predicates is, but what it should be. This understanding is inconsistent with conceptual analysis, as far as conceptual analysis, as it is normally practiced, is not a prescriptive, but a descriptive activity (for a discussion see Valchev 2018). Accordingly, the expertise defence does not refer to this understanding.

66

H. Valchev

Critics of the expertise defence, on the other hand, have based their arguments on empirical research which suggests that the semantic intuitions of both philosophers and non-philosophers are affected by certain irrelevant factors, including their temperaments (Schulz et al., 2011), the order in which they are presented with the possible cases (Schwitzgebel & Cushman, 2012), whether the possible cases are framed in the second person or the third person (Tobia et al., 2013), and even the smell of the questionnaire (Tobia et al., 2013). There is some empirical research which suggests that philosophers are indeed less susceptible to a limited range of relevant cognitive biases (see Egler & Ross, 2020), but we can say that on the whole the expertise defence is not supported by the empirical data.

4.5 The Study 4.5.1 Design of the Study The study consisted in a questionnaire survey conducted among undergraduate students from Sofia University, Bulgaria. Assuming that there might be correlations between respondents’ answers, and their majors, I wanted to reach students from diverse majors. The majors I chose were Philosophy, Psychology, Law, International Relations, Computer Sciences, and Informatics. In order to reach the students from these majors, I would turn up at the beginning of their lectures, and ask the given professor for permission to conduct the survey among the given class. This way I was able to recruit a total of 483 respondents, 38 of whom majoring in Philosophy, 67 in Psychology, 198 in Law, 66 in International Relations, 53 in Computer Sciences, and 61 in Informatics. The questionnaire was built around a scenario in which a philosophy professor named Ivanov is considering a Gettier-type possible case. Ivanov holds a certain belief about whether the predicate ‘knows’ is semantically applicable in the possible case, but decides to check whether other people would agree with him by conducting a questionnaire survey. It turns out that a certain group of people disagree with him. For clarity, and because I thought that the semantic intuitions of the respondents might be influenced by a certain cognitive bias, I decided to use a vignette with the Gettier-type possible case as the first question in the questionnaire. The vignette was directly taken from Weinberg et al. (2001) and was designated as ‘Question 1’. The second question in the questionnaire – designated as ‘Question 2’ – consisted in a vignette with the possible case about Ivanov. After becoming acquainted with the possible case, the respondents had to indicate the extent to which they agreed on a Likert scale from 1 (completely disagree) to 5 (completely agree) with the following two statements: that Ivanov is an expert on Question 1; and that the results from the study conducted by Ivanov undermine his belief about the answer to Question 1. Finally, the respondents were asked to indicate their major, their gender, and whether they had previously taken any courses in philosophy.

4 Does Scientific Conceptual Analysis Provide Better Justification than. . .

67

In order to account for the possible cognitive bias mentioned above and for the different possible ways in which the results from Ivanov’s study could potentially undermine his belief, I created different variants of Question 2. The possible cognitive bias consisted in the following. I thought that the respondents who agree with Ivanov about the answer to Question 1 might be more likely to say that the results from his study don’t undermine his belief, whereas the respondents who disagree with him about the answer to Question 1 might be more likely to say that the results from his study undermine his belief. In order to gather better evidence as regards this possible cognitive bias, I created two variants of the possible case with Ivanov – one in which Ivanov believes that the predicate ‘knows’ is semantically applicable in the Gettier-type possible case and one in which he believes it is not.9 Furthermore, the study conducted by Ivanov had the following six possible outcomes. 1. 2. 3. 4.

25% of all respondents disagree with Ivanov. 50% of all respondents disagree with Ivanov. 75% of all respondents disagree with Ivanov. The majority of all respondents agree with Ivanov, but the majority of Muslims disagree. 5. The majority of all respondents agree with Ivanov, but the majority of black respondents disagree. 6. The majority of all respondents agree with Ivanov, but the majority of respondents grown up in East Asia disagree. For each of these possible outcomes I created a separate variant of the possible case with Ivanov. Thus, the possible case had a total of twelve variants. Each of these variants corresponded to a separate variant of the questionnaire. Thus, altogether there were twelve variants of the questionnaire. Each participant was randomly given one of them. I have included one of these variants in the appendix. The questionnaire survey I conducted was related in the following way to the question of whether scientific conceptual analysis provides better justification than armchair conceptual analysis. If the level of agreement with the statement that the results from Ivanov’s study undermine his belief was high for a given outcome of his study, this would provide us with some evidence suggesting that the given outcome is a sufficient condition for the semantic application of the predicate ‘undermine’ i.e.

9

On the basis of the results from Weinberg et al. (2001) and several other studies, I assumed that there is a significant chance that most respondents answer that the predicate ‘knows’ is not semantically applicable in the Gettier-type possible case. If this turned out to be the case, then no matter if Ivanov believes that the predicate is semantically applicable or not, the level of agreement with him would be much different (much lower or much higher) from the level of disagreement. However, the evidence regarding the influence that agreement or disagreement with Ivanov might have on the answers to Question 2 would be better if the level of agreement is similar to the level of disagreement. The difference between the two could be minimized by creating the said two variants of the possible case with Ivanov (one in which Ivanov believes that the predicate is semantically applicable and one in which he believes it is not).

68

H. Valchev

that it does undermine Ivanov’s belief. This, in turn, would be evidence suggesting that the second premise of the argument from uniformity of agreement is true. Analogically, if the level of agreement with the statement that the results from Ivanov’s study undermine his belief was low for a given outcome of his study, this would provide us with some evidence suggesting that the given outcome is not a sufficient condition for the semantic application of the predicate ‘undermine’i.e. that it does not undermine Ivanov’s belief – which, in turn, would be evidence suggesting that the second premise of the argument from uniformity of agreement is false. Furthermore, if the second premise is true, the levels of agreement with the statement for outcomes (1), (2) and (3) could be expected to be progressively higher, whereas if the second premise is false, the said levels of agreement could be expected not to be progressively higher. The level of agreement with the statement that Ivanov is an expert on Question 1, on the other hand, could not be considered to be evidence about whether the predicate ‘is an expert’ is semantically applicable to Ivanov or any other professional philosopher, because the respondents were not given any information about the evidential value of Ivanov’s appeal to his own semantic intuitions i.e. about his ability to make judgements about possible cases, his level of immunity to cognitive biases, etc., apart from the fact that he is a philosophy professor. Thus, the level of agreement with the statement that he is an expert could not be considered to be evidence for the truth value of the second premise of the expertise defence. However, it would be indicative of the level of expertise the respondents would ascribe to Ivanov on the basis of the fact that he is a philosophy professor, which would allow me to draw some conclusion about the truth of the first premise of the expertise defence. If the first premise is true, then the level of agreement with the statement that Ivanov is an expert could be expected to be negatively correlated with the level of agreement with the statement that the results from his study undermine his belief. Accordingly, if the first premise is false, then the levels of agreement with the two statements could be expected not to be negatively correlated.

4.5.2 Results and Discussion The average Likert scale value for the statement that the results from Ivanov’s study undermine his belief was 2.94 with a standard deviation of 1.22, and the average Likert scale values for the six possible outcomes did not differ significantly. Furthermore, the average Likert scale values corresponding to outcomes (1), (2) and (3) were 2.92 (sd: 1.21), 3.13 (sd: 1.14), and 2.87 (sd: 1.24) respectively, which means that they were not progressively lower. The fact that they were not progressively lower provides us with some evidence that the second premise of the argument from uniformity of agreement is false (or at least that results from armchair conceptual analysis are not undermined by results from empirical studies which suggest that the majority of speakers have semantic intuitions that are different from those of the person who has conducted armchair conceptual analysis),

4 Does Scientific Conceptual Analysis Provide Better Justification than. . .

69

but considering the fact that the level of agreement with the statement was not low, we can say that this evidence is weak. We can say that on the whole the results did not provide any clear evidence as regards the truth value of the second premise of the argument from uniformity of agreement. The Likert scale values for the statement that Ivanov is an expert were not negatively correlated with the Likert scale values for the statement that the results from his study undermine his belief i.e. the respondents who ascribed a high Likert scale value to the former statement were not significantly more likely to ascribe a low Likert scale value to the latter statement, and the respondents who ascribed a low Likert scale value to the former statement were not significantly more likely to ascribe a high Likert scale value to the latter statement. Thus, the results provide some evidence that the first premise of the expertise defence is false, and therefore that the expertise defence is not a sound argument. The results also suggested that the respondents were indeed influenced by the cognitive bias described above. Those who gave the same answer to Question 1 as Ivanov ascribed a significantly lower (p < 0.01) Likert scale value to the statement that the results from his study undermine his belief (i.e. were significantly more likely to disagree that the results from his study undermine his belief), and those who gave to Question 1 an answer which was different from the answer given by Ivanov ascribed a significantly higher (p < 0.01) Likert scale value to the statement that the results from his study undermine his belief (i.e. were significantly more likely to agree that the results from his study undermine his belief). Furthermore, the respondents who were majoring in psychology ascribed a significantly higher (p < 0.01) Likert scale value to the statement that Ivanov is an expert, whereas the respondents majoring in law ascribed a significantly lower (p < 0.01) Likert scale value to the same statement. The gender and experience in philosophy of the respondents, on the other hand, did not have any significant influence on the results. The effect of the said cognitive bias is one possible reason why the results from the study did not provide any clear evidence as regards the truth value of the second premise of the argument from uniformity of agreement. However, this effect was relatively small. The respondents who agreed with Ivanov did ascribe a significantly lower average Likert scale value to the statement that the results from his study undermine his belief, but this Likert scale value was still close to 3 (it was 2.72 (sd: 1.18)). Analogically, the respondents who disagreed with Ivanov did ascribe a significantly higher average Likert scale value to the statement that the results from his study undermine his belief, but this Likert scale value was still close to 3 (it was 3.17 (sd: 1.22)). Thus, even if the respondents weren’t influenced by this cognitive bias, the results still wouldn’t provide any clear evidence as regards the truth value of the second premise of the argument from uniformity of agreement. Another possible reason why the results didn’t provide any clear evidence as regards the truth value of the second premise of the argument from uniformity of agreement is an ambiguity in the semantic application conditions of the predicate ‘undermines’. In fact, it can be assumed that the existence of such an ambiguity has allowed for the existence of the cognitive bias. In any case, the existence of such an ambiguity and of the cognitive bias can also partially explain the fact that the

70

H. Valchev

agreement with the statement that Ivanov is an expert was not negatively correlated with the agreement with the statement that the results from his study undermine his belief. This fact, however, can also be partially explained by the fact that even experts sometimes make mistakes. It can be assumed that the respondents who ascribed high Likert scale values to both the statement that Ivanov is an expert and the statement that the results from his study undermine his belief based their reasoning exactly on this fact. Finally, the existence of an ambiguity in the semantic application conditions of the predicate ‘undermines’ and the existence of the said cognitive bias can also partially explain the existence of disagreement amongst philosophers as regards the justification of armchair conceptual analysis and scientific conceptual analysis. As I noted in the previous section, empirical research suggests that philosophers are affected by different cognitive biases when considering the semantic application of predicates in possible cases. They might as well be affected by the one identified as a part of the present study (although further research needs to be conducted in order to determine whether this is actually the case). On the other hand, an ambiguity in the semantic application conditions of a predicate allows different speakers to ascribe to the predicate different semantic application conditions, which, in turn, can lead to certain conflicts.

4.6 Conclusion The present inquiry was concerned with the question of whether scientific conceptual analysis provides better justification than armchair conceptual analysis. In order to address this question, I provided exact definitions of armchair conceptual analysis and scientific conceptual analysis. Armchair conceptual analysis can be defined as a method that consists in drawing a conclusion about the semantic application conditions of a predicate on the basis of reflection on semantic intuitions, whereas scientific conceptual analysis can be defined as a method that consists in drawing a conclusion about the semantic application conditions of a predicate on the basis of empirical studies of the manifestations of semantic intuitions. Furthermore, I used a certain criticism of armchair conceptual analysis, raised by experimental philosophers, as a basis for an argument to the conclusion that scientific conceptual analysis provides better justification than armchair conceptual analysis, and considered the expertise defence as a possible response to this argument. I called the argument ‘argument from uniformity of agreement’ and formulated two versions of it – a simple and a complex version. In short, the simple version states that scientific conceptual analysis provides better justification than armchair conceptual analysis, because the results from armchair conceptual

4 Does Scientific Conceptual Analysis Provide Better Justification than. . .

71

analysis can be undermined by results from scientific conceptual analysis which suggest that the relevant semantic intuitions of the majority of speakers or of the speakers from certain theoretically interesting social groups are different from those of the person who has conducted armchair conceptual analysis. The expertise defence can be understood as an attack of one of the premises of this argument. Finally, I presented and discussed the results from an empirical study in which scientific conceptual analysis was used in order to gather evidence as regards the soundness of the argument from uniformity of agreement and the expertise defence. The study consisted in a questionnaire survey in which respondents are presented with two vignettes. The first one consisted in a Gettier-type possible case, whereas the second one described a scenario in which a philosophy professor named Ivanov is considering the Gettier-type possible case. Ivanov holds a certain belief about whether the predicate ‘knows’ is semantically applicable, but decides to check whether other people would agree with him by conducting a questionnaire survey. It turns out that a certain group of people disagree with him. After becoming acquainted with this scenario, the respondents in the study were asked about the extent to which they agreed with the following two statements: that Ivanov is an expert about the Gettier-type possible case; and that the results from his study undermine his belief about the Gettier-type possible case. It turned out that on the whole the respondents were uncertain about whether the results from Ivanov’s study undermine his belief, and that their agreement with the statement that his belief is undermined was not negatively correlated with the statement that he is an expert. The latter result was interpreted as providing some evidence against one of the premises of the expertise defence, and thus against the expertise defence itself, whereas the former result was interpreted as suggesting an ambiguity in the semantic application conditions of the predicate ‘undermine’. Furthermore, the results suggested the existence of a cognitive bias consisting in the fact that the respondents who agreed with Ivanov about the Gettier-type possible case were more likely to disagree that the results from his study undermine his belief, whereas the respondents who disagreed with him were more likely to agree that the results from his study undermine his belief. Finally, it was speculated that the existence of this cognitive bias and of an ambiguity in the semantic application conditions of the predicate ‘undermine’ might partially explain the existence of disagreement amongst philosophers as regards the justification of armchair conceptual analysis and scientific conceptual analysis.

72

H. Valchev

Appendix Question 1 Bob has a friend, Jill, who has driven a Buick for many years. Bob therefore thinks that Jill drives an American car. He is not aware, however, that her Buick has recently been stolen, and he is also not aware that Jill has replaced it with a Pontiac, which is a different kind of American car. Does Bob really know that Jill drives an American car, or does he only believe it? (a) really knows. (b) only believes. Question 2 Ivanov is a Philosophy professor. He believes that the answer to Question 1 is (b), but decides to check whether other people would agree with him by conducting a questionnaire survey among a great number of people. It turns out that 75% of the respondents answer with (a). Please, indicate the extent to which you agree with the following statement on a scale from 1 (completely disagree) to 5 (completely agree). Ivanov is an expert on Question 1. 1 2 3 4 5 The results from the survey conducted by Ivanov undermine his belief that the answer to Question 1 is (b). 1 2 3 4 5 Major: Gender: Have you completed any course in philosophy before?

References Ayer, A. J. (1956). The problem of knowledge. Penguin Books Ltd. Bluhm, R. (2012). Selbsttäuscherische Hofnung. Mentis. Bluhm, R. (2013). Don’t ask, look! linguistic corpora in philosophical analyses. In M. Hoeltje, T. Spitzley, & W. Spohn (Eds.), Was dürfen wir glauben? Was sollen wir tun? (pp. 7–15). DuEPublico. Bordonaba-Plou, D. (2021). An analysis of the centrality of intuition talk in the discussion on taste disagreements. Filozofia Nauki, 29(2), 133–156. Buckwalter, W., & Stich, S. (2014). Gender and philosophical intuition. In J. Knobe & S. Nichols (Eds.), Experimental philosophy (Vol. 2, pp. 307–346). Oxford University Press.

4 Does Scientific Conceptual Analysis Provide Better Justification than. . .

73

Cameron, D., Payne, K., & Doris, J. (2013). Morality in high definition: Emotion differentiation calibrates the influence of incidental disgust on moral judgments. Journal of Experimental Social Psychology, 49(4), 719–725. Caton, J. N. (2020). Using linguistic corpora as a philosophical tool. Metaphilosophy, 51(1), 51–70. Chalmers, D., & Jackson, F. (2001). Conceptual analysis and reductive explanation. The Philosophical Review, 110(3), 315–360. Egler, M., & Ross, L. (2020). Philosophical expertise under the microscope. Synthese, 197(3), 1077–1098. Friesdorf, R., Conway, P., & Gawronski, B. (2015). Gender differences in responses to moral dilemmas: A process dissociation analysis. Personality and Social Psychology Bulletin, 41(5), 696–713. Gettier, E. (1963). Is justified true belief knowledge? Analysis, 23(6), 121–123. Glasgow, J. (2008). On the methodology of the race debate: Conceptual analysis and racial discourse. Philosophy and Phenomenological Research, 76(2), 333–358. Grice, P. (1950). Meaning. In Studies in the ways of words (pp. 213–223). Harvard University Press. Grice, P. (1958). Postwar Oxford philosophy. In Studies in the ways of words (pp. 171–180). Harvard University Press. Hansen, N., Porter, J. D., & Francis, K. (2019). A corpus study of “know”: On the verification of philosophers’ frequency claims about language. Episteme, 18(2), 242–268. Henderson, D., & Horgan, T. (2011). The epistemological spectrum: At the interface of cognitive science and conceptual analysis. Oxford University Press. Hinton, M. (2021). Corpus linguistics methods in the study of (meta)argumentation. Argumentation, 35, 435–455. Horvath, J., & Koch, S. (2021). Experimental philosophy and the method of cases. Philosophy Compass, 16(1), e12716. https://doi.org/10.1111/phc3.12716 Kauppinen, A. (2007). The rise and fall of experimental philosophy. Philosophical Explorations, 10, 95–118. Kipper, J. (2012). A two-dimensionalist guide to conceptual analysis. Ontos Verlag. Kripke, S. (1980). Naming and necessity. Harvard University Press. Ludwig, K. (2007). The epistemology of thought experiments: First person versus third person approaches. Midwest Studies in Philosophy, 31, 128–159. Machery, E., Mallon, R., Nichols, S., & Stich, S. (2004). Semantics, cross-cultural style. Cognition, 92(3), 1–12. Nado, J. (2014). Philosophical expertise. Philosophy Compass, 9(9), 631–641. Nahmias, E., Morris, S., Nadelhoffer, T., & Turner, J. (2005). Surveying freedom: Folk intuitions about free will and moral responsibility. Philosophical Psychology, 18(5), 561–584. Nahmias, E., Morris, S., Nadelhoffer, T., & Turner, J. (2006). Is incompatibilism intuitive? Philosophy and Phenomenological Research, 73(1), 28–53. Overton, J. (2013). ‘Explain’ in scientific discourse. Synthese, 190(8), 1383–1405. Petrinovich, L., & O’Neill, P. (1996). Influence of wording and framing effects on moral intuitions. Ethology and Sociobiology, 17(3), 145–171. Plato. (1997). Greater Hippias. In J. Cooper (Ed.), Plato: Complete works. Hackett. Putnam, H. (1975). The meaning of ‘Meaning’. In Mind, language and reality: Philosophical papers (Vol. 2, pp. 215–271). Cambridge University Press. Schulz, E., Cokely, E., & Feltz, A. (2011). Persistent bias in expert judgments about free will and moral responsibility: A test of the expertise defense. Consciousness and Cognition, 20(4), 1722–1731. Schwitzgebel, E., & Cushman, F. (2012). Expertise in moral reasoning? Order effects on moral judgment in professional philosophers and non-philosophers. Mind & Language, 27(2), 135– 153. Schwitzgebel, E., & Cushman, F. (2015). Philosophers’ biased judgments persist despite training, expertise and reflection. Cognition, 141, 127–137. Searle, J. (1980). Minds, brains and programs. Behavioral and Brain Sciences, 3(3), 417–457.

74

H. Valchev

Strohminger, N., & Nichols, S. (2014). The essential moral self. Cognition, 131(1), 159–171. Swain, S., Alexander, J., & Weinberg, J. (2008). The instability of philosophical intuitions: Running hot and cold on truetemp. Philosophy and Phenomenological Research, 76(1), 138– 155. Sytsma, J. (2010). The proper province of philosophy: Conceptual analysis and empirical investigation. Review of Philosophy and Psychology, 1, 427–445. Sytsma, J., & Livengood, J. (2011). A new perspective concerning experiments on semantic intuitions. Australasian Journal of Philosophy, 89(2), 315–332. Tobia, K., Buckwalter, W., & Stich, S. (2013). Moral intuitions: Are philosophers experts? Philosophical Psychology, 26, 629–638. Valchev, H. (2018). What is conceptual analysis? Balkan Journal of Philosophy, 10(2), 131–142. Valchev, H. (2022). Empirical conceptual analysis: An exposition. Philosophia, 50, 757–776. Weinberg, J., Nichols, S., & Stich, S. (2001). Normativity and epistemic intuitions. In J. Knobe & S. Nichols (Eds.), Experimental philosophy (pp. 17–46). Oxford University Press. Williamson, T. (2011). Philosophical expertise and the burden of proof. Metaphilosophy, 42, 215– 229. Wright, J.C. (2010). On intuitional stability: The clear, the strong, and the paradigmatic. Cognition, 115(3), 491–503.

Hristo Valchev completed in 2018 a PhD in Epistemology at Sofia University, Bulgaria, with a dissertation on the topic “The Method of Conceptual Analysis in Analytic Philosophy”. From 2019 to 2022, he worked as a postdoctoral researcher at Sun Yat-sen University, China, doing research related to the cultural embeddedness of argumentation. He is currently employed at Guangdong University of Foreign Studies, China. His areas of interest include epistemology, logic, philosophy of language, experimental philosophy, cultural studies.

Chapter 5

Distributional Theories of Meaning: Experimental Philosophy of Language Jumbly Grindrod

Abstract Distributional semantics is an area of corpus linguistics and computational linguistics that seeks to model the meanings of words by producing a semantic space that captures the distributional properties of those words within a corpus. In this paper, I provide an overview of distributional semantic models, including a broad sketch of how such models are constructed. I then outline the reasons for and against the claim that distributional semantic models can serve as a theory of meaning, paying special attention to those within the field who have defended this claim. Finally, I conclude by arguing that despite the fact that such models are holistic, they nevertheless avoid the objections raised against holistic theories of meaning, particularly from Fodor & Lepore (1992) (Holism: a shopper’s guide. Blackwell, 1992) and Fodor & Lepore (1999).

5.1 Introduction One central question in philosophy of language has been whether a formal theory of meaning in a language is possible and if so what the format of such a theory should be. In this paper, I will focus on distributional semantics: a mathematical approach to meaning that is quite far removed from the kinds of theory typically found in philosophy of language and formal semantics. It is an approach that arguably has the wind in its sails insofar as it is the basis for much of the recent progress in natural language processing. This has led some working in distributional semantics to endorse the claim that some distributional semantic model will form the basis for a theory of meaning in a language (Sahlgren, 2008; Baroni et al., 2014a; Westera & Boleda, 2019). In this paper, I will consider the plausibility of this claim by considering both the reasons that speak in favour of and against it. I will also argue that despite the fact that a distributional theory of meaning would be a holistic

J. Grindrod () Department of Philosophy, Edith Morley, Whiteknights, University of Reading, Reading, UK e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Bordonaba-Plou (ed.), Experimental Philosophy of Language: Perspectives, Methods, and Prospects, Logic, Argumentation & Reasoning 33, https://doi.org/10.1007/978-3-031-28908-8_5

75

76

J. Grindrod

theory of meaning, the extent to which the objections typically raised against holistic theories would also apply to a distributional theory is not straightforward. In that respect, a philosophical investigation of distributional semantics proves valuable insofar as it opens up theoretical possibilities that have largely been overlooked in the philosophical literature. The paper will proceed as follows. In Sect. 5.2, I will outline the distributional hypothesis as the theoretical basis for distributional semantics, and distinguish between two forms of distributional hypothesis: one weaker and one stronger. Roughly, the weaker form of the distributional hypothesis posits a correlation between the meaning of a term and its distribution across a suitable corpus, while the stronger version claims that the distribution of an expression serves to (at least partially) represent the meaning of a term. In Sect. 5.3, I will provide an overview of how distributional semantic models are constructed. In Sect. 5.4, I will consider the various reasons in favour of the distributional hypothesis (in both its weaker and stronger form), while in Sect. 5.5 I will consider the reasons for resisting the stronger form of the distributional hypothesis. Finally, in Sect. 5.5.4, I will focus on the holistic nature of distributional semantic models, and argue that the extent to which distributional semantic models are subject to worries usually raised against holistic theories is not straightforward, and that their holistic nature should not be taken as reason to reject them.

5.2 Distributional Semantics and the Distributional Hypothesis Corpus linguistics is a sub-discipline of linguistics that is not defined by its explanandum – as the likes of syntax, semantics, phonology, etc. are. Instead, it is defined by its methodology. Simply put, corpus linguistics draws insights from corpora, and this can be done in a wide variety of ways. While the history of corpus linguistics is certainly an interesting one – in the twentieth century it was arguably a victim of the sudden rise in generative linguistics brought forth by Chomsky and others (McEnery & Wilson, 1996, chap. 1) – it is fair to say that it has received much greater attention in recent years, particularly as corpora have become larger and more readily available, and the methods used to analyse corpora have proliferated and become more easily implementable. It would of course be no surprise that corpus linguistics is a worthwhile approach if one is interested, say, in the frequency of certain expressions or phrases (or any other linguistic item). Indeed, Gries (2016, p. 11) points out that “strictly speaking at least, the only thing corpora can provide information on is frequencies”. But one of the fundamental commitments of corpus linguistics is that information on frequencies can be put to work in a number of different ways, to generate insights regarding a wide range of linguistic properties, including morphological, syntactic, phonological, semantic, and pragmatic. (For an overview and extended discussion, see: O’Keeffe and McCarthy (2010)).

5 Distributional Theories of Meaning: Experimental Philosophy of Language

77

This article is concerned with the use of corpus analysis to generate insights regarding meaning. But how could information on the frequency of words and phrases within a corpus shed light on the meanings of those expressions? The starting point is what is known as the distributional hypothesis – the claim roughly that “Lexemes with similar linguistic contexts have similar meanings” (Lenci, 2018, p. 152).1 Zellig Harris was one of the first to propose the distributional hypothesis: If we consider words or morphemes A and B to be more different in meaning than A and C, then we will often find that the distributions of A and B are more different than the distributions of A and C. In other words, difference of meaning correlates with difference of distribution. (Harris, 1954, p. 156)

The idea behind the distributional hypothesis is simple enough. If two terms were exact synonyms and in fact only differed in their form, then one would expect that they get used in exactly the same way. To use Harris’ example, we would expect that the terms “oculist” and “eye doctor” get used in largely the same ways, because they seem roughly synonymous.2 When it comes to a corpus, we can understand this in terms of an expression’s or phrase’s distribution across a corpus, i.e. where the term appears in the corpus. The distributional hypothesis reflects this in stating that two terms that have similar distributions also have similar meanings. Notice that the distributional hypothesis as outlined in the quotations from Lenci and Harris is merely a claim about correlation between meaning and distribution. Indeed, it is worth distinguishing between two forms of the distributional hypothesis: • DH1 : Expressions with similar meanings will have similar distributions across a corpus. • DH2 : The distribution of an expression across a corpus can serve as a representation of its meaning, such that a theory of meaning should at least capture the distributional properties of expressions. Clearly DH2 is much stronger than DH1 .3 But the truth of either of these claims would open the possibility of investigating meaning via an account of distributional facts. DH1 posits a mere correlation between distribution and meaning, and so it is consistent with the (plausible enough) idea that it is the meaning of a term that serves to partially determine its distribution, such that we can investigate the meaning of a term via a form of “reverse engineering” (Boleda, 2020, p. 2). In this respect, DH1 is a relatively modest claim that is consistent with a range of theoretical 1

See also: Firth (1957, p. 11) “You shall know a word by the company it keeps!”. It seems likely that there is a difference in register here, and that this difference will affect the ways in which these expressions are distributed across corpora. See: § 5.4 for discussion of this issue with regard to distributional semantic models of meaning. 3 Lenci (2008) distinguishes between a weak form of the distributional hypothesis – equivalent to DH1 – and a strong form. However, Lenci’s strong form is a cognitive hypothesis that distributional structures serve as part of the explanation of how expressions within a language are cognized. My focus here is on whether distributional semantics can serve as a theory of meaning and it is at least possible that a theory of meaning need not capture such cognitive facts. For further discussion of this issue, see § 5.3. 2

78

J. Grindrod

views regarding the nature of linguistic meaning, such that the instrumental use of distributional semantics in investigating meaning should be viewed as an exciting prospect regardless of one’s prior theoretical commitments. In particular, the success of distributional semantics means that new methodologies have become available in investigating the meanings of individual expressions. On the other hand, as we will see in Sects. 5.3, 5.4, and 5.5, many have argued for the interesting and controversial claim that a distributional model could serve as a theory of meaning, and this is captured in DH2 . One way in which the distributional hypothesis can be implemented is via a collocation analysis. For a given term, we could produce a list of the terms that most frequently appear alongside it.4 Doing so can reveal aspects of the target term’s meaning, such as the subject matter it belongs to, its register, its semantic prosody, etc. For instance, if the most frequent collocates of a given term include “hell”, “damned”, “forbidden”, and “damnation”, it seems reasonable to conclude that the term belongs to a religious subject matter and further that it has negative connotations. In a collocation analysis of the above kind, we use our understanding of the collocate terms to gain insight into the target term. But a more radical way of employing the distributional hypothesis is to make no prior appeal to our understanding of other terms in providing an account of a term’s meaning via its distribution. Distributional Semantic Models (DSMs) take this approach. In DSMs, the distribution of a given term across a corpus is represented mathematically as a vector.5 The central benefit of doing so is that the similarity of two distributions (and so, according to the distributional hypothesis, the meanings) can be measured in terms of geometric positions, for example, by measuring the cosine of the angle of two vectors. In the following section, I will provide a broad overview of how DSMs are constructed.

5.3 Constructing DSMs: An Overview If DSMs model the meanings of words in terms of vectors in a high-dimensional space, then how are such vectors generated? In what follows I will provide a broad overview of the approach so as to give the reader an idea of its core elements.6

4

Rather than looking at the most frequent terms, collocation analyses will often use association scores to pick collocates. This is reflective of the fact that often the most frequent collocates are simply the more frequent terms in a language, and so it is better instead to look at the collocates that bear the strongest statistical association, such as the highest Mutual Information score. 5 More complex models may capture the meanings of certain terms using some other mathematical object such as a tensor (see: Baroni et al., 2014a). We will ignore this complication for now. 6 Readers seeking greater technical detail may want to turn to: (Erk, 2012; Clark, 2015; Kiela & Clark, 2014; Lenci, 2018; Boleda, 2020).

5 Distributional Theories of Meaning: Experimental Philosophy of Language

79

A vector consists of n number of components that correspond to the coordinates in an n-dimensional space. The simplest form of DSM will have a corpus dictionary of size n, where each component of the vector will equal the number of times that a member of the corpus dictionary co-occurs with the vector term. To use a dummy example, consider the following (tiny) corpus: • Corpus: Dogs bark. Because dogs bark, we know that dogs don’t meow. Cats meow, but did you know that cats purr? We will use the following corpus dictionary, consisting of 4 terms: • Corpus dictionary: Now we need to decide what counts as co-occurrence. Let’s suppose that we will treat two terms as co-occurring iff one immediately follows the other (i.e. we use a context window of 1-1). Now we can construct vectors for terms like “dog” and “cat”, by representing the number of times the terms co-occur in the corpus dictionary: • Dog: • Cat: These vectors lie inside a 4-dimensional space. Typically, the corpus dictionary will be much larger however, consisting of hundreds or even thousands of terms, and so the vectors will typically belong to a space of many more dimensions. We are now able to measure the similarity of our vectors in terms of their position in the 4dimensional space. A standard way to measure the similarity is in terms of the cosine of the angle between the two vectors.7 A score of 0 would mean total dissimilarity, and a score of 1 would mean that the two vectors are identical. Here, the terms receive a similarity score of 0.26 (although in a corpus of this size such a score means very little). This provides an outline of the simplest possible approach, and there are a number of ways that DSMs can (and nearly always do) become more sophisticated. First, as previously mentioned, there is the question of what is selected as the corpus dictionary. We could even allow for every term in the corpus to be included in the corpus dictionary, although pragmatic considerations regarding computational efficiency may well lead to a limited set being used. Kiela and Clark (2014) tested the performance of vectors of different size ranging up to vectors with 500,000 components. They found no improvement in performance when the vector size goes beyond 50,000 components. Second, there is the question of the “context window”

7

The cosine of an angle in a right-angled triangle is calculated by dividing the adjacent side with the hypotenuse. There are alternative measures of similarity, such as Euclidean distance or the dot product, which may be called for in particular investigations. One advantage of cosine similarity with regard to investigating meaning is that it does not take into account the magnitude of the vector and so is not affected by the overall frequency of the terms in the way that Euclidean distance or dot product is. This accords with the idea that two terms could be very similar in meaning even if one is used much more frequently than the other.

80

J. Grindrod

i.e. how far an expression has to occur on either side of the target term to count as co-occurring. This could be very narrow, as it was in the toy example, or could be very wide. Indeed, some models will use whole documents as a context window (e.g. Landauer and Dumais’ (1997) Latent Semantic Analysis model). There are also possibilities for more sophisticated context windows. For example, rather than treating any co-occurrence within a window as the same, we might instead choose to weight a co-occurrence depending on how close to the target term it appeared. The context window could also appeal to syntactic relations if the model is being applied to corpora with POS-tagging (e.g. Padó & Lapata, 2007; Erk & Padó, 2008). Typically, the vectors representing the raw co-occurrences will also be transformed so that particular components in the vector are weighted. The reason for this is that corpora tend to have relatively few highly-frequent terms and many more less-frequent terms, and that it will often be the co-occurrences with less frequent terms that better capture the target term’s meaning. For instance, function terms such as “the” or “a” are among the most common terms in any corpus, but that a given term often co-occurs with such terms tells us little about their meaning.8 Dimensionality reduction is also commonly used, where the result is a vector of a lower dimensionality and where the dimensions no longer track co-occurrence data regarding particular terms in the corpus dictionary, but instead track more abstract features of the distributional data as a whole. Singular value decomposition is one of the most common methods used (see Landauer & Dumais, 1997). This usually proves to be an important step because large vectors constructed from co-occurrence data are often quite sparse – many of the vector components will be zero because may words never occur with other words – and dimensionality reduction provides a way of representing the same information (to some approximate level) within a less complex object.9 ,10 The approach outlined above is sometimes called a “count model” (Baroni et al., 2014b) because the vectors are constructed, initially at least, by counting the number of co-occurrences between the corpus dictionary and the vector term. An alternative approach that has become more prominent in recent years is to instead use a neural network to automatically construct vectors (known specifically in this context as “embeddings” (Lenci, 2018; Westera & Boleda, 2019)) for each term in a corpus as part of its training phase. Typically, the neural network is set the task of producing a language model i.e. a model designed to give the probability that a word arises at a given point in a text. For instance, the widely-used word2vec package created by Mikolov et al. (2013a) provides two types of model architecture. A “continuous bag 8

For an overview of various weighting functions, see Kiela and Clark (2014). Thanks to an anonymous reviewer for emphasizing this point. 10 Landauer and Dumais (1997) and Lenci (2018) both emphasise the importance of dimensionality reduction in the process of producing a model that captures meaning. Both suggest that dimensionality reduction serves as an abstraction mechanism that picks up on latent patterns in the distributional data that would not be detected by a model operating on raw frequency statistics. In this respect, dimensionality reduction may be an important step that brings greater benefit than just computational efficiency. 9

5 Distributional Theories of Meaning: Experimental Philosophy of Language

81

of words” model predicts which word will appear at a particular point in a text given the expressions that surround it, while a “skip gram” model predicts the words that will appear around a given word. In order to learn how to complete this task, the network takes as its training data a corpus organized into input/output pairs (e.g. for a “continuous bag of words” model, the input will be words surrounding the target term, and the output will be the target term). Over the training period, it would then repeatedly adjust the way it assigns output probabilities to each term until it approximates the input/output patterns in the training data to some minimal level. While there has been a great deal of discussion about whether count or so-called “predict” models are preferable (Baroni et al., 2014b; Lenci, 2018), predict models will be treated in the remainder of this paper as no different to count models in embodying the DSM approach. The above sketch of how DSMs are constructed will be sufficient for the purposes of this paper. However, there are several further parameters that a more thorough survey would give greater attention to, including: the selection of the corpus; whether there is a “stopword” list of terms excluded from the model; whether context words are lemmatized (e.g. whether “have”, “had”, “has”, are treated as distinct words or as instances of the same lemma), whether alternative similarity metrics beyond cosine can be used; and whether a DSM can draw upon data from non-textual modalities e.g. image data.

5.4 The Success of the Approach At this stage, we may ask: why think that modelling the distributions of expressions using the above method provides a way of investigating or representing word meaning? That is, what reason is there, beyond any initial plausibility, for accepting DH1 or DH2 ? The question is much more pressing in the case of DH2 because, as we will see in Sect. 5.5, there are many straightforward objections against the idea that a theory of meaning should be distributional in nature. In this section, I will outline the positive features of the approach regarding the potential modelling of word meaning. First, and perhaps most importantly, the proof of the distributional hypothesis (DH1 , specifically) is very much in the pudding. That is, DSMs are not produced on the assumption that capturing distributional properties will thereby provide a way of modelling meaning; instead, such models are tested and modified depending on the extent to which they can complete certain natural language processing tasks that are related to meaning. DSMs have been shown to perform well in a number of different natural language processing tasks. Among others, Baroni and Lenci (2010) have shown that DSMs can predict quantitative human judgements of meaning similarity between word pairs. DSMs also perform well in synonym detection tasks, where a synonym for a target word must be selected from a range of candidates (Landauer & Dumais, 1997; Bullinaria & Levy, 2007; Baroni & Lenci, 2010). They also perform well in noun categorisation tasks (Lund & Burgess, 1996) and in selectional

82

J. Grindrod

preference tasks (Padó et al., 2007). All of these tasks in one respect or another confirm the idea that similarity in the vector space (usually understood as cosine of the angles between two vectors) is tracking similarity in meaning. A second benefit of the distributional approach, as Boleda (2020, p. 4) notes, is that it is radically empirical insofar as word meanings are extracted from natural language data using automated methods that are often easily scaled up in application to any number of expressions in a language. This contrasts strikingly with much of what occurs in formal semantics and philosophy of language, where the theorist’s semantic intuitions are often appealed to in order to make claims about the meanings of given terms, and there has of course been a great deal of discussion about the potential pitfalls of relying upon intuitions in this way (Devitt, 2006; Maynes & Gross, 2013). One remedy has been to move from the intuitions of the theorist to the intuitions of experimental participants, perhaps by surveying them on particular word meanings or particular uses. But the benefit of the distributional approach is that it models word meaning in a way that makes no direct appeal to intuitions.11 It also arguably possesses greater ecological validity insofar as the language usage that the model is based on is usually authentic language use “in the wild” rather than from any experimental setting that may come with certain confounds or limitations.12 A third benefit of the distributional approach is that it puts semantic similarity at its core. The very idea of semantic similarity has attracted criticism (see Sect. 5.5.4; Fodor & Lepore 1992, 1999), but here I just want to note that such a notion does bring initial benefits with it. There does seem to be a clear sense in which we can speak of some terms being more similar in meaning than others. “Cat” is more similar in meaning to “dog” than it is to “communist”. Similarity in meaning is not something that is straightforwardly captured in a standard truth-conditional approach to meaning such as that exemplified by Heim and Kratzer (1998). For instance, if we are to understand the meaning of “cat” in terms of the set of all cats (or the characteristic function of that set (Heim & Kratzer, 1998, p. 24)), and the equivalent point applies for both “dog” and “communist”, it seems there is no straightforward way to pick up on the similarity relations that exist between the three terms. By contrast, DSMs can happily capture the above facts. Fourth, DSMs have enjoyed some success in partly capturing phenomena that are often seen as a thorn in the side of the formal semantic tradition: polysemy effects. One way in which we see polysemy arise is in the fact that what a word means on a given occasion of use depends in part on the terms that is combined

11 There may still be indirect appeal to intuitions. For instance, if our model is constructed according to whether it can predict human judgments of semantic similarity, then clearly meaning intuitions are playing a role in the evaluation of the model. But even if we acknowledge this, there is still a sense in which the role of intuitions is being minimized. We may allow that meaning intuitions are playing a role at the point of model evaluation, but once we have a model that passes the evaluation, and so (ideally) works, this will be able to inform us about the meanings of terms not included in the evaluation task. 12 Thanks to Emma Borg for emphasising this point.

5 Distributional Theories of Meaning: Experimental Philosophy of Language

83

with. For instance, “cutting” in “cutting cake” refers to a different type of action as in “cutting grass” (see Asher, 2011, pp. 16–17 for similar examples).13 This phenomenon has been approached in a number of different ways in the distributional tradition, but here I will outline just two. First is what could be described as a sense enumeration approach. The thought is that if a term is polysemous, it must have a finite number of meanings that we need to identify. A notable work of this kind in the distributional tradition is Schütze (1998). Schütze outlines a method whereby rather than producing vectors for each word in a corpus (let us call these word vectors), we instead produce a vector for each context that a word token appears in. For each word token, we reach the context vector by summing the word vectors of all the terms that surround that word token within a given window. For a given word, we will end up with one context vector for each token. We then use a clustering algorithm to group the various context vectors into a pre-set number of clusters. This provides a way of clustering all the various uses of e.g. “cutting” into distinct senses. Of course, with any unsupervized clustering algorithm, there is no guarantee that the resultant clusters align neatly with intuitive senses, but the clusters can then be inspected by considering whether the individual tokens within a cluster align to some recognisable usage. An alternative approach is to combine the word vector of the term of interest with the term that it appears alongside. For instance, we could combine the vector “cutting” with “grass” on the one hand and “cake” on the other. Mitchell and Lapata (2008) have shown that quite simple forms of combination such as vector addition or vector multiplication can lead to effective results. For instance, we might want to capture the difference in meaning of “ran” in “horse ran” vs “colour ran”. Mitchell and Lapata showed that vectors produced via vector multiplication correlate well with human judgments of similarity and also with vectors of phrases closely-related to the polysemous meaning (e.g. that “horse ran” is very similar in meaning to “horse galloped”). Many more sophisticated models have been proposed to represent the way two combined terms combine in meaning (See: Boleda (2020) for an overview). Notably, merely using addition or multiplication functions will take no account of word order, such that “dogs chase cats” and “cats chase dogs” would end up with identical word vectors, and so more sophisticated models attempt to capture the non-commutative nature of meaning composition (Baroni et al., 2014a, p. 264 ff.). In sum, there is a good deal of evidence to suggest that DSMs can 13 Note that the kind of polysemy considered here is what might be termed compositional polysemy i.e. the variation in meaning that an expression displays when combined in various larger expressions. It could be argued that this phenomenon presents no particular problem for the formal semantic tradition provided that it is acknowledged first that e.g. “cutting” may be associated with more than one sense and second that which sense it contributes depends partly on the expression it is combined with. There would be no principled barrier to a formal semantic model capturing these facts, and so there is no thorn in the side of the semantic tradition here (Fodor & Lepore, 1998, p. 284; Borg, 2012, pp. 188–189). Even if this is right, the benefit of DSMs should still be noted i.e. that they seem to provide a way of modelling how the specific meanings of complex expressions can arise from their parts, rather than just providing a general model of how expressions of particular semantic types combine with one another.

84

J. Grindrod

model certain polysemy effects, particularly the manner in which the meaning of a given term will be affected by the terms it appears alongside. At the same time, it is important to note that there are related phenomena that either kind of distributional approach will not capture. For instance, the much-discussed Travis cases, associated with Charles Travis (1997), are cases where a shift in meaning across two distinct conversational contexts is more plausibly due to contextual features beyond the cotext, and so to the extent that a term can vary in its meaning while its linguistic context remains the same, DSMs will fall short in a complete account of polysemy. A fifth benefit of the distributional approach is that there is some evidence that vector-based models of lexical meaning can track certain very specific features of a given term’s meaning. This is most clearly seen in so-called “analogy tasks”, that seem to suggest that DSMs to capture feature relations that hold between terms. For instance, there is a sense in which the terms “king” and “uncle” share a semantic feature regarding gender, as do “queen” and “aunt”. Furthermore, “king” and “queen” arguably only differ with regard to this semantic feature. Mikolov et al. (2013a, b) showed that this is captured in their DSM via the use of fairly simple operations of vector subtraction. Roughly, if we take Ev to be the vector for expression E, then kingv minus queenv is roughly equivalent to unclev minus auntv . Mikolov et al. showed that this is true of several different semantic and syntactic properties. This opens up the possibility that further investigation of DSMs, even via the use of relatively simply operations, may reveal further insight into word meaning beyond simple meaning similarity claims. To summarize this section, while the distributional approach may seem like a significant departure from what is often seen as a viable theory of meaning, the approach has a number of benefits. However, even with all this in mind, it may still be thought that there are serious objections against DSMs understood as a theory of meaning. I turn to those objections in the following section.

5.5 Objections to DSMs as Theories of Meaning The benefits outlined in the previous section are plausibly taken as reasons in favour of DH1. In this regard, the prospect of using distributional semantics as a method of investigation that complements other methods should be viewed as relatively uncontroversial. In this section and the next, I will consider the plausibility of DH2 – the stronger claim that a theory of meaning should represent meanings of expressions in terms of their distributional properties. Before doing so, it is important to emphasise that the idea that DSMs could serve as a theory of meaning is currently an increasingly popular position within the distributional semantics literature. For instance, (Sahlgren, 2008, p. 33) argues that once certain foundational assumptions drawn from the structuralist tradition are made explicit, it becomes clear that “distributional representations do constitute full-blown accounts of linguistic meaning”. Similarly, Westera and Boleda (2019) argues that once a theory of expression meaning is properly delineated from an account of the truth

5 Distributional Theories of Meaning: Experimental Philosophy of Language

85

and reference of language use, “distributional semantics on its own can in fact be a fully satisfactory model of expression meaning”.14 Meanwhile, Baroni et al. (2014a) states: A cautious view of DSMs is that they are a handy engineering surrogate of a semantic lexicon. Various considerations support, however, the bolder stance that DSMs are models of a significant part of meaning as it is encoded in human linguistic competence. (Baroni et al., 2014a, p. 255)

They continue: We do endorse the view that distributional semantics is a theory of semantics, that DSMs are an important part of the semantic component of an adult speaker’s mental lexicon. In short, the claim is that a core aspect of the meaning of a word is given by (a function of) its distribution over the linguistic contexts (and possibly the non-linguistic ones) in which it occurs, encoded in a vector of real values that constitutes a feature-based semantic representation of the word. (Baroni et al., 2014a, p. 257)

There are of course theoretical differences across these authors, but they are united by their endorsement of DH2 .15 One important point that will recur in this section is that endorsement of DH2 does not strictly amount to the view that DSMs serve as wholesale alternatives to the kinds of theories of meaning typical to formal semantics. Many in the distributional semantics field will argue that ultimately distributional models and formal semantic accounts will have to be combined in some way so as to reap the benefits of both traditions. But how exactly that combination would work remains something of an open question.16 For the purposes of this section, I will consider whether we can treat DSMs as even partial accounts of the meanings of expressions. As we will see, there are possible objections against a positive answer, but there are also responses to be found within the literature. Objections against DH2 naturally rely upon claims about the criteria for any suitable theory of meaning. This of course is a controversial topic in its own right and not one I intend to settle here. Instead, I will merely focus on whether the objections that the criteria give rise to do apply to DSMs. I will not evaluate the plausibility of the criteria themselves.

14 Westera and Boleda’s account certainly warrants greater discussion than I will give it here. In particular, their proposal has clear points of similarity with other views (e.g., radical contextualist views, relevance theoretic views, Pietroski’s (2018) internalist account of meanings as procedures) that claim that a theory of meaning should not capture worldly phenomena such as truth and reference. Westera and Boleda arguably go a step further in claiming that a theory of meaning should not even capture entailment relations. 15 See also Lenci (2008). 16 McNally and Boleda (2017) propose a novel combination of discourse representation theory and distributional semantics in order to capture the conceptual composition of modified noun phrases (e.g. that “red” modifies “pen” in a different way to the manner in which it modifies “apple”), and particularly the way in which the conceptual composition is sometimes affected by features of the object referred to (what they call “referentially afforded interpretations”). In doing so, they develop an interesting proposal on how distributional representations can be viewed as encoding conceptual information for both simple and complex expressions.

86

J. Grindrod

5.5.1 No Understanding Consider a language you have never encountered before. Imagine that in an attempt to familiarize yourself with that language, a powerful figure provides you with all the distributional facts about an incredibly large corpus in that language. For any given word, you have available the kinds of vectors we have been considering, and any other distributional information. You have an excellent understanding of when particular words turn up in the corpus, the terms they typically collocate with, and those they don’t. The first objection against DSMs is that this is the wrong kind of information required for an account of meaning – it would never provide you with an understanding of what the terms mean. If the language was Icelandic, for instance, these kinds of distributional facts might tell you all of the distributional patterns associated with “köttur”, including that it often collocates with “mjá”, but it wouldn’t tell you that the term is about or refers to cats. This of course, bears a strong resemblance to the famous Chinese Room thought experiment devised by Searle (1980), but also with Lewis’ (1970, p. 18) famous dictum that “semantics with no treatment of truth conditions is not semantics”.17 One way of interpreting this problem is as what has become known as the symbol grounding problem, following Harnad (1990). Understood in this way, the problem lies specifically in the symbolic nature of the linguistic information being processed, and particularly in the idea that if the meaning of a symbolic expression is given in terms of other expressions that themselves await interpretation, then we have gone no step further in capturing the meaning of the original expression. Following Harnad (1990), a popular solution to this problem is to claim that the meanings of the symbols in question have to be linked in some way to non-symbolic information, e.g. perceptual information or action-based information. Baroni et al. (2014a) take this approach specifically with regard to DSMs: Since DSMs represent the meaning of a symbol (or word) in terms of a set of other symbols (the words or other linguistic contexts it co-occurs with), they are subject to the lackof-grounding criticisms traditionally vented against symbolic models. If symbols are not grounded in the sensory-motor system and thus connected to the external world, they cannot really have “meaning”. (Baroni et al., 2014a, p. 257)

Their subsequent recommendation is that ultimately in order for a DSM to capture the kind of linguistic understanding a typical speaker has, the data drawn upon must not only be textual data, but data from other modalities, such as image data – thereby imitating the kind of information available to a typical speaker (e.g. the representation for “köttur” will not only represent corpus information but also information drawn from a large bank of photographs, some of which contain pictures of cats). In this way, the understanding that it is claimed is lacking in the original case might be gained by introducing some kind of causal link with the

17 See

Bender and Koller (2020) for a complaint of this kind applied specifically to the idea that language models capture meaning or understanding.

5 Distributional Theories of Meaning: Experimental Philosophy of Language

87

external world via a perceptual input. It is important to note that Searle (1980) rejected this kind of response, stating that the same kind of problem arises in the case of image data (see his “Robot reply” section) and so some causal link with the external world makes no difference. But whether he was right to claim this is itself a controversial topic that will not be settled here. Rather than claiming that a DSM can plausibly provide the kind of understanding that is at issue with the symbol grounding problem, it could instead be claimed that capturing such understanding is not a requirement of a theory of meaning. As I have already noted, Westera and Boleda (2019) argue that a theory of meaning should be internalist insofar as it should not ascribe to expressions truth-conditional, referential, or entailment properties – these being phenomena that arise at the level of the speaker and the utterance. To the extent that the above objection relies on referential intuitions (e.g. that “köttur” refers to cats), this is a way of avoiding the objection. The challenge of course will be to make the case that a complete theory of meaning has thereby been given.

5.5.2 No Compositionality While DSMs have enjoyed success in NLP tasks focused on single expressions, the challenge of extending the method to complex expressions in a manner that captures the compositionality of language remains something of an open challenge. Of course, there is nothing preventing us constructing vectors for complex expressions of two terms or more in just the same manner as vectors are constructed for individual terms. However, particularly as we look to more complex expressions with higher numbers of constituent terms, the frequency of such terms within any given corpus will decrease greatly. At its limit, this process will clearly not work for entire sentences, as there are obviously an infinite number of sentences that will not appear in any corpus of any size. Furthermore, even for complex expressions only consisting of two or three expressions that may have a healthy frequency within a corpus, it is not clear that using the above procedure to construct a vector to represent its meaning would capture anything about how the expression is composed. After all, it would result in a vector of the same type as single term vectors, with no further properties that clearly indicate its complex structure. One approach to capturing compositionality within a distributional model would be to utilize some of the standard geometric operations available such as vector addition and vector multiplication. As mentioned earlier, Mitchell and Lapata (2008) take this approach with some measure of success, showing that the resultant vectors do correlate well with human judgements of particular phrases. However, as mentioned, the obvious problem with such an approach is that operations such as vector addition and vector multiplication are commutative (e.g. in the case of addition, a + b = b + a) whereas the same cannot be said for the meanings of complex expressions – in the case of sentences, “John loves Joe” does not mean the same thing as “Joe loves John”.

88

J. Grindrod

It is here that we see that distributional semantics differs markedly from the formal semantic tradition, where the constraint of compositionality is typically baked into the very structure of the theory. As a result, there is a growing research program where distributional semantics is in some sense combined with the kind of compositional structure one finds in formal semantics (see Boleda and Herbelot (2016) for an overview of the topic). For instance, Erk and Padó (2008) propose a method whereby there are not only word vectors in the broad manner outlined above, but there are also selectional preferences for given syntactic categories. For example, for a term like “catch”, which as a transitive verb takes a subject and an object, a vector will be produced that stands for the kinds of terms that serve as the object of the verb, and the same is done for the kinds of terms that serve as the subject of the verb. For a given instance of the term being used in a larger phrase (e.g. “catch a ball”), these selectional preference vectors will be combined with the word vectors to represent what the word means within that linguistic context. To the extent that the resultant vectors take into account the grammatical roles of the expressions involved, the manner in which the expressions are combined is better captured. A perhaps clearer instance of a combination of the compositional structure from formal semantics with the account of word meaning given in distributional semantics is given by Baroni et al. (2014a). They take their cue from Heim and Kratzer (1998), who view all forms of semantic composition as a kind of function application. Roughly, the idea (from Frege (1997)) is that when any number of expressions combine, one of those expressions serves as a mathematical function that takes the other expressions as its input and provides some object as its output. For instance, in the sentence “Susan smokes”, the expression “smokes” can be viewed as a function that takes the semantic value of the expression it is combined with as its input (in this case, Susan) and provides a truth value as its output depending upon whether the object in question smokes. Frege drew a distinction here between “complete” expressions whose semantic value is an object or set of objects and “incomplete” expressions whose semantic value is a mathematical function. Baroni et al. (2014a) suggest that this distinction could be employed within a DSM, whereby complete expressions are represented by constructing vectors in the broad method outlined in Sect. 5.3, whereas incomplete expressions are represented by transformative functions that take a word vector or vectors as input and provide a new geometric object as output (such as a tensor of some higher order).18 It is, I think, fair to say that it is still early days for projects such as these, but they are sufficient to show that the issue of compositionality is widely seen not as a knock-down objection to distributional semantics, but as an open research question. While the basic proposal behind DSMs does not provide an obvious way

18 An alternative approach to capturing compositionality in a DSM has been to use recurring neural networks, where the vectors for individual words are used as input to a neural network that then produces a vector for the combination of those words (Socher et al., 2012).

5 Distributional Theories of Meaning: Experimental Philosophy of Language

89

of capturing compositionality, the live proposals on offer serve to show that the possibility of a DSM capturing compositionality to the extent that formal semantic approaches do is still open.

5.5.3 No Cognitive Plausibility A distinct objection states that a complete theory of meaning must capture the cognitive processes behind linguistic understanding and that DSMs do not plausibly do this. The precise nature of such a cognitive constraint is itself controversial. Following Larson and Segal (1995), it might be thought, for instance, that a semantic theory is a theory of semantic knowledge, i.e. what must be known in order to know a language. Note that this is importantly distinct from, even if linked to, a theory of language acquisition, which would provide some account of how infants are able to go from having no language when they are born, to some language when they are older. This objection can be developed in terms of either theory. It could be claimed that DSMs do not capture what is known by a typical language user when they know a language. Or it could be claimed that DSMs do not provide a good basis for understanding how languages are acquired. In response, however, advocates of DSMs have argued explicitly that speakers do plausibly employ some distributional method in order to acquire meanings or as part of their linguistic understanding. In fact, capturing the cognitive processes behind linguistic competence was one of the key motivations behind some of the earliest DSMs such as Landauer and Dumais’ LSA (1997) and Lund and Burgess’ HAL (Hyperspace Analogue to Language) (1996). For instance, Landauer and Dumais argued that positing a distributional element to language acquisition may prove important in answering the so-called poverty of the stimulus problem (Lasnik & Lidz, 2016). This is the idea that a language learner’s exposure to language use underdetermines the facts about language that they need to acquire in order to become competent speakers. But if speakers followed some procedure akin to the construction of a DSM, this may go some way to explaining fast language acquisition in an environment of relatively sparse data. Similarly, Baroni et al. state: Even those language acquisition theorists who stress the role of extra-linguistic cues recognize that the vocabulary size that teenagers command by end of high-school (in the order of tens of thousands of words) can only be acquired by bootstrapping from linguistic data. This bootstrapping is likely to take the form of distributional learning: we all have the experience of inferring the meaning of an unknown term encountered in novel just from the context in which it occurs, and there is psycholinguistic evidence that statistical patterns of co-occurrence influence subjects’ intuitions bout the meaning of nonce words just as they do DSMs. (Baroni et al., 2014a, p. 255)

Of course, these quite general considerations are not reason to think that any specific DSM is implemented in the human brain, but perhaps that something close to a distributional model is implemented. As Landauer and Dumais state:

90

J. Grindrod We, of course, intend no claim that the mind or brain actually computes a [singular value decomposition] on a perfectly remembered event-by-context matrix of its lifetime experience using the mathematical machinery of complex sparse-matrix manipulation algorithms. What we suppose is merely that the mind-brain stores and reprocesses its input in some manner that has approximately the same effect. (Landauer & Dumais, 1997, p. 218)

But the idea that distributional procedures may play some role in the acquisition, storage, or activation of semantic representations is now the topic of an expanding literature within cognitive science (e.g. Jones et al., 2006; Andrews et al., 2009; Marelli & Baroni, 2015; Mandera et al., 2017; Marelli et al., 2017). Particular attention is paid there to the extent to which DSMs can capture the results of a range of psycholinguistic tasks, such as similarity judgment task, semantic priming tasks, word association tasks, and compound term comprehension tasks. But consideration is also given to the extent to which it is plausible that the brain is performing the same algorithmic procedures as current DSMs (see: Mandera et al., 2017, pp. 58–60). In this respect, the question of whether the cognitive account of language acquisition and use must make appeal to distributional procedures is still an open question, for which there is growing body of literature in support.

5.5.4 No Granularity of Meaning The final objection is that the distributional approach inevitably leads to an account of meaning that is too coarse-grained. It is widely recognized that there are a range of different meaning properties that a given expression could have. In particular, beyond the truth-conditional meaning that a given expression possesses, we might also want to allow for expressivist meaning, procedural meaning, pejorative meaning, and perhaps even further properties that lie on the periphery of semantics such as connotation and register. Now there is no settled position regarding whether some of these categories really are distinct from one another, or whether some of them should be treated as pragmatic phenomena rather than stable semantic properties of the expressions, and so there is interesting theoretical work to be done in that regard. But an objection against DSMs is that they seem to inevitably lump all of these phenomena together, along with anything else that could affect a term’s distribution within a corpus. If two terms are judged by a DSM to have very similar, yet distinct meanings (i.e. the cosine of the angle between the two vectors is very high but less than 1), then this leads to the question of what exactly the difference in meaning is between the two terms. And here we get no indication as to whether the difference is one of truth-conditional meaning, procedural meaning, expressivist meaning, register, etc. Assuming that we do want our theory of meaning to capture the fact that these are indeed different phenomena (as intuitively seems the case), then the objection runs that this is not something a DSM will be able to deliver. Instead, all forms of meaning are lumped together and captured within the single vector.

5 Distributional Theories of Meaning: Experimental Philosophy of Language

91

In response, it is important to first note that it is already recognized that the kind of meaning that is captured within a DSM is partly dependent on how the model’s parameters are set. In particular, Lenci (2018) details how the size of the context window or the introduction of grammatically-encoded information can affect the kind of meaning that a DSM captures, with wider context windows capturing the broader topic that a given term belongs to while narrower context-windows capture semantic features specific to the expression: Experiments have shown that narrow context windows and syntactic collocates are best suited to capturing lexemes that are related by paradigmatic semantic relations (e.g. synonyms and antonyms) or that belong to the same taxonomic category (e.g. violin and guitar) because they share very close collocates. Conversely, collocates extracted with larger context windows are biased toward more associative semantic relations (e.g. violin and music), like region models. (Lenci, 2018, p. 158)

This clearly opens up the possibility that distinct DSMs may be able to track distinct semantic features, and so in this way the broad approach could potentially distinguish between different meaning phenomena, partly by modifying the type of context window used. The second point to note in response is that while a vector-based representation of a word’s meaning may not make explicit the various facets of the word’s meaning in terms of the various kinds of meaning properties a term possesses, it does not follow that we cannot learn about such features via inspection of a DSM. We have already seen that semantic features such as gender can be represented in the semantic space insofar as e.g. kingv minus queenv is roughly equivalent to unclev minus auntv . So while the vector for e.g. “king” does not make it explicit that the semantic feature is present, such regularities across “king” and “uncle” can be uncovered by fairly simple operations in order to investigate the semantic space. It may be that the same is true of e.g. pejorative meaning, that some geometric operations will be able to uncover the regularities that hold across word vectors for those words that possess certain types of pejorative meaning. To restate a point made earlier, fully understanding what is captured in a DSM may require a great deal of further work, going beyond simple similarity scores in order to fully understand how the manner in which a word vector is extended into a semantic space represents aspects of the word’s meaning. To summarize this section, the idea that a theory of meaning should be distributional in nature, that it should resemble at least in part a DSM, has attracted a number of criticisms, drawing on issues that are central to the study of meaning. We have also seen the manner in which each of those objections have been or can be responded to. In the following section, we will move beyond these objections to consider in greater detail how DSMs could be understood as a theory of meaning.

92

J. Grindrod

5.6 DSMs as Holistic Theories of Meaning In terms of the philosophical underpinnings of how meaning should be understood given the distributional approach, the view of later Wittgenstein that meaning is use is often appealed to.19 The distributional approach is supposed to capture this idea insofar as meaning representations are constructed directly out of a corpus of language usage. However, this appeal to Wittgenstein is of questionable value, partly because Wittgenstein was arguably defending a kind of theoretical quietism about meaning according to which no substantive theory of meaning in a language is to be given. I suggest that it is of greater benefit to understand the distributional approach via a distinct topic that has previously attracted a great deal of discussion within philosophy of language: meaning holism. For the purposes of this paper, we can understand meaning holism as the view that “relations of a certain kind (or kinds) that obtain among expressions of a given natural language (all of these expressions, or many of them) are constitutive for linguistic expressions to mean what they do.” (Dresner, 2012, p. 611). Similarly, outlining meaning holism in terms of interdependence of word meanings, Pagin (2008) characterizes the view as claiming that: [E]xpressions in a language (public or mental) have certain non-semantic properties and stand in certain non-semantic relations to each other, such that the semantic properties of the sentences depend on, get determined or constituted by, or supervene on, these non-semantic properties and relations.

Characterising meaning holism in this way remains completely neutral on what the non-semantic relations are that the semantic properties depend upon. However, the philosophical literature has largely focused on meaning holism of only a few different types. First, there is Quine’s (1960) view of a theory of meaning for a language as akin to a theory of linguistic behaviours, where theories are confirmed or disconfirmed against the empirical evidence as a whole. On this view, expressions only have meaning insofar as they contribute to the predictions of the theory. On a similar note, Davidson (1967, 1973) famously held that meaning properties are attributable to all linguistic items via a process of radical interpretation, whereby the meanings of all utterances and the contents of all beliefs are allocated according to a principle of charity that seeks to maximize the accuracy of those beliefs and utterances. Finally, and arguably most prominently, many have argued for some form of inferential role semantics according to which the meaning of a given term is determined by the inferential relations that hold between sentences. So, rather than it being a result of the meaning of the word “bachelor” that in believing “John is a bachelor”, one can thereby believe “John is an unmarried man”, the explanatory relation runs in the other direction: the fact that one can infer one sentence from

19 See,

(2019).

e.g., Firth (1957); Lenci (2008, 2018); Sahlgren (2008); Erk (2012); Westera and Boleda

5 Distributional Theories of Meaning: Experimental Philosophy of Language

93

the other partly constitutes or determines the meaning of the word “bachelor” (Brandom, 1994). For our purposes, it is important to see that DSMs understood as providing some theory of meaning meet the above definitions of meaning holism, but where the non-semantic relations that hold between expressions are instead distributional relations.20 This is important because there has been significant discussion over the plausibility of meaning holism in philosophy of language, and so it is fruitful to consider whether the objections raised against meaning holism (usually understood as some form of inferential role semantics) also apply to DSMs. The remainder of this section will be devoted to doing just that.21 One common form of objection, particularly pressed by Fodor and Lepore (1992) and Fodor and Lepore (1999), is that if the meaning of individual expressions depends upon some wider network, then communication will be unstable to the extent that any changes in that network are allowed to occur. The point has typically been pressed with regard to inferential role semantics. If the meaning of a term is dependent upon the inferences that an individual is able to make with that term, and the inferences that an individual is able to make are dependent upon the complete set of beliefs that an individual possesses, then it appears that expression meanings will vary across any individuals with different sets of beliefs, and also over time as an individual comes to modify their beliefs. But then this would spell trouble for an account of communication, if we are to understand successful communication in terms of the hearer grasping the same proposition that was communicated by the speaker.22 Although this objection is usually pressed with regard to inferential-role semantics, it is important to see that the objection really challenges the very structure of a holistic theory of meaning, and so it would also apply to a distributional theory.

20 It

is tempting to think that DSMs are only holistic according to the above definition if the corpus dictionary contains all other expressions contained within the corpus – and so the meaning of any given expression would be represented by its co-occurrence with all other expressions within the corpus. This needn’t be the case, however. The corpus dictionary just plays the role of capturing the distribution of a given expression within a corpus. Even if one had a limited corpus dictionary, it would still be the case that the distribution of one expression would be dependent upon the distribution of all other expressions including those not included within the corpus dictionary. 21 One objection against holistic theories of meaning is that holistic meanings are not compositional (Fodor & Lepore 1992, p. 175 ff.). As we saw in Sect. 5.2, whether DSMs can capture the compositionality of meaning is currently treated as an open research question within distributional semantics, and so I will not consider that objection any further here. 22 Fodor and Lepore emphasise other problems that arise from such instability. For instance, it would seem that an individual would never be able to change their mind regarding the truth of a sentence, as any change in mind would be a change of beliefs, and so what was meant by the sentence would then change as well. So strictly, rather than going from believing p to believing ¬p, the individual would then be considering some proposition other than p. They also emphasise that inferentialism understood as a theory of mental content will not be able to provide intentional explanations that generalise over propositional attitudes, as the possession of propositional attitudes will be dependent upon the particular beliefs of an individual. However, these problems are quite particular to a form of meaning holism that depends upon the complete set of beliefs for an individual, and so they will not be of concern here.

94

J. Grindrod

That is, small changes in the corpus that a DSM uses will lead to changes in the vectors that represent word meaning. Pagin (2008) is keen to emphasize that whether holistic views are as unstable as Fodor and Lepore claim is dependent on the precise nature of the non-semantic relations that determine meaning. But in the case of DSMs, this response will not be particularly helpful, as the effective corpora that individuals would use as a basis of word meaning (i.e. the collection of linguistic experiences) will differ significantly across individuals. There are a number of ways in which the distributional theorist might respond to this complaint; here I will consider three. First, it could be argued that while any given DSM will rely on a particular corpus, and that individuals may construct personalized corpora based upon their own experiences, in trying to capture meaning in a language via distribution, we should envisage a kind of ideal corpus of language use that any given actual corpus approximates towards. This would ensure stability in meaning across a language well enough, but it would completely rely on the somewhat hazy notion of the ideal corpus of a language. Would the corpus consist in all actual usage within the language? Or perhaps all possible usage? Or do we have some basis for filtering out certain data as noise? This would also potentially compromise the empirical basis for the distributional approach that was previously discussed as a benefit. Finally, it would also not touch the issue of communicative success as individuals would still have to construct their own corpora, and so the possibility of variation in corpora (and thus in meaning) across individuals remains open (and probably, to be expected). A more promising possibility, emphasized by Jackman (1999) and Pagin (2008), is that greater emphasis is put on the distinction between the holistic determination base (i.e. the set of non-semantic relations between expressions that give rise to the meaning properties for those expressions) and the resultant semantic properties. The charge of instability assumes that if there is a change in the determination base, then a change in meaning will result, but this need not be so. Discussing a form of holism where the determination base is a set of beliefs, Jackman illustrates the point in the following way: If the function from belief to meaning turns out to be many-to-one, then content could be comparatively stable through changes in belief. One could thus claim that the meaning of “elephant” is a function of one’s “elephant” beliefs without implying that every change in one’s “elephant” beliefs will produce a corresponding change in the meaning of “elephant”. (Jackman, 1999, p. 363)

Understanding distributional properties in this way would allow for some form of response to the stability charge. The thought would be that while distributional properties do serve to determine semantic properties, semantic properties are nevertheless fairly stable over changes in distributional properties. This response would also provide an alternative way of viewing distributional accounts as compatible with truth-conditional accounts. Truth conditional accounts typically seek to capture the semantic properties directly, whereas distributional accounts capture the underlying determination base.

5 Distributional Theories of Meaning: Experimental Philosophy of Language

95

The third possible response, and arguably the one that fits most clearly with the distributional picture, is to replace the notion of meaning identity that gives rise to the communication objection with a notion of meaning similarity. The distributional account already has an account of meaning similarity at its heart, and so it could be claimed that while it may be rare for two individuals to adopt identical semantic models or to even capture the meanings of a single expression in the same way, provided that what the hearer understands is similar enough to what the speaker said, communication can be successful. Interestingly, in their various attacks against meaning holism, (Fodor & Lepore, 1996, 1999) did previously discuss and reject this form response at length, particularly with regard to a semantic space proposal defended by Churchland. However, their reasons for rejection do not straightforwardly apply to DSMs. One problem that they argue is fundamental to any account of meaning similarity is that, unlike identity, any two items can only be similar in some particular respect, they cannot be similar simpliciter (Fodor & Lepore, 1999, p. 392). This problem arises in the question of a vector-based semantic space insofar as we must determine a set of dimensions within which the meaning of a term will be represented. However, Fodor and Lepore envision a semantic space where the dimensions in question are semantic dimensions i.e. that dimensions essentially stand for certain primitive semantic notions along which the meanings of terms in a language can be represented. But this, they argue leads to the thorny issue of which semantic properties we treat as primitive and which semantic properties will be represented within the space, for which they claim there is no good answer. DSMs seem to avoid this charge straightforwardly, as the dimensions used in the semantic space are not semantic dimensions, but are distributional. So it is not a question of identifying certain primitive semantic properties with which other semantic properties are explained. Instead, semantic properties are captured in terms of the underlying distributional properties. A second issue they raise is, perhaps, more serious (Fodor & Lepore, 1999, p. 384). It is widely thought that one outcome of a successful theory of meaning should be statements of the following kind: (a) ‘Nixon is dead (at t)’ is true iff Nixon is dead (at t). Where there is some guarantee that the right hand side of the biconditional is identical in meaning to the quoted sentence on the left-hand side. However, if meaning identity is replaced with meaning similarity, then we would not have the theoretical apparatus available to make that guarantee and deliver statements of this kind. As they note, “Nixon is in a coma” is similar in meaning to “Nixon is dead”, but plugging the former into the right-hand side of (a) would clearly be unacceptable. In that respect, an account that replaces semantic identity with semantic similarity will not be able to provide statements of this kind. At this stage, perhaps the clearest option for the distributional theorist would be to resort to the Pagin/Jackman response mentioned earlier. Perhaps there is a working notion of semantic identity that holds between meaning properties, and there is also a notion of semantic similarity that can be calculated according to the distributional

96

J. Grindrod

determination base. In that respect, the job description of semantic similarity may not be to capture communicative success or to provide structural constraints on meaning theorems, but the notion would still be a theoretically important one. An alternative form of response would be to deny that a semantic theory should provide statements like (a) because semantic theories are not in the business of providing truth conditions for sentences (Westera & Boleda, 2019). To summarize this section, I have sought to better understand the theoretical underpinnings of DSMs as theories of meaning by drawing upon previous philosophical work on meaning holism. Doing so is beneficial for both sides of the debate, where on the one hand, possible avenues for understanding how distributional models may be consistent with more traditional theories of meaning open up, while on the other hand, a better understanding of the possible forms of meaning holism is produced.

5.7 Conclusion In this paper, I have provided an overview of DSMs, including how they are constructed and the extent to which they can meet the job description of a theory of meaning. Regarding the latter issue, I only hope to have shown that this is indeed a topic that warrants further philosophical investigation, particularly given that distributional semantics has been at the heart of much of the state of the art in natural language processing. Even if it is the case that DH2 is false, it is also important to note that the evidence in favour of DH1 already suggests that DSMs will at least serve as important tools in the philosophical investigation of particular expressions.

References Andrews, M., Vigliocco, G., & Vinson, D. (2009). Integrating experiential and distributional data to learn semantic representations. Psychological Review, 116(3), 463–498. Asher, N. (2011). Lexical meaning in context: A web of words. Cambridge University Press. Baroni, M., & Lenci, A. (2010). Distributional memory: A general framework for corpus-based semantics. Computational Linguistics, 36(4), 673–721. Baroni, M., Bernardi, R., & Zamparelli, R. (2014a). Frege in space: A program for compositional distributional semantics. In Linguistic issues in language technology, volume 9, 2014 – Perspectives on semantic representations for textual inference. CSLI Publications. Baroni, M., Dinu, G., & Kruszewski, G. (2014b). Don’t count, predict!: A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (Volume 1: Long papers) (pp. 238– 247). Association for Computational Linguistics. Bender, E. M., & Koller, A. (2020). Climbing towards NLU: On meaning, form, and understanding in the age of data. In Proceedings of the 58th annual meeting of the Association for Computational Linguistics, pp. 5185–5198.

5 Distributional Theories of Meaning: Experimental Philosophy of Language

97

Boleda, G. (2020). Distributional semantics and linguistic theory. Annual Review of Linguistics, 6(1), 213–234. Boleda, G., & Herbelot, A. (2016). Formal distributional semantics: Introduction to the special issue. Computational Linguistics, 42(4), 619–635. Borg, E. (2012). Pursuing meaning. Oxford University Press. Brandom, R. (1994). Making it explicit. Harvard University Press. Bullinaria, J. A., & Levy, J. P. (2007). Extracting semantic representations from word cooccurrence statistics: A computational study. Behavior Research Methods, 39(3), 510–526. Clark, S. (2015). Vector space models of lexical meaning. In S. Lappin & C. Fox (Eds.), The handbook of contemporary semantic theory (pp. 439–522). Blackwell. Davidson, D. (1967). Truth and meaning. Synthese, 17(3), 304–323. Davidson, D. (1973). Radical interpretation. Dialectica, 27(3/4), 313–328. Devitt, M. (2006). Intuitions in linguistics. British Journal for the Philosophy of Science, 57(3), 481–513. Dresner, E. (2012). Meaning holism. Philosophy Compass, 7(9), 611–619. Erk, K. (2012). Vector space models of word meaning and phrase meaning: A survey. Language and Linguistics Compass, 6(10), 635–653. Erk, K., & Padó, S. (2008). A structured vector space model for word meaning in context. In Proceedings of the 2008 conference on empirical methods in natural language processing (pp. 897–906). Association for Computational Linguistics. Firth, J. R. (1957). A synopsis of linguistic theory. In Studies in linguistic analysis (pp. 1–32). Blackwell. Fodor, J. A., & Lepore, E. (1992). Holism: A shopper’s guide. Blackwell. Fodor, J. A., & Lepore, E. (1996). Reply to Churchland. In R. N. McCauley (Ed.), The Churchlands and their critics (pp. 159–162). Blackwell. Fodor, J. A., & Lepore, E. (1998). The emptiness of the lexicon: Reflections on James Pustejovsky’s “The generative lexicon”. Linguistic Inquiry, 29(2), 269–288. Fodor, J. A., & Lepore, E. (1999). All at sea in semantic space: Churchland on meaning similarity. The Journal of Philosophy, 96(8), 381–403. Frege, G. (1997). Begriffsschrift: A formula language of pure thought modelled on that of arithmetic. In M. Beaney (Ed.), The Frege reader (pp. 47–78). Oxford Beaney. Gries, S. T. (2016). Quantitative corpus linguistics with R (2nd ed.). Routledge. Harnad, S. (1990). The symbol grounding problem. Physica D: Nonlinear Phenomena, 42(1–3), 335–346. Harris, Z. S. (1954). Distributional structure. Word, 10(2–3), 146–162. Heim, I., & Kratzer, A. (1998). Semantics in generative grammar. Blackwell. Jackman, H. (1999). Moderate holism and the instability thesis. American Philosophical Quarterly, 36(4), 361–369. Jones, M. N., Kintsch, W., & Mewhort, D. J. (2006). High-dimensional semantic space accounts of priming. Journal of Memory and Language, 55(4), 534–552. Kiela, D., & Clark, S. (2014). A systematic study of semantic vector space model parameters. In Proceedings of the 2nd workshop on continuous vector space models and their compositionality, pp. 21–30. Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211–240. Larson, R., & Segal, G. (1995). Knowledge of meaning: Introduction to semantic theory. MIT Press. Lasnik, H., & Lidz, J. L. (2016). The argument from the poverty of the stimulus. In I. Roberts (Ed.), The Oxford handbook of universal grammar (pp. 221–248). Oxford University Press. Lenci, A. (2008). Distributional semantics in linguistic and cognitive research. Italian Journal of Linguistics, 20(1), 1–31. Lenci, A. (2018). Distributional models of word meaning. Annual Review of Linguistics, 4(1), 151–171.

98

J. Grindrod

Lewis, D. (1970). General semantics. Synthese, 22(1/2), 18–67. Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical cooccurrence. Behavior Research Methods, Instruments, & Computers, 28(2), 203–208. Mandera, P., Keuleers, E., & Brysbaert, M. (2017). Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting: A review and empirical validation. Journal of Memory and Language, 92, 57–78. Marelli, M., & Baroni, M. (2015). Affixation in semantic space: Modeling morpheme meanings with compositional distributional semantics. Psychological Review, 122(3), 485–515. Marelli, M., Gagné, C. L., & Spalding, T. L. (2017). Compounding as abstract operation in semantic space: Investigating relational effects through a large-scale, data-driven computational model. Cognition, 166, 207–224. Maynes, J., & Gross, S. (2013). Linguistic intuitions. Philosophy Compass, 8(8), 714–730. McEnery, T., & Wilson, A. (1996). Corpus linguistics: An introduction. Edinburgh University Press. McNally, L., & Boleda, G. (2017). Conceptual versus referential affordance in concept composition. In J. Hampton & Y. Winter (Eds.), Compositional and concepts in linguistics and psychology (pp. 245–268). Springer. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013a). Efficient estimation of word representations in vector space. ICLR Workshop. Mikolov, T., Wen-tau, Y., & Zweig, G. (2013b). Linguistic regularities in continuous space word representations. In Proceedings of the 2013 conference of the North American chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 746–751). Association for Computational Linguistics. Mitchell, J., & Lapata, M. (2008). Vector-based models of semantic composition. In Proceedings of ACL-08: HLT (pp. 236–244). Association for Computational Linguistics. O’Keeffe, A., & McCarthy, M. (2010). Routledge handbook of corpus linguistics (1st ed.). Routledge. Padó, S., & Lapata, M. (2007). Dependency-based construction of semantic space models. Computational Linguistics, 33(2), 161–199. Padó, S., Padó, U., & Erk, K. (2007). Flexible, corpus-based modelling of human plausibility judgements. In Proceedings of the 2007 joint conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (pp. 400–409). Association for Computational Linguistics. Pagin, P. (2008). Meaning holism. In E. Lepore & B. C. Smith (Eds.), The Oxford handbook of philosophy of language (pp. 213–232). Oxford University Press. Pietroski, P. M. (2018). Conjoining meanings: Semantics without truth values. Oxford University Press. Quine, W. V. O. (1960). Word & object. MIT Press. Sahlgren, M. (2008). The distributional hypothesis. Italian Journal of Linguistics, 20(1), 33–53. Schütze, H. (1998). Automatic word sense discrimination. Computational Linguistics, 24(1), 97– 123. Searle, J. R. (1980). Minds, brains, and programs. Behavioral and Brain Sciences, 3(3), 417–424. Socher, R., Huval, B., Manning, C. D., & Ng, A. Y. (2012). Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning (pp. 1201–1211). Association for Computational Linguistics. Travis, C. (1997). Pragmatics. In B. Hale & C. Wright (Eds.), A companion to the philosophy of language (pp. 87–107). Blackwell. Westera, M., & Boleda, G. (2019). Don’t blame distributional semantics if it can’t do entailment. In Proceedings of the 13th international conference on computational semantics – long papers, pp. 120–133.

5 Distributional Theories of Meaning: Experimental Philosophy of Language

99

Jumbly Grindrod is a lecturer in philosophy at the University of Reading. His research interests lie in epistemology and philosophy of language, and particularly where they intersect. More recently, his research has focused on how corpus linguistics and computational linguistics can help answer philosophical questions. He has previously published in journals such as Ergo, Mind & Language, and Episteme.

Part II

Experimental Philosophy of Language and Corpus Methods

Chapter 6

Are Moral Predicates Subjective? A Corpus Study Isidora Stojanovic and Louise McNally

Abstract The nature of moral judgments, and, more specifically, the question of how they relate, on the one hand, to objective reality and, on the other, to subjective experience, are issues that have been central to metaethics from its very beginnings. While these complex and challenging issues have been debated by analytic philosophers for over a century, it is only relatively recently that more interdisciplinary and empirically-oriented approaches to such issues have begun to see light. The present chapter aims to make a contribution of that kind. We will present the results of an empirical – specifically, corpus linguistic – study that offers evidence that moral predicates exhibit hallmarks of subjectivity at the linguistic level, but also, that they differ significantly from paradigmatically subjective predicates.

6.1 Introduction The nature of moral judgments, and, more specifically, the question of how they relate, on the one hand, to objective reality and, on the other, to subjective experience, are issues that have been central to metaethics from its very beginnings. While these complex and challenging issues have been debated by analytic philosophers for over a century (and by philosophers tout court since Plato and Aristotle), it is only relatively recently that more interdisciplinary and empirically-oriented

Authors Isidora Stojanovic and Louise McNally have equally contributed to this chapter. I. Stojanovic Institut Jean Nicod, CNRS, DEC, ENS, PSL, Paris, France e-mail: [email protected] L. McNally () U. Pompeu Fabra, Barcelona, Spain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Bordonaba-Plou (ed.), Experimental Philosophy of Language: Perspectives, Methods, and Prospects, Logic, Argumentation & Reasoning 33, https://doi.org/10.1007/978-3-031-28908-8_6

103

104

I. Stojanovic and L. McNally

approaches to such issues have begun to see light. The present chapter aims to make a contribution of that kind. We begin in Sect. 6.2 by contextualizing our study in the literature on moral predicates and subjectivity. We briefly look at recent studies from experimental moral philosophy that suggest that moral judgments are more subjective than factual judgments, but less so than judgments on matters of aesthetic preference and personal taste. We also look at how subjectivity has been approached in semantics and, in particular, at the idea that embedding a predicate under subjective attitude verbs like “find” can serve as a criterion for subjectivity. Section 6.3 presents the corpus study. In a nutshell, we have looked at how basic moral predicates - “moral” and “immoral”, “ethical” and “unethical” - as well as predicates modified by “morally” and “ethically”, behave with respect to the verbs “find” and “consider”, both of which denote subjective attitudes, but of different kinds. Section 6.4 discusses the theoretical implications of the results of our study, and argues that moral predicates exhibit hallmarks of subjectivity at the linguistic level, but are also importantly different from paradigmatically subjective predicates such as “fun” and “boring” or “delicious” and “disgusting”. While the latter are clearly associated with “find”-like attitudes, the former show a preference for “consider”-like attitudes.

6.2 Moral Predicates and Subjectivity: A Snapshot of a Long-Standing Philosophical Debate and the More Recent Empirical Turn 6.2.1 The Vexed Issue of Moral Subjectivity Moral realism, that is, the view that “moral claims [...] purport to report facts and are true if they get the facts right” (Sayre-McCord, 2005, p. 1), is a long-standing position in metaethics, but also one that faces many challenges (see, i.a. SayreMcCord, 2005; Railton, 2017). Consider the following two claims: (1) Mao Zedong’s Cultural Revolution was unethical. (2) Mao Zedong’s Cultural Revolution took place in China from 1966 to 1976. In its somewhat caricatural form, a moral realist holds that (1) and (2) are completely on a par: both purport to refer to objective facts, and whether they are true or false can be determined simply by looking at how things are, or were, in reality. However, while (2) indeed reports a historical fact that can be easily confirmed as true, it is far from obvious what kind of fact would play the same role for (1). Relatedly, disagreements on the two kinds of claims differ importantly. When and where the Cultural Revolution took place is hardly open to disagreement, and if it were, historians would be joining their forces in order to find the best way of establishing when and where it took place. Moral disagreements, in contrast, often

6 Are Moral Predicates Subjective? A Corpus Study

105

(though not always) tend toward the unresolvable, and moreover, are arguably such because of their very nature. A hard-core communist in China in the late 1960s who supported the Cultural Revolution and a hard-core opponent of communism differ precisely in that they endorsed radically opposed systems of values, and are unlikely to be ever able to reach an agreement over a statement such as (1). From the basic observation that moral claims like (1) and factual claims like (2) do not appear to be on a par – at least, not when taken at face value – one sees a proliferation both of alternatives to and refinements of moral realism (for overview, see e.g. Soria Ruiz et al., 2021). Among the alternatives, expressivist views (see e.g. Camp, 2017 for overview) hold that the function of moral statements is not to express factual information, but rather to convey subjective attitudes with respect to a moral issue; for example, (1a) expresses the speaker’s disapproval of the Cultural Revolution. But expressivism also faces serious challenges. For one, subjective attitudes such as (dis)approval do not bear truth value, while moral claims, at least prima facie, do. One who is in disagreement with (1) can reply “That’s not true!” Similarly, claims such as (1) easily appear in contexts where they must be able to bear a truth value, as in (3): (3) If the Cultural Revolution was unethical, then its leaders do not deserve monuments in their honor. The challenge of explaining how a sentence such as (1) can express an attitude such as disapproval when uttered on its own, yet occur in complex constructions such as (3), is an instance of the so-called “Frege-Geach problem” and has been widely discussed in metaethics (see e.g. Woods, 2017 for overview). In particular, a family of accounts known as hybrid expressivist accounts aim to preserve the basic insights of expressivism while being able to account for the compositional behavior of moral language (see e.g. Björkholm, 2022 for overview and discussion). To bring the point home, moral statements appear to be neither (completely) objective nor (completely) subjective. This in-between status of moral statements is further confirmed by several recent empirical studies. The most famous are by Goodwin and Darley (2008, 2010, 2012). In their pioneering 2008 study, they investigated folk judgments regarding moral statements and made a four-tiered comparison between factual statements, statements on matters of social convention, statements on matters of personal taste and aesthetics, and moral statements. The task with which they presented their participants consisted of three steps. First, participants were presented with a statement and asked to which degree they agreed with the statement. Next, participants had to decide whether they thought it was a true statement, a false statement, or a matter of opinion. Finally, after a task of distraction during which the examiner would select five statements with which the participant agreed to a very high degree, participants were told that somebody else disagreed with these statements and were asked whether they thought the other person was surely wrong, whether it was possible that neither of them was wrong, or whether it was possible that the participant was wrong and the other person right. Combining these different measures, Goodwin and Darley created “a scale of objectivity” and found that moral statements were judged to

106

I. Stojanovic and L. McNally

be less objective than factual statements, but more objective than statements on matters of social convention or personal taste (with the former being judged more objective than the latter). In follow-up studies, they further showed that the degree of perceived subjectivity varied significantly among moral judgments themselves. Thus judgments regarding what they call contested value of life issues, such as the permissibility of abortion or euthanasia, were found to be more subjective than others. These initial studies set into motion a rich and still incredibly active research agenda, whose goal is twofold. First, it applies experimental methodology to the study of folk judgments regarding the subjective vs. objective character of moral statements. Second, it aims to examine the implications of these empirical findings for theoretical issues discussed in metaethics.1 While space precludes summarizing these latter works here, the question of whether, and to what extent, moral judgments and moral statements are objective rather than subjective is still very much a matter of controversy. The study we present in Sect. 6.3 is a contribution to this debate.

6.2.2 Tracking Subjectivity Semantically While experimental moral philosophy and psychology were concerned with subjectivity specifically with respect to morality, other issues concerning subjectivity emerged independently and came to be topics of great interest in semantics. More precisely, for the past two decades, there has been a growing interest in the semantics of subjective and evaluative predicates, among which the predicates of personal taste (PPTs, for short), such as “tasty”, “delicious”, “disgusting”, “fun” and “boring”, have been in the center of attention; see e.g. Lasersohn (2005), Stephenson (2007), Stojanovic (2007) for early discussions, and Umbach (2021), Stojanovic and Kaiser (2022: section 2) and Willer (forthcoming) for recent overviews. One of the things that sparked such a vivid interest in subjective predicates is the idea of faultless disagreement; that is, the idea that it makes sense to disagree over matters of personal taste. For example, we may debate whether Monopoly is fun or boring, even if we know that Monopoly can be fun to some people, or on some occasions, and boring to other people, or on other occasions. Faultless disagreement has generated an impressive amount of philosophical literature, and continues to be a hotly debated issue; for overviews, see, i.a. Bordonaba-Plou (2017), Stojanovic (2017), Karczewska (2019), Zakkou (2019), or Zeman (2020). While PPTs are particularly prone to generating scenarios of faultless disagreement, it has been widely noted in the literature that the phenomenon appears to be

1

For further studies and discussion, see i.a. Wright et al. (2013), Beebe and Sackris (2016), Pölzler (2017), Pölzler and Wright (2020a, b), and Sarkissian (2016).

6 Are Moral Predicates Subjective? A Corpus Study

107

much broader than matters of personal taste. For example, vague predicates (that is, those that generate instances of the sorites paradox) and relative gradable predicates in general, when used in a positive (as opposed to comparative or superlative) form, can easily give rise to what appears to be a faultless disagreement. Consider two friends who disagree over whether a 10A C bottle of wine is expensive. One of them can judge it to be expensive because they are used to buying wine that would cost 4A C a bottle, and the other, not expensive because their standard of reference is wines that cost over 15A C a bottle. Any predicate whose application makes reference to standards that can vary across contexts can, in principle, give rise to dialogues that take the form of a faultless disagreement (see e.g. Kennedy, 2013; Solt, 2018; Odrowa¨ ˛z-Sypniewska, 2021; see also Verheyen et al., 2018 for an experimental study of the subjectivity in gradable adjectives). What is more, even expressions such as athlete or publication can arguably generate faultless disagreement (see Sundell, 2011; Stojanovic, 2012) because the conditions for their application are not firmly settled, so that competent language users may disagree whether someone counts as an athlete or whether something counts as a publication without there being a completely objective way to settle the matter. Importantly for the present purposes, when it comes to moral predicates, there is considerable controversy as to whether they are prone to faultless disagreement or not. We have already seen that empirical studies such as Goodwin and Darley (2008, 2012) provide a mitigated answer: moral statements are more prone to faultless disagreement than factual statements, but significantly less so that statements on matters of personal taste, and among moral statements themselves, some are more prone than others. Similar findings are reported in the recent study in Soria Ruiz and Faroldi (2020), while Stojanovic (2019) argues, on more theoretical grounds, that disagreements over moral issues pattern differently from disagreement over matters of taste. Because of the elusive character of faultless disagreement, scholars have looked for other ways of identifying subjectivity at the semantic level. One diagnostic that has gained great popularity is the so-called “find” test (see Sæbø, 2009). The basic idea is that it is fine to embed subjective predicates under verbs of subjective attitude such as the English “find”, but not so with nonsubjective predicates, as illustrated by the following contrast (from Kennedy, 2013, p. 260): (4) Anna finds trippa alla romana tasty. (5) ?Anna finds trippa alla romana to be vegetarian. While embeddability under subjective attitude verbs such as “find” is often used as a test in order to identify PPTs, the test is not without problems. A particularly pressing problem is that there are predicates that seem to fall into something of a gray zone: while they are not outright infelicitous under “find”, they are not perfectly felicitous either. Thus McNally and Stojanovic (2017, p. 29) write:

108

I. Stojanovic and L. McNally

Another sign that “find” anti-selects for strictly evaluative predicates is the oddness of assertions like (18), in comparison to the more natural embedding under “consider” in (19). (18) (19)

a. ?I find Miró’s mosaic on the Rambles mediocre. b. ?I find lying bad/worse than stealing. a. I consider Miró’s mosaic on the Rambles mediocre. b. I consider lying bad/worse than stealing.

Though (18b) is not unacceptable, it strongly implies that the speaker has made his or her evaluation about lying on the basis of specific experiences of doing it.

Moral adjectives are among the adjectives in the gray zone. Some authors, such as Franzén (2020) and Silk (2021), take moral adjectives to be felicitous under “find”, and take this as evidence to the effect that moral adjectives are subjective, while other authors, such as McNally and Stojanovic (2017) and Stojanovic (2019) take them to be marked under “find”, or as noted in the cited passage, felicitous only in the context of specific subjective experiences that are compatible with, but not inherent to, moral judgments. They thus take the “find” test to offer evidence that there is an important semantic difference between moral adjectives and PPTs. This continuing controversy surrounding the embeddability of moral adjectives under “find” and its implications is thus one of the motivations for the corpus study that we have conducted and present in the next section. Finally, even if moral predicates do not pattern in quite the same way as PPTs when it comes to faultless disagreement and embeddability under “find”, this does not mean that these predicates are completely objective either. The empirical studies of Goodwin and Darley (2008, 2012) and Soria Ruiz and Faroldi (2020) show that moral predicates still elicit a significantly high intuition of faultlessness. Furthermore, moral predicates are clearly felicitous under other verbs of subjective attitude, and in particular, the verb “consider”, which anti-selects for fully objective predicates (see Lasersohn, 2009; Kennedy & Willer, 2022): (6) #Anna considers the sum of two and two greater than four. Let us take stock. While it was PPTs that triggered the interest in subjectivity from a semantic point of view, one challenge that followed immediately was to know how far this notion of subjectivity extended. Appeal to the idea of faultless disagreement suggested that there were many more expressions beyond PPTs that were subjective, but the “find” test narrowed down again the range of putative subjective expressions. However, a major issue was, and continues to be, that for many expressions, the applicability of that test yields controversial results. To date, the data discussed on moral predicates and “find” have been, to our knowledge, entirely anecdotal and constructed for the purposes of making an argument. While such data can be legitimately used, the controversies described above suggest that it could be informative to take a broader, more systematic look at naturally occurring examples.

6 Are Moral Predicates Subjective? A Corpus Study

109

6.3 The Corpus Study 6.3.1 Corpus Used and Raw Data Collection Method We took a snapshot of the distribution of moral adjectives with “find” through a study carried out on the Corpus of Contemporary American English (COCA), using the search tool at www.english-corpora.org (Davies 2008). COCA has over 1 billion words spanning the years 1990–2019 and offers a sample of English evenly distributed across language drawn from academic journals, magazines, newspapers, fiction, spoken language (TV and radio interviews), TV and movie subtitles, blogs and other web pages. In this respect, it constitutes what corpus linguists would consider a balanced, representative sample of the language. Since this is, to our knowledge, the first corpus study on this topic,2 and given the large size and the somewhat limited linguistic information that can be searched for in the corpus using the web search tool, we opted for a limited study, in the hope of inspiring future research on broader sets of data. We focused on uses of adjectives expressing moral judgments as predicative complements to “find” and, for comparison, “consider”. We chose the adjectives “moral”, “immoral”, “ethical” and “unethical”, which we considered to be simultaneously among the most prototypical examples of adjectives used for moral judgments and the least polysemous. Of course, other adjectives can be used to make moral judgments – candidates we considered include “good”, “bad”, “right”, “wrong”, and some additional, more specific adjectives like “(un)acceptable”. However, after some initial searches, we found that these all raised concerns due to their polysemy: Something can be good, bad, or (un)acceptable for moral or ethical reasons, or for other reasons not related to moral judgments. We wished to avoid having to make qualitative decisions concerning the interpretation of such adjectives. As an alternative, to broaden our dataset somewhat, we added to the search complements of the form [“morally”/“ethically” ADJECTIVE], assuming that a speaker who chooses to use the qualifiers “morally” or “ethically” is making explicit the nature of their judgment. All words in the online version of COCA are tagged for lemma (that is, the basic form that covers all inflected forms, such as “finds”, “found”, and “finding” for “find”), as well as for part of speech (noun, verb, etc.). However, COCA is not syntactically parsed: there is no way, for example, to distinguish adjectives used as predicates (as in “We found that unethical”) from those used as modifiers of nouns

2

While philosophers in metaethics and philosophy of language are increasingly eager to look at empirical evidence concerning morality and moral language, the main focus has been on collecting data through controlled experiments (e.g. eliciting acceptability judgments), rather than from corpora. A notable exception is the corpus study presented in Reuter et al. (manuscript), who use corpus data to argue that thick and thin evaluative (specifically moral) expressions are distinguishable from other types of expressions in terms of how they combine with intensifiers (“truly”, “really”, “very”).

110

I. Stojanovic and L. McNally

(as in “We found that unethical politician collecting bribes”). We therefore could not search for adjectives specifically used as predicative complements to “find” or “consider”. The practical alternative that allowed us to collect the most examples was a search for the lemma for each verb within 9 words – the maximum window afforded by the search tool – to the left of each adjective and adverb.3 This strategy guaranteed that we could capture examples in which the complement to the verb to which the moral adjective or adverb was ascribed was quite long (e.g. “It is difficult to consider the employee of a company immoral”) or where adverbial or other material intervened (e.g. “those who consider it entirely immoral”). However, it also meant that we collected a lot of false positives which had to be filtered out (for example, “[Y]ou don’t say whether you consider eavesdropping to be a moral or ethical act”, where “ethical” modifies “act”, and moreover where the judgment is not about whether eavesdropping is ethical or not, but rather whether it is an act of an ethical nature). We offer additional details on the data filtering in the following subsection. Of course, our results will be better interpretable if we also have information about how adjectives behave with “find” and “consider” more generally. This requires having a sense of a) the base frequency of the two verbs and the moral adjectives (particularly when used as predicates); and b) the range and frequency of the other adjectives that occur as predicative complements to “find” and “consider”, as well as the base frequency of these latter adjectives, again, particularly as predicates. As already noted, because the corpus is not syntactically parsed, it is not trivial to extract this information reliably. However, we did attempt a broader quantitative comparison in two ways. First, we carried out an additional search to get a sense of how the frequencies of “(im)moral” and “(un)ethical” with “find” and “consider” compare with those of other adjectives that serve as complements to these verbs. To keep the data collection manageable and as comparable as possible, we collected frequency counts for all adjectives that occurred in the context of the lemmas for “find” or “consider” followed directly by the pronoun “it”, specifying in addition that the expression directly following the adjective not be a noun, to avoid picking up uses of the adjective as a modifier. As “it” is unambiguously a pronoun (unlike “that”, which also has a use as a determiner, as in “that ethical dilemma”), we minimized the collection of irrelevant examples – any adjective following “it” and not followed by a noun is highly likely to be a predicate, as in “consider it ethical”. At the same time, “it” is a highly frequent word, and therefore considered likely to produce a sufficient number of hits to allow for some preliminary analysis. Second, we examined the automatically calculated mutual information (MI) scores available in COCA for the verbs “find” and “consider” with all adjectives,

3

This sort of search is carried out using the collocation search option in the tool at www.englishcorpora.org. For technical reasons, it was not possible to use an equivalent and intuitively more natural strategy of searching for the adjective within the same window to the right of the verb lemma.

6 Are Moral Predicates Subjective? A Corpus Study

111

as well as the mutual information scores for “moral”, “immoral”, “ethical”, “unethical”, “morally”, and “ethically” with all verbs. We provide further details on MI and why we looked at it in the next section.

6.3.2 Initial Results and Data Filtering The raw numbers of hits produced by first searches specifically for “(im)moral”, “(un)ethical”, “morally”, and “ethically”, are summarized in Table 6.1. However, these had to be filtered to eliminate duplicates as well as to restrict results to uses of the adjectives as complements to the verbs in question, and of the adverbs as modifiers of adjectival complements to the verbs. This required reading the examples individually, and was carried out by McNally, a trained linguist and native speaker of English (although the task did not present any particular difficulty). The output of this process includes all examples in which the syntactic functions of the expressions are respected, even if the surface word order varies (e.g. “consensual behavior he considers immoral”, where “immoral” is ascribed to “consensual behavior” from within a relative clause, or “is considered wrong ethically”, a marked but grammatical option in English). Examples in which the moral adjective complement was preceded by “as”, a stylistic option in English, were also left in (e.g. “Neocons consider lying as a standard operating procedure as perfectly ethical”), as were those in which the moral adjective appeared as the complement to “to be” in an infinitival complement to the verb (e.g. “We consider her actions to be immoral”). In the latter case, though the syntactic structure is technically different, we see no nuance of semantic or pragmatic difference of any sort. The results, after this initial filtering, appear in Table 6.2. As can be seen, filtering considerably reduced the number of examples. We provide further, qualitative commentary on these in the next section. As noted at the end of the last section, in order to put these results in context, it is relevant to take into account the overall frequency of both the two verbs and

Table 6.1 Raw occurrences in COCA of the lemmas for “find” and “consider” within a 9-word window to the left of “(im)moral”, “(un)ethical”, “morally”, and “ethically” FIND CONSIDER

Moral 376 334

Immoral 64 145

Ethical 156 177

Unethical 56 74

Morally ADJ 176 100

Ethically ADJ 32 25

Table 6.2 Number of occurrences in COCA of the lemmas for “find” and “consider” within a 9-word window to the left of “(im)moral”, “(un)ethical”, “morally, and “ethically”, after filtering FIND CONSIDER

Moral 4 32

Immoral 45 125

Ethical 11 32

Unethical 25 64

Morally ADJ 138 70

Ethically ADJ 15 17

112

I. Stojanovic and L. McNally

the individual adjectives and adverbs. As a first approximation, we carried out a further search for the lemmas for “find”/“consider”, followed immediately by “it”, then directly by any word of the category adjective, and then any category other than a noun (the specific search strings used were “FIND it ADJ -NOUN” and “CONSIDER it ADJ -NOUN”, where “-” is a Boolean negation operator). Due to imprecisions in the tagging, these searches also yielded various false positives that had to be manually filtered. These fell into two cases: a) examples where a noun appeared after the adjective (e.g. “consider it real progress”); and b) where the third item was not an adjective (e.g. “finding it – while”, where the dash punctuation constitutes the third item). As before, this filtering was carried out by McNally. It is worth pointing out that this filtering, in the case of “find”, leaves in examples that probably correspond to a distinct sense of the verb that does not involve subjective judgment, namely examples like (7). (7) He opened the lid and found it empty. The two remaining seeds were gone. In this example, the verb describes not a judgment but an event of encountering something in an objective state. This sense of “find” is highly salient with “empty” and a few other adjectives, such as “full”, “intact”, “(un)occupied”, and “vacant”. However, it cannot be reliably identified solely by considering the adjective alone – for example, one could use (8) to express a subjective judgment about a theater after a very poorly attended performance. (8) I found it empty. Since the examples we extracted were too numerous to verify individually, in some cases it is difficult to determine the interpretation of the verb with certainty, and overall the adjectives we considered likely to yield this interpretation constituted not more than 100 tokens, or about 0,7% of the total for “find”, we chose not to exclude them. After filtering, a total of 14,536 tokens of FIND “it” ADJ and 831 tokens of CONSIDER “it” ADJ remained. We also searched for instances of “morally” or “ethically” ADJ in the same context. The results, including the number of tokens involving the four moral adjectives of interest, are summarized in Table 6.3. For comparison we extracted two additional sorts of counts. First, in Tables 6.4 and 6.5 we provide counts for the five most frequent adjectives that occur in this context with “find” and “consider”, respectively. Second, we looked at the frequencies in the same contexts of a sample of adjectives that have been repeatedly classified as PPTs in the philosophical and

Table 6.3 Number of occurrences in COCA (1) of all adjectives (including, “(im)moral” and “(un)ethical”), (2) of “(im)moral” and “(un)ethical”, and (3) of “morally”/“ethically” ADJ directly following FIND/CONSIDER “it” and not followed by a noun, after filtering ADJ Moral Immoral Ethical Unethical Morally ADJ Ethically ADJ FIND it _ 14,536 1 6 6 2 12 2 CONSIDER it _ 831 1 7 2 6 5 1

6 Are Moral Predicates Subjective? A Corpus Study

113

Table 6.4 Number of occurrences in COCA of the five most frequent adjectives directly following FIND “it” FIND it _

Difficult 2325

Interesting 1028

Easier 786

Necessary 656

Impossible 506

Table 6.5 Number of occurrences in COCA of the five most frequent adjectives directly following CONSIDER “it” CONSIDER it _

Important 50

Necessary 42

Possible 25

Essential 20

Appropriate, unlikely (tie) 19

Table 6.6 Number of occurrences in COCA of a sample of predicates of personal taste directly following FIND “it” and CONSIDER “it” FIND it _ CONSIDER it _

Boring 67 0

Delicious 11 0

Disgusting 38 2

Exciting 49 0

Fun 53 2

Tasty 3 0

linguistics literature discussed in Sect. 6.2, specifically “boring”, “delicious”, “disgusting”, “exciting”, “fun”, and “tasty”. The results appear in Table 6.6. We now turn to some observations on the results of these searches.

6.3.3 Observations on the Corpus Data First, from the counts in Tables 6.2 and 6.3, we can certainly conclude that some moral adjectives, as well as the adverbs “morally” and “ethically”, do appear with “find”. A sample example for each adjective/adverb is provided in (9)–(15). (9) [Senator E. Kennedy]: As a matter of your own individual and personal moral beliefs, do you believe that abortion is moral or immoral? [Judge Souter]: Senator, I’m going respectfully to ask to decline to answer that question for this reason, that whether I do or do not find it moral or immoral will play absolutely no role in any decision I make if I am asked to make it on the question of what weight should or legitimately may be given to the interest which is represented by the abortion decision (10) I would never vote for something I find immoral or unjust even if 90% of my voters were for it (11) I agree that if I use and enjoy open source software, it is ethical for me to contribute back, and I find it most ethical to contribute in a fashion that can be used and enjoyed by all those whose contributions I enjoy (12) [Talking about “pay-to-play” concerts] While we might find the practice unethical, disgusting and ugly, it’s not illegal (13) In fact, I find it to be a moral responsibility that I take the knowledge that I am able to understand and help make it accessible to everyone. I find it to be very

114

I. Stojanovic and L. McNally

unethical to provide poor or incomplete information (which is part of why my posts are so long) (14) [To] vote for a third party candidate that I find less morally objectionable is for me the way to avoid any cooperation with an immoral act (15) I still think even in debate vituperative insults can occasionally be useful, and in less structured discussions elsewhere I don’t find them ethically questionable, though often overutilized That said, the data in Tables 6.2, 6.3, 6.4, 6.5 and 6.6 also clearly indicate that our sample moral adjectives and adverbs are as a whole used considerably less frequently with “find” than are the adjectives in our sample of PPTs, both in absolute and relative terms. The PPTs occur vastly more often after FIND “it” than after CONSIDER “it” (where their presence is virtually testimonial). In contrast, the moral adjectives, with the exception of “ethical”, appear more often after CONSIDER “it” than after FIND “it” (though overall the numbers are very small), and Table 6.2 clearly shows a greater tendency to appear as a complement to “consider” than to “find”, including for “ethical”. Interestingly, however, this asymmetry is not found with the adverbs: indeed, “morally” appears more often as a modifier of an adjectival complement to “find” than it does with adjectival complements to “consider”, while “ethically” appears a similar number of times with both. Of course, these numbers have to be evaluated against the background of other frequency information. The overall frequency of any word will obviously influence how often it appears with other words. Moreover, some words are relatively unselective about the other words they appear with (for example, “be”), while others occur much more frequently with some words than others (such as “radiocarbon”, with “dating”). This selectivity, or strength of association, can occur for multiple reasons, both grammatical (“be” is a verb used in a wide range of constructions) and semantic/pragmatic (“radiocarbon dating” describes a particularly widely used method for dating objects, and we may talk considerably less infrequently about other uses of radioactive isotopes of carbon and thus use “radiocarbon” infrequently as a modifier of other terms). In the case that interests us here, it would be interesting to know whether, indeed, there are distinctly different strengths of association between “find” and PPTs, on the one hand, and moral adjectives and “consider”, on the other. In corpus linguistics, one standard measure of strength of association is mutual information (MI), and COCA conveniently provides automatically calculated MI scores word pairs in the corpus.4 We will not go into the technical details of MI here

4

Mutual Information is calculated in COCA as in (i), taken from https://www.english-corpora.org/ mutualInformation.asp with minor modifications. (i) MI = log((AB*sizeCorpus)/(A*B*span))/log(2), where A = frequency of the word of interest (e.g. “moral”) B = frequency of collocate (e.g. “find”) AB = frequency of collocate near the node word (e.g. “find” near “moral”)

6 Are Moral Predicates Subjective? A Corpus Study

115

(see Evert, 2009 for very useful discussion), other than to note that one important limitation of the way in which MI is calculated in COCA is that it does not take into account the syntactic relations between words. It simply looks at cooccurrence within a specified window. Thus, a string like “find the moral responsibility”, which is irrelevant for our purposes, contributes to the MI score for “find” and “moral” in exactly the same way as the relevant (if grammatically incomplete) string “find it moral and”. In general, the higher a (positive) MI score, the stronger the (positive) association. In the english-corpora.org interface, the default suggestion for a MI search is to find scores of at least 2.5; Hunston (2002, p. 71) asserts that MI scores of “3 or higher can be taken to be significant.” We searched COCA’s frequency database for MI scores over 1 for the different adjectives and adverbs mentioned above in combination with “find” and “consider”, not placing any minimum threshold on absolute frequencies for the co-occurrences. We found MI scores over 1 for “ethical” (1.96), “unethical” (3.55), and “immoral” (3.63) and “consider”; indeed, “consider” was the verb with the strongest mutual information score for “unethical” and “immoral”. Similarly positive scores were found for “morally” (2.29) and “ethically” (2.41) with “consider”. “Moral” did not give a positive result in this search. None of the PPTs showed positive MI scores with “consider”, and none of the adjectives or adverbs at all showed an MI score over 1 with “find”, except for “boring” (1.37). Thus, despite the limitations of the MI scores as calculated in COCA, we have some reason to think that moral adjectives and adverbs are semantically different in some way from PPTs, despite the fact that both occur with “find”. We did a further search for the adjectives most strongly associated with “find”; the top 10 were “hard-pressed” (4.84), “distasteful” (4.52), “objectionable” (4.38), “amusing” (4.34), “gainful” (3.84), “guilty” (3.79), “off-putting” (3.76), “repulsive” (3.63), “abhorrent” (3.62), and “repugnant” (3.62). Among these, all but “hard-pressed”, “gainful”, and “guilty” are adjectives that imply an experiencer subject, and, to that extent, are arguably PPTs—even if “objectionable”, “repulsive”, “abhorrent” and “repugnant” can be used for the purpose of assessing moral actions.5

6.3.4 Discussion The present work lies within a broader philosophical enterprise of understanding the nature of morality. More precisely, we wish to know to which extent moral judgments are subjective in the way in which judgments of personal taste are. We

sizeCorpus = the number of words in the corpus span = span of words (in COCA, this is 3 to left and 3 to right of word of interest) log(2) is literally the log10 of the number 2: .30103 5

Note that in “(not) guilty”, “find” often occurs not as a subjective attitude verb but rather acquires a legal sense describing a jury officially deciding on an accused individual’s guilt.

116

I. Stojanovic and L. McNally

approach this question by studying moral language; specifically, by looking at how paradigmatic moral predicates – “(im)moral” and “(un)ethical” – combine with subjective attitudes verbs. Overall, our findings show that moral predicates exhibit certain linguistic hallmarks of subjectivity, but, at the same time, behave differently from PPTs. Both “consider” and “find” (in one of its senses) are verbs that express subjective attitudes. Lasersohn (2009, p. 365) observes that “consider” “is much more limited than ‘believe’ in the types of complement clause it may combine with. It combines quite naturally with clauses expressing personal taste, but normally does not combine with clauses expressing completely objective matters of fact”. “Find”, as amply discussed in the literature, clearly tracks subjective judgment and is more restrictive than “consider”, since it does not accept complements such as “vegetarian”, which are subjective only to the extent that different speakers may appeal to different criteria in classifying things as vegetarian or not. Our corpus study provides evidence of natural occurrences of moral adjectives with both verbs, so one may be tempted to simply conclude that moral adjectives are therefore subjective, just like PPTs. But this would be a hasty and oversimplified conclusion. Our findings show that moral predicates prefer to occur with “consider” rather than “find” (despite occasionally occurring with the latter), whereas in the case of PPTs, it is the other way around. We draw this more nuanced conclusion from the fact that, as can be seen from Table 6.2, moral predicates are about three times more likely to occur with “consider” than with “find”.6 While this already reveals a proportional preference for “consider” over “find”, it bears noting that the preference is actually much higher, given that the verb “find” is much more frequent than “consider”. Table 6.3, which looks specifically at “find”/“consider it” ADJ constructions, points to a similar pattern. That is to say, while the number of occurrences is altogether low, the much greater frequency of “find” over “consider” suggests that, proportionally, moral predicates show a preference for the latter over the former. On the other hand, as Table 6.6 shows, PPTs are largely absent in the “consider it” ADJ construction, but very frequent in the “find it” ADJ construction (for instance, the antonyms “fun” and “boring” have over 50 occurrences each, while among moral predicates, “immoral” and “ethical” score highest, with only six occurrences each). Our findings are therefore completely in line with the observation from McNally and Stojanovic (2017), mentioned earlier, that evaluative adjectives (of which moral adjectives are a subtype) occur more naturally with “consider” rather than “find”. The corpus study presented here has significant implications for theoretical research on subjectivity. It is generally assumed that “find” is more restrictive than 6

Note that this does not extend to predicates that are modified with “morally” (where we see the reverse pattern) and “ethically” (equally likely to occur with either verb). In the corpus data, we often see “find” embed adjectives such as “reprehensible”, “objectionable” and “repugnant” modified by “morally”. We believe that it is these adjectives that are driving the preference for “find” over “consider”, while the adverb “morally” primarily serves to endow the adjectives with a more specific sense.

6 Are Moral Predicates Subjective? A Corpus Study

117

“consider”, which in turn is more restrictive than “believe”. But the picture that emerges from our corpus search appears to be more subtle. For one thing, pace Lasersohn, “consider” does not “combine quite naturally with clauses expressing personal taste”; witness the fact that predicates such as “boring”, “fun”, “delicious”, “disgusting”, and “tasty” hardly ever occur with “consider”. This suggests that the relationship between the subjective attitudes that are expressed with the two verbs is not one of subordination. Rather, the attitudes that the two verbs express, considering and finding, may be plausibly seen as involving different types of subjectivity. A further question is what kind of theory accounts best for the data observed. While there have been a number of interesting and plausible proposals concerning the semantics of “find” (see e.g. Willer, forthcoming, for references and overview), the question of how it differs from other subjective attitude verbs has been less discussed. A notable exception is the work of Kennedy and Willer (2016, 2022), whose proposal is largely driven by the motivation of capturing the differences between “find” and “consider”. They start by outlining their account of attributions of subjective attitudes, based on the idea of what they call “counterstance contingency”: “a subjective attitude ascription asserts belief in the proposition expressed by the complement clause, and presupposes the contingency of this belief across a set of contextually provided alternatives to the attitude holder’s doxastic state, all of which agree on the salient facts of the matter but disagree on judgments about those facts. We label these alternatives counterstances and the contingency across them counterstance contingency” (Kennedy & Willer, 2022, p. 13). After motivating the idea of counterstance contingency, they note: “It remains to explain the more fine-grained differences between consider-type and find-type subjective attitude verbs. Our key proposal is that the latter presuppose a distinguished kind of subjectivity that we label radical counterstance contingency, which flows from a distinguished kind of pragmatic underdetermination (...)” (Kennedy & Willer, 2022, p. 15). While mere counterstance contingency tends to result from incidental underdetermination, radical counterstance contingency results from essential underdetermination. In other words, in the former case, speakers can avoid underdetermination by stipulating that terms be understood in one way rather than another, while in the latter case, their views and experiences diverge more radically and cannot be brought into agreement by mere stipulation. Kennedy and Willer’s account can explain why terms such as “vegetarian” can occur with “consider” but not with “find”. It can also provide a plausible story as to why PPTs are more likely to occur with “find” instead of “consider”.7 However, the predictions of their view, when it comes to moral judgments, are less clear. On the one hand, whether a belief is counterstance contingent, and whether it is radically

7

The idea would be, roughly, that if one assumes that judgments of personal taste systematically involve radical counterstance contingency, then speakers should preferably use a verb that triggers this presupposition (to wit, “find”) rather than a verb such as “consider”, which triggers a weaker presupposition.

118

I. Stojanovic and L. McNally

so or not, is conversation-dependent, which would fit well with the observation that moral predicates can occur both in “consider” and in “find” constructions. On the other, this suggests that the interpretation of the attributions of “find”-attitudes vs. “consider”-attitudes should differ precisely along these lines; that is to say, a speaker who uses “consider” presupposes that their (or the attributee’s) divergence on moral issues could be settled by stipulation, whereas if they use “find”, they presuppose a more radical divergence. It also suggests that the preference for “consider” that we have observed for moral predicates should mirror the tendency of incidental rather than essential underdetermination when it comes to moral judgments. Whether the two predictions are borne out would require examining the examples closely, including a qualitative analysis of the contexts in which they occurred. However, this important task lies beyond the scope of this paper.

6.4 Conclusion While the nature of morality has been a core topic of interest for decades, the twentyfirst century marks what may be called the empirical turn in philosophy in general, and moral philosophy and philosophy of language in particular. However, the empirical methods used so far have predominantly involved controlled studies involving, for example, the elicitation of acceptability judgments. The present chapter offers new insights on the nature of moral judgments based on corpus methodology. We have presented a corpus study that investigates the subjective character of moral predicates, by examining how they combine with two subjective attitude verbs, “find” and “consider”. The study shows, in a nutshell, that moral predicates can occur naturally with both verbs. Nevertheless, they show a clear preference for “consider” over “find”. In this respect, moral predicates differ significantly from predicates of personal taste, which embed frequently and naturally under “find”, but hardly ever under “consider”. This suggests, in turn, that the subjectivity that one sees in moral judgments may well be of a different kind than the subjectivity of personal taste. Acknowledgments We would like to thank David Bordonaba Plou, Christopher Kennedy, Malte Willer and an anonymous reviewer for comments. Isidora Stojanovic acknowledges support from the COST Action CA17132 in the Horizon 2020 Framework Programme of the European Union, and from the Agence nationale de la recherche (ANR) under the grant agreement ANR-17-EURE0017 FrontCog. Louise McNally acknowledges support through an ICREA Academia award.

References Beebe, J. R., & Sackris, D. (2016). Moral Objectivism across the lifespan. Philosophical Psychology, 29, 912–929. Björkholm, S.. (2022). The duality of moral Language. On hybrid theories in metaethics. Dissertation, University of Stockholm.

6 Are Moral Predicates Subjective? A Corpus Study

119

Bordonaba Plou, D. (2017). Operadores de orden superior y predicados de gusto: Una aproximación expresivista. Dissertation, Universidad de Granada. Camp, E. (2017). Metaethical expressivism. In T. McPherson & D. Plunkett (Eds.), The Routledge handbook of metaethics (pp. 87–101). Routledge. Davies, M. (2008). The Corpus of Contemporary American English (COCA). Brigham Young University. https://www.english-corpora.org/coca/ Evert, S. (2009). Corpora and collocations. In A. Lüdeling & M. Kytö (Eds.), Corpus linguistics: An international handbook (Vol. 2, pp. 1212–1248). De Gruyter Mouton. Franzén, N. (2020). Evaluative discourse and affective states of mind. Mind, 129, 1095–1126. Goodwin, G. P., & Darley, J. M. (2008). The psychology of meta-ethics: Exploring objectivism. Cognition, 106, 1339–1366. Goodwin, G. P., & Darley, J. M. (2010). The perceived objectivity of ethical beliefs: Psychological findings and implications for public policy. Review of Philosophy and Psychology, 1, 1–28. Goodwin, G. P., & Darley, J. M. (2012). Why are some moral beliefs perceived to be more objective than others? Journal of Experimental Social Psychology, 48, 250–256. Hunston, S. (2002). Corpora in applied linguistics. Cambridge University Press. Karczewska, N. (2019). Faultless disagreement in contemporary semantic theories. Dissertation, University of Warsaw. Kennedy, C. (2013). Two sources of subjectivity: Qualitative assessment and dimensional uncertainty. Inquiry, 56, 258–277. Kennedy, C., & Willer, M. (2016). Subjective attitudes and counterstance contingency. Proceedings of Semantics and Linguistic Theory, 26, 913–933. Kennedy, C., & Willer, M. (2022). Familiarity inferences, subjective attitudes, and counterstance contingency: Toward a pragmatic theory of subjective meaning. Linguistics and Philosophy, 45(6), 1395–1445. https://doi.org/10.1007/s10988-022-09358-x Lasersohn, P. (2005). Context dependence, disagreement, and predicates of personal taste. Linguistics and Philosophy, 28, 643–686. Lasersohn, P. (2009). Relative truth, speaker commitment and control of implicit arguments. Synthese, 166, 359–374. McNally, L., & Stojanovic, I. (2017). Aesthetic adjectives. In J. Young (Ed.), The semantics of aesthetic judgment (pp. 17–37). Oxford University Press. Odrowa¨ ˛z-Sypniewska, J. (2021). Vagueness in natural language. In P. Stalmaszczyk (Ed.), The Cambridge handbook of the philosophy of language (pp. 434–449). Cambridge University Press. Pölzler, T. (2017). Revisiting folk moral realism. Review of Philosophy and Psychology, 8, 455– 476. Pölzler, T., & Wright, J. C. (2020a). An empirical argument against moral non-cognitivism. Inquiry, 1–29. https://doi.org/10.1080/0020174X.2020.1798280 Pölzler, T., & Wright, J. C. (2020b). Anti-realist pluralism: A new approach to folk metaethics. Review of Philosophy and Psychology, 11, 53–82. Railton, P. (2017). Naturalistic realism in metaethics. In T. McPherson & D. Plunkett (Eds.), The Routledge handbook of metaethics (pp. 43–57). Routledge. Reuter, K., Baumgarten, L., & Willemsen, P. (manuscript). Tracing thick and thin concepts through corpora. Unpublished manuscript. http://philsci-archive.pitt.edu/20584/ Sæbø, K. J. (2009). Judgment ascriptions. Linguistics and Philosophy, 32, 327–352. Sarkissian, H. (2016). Aspects of folk morality: Objectivism and relativism. In W. Buckwalter & J. Sytsma (Eds.), A companion to experimental philosophy (pp. 212–224). Wiley Blackwell. Sayre-McCord, G. (2005). Moral realism. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy. The Metaphysics Research Lab. Summer 2021 Edition. https://plato.stanford.edu/ archives/sum2021/entries/moral-realism/ Silk, A. (2021). Evaluational adjectives. Philosophy and Phenomenological Research, 102, 127– 161.

120

I. Stojanovic and L. McNally

Solt, S. (2018). Multidimensionality, subjectivity and scales: Experimental evidence. In E. Castroviejo, L. McNally, & G. W. Sassoon (Eds.), The semantics of gradability, vagueness, and scale structure (pp. 59–91). Springer. Soria Ruiz, A., & Faroldi, F. L. (2020). Moral adjectives, judge-dependency and holistic multidimensionality. Inquiry, 64, 1–30. Soria Ruiz, A., Cepollaro, B., & Stojanovic, I. (2021). The semantics and pragmatics of value judgments. In P. Stalmaszczyk (Ed.), The Cambridge handbook of the philosophy of language (pp. 434–449). Cambridge University Press. Stephenson, T. (2007). Judge-dependence, epistemic modals, and predicates of personal taste. Linguistics and Philosophy, 30, 487–525. Stojanovic, I. (2007). Talking about taste: Disagreement, implicit arguments, and relative truth. Linguistics and Philosophy, 30, 691–706. Stojanovic, I. (2012). Emotional disagreement. Dialogue, 51(1), 99–117. Stojanovic, I. (2017). Context and disagreement. Cadernos de Estudos Lingüísticos, 59, 9–22. Stojanovic, I. (2019). Disagreements about taste vs. disagreements about moral issues. American Philosophical Quarterly, 56, 29–42. Stojanovic, I., & Kaiser, E. (2022). Exploring valence in judgments of taste. In J. Wyatt, J. Zakkou, & D. Zeman (Eds.), Perspectives on taste (pp. 231–259). Routledge. Sundell, T. (2011). Disagreements about taste. Philosophical Studies, 155, 267–288. Umbach, Carla. 2021. Evaluative predicates. Beyond fun and tasty. In The Wiley Blackwell companion to semantics, eds. Daniel Gutzmann, Lisa Matthewson, Cecile Meier, Hotze Rullmann, and Thomas E. Zimmermann. https://doi.org/10.1002/9781118788516.sem127 Verheyen, S., Dewil, S., & Egré, P. (2018). Subjectivity in gradable adjectives: The case of tall and heavy. Mind and Language, 33(5), 460–479. https://doi.org/10.1111/mila.12184 Willer, M. (forthcoming). Subjectivity. In E. Lepore & U. Stojnic (Eds.), The Oxford handbook of contemporary philosophy of language. Oxford University Press. Woods, J. (2017). The Frege-Geach problem. In T. McPherson & D. Plunkett (Eds.), The Routledge handbook of metaethics (pp. 226–242). Routledge. Wright, J. C., Grandjean, P., & McWhite, C. (2013). The meta-ethical grounding of our moral beliefs: Evidence for meta-ethical pluralism. Philosophical Psychology, 26, 336–361. Zakkou, J. (2019). Faultless disagreement: A defense of contextualism in the realm of personal taste. Klostermann. Zeman, D. (2020). Faultless disagreement. In M. Kusch (Ed.), The Routledge handbook of philosophy of relativism (pp. 486–495). Routledge.

Isidora Stojanovic is a senior researcher at the Centre National de Recherche Scientifique (CNRS), at the Jean-Nicod Institute in Paris, France. She holds a PhD in Philosophy from Stanford University and a PhD in Cognitive Science from the Ecole Polytechnique. She has published in American Philosophical Quarterly, Erkenntnis, Linguistics and Philosophy, Inquiry, Synthese, as well as many handbooks and collective volumes. Her research lies at the interface between philosophy of language and formal semantics and combines theoretical tools with empirical methods. Louise McNally is Professor of Linguistics at Universitat Pompeu Fabra, Barcelona. Her research focuses on the interaction of lexical and compositional semantics and on the form-meaning interface. She is currently co-Editor in Chief of the journal Semantics and Pragmatics.

Chapter 7

Linguistic Corpora and Ordinary Language: On the Dispute Between Ryle and Austin About the Use of ‘Voluntary’, ‘Involuntary’, ‘Voluntarily’, and ‘Involuntarily’ Michael Zahorec, Robert Bishop, Nat Hansen, John Schwenkler, and Justin Sytsma

Abstract The fact that Gilbert Ryle and J.L. Austin seem to disagree about the ordinary use of words such as ‘voluntary’, ‘involuntary’, ‘voluntarily’, and ‘involuntarily’ has been taken to cast doubt on the methods of ordinary language philosophy. As Benson Mates puts the worry, ‘if agreement about usage cannot be reached within so restricted a sample as the class of Oxford Professors of Philosophy, what are the prospects when the sample is enlarged?’ (Mates, Inquiry

We want to thank David Bordonaba-Plou, Eugen Fischer, and Kevin Reuter for their helpful comments. Nat Hansen gratefully acknowledges support from the Alexander von Humboldt Foundation, John Schwenkler from the Alexander von Humboldt Foundation and the Notre Dame Institute for Advanced Study.

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/978-3-031-28908-8_7.

M. Zahorec Florida State University, Tallahassee, FL, USA R. Bishop California State University, San Bernardino, CA, USA N. Hansen University of Reading, Reading, UK J. Schwenkler Florida State University, Tallahassee, FL, USA J. Sytsma () Victoria University of Wellington, Wellington, New Zealand e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Bordonaba-Plou (ed.), Experimental Philosophy of Language: Perspectives, Methods, and Prospects, Logic, Argumentation & Reasoning 33, https://doi.org/10.1007/978-3-031-28908-8_7

121

122

M. Zahorec et al.

1:161–171, 1958, p. 165). In this chapter, we evaluate Mates’s criticism alongside Ryle’s and Austin’s specific claims about the ordinary use of these words, assessing these claims against actual examples of ordinary use drawn from the British National Corpus (BNC). Our evaluation consists in applying a combination of methods: first aggregating judgments about a large set of samples drawn from the corpus, and then using a clustering algorithm to uncover connections between different types of use. In applying these methods, we show where and to what extent Ryle’s and Austin’s accounts of the use of the target terms are accurate as well as where they miss important aspects of ordinary use, and we demonstrate the usefulness of this new combination of methods. At the heart of our approach is a commitment to the idea that systematically looking at actual uses of expressions is an essential component of any approach to ordinary language philosophy.

7.1 Introduction In the middle of the twentieth century, some philosophers argued that reflection on the language of non-philosophers could be a way of making progress on perennial philosophical problems. This approach, which came to be called ‘ordinary language philosophy’, pointed to supposed differences between the way philosophers used expressions that play central roles in philosophical arguments (expressions such as ‘looks’ in arguments for sense data, or ‘knows’ in arguments for skepticism) and the way those expressions are used outside philosophy. Failing to pay sufficient attention to ordinary use of expressions got philosophers into trouble, the ordinary language philosophers argued, because philosophers ended up using those expressions in ways that were so different from their ordinary uses that they in effect were either giving these expressions entirely new meanings (cf. Waismann, 1997; Fischer, 2019) or using them without any real meaning at all (cf. Baz, 2017), and so in any case weren’t really talking about looking or knowing (or whatever topic they took themselves to be investigating) in any recognizable sense. Further, by distorting the meaning of expressions like ‘looks’ and ‘knows’, philosophers got themselves into unnecessary and unfruitful philosophical entanglements.1 The way out of those entanglements was to reflect on the ordinary uses of the relevant expressions. During the 1950s and early 1960s, ordinary language philosophy was arguably ‘the most influential school of philosophy in Britain’ (Russell, 1953, p. 303), but it also attracted intense criticism from several different directions. The bestknown objection to ordinary language philosophy holds that while philosophers may be departing from ordinary use when they employ expressions like ‘looks’ and ‘know’ in philosophical arguments, they aren’t thereby distorting the meaning of those expressions (Grice, 1961; Stroud, 1984). A lesser known, but even more fundamental, objection criticizes what should be the core strength of ordinary

1

For a contemporary version of this argument, see Fischer et al. (2021).

7 Linguistic Corpora and Ordinary Language: On the Dispute Between Ryle. . .

123

language philosophy: its claims about the ordinary use of terms and expressions. If ordinary language philosophers fail to correctly describe the way that expressions are ordinarily used, then they have stumbled at the very first hurdle in their criticism of the way philosophers use language. This is what Mates (1958) argued in an early debate with Cavell (1958). Mates held that there is reason to doubt the accuracy of ordinary language philosophers’ claims because Gilbert Ryle and J. L. Austin, two of the most influential and astute ordinary language philosophers, disagree in some of their claims about ordinary use. The conclusion that Mates draws from Austin and Ryle’s disagreement is posed as a (rhetorical) question: ‘If agreement about usage cannot be reached within so restricted a sample as the class of Oxford Professors of Philosophy, what are the prospects when the sample is enlarged?’ (Mates, 1958, p. 165). The purported disagreement that Mates finds between Ryle and Austin concerns the expressions ‘voluntary’, ‘involuntary’, ‘voluntarily’, and ‘involuntarily’. Mates observes that, in The Concept of Mind, Ryle makes the following claims: In their most ordinary employment ‘voluntary’ and ‘involuntary’ are used, with a few minor elasticities, as adjectives applying to actions which ought not to be done. We discuss whether someone’s action was voluntary or not only when the action seems to have been his fault . . . . In this ordinary use, then, it is absurd to discuss whether satisfactory, correct or admirable performances are voluntary or involuntary. (Ryle, 1949/2009, p. 56)

Ryle goes on to say, in accordance with the general strategy pursued by ordinary language philosophers, that philosophers apply these terms in ‘quite another way’ from the ordinary use, namely as applying to not only actions that ought not to be done, but also to ‘meritorious actions’ (ibid.). It is this, he argues, that leads philosophers into nonsense and confusion. Mates then points out that Ryle’s remarks seem inconsistent with something Austin says about ‘voluntarily’ and ‘involuntarily’ in the context of a discussion of expressions we use to make excuses in ordinary language: For example, take ‘voluntarily’ and ‘involuntarily’: We may join the army or make a gift voluntarily, we may hiccough or make a small gesture involuntarily. (Austin, 1957, p. 17)

Whatever one’s views about the morality of joining the army, it seems clear that making a gift, hiccoughing or making a small gesture are not usually seen as ‘actions which ought not to be done’. It therefore looks like Austin has given an example that shows that Ryle’s characterization of ordinary use is incorrect. And this is the way the apparent conflict between Austin and Ryle has been characterized in the literature on ordinary language philosophy in the 60 years since Mates made his argument: Austin’s example reveals Ryle’s mistake (e.g., Cavell, 1958, p. 174; Hacker, 1996, p. 235; Hanfling, 2000, p. 56; Norris, 2017, p. 30). We want to linger over this apparent disagreement. We say ‘apparent’ because, as any acute observer of ordinary language should notice, Austin’s examples don’t actually contradict what Ryle says in the passage that Mates quotes. This is because Ryle makes a claim only about the adjectives ‘voluntary’ and ‘involuntary’, while Austin’s claim is about the adverbs ‘voluntarily’ and ‘involuntarily’. However, not

124

M. Zahorec et al.

only does all of the existing commentary that addresses Mates’s objection take it for granted (without argument) that Austin’s and Ryle’s claims are in conflict,2 but all existing commentary that we are aware of also fails to notice that, just a few pages later in The Concept of Mind, Ryle himself describes the use of ‘voluntarily’ in a way that is similar to Austin—right down to the example of volunteer soldiers: Very often we oppose things done voluntarily to things suffered under compulsion. Some soldiers are volunteers, others are conscripts; some yachtsmen go out to sea voluntarily, others are carried out to sea by the wind and tide. (Ryle, 1949/2009, p. 60)

Following this remark Ryle makes several further claims about different possible uses that ‘voluntarily’ and ‘involuntarily’ have in ordinary language, which we will discuss in detail below. It seems plausible, then, that Mates was simply wrong that there was disagreement between Ryle and Austin about the ordinary use of these expressions.3 Nevertheless, while the way that Mates frames the disagreement between Ryle and Austin is sloppy, it is possible to reconstruct a modified version of his challenge. In the passage quoted above, Ryle says that ‘very often we oppose things done voluntarily to things suffered under compulsion’, while Austin’s position is that ‘voluntarily’ and ‘involuntarily’ can only be used to modify normal verbs when the action named by the verb is done ‘in some special way or circumstances’: The natural economy of language dictates that for the standard case covered by any normal verb,—not, perhaps, a verb of omen such as ‘murder,’ but a verb like ‘eat’ or ‘kick’ or ‘croquet’—no modifying expression is required or even permissible. Only if we do the action named in some special way or circumstances, different from those in which such an act is naturally done (and of course both the normal and the abnormal differ according to what verb in particular is in question) is a modifying expression called for, or even in order . . . It is bedtime, I am alone, I yawn: but I do not yawn involuntarily (or voluntarily!) . . . To yawn in any such peculiar way is just not to just yawn. (Austin, 1957, p. 16)

At the root of Mates’s challenge is the following question: How could we tell whether Ryle or Austin (or both, or neither) is correct in their characterizations of ordinary language? In this chapter we will demonstrate the value to ordinary language philosophy of looking at a sample of actual language drawn from a linguistic corpus—a searchable body of text, purpose-built for answering linguistic questions (Bluhm, 2016, p. 91).4 The results of our analysis pull in several different directions. First, our results cast doubt on several of the specific claims that Ryle and Austin make about the

2

One exception is Hansen (2017), who does note that Ryle and Austin’s claims are not strictly speaking in conflict with each other. 3 For further discussion of this point, see Schwenkler (Forthcoming). 4 Experimental philosophers have increasingly been calling on tools from corpus linguistics in recent years, treating corpora as a further source of evidence that can help with testing philosophical hypotheses about language. See Liao and Hansen (2022), Hansen et al. (2021), Sytsma et al. (2019), Caton (2020), and Ulatowski et al. (2020) for recent discussions and examples.

7 Linguistic Corpora and Ordinary Language: On the Dispute Between Ryle. . .

125

ordinary use of our target terms. This reveals the pitfalls of doing ordinary language philosophy without a systematic survey of the varieties of actual use. Second, our results nevertheless vindicate, in unexpected ways, some more general aspects of Ryle’s and Austin’s claims about these expressions, including Austin’s insistence that the pairs ‘voluntary’ and ‘involuntary’, and ‘voluntarily’ and ‘involuntarily’, do not function as simple opposites, and Ryle’s claim these words are used to mark a number of different non-overlapping conceptual distinctions. Third, we find that some characteristically philosophical uses of these expressions are closest in their use to one narrow type of ordinary use, namely one that categorizes bodily movements in physiological terms. This last finding anticipates a claim made by Anscombe (1963, p. 12) and raises the possibility of a novel response to Ryle and Austin’s claim that philosophical uses of these expressions depart from their ordinary use, namely that the philosophical use is actually continuous with one type of ordinary use. However, as we will discuss in Sect. 7.5, the philosophical significance of the similarity between philosophical and physiological uses of these expressions remains an open question.

7.2 Ryle on ‘Voluntary’ and ‘Involuntary’ Our first study began by considering the above-quoted passage from The Concept of Mind, focusing on the claims highlighted here in italics: In their most ordinary employment ‘voluntary’ and ‘involuntary’ are used, with a few minor elasticities, as adjectives applying to actions which ought not to be done. We discuss whether someone’s action was voluntary or not only when the action seems to have been his fault. (Ryle, 1949/2009, p. 56; emphasis added)

Ryle makes three claims in this passage about how ‘voluntary’ and ‘involuntary’ are ordinarily used: namely that (‘in their most ordinary employment’ and ‘with a few minor elasticities’) these words are used to (1) describe actions that (2) ought not be done, (3) in cases where the action seems to have been the fault of the agent. To aid in a qualitative assessment of the accuracy of these claims, we used the British National Corpus (BNC) to generate a sample of 100 uses of each of the terms ‘voluntary’ and ‘involuntary’, together with their surrounding context.5 These two

5 The BNC is a 100-million-word sample of British English from the late twentieth century (BNC Consortium, 2007). Ninety percentage of the corpus is drawn from written materials, including newspapers, fiction and non-fiction books, and letters, while the remaining 10% comes from spoken language, including transcripts of government meetings, trials, and radio and television shows. In the BNC, there are 3849 uses of ‘voluntary’ and 359 uses of ‘involuntary’. To obtain our sample, we downloaded the entire BNC and used a straightforward text-sorting algorithm to isolate all the uses of each term of interest, along with the context in which each use appeared. With the help of a random number generator, we selected 100 entries from each of these lists. Subsequent examination revealed that two items out of the 200 were duplicates (both concerned uses of ‘involuntary’). These were removed from the analysis. The remaining entries, together with

126

M. Zahorec et al.

lists of 100 ‘key words in context’ (KWICs) comprised our sample corpora for this study, as used them to evaluate Ryle’s three claims by posing the following three questions of each KWIC: (Q1) Is the term of interest used to describe an action or some actions? (Q2) Does the speaker say, suggest, or assume that the agent(s) (either potentially or actually) ought not to have performed the action(s)? (Q3) Does the speaker say, suggest, or assume that the agent(s) are (either potentially or actually) at fault for something? This study followed a simple procedure. Using Qualtrics, we presented each KWIC individually to each of the authors of this paper. Each author then answered each of Q1, Q2, and Q3 on a six-point scale with the response options ‘Not Applicable’, ‘No,’ ‘Probably Not,’ ‘Not sure,’ ‘Probably,’ and ‘Yes’, in reference to the passage in question. All five of the authors completed the survey before the responses were analyzed. Interrater reliability was fair to moderate across the questions and terms, suggesting that collectively we had some difficulty interpreting Ryle’s and applying them to real world examples—a point that we return to below.6 Histograms of responses for each question are shown in Fig. 7.1, and full results are given in the supplemental materials along with the full text of each KWIC we evaluated, indexed by target expression and position in the sample.7 The first things to note from our results are the high number of items that generated negative responses (either ‘No’ or ‘Probably Not’) to Q1, especially for uses of ‘voluntary’, as well as the correspondingly high number of ‘NA’ responses to Q2 and Q3—both questions that presume we are dealing with an action. This suggests that, contrary to Ryle’s initial assumption, these terms are fairly often used to describe events that are not actions. Indeed, we gave predominantly negative responses (i.e., plurality ‘No’ or ‘Probably not’ with less than half ‘Probably’ or

those from the study reported in the next section, are given in full in the supplemental materials. These are numbered by order in our original samples, with the items for ‘voluntary’ running from 1 to 100 and the items for ‘involuntary’ from 101 to 200. Shortened versions are shown in Figs. 7.3 and 7.4. Examples of usage taken from the British National Corpus were obtained under the terms of the BNC End User License. Copyright in the individual texts cited resides with the original IPR holders. For information and licensing conditions relating to the BNC, please see the web site at http://www.natcorp.ox.ac.uk/ 6 Interrater reliability was measured in two ways. First, coding ‘NA’ responses as 0, ‘No’ responses as 1, and so on, the results were treated as interval on a 0–6 scale. Intraclass correlation coefficients with 95% confidence intervals were then calculated for each term and for each question, treating both the items and raters as random effects. Results for ‘voluntary’ were: (Q1) 0.67 [0.59, 0.74], (Q2) 0.53 [0.44, 0.62], (Q3) 0.53 [0.44, 0.62]. Results for ‘involuntary’ were: (Q1) 0.44 [0.34, 0.54], (Q2) 0.52 [0.43, 0.61], (Q3) 0.54 [0.45, 0.63]. Second, we treated responses as categorical, combined negative responses (‘No’ or ‘Probably Not’) and positive responses (‘Yes’ or ‘Probably’). We then calculated Fleiss’s Kappa for each term and for each question. Results for ‘voluntary’ were: (Q1) 0.55, (Q2) 0.52, (Q3) 0.52. Results for ‘involuntary’ were: (Q1) 0.18, (Q2) 0.22, (Q3) 0.21. 7 Supplemental materials available at [https://doi.org/10.1007/978-3-031-28908-8_7].

7 Linguistic Corpora and Ordinary Language: On the Dispute Between Ryle. . .

400

(Q1) Action?

(Q2) Ought not?

(Q3) At fault?

N P Y ob ot S robaes ab ur bl ly e y No t (Q1) Action?

N P Y ob ot S robaes ab ur bl ly e y No t (Q2) Ought not?

N P Y ob ot S robaes ab ur bl ly e y No t (Q3) At fault?

127

300 ‘voluntary’

200

100

0

Pr

o

A

Pr

N

N

N

Pr

o

Pr

N

A

N

N

N

Pr

o

N

A

N

Pr

400

300 ‘involuntary’

200

100

0

A

o

N P Y ob ot S robaes ab ur bl ly e y No t

N

A

o

N

o

A

N

N P Y ob ot S robaes ab ur bl ly e y No t

N P Y ob ot S robaes ab ur bl ly e y No t

Fig. 7.1 Histograms of responses for each of the three questions in our first study (Q1–Q3), broken down by term (‘voluntary’ top, ‘involuntary’ bottom). (Total responses are 500 for ‘voluntary’ (five coders for each of the 100 entries) and 490 for ‘involuntary’ (five coders for each of the 98 entries remaining after removing duplicates))

‘Yes’) to Q1 on 53 of the 100 items for ‘voluntary’ and 23 of the 98 items for ‘involuntary’. While it is unclear just what frequency of use might be countenanced as a ‘few minor elasticities’, each proportion is greater than and significantly different from a conservative noise threshold of 15%.8 Looking at the individual

‘voluntary’: χ2 = 110.3, p < .001, 95% CI [0.43, 0.63]; ‘involuntary’: χ2 = 4.87, p = .027, 95% CI [0.16, 0.33].

8

128

M. Zahorec et al.

items that generated these responses, a few groups stand out. One large group comprises items where ‘voluntary’ is used to say that certain ‘bodies’, ‘sectors’, ‘foundations’, or ‘organizations’ are neither for-profit nor part of the government, such as in the following: [57] The committee includes representatives of local authorities, health authorities, Government departments and the voluntary and private sector. Likewise, an example of a use of ‘involuntary’ that doesn’t apply to an action is the following, in which it is used to name that part of an animal’s nervous system that controls the muscles and glands of its internal organs: [165] Her work showed that the cells migrate to many different sites in the embryo developing into the skeletal elements of the head, all the pigment cells in the body, most of the nerves of the involuntary nervous systems, sensory nerves, and a variety of glands. Given that Ryle’s concern is only with the use of words like ‘voluntary’ and ‘involuntary’ in characterizing actions, we chose to exclude the items with predominantly negative responses to Q1 from the subsequent analysis. However, even after restricting the sample to uses of ‘voluntary’ and ‘involuntary’ that were judged to describe actions, Ryle’s other two specific claims about the ordinary use of ‘voluntary’ were still not supported by our findings. In fact, a majority of the authors answered ‘No’ or ‘Probably Not’ to both Q2 and Q3 for all but one of the remaining 47 items.9 The one exception, for which the authors’ ratings were ambivalent, is still not a clear example of the use of ‘voluntary’ that Ryle seems to have in mind. That example concerns ‘voluntary liquidation’, a process by which the assets of an insolvent entity (person or company) are sold off to pay creditors: [20] For another hour the inquisition continued, almost, I felt, as though the judges were scraping for any dirt they could find. One member asked for details of how my father had gone into voluntary liquidation. Strikingly, many of the examples of ‘voluntary’ in the sample are used in precisely the circumstances which Ryle suggests the term won’t be used—for the word often modifies actions which ought to be done and for which praise is called for, as in the following: [67] When it emerges that she has been doing her voluntary work in York for just 6 years, her enormous commitment becomes clear. Pets as Therapy is, of course, nothing new. But Joan ventures where few other dog owners would dare to tread . . . Other uses characterize actions that are evaluatively neutral, such as:

9

Needless to say, this proportion is above and significantly different from the 15% threshold: χ2 = 246.7, p < .001, 95% CI [0.87, 1.00].

7 Linguistic Corpora and Ordinary Language: On the Dispute Between Ryle. . .

129

[88] Furthermore, the clinical outcome seemed also to be related to the length of prolonged voluntary anal contraction achieved by patients. Overall, based on our sample, it appears that all of Ryle’s three claims about the ‘most ordinary employment’ of ‘voluntary’ are mistaken, at least on a reasonable interpretation of a ‘few minor elasticities’ and with regard to contemporary use. Ryle’s claims about the ordinary use of ‘involuntary’ fared slightly better. Of the 75 uses of ‘involuntary’ that we regarded as modifying actions, a majority of the authors answered ‘No’ or ‘Probably Not’ to both Q2 and Q3 for 44 of them.10 Indeed, there were a total of just six items that received positive responses from each of the authors for both Q2 and Q3, and another five that received majority positive responses to both questions. Five of the six items with unanimously positive responses employed the phrase ‘involuntary manslaughter’ ([106, 117, 119, 125, 126]), while the other expressed a worry from ‘Right-to-Life groups’ concerning the slippery slope leading toward ‘active and involuntary euthanasia’ ([194]). Further, many of the other items that received a majority positive response to Q2 and Q3 also seem to fit Ryle’s claims. For instance, the following sentence is naturally read as an excuse or apology for encroaching on someone’s private space: [111] My intrusion was involuntary, Dr. Vaughan. I can’t have meant to come here if it’s private property. Similarly, [189] describes an ‘involuntary remark’ that was ‘swiftly regretted’, while [198] details a question that ‘was totally involuntary’, noting that the speaker ‘could have bitten her tongue out for asking it’. Nevertheless, a clear majority of uses of ‘involuntary’ in our sample do not align with Ryle’s account. Indeed, as noted above, a majority of the items received a negative response from the majority of the authors for both Q2 and Q3, as in the following examples: [120] All are striking, some are beautiful, others startling and a few may invoke an involuntary shudder . . . [135] Her thoughts turned to Geoffrey Howe, for so long her most faithful lieutenant, and her right leg made an involuntary kicking movement. While each of these sentences might be read as implying that the agent did not explicitly want to do the thing described, none of them seem to require that the agent ought not have done this, nor that it is something for which they should be held at fault. Once again, Ryle’s claims about the ordinary use of ‘involuntary’ simply do not square with how the word is most often employed in our sample.

10 This proportion is, once again, significantly greater than the 15% threshold employed above: χ2 = 108.8, p < .001, 95% CI [0.47, 0.70].

130

M. Zahorec et al.

7.3 Further Claims: Austin’s ‘Special Circumstances’ and Beyond the Received View of Ryle As we discussed in Sect. 7.1, the received view about Ryle’s account of the ordinary use of ‘voluntary’ and ‘involuntary’ is that it is simply mistaken. The results reported in the previous section look like clear support for this. But, as we discussed, the standard criticism fails to notice that Ryle goes on to make several further claims about this family of terms that go beyond what is said in the better-known passage about ‘voluntary’ and ‘involuntary’ that was the focus of Sect. 7.2. And Austin, like Ryle, also makes claims about the use of ‘voluntarily’ and ‘involuntarily’ that stand in need of assessment. This is what we did in our second study, deriving a new set of five questions from Austin’s and Ryle’s further remarks and assessing them in reference to the sample KWICs generated in our first study for the terms ‘voluntary’ and ‘involuntary, as well as two further corpora, compiled from the BNC in the same way as before, of 100 uses each of the terms ‘voluntarily’ and ‘involuntarily’.11 Our first new question was derived from Austin. Above we quoted his different diagnosis of where the philosophical use of the modifying adverbs ‘voluntarily’ and ‘involuntarily’ goes awry. For Austin, the use of such a modifying expression is ‘required or even permissible’ only if the action it describes is done ‘in some special way or circumstances, different from those in which such an act is naturally done’ (Austin, 1957, p. 16). To assess this claim we used the following question: (Q4) In using the term of interest, does the speaker say, imply, or suggest that the agent(s) did the thing in question in some special way or circumstances, different from those in which such an act is naturally done? The remaining four questions in our second study derive from Ryle. Above we cited a neglected passage in The Concept of Mind where Ryle seems to anticipate the possibility of using a word like ‘voluntarily’ to describe an act, like becoming a soldier or going out to sea, that is not obviously something that ought not to be done nor that is someone’s fault. Here is the wider context of that remark, with a few crucial passages highlighted in italics: Very often we oppose things done voluntarily to things suffered under compulsion. Some soldiers are volunteers, others are conscripts; some yachtsmen go out to sea voluntarily, others are carried out to sea by the wind and tide . . . What is involuntary, in this use, is not describable as an act. Being carried out to sea, or being called up, is something that happens to a person, not something which he does . . . So sometimes the question ‘Voluntary or involuntary?’ means ‘Did the person do it, or was it done to him?’; sometimes it presupposes that he did it, but means ‘Did he do it with or without heeding what he was

11 The BNC contains 474 uses of ‘voluntarily’ and 214 uses of ‘involuntarily’. Of the 100 examples of each that were selected, six were found to be duplicates (one for ‘voluntarily’ and five for ‘involuntarily’). These were removed from the analysis. The remaining entries are given in full in the supplemental materials, numbered by order in our original sample (‘voluntary’ items numbered from 201 to 300, ‘involuntary’ items from 301 to 400), and shortened versions are shown in Figs. 7.5 and 7.6.

7 Linguistic Corpora and Ordinary Language: On the Dispute Between Ryle. . .

131

doing?’ or ‘Did he do it on purpose or inadvertently, mechanically, or instinctively, etc.?’ (Ryle, 1949/2009, p. 60; italics added)

The most striking feature of this passage is that Ryle seems to be identifying several different, and possibly non-overlapping, ways that our terms of interest can be used. We took this as a cue to ask the following further questions: (Q5) Does the speaker use the term of interest as a way of clarifying whether the agent(s) did the thing in question on purpose, in contrast with doing it merely by accident? (Q6) Does the speaker use the term of interest as a way of clarifying whether the agent(s) did the thing in question of their own accord, i.e., not under threat or compulsion (by another person or natural forces, for example)? (Q7) Does the speaker use the term of interest as a way of clarifying whether the agent(s) really did the thing in question at all, rather than its being something that merely happened to them? (Q8) Does the speaker use the term of interest as a way of clarifying whether the agent(s) did the thing in question heeding what they were doing, as opposed to doing it inadvertently, mechanically, or instinctively? It is worth pointing out that Q5–Q8 all ask whether the speaker is clarifying matters in regard to some particular contrast (e.g., ‘on purpose’ versus ‘merely by accident’ in Q5). Accordingly, in contrast with Q2–Q4, positive responses to these questions do not differentiate between cases where the speaker is using the term to say what falls on the first side of the contrast (e.g., that the action was done on purpose) and cases where the speaker is using the term to say what falls on the second side of the contrast (e.g., that the action was done by accident). Following the method of our first study, in our second study each of the authors answered these five questions for all the items in our samples of uses of ‘voluntary’ and ‘involuntary’, as well as for the sample uses of ‘voluntarily’ and ‘involuntarily’. As in our first study, each KWIC was presented individually via Qualtrics and the questions were answered using the same six-point scale. Interrater reliability was generally just fair to moderate, and often poor for Q4 and Q7, again indicating the difficulty of interpreting these criteria and applying them to actual examples.12

12 Interrater reliability was measured in the same two ways for our first study. ‘voluntary’: (Q4) ICC = 0.45 [0.36, 0.55], Kappa = 0.16; (Q5) ICC = 0.53 [0.44, 0.62], Kappa = 0.35; (Q6) ICC = 0.68 [0.60, 0.75], Kappa = 0.33; (Q7) ICC = 0.54 [0.45, 0.63], Kappa = 0.32; (Q8) ICC = 0.63 [0.54, 0.71], Kappa = 0.36. ‘involuntary’: (Q4) ICC = 0.35 [0.26, 0.46], Kappa = 0.12; (Q5) ICC = 0.44 [0.35, 0.54], Kappa = 0.22; (Q6) ICC = 0.60 [0.51, 0.68], Kappa = 0.50; (Q7) ICC = 0.27 [0.18, 0.37], Kappa = 0.11; (Q8) ICC = 0.69 [0.61, 0.76], Kappa = 0.47. ‘voluntarily’: (Q4) ICC = 0.22 [0.13, 0.31], Kappa = 0.08; (Q5) ICC = 0.43 [0.34, 0.53], Kappa = 0.28; (Q6) ICC = 0.47 [0.38, 0.57], Kappa = 0.23; (Q7) ICC = 0.16 [0.08, 0.26], Kappa = 0.06; (Q8) ICC = 0.45 [0.36, 0.55], Kappa = 0.25. ‘involuntarily’: (Q4) ICC = 0.44 [0.34, 0.54], Kappa = 0.25; (Q5) ICC = 0.26 [0.17, 0.37], Kappa = 0.05; (Q6) ICC = 0.70 [0.63, 0.77], Kappa = 0.58; (Q7) ICC = 0.18 [0.10, 0.28], Kappa = 0.09; (Q8) ICC = 0.74 [0.67, 0.80], Kappa = 0.50.

132

M. Zahorec et al.

Fig. 7.2 Histograms of responses for each of the five questions in our second study (Q4–Q8), broken down by term (from top to bottom: ‘voluntary’, ‘involuntary’, ‘voluntarily’, ‘involuntarily’) (Total responses are 500 for ‘voluntary’ (five coders for each of the 100 entries), 490 for ‘involuntary’ (five coders for each of the 98 entries remaining after removing duplicates), 495 for ‘voluntarily’ (five coders for each of the 99 non-duplicate entries), and 475 for ‘involuntarily’ (five coders for each of the 95 non-duplicate entries))

Histograms of responses for each question are shown in Fig. 7.2, and breakdowns of responses for each item are given in the supplemental materials. As with Q2 and Q3 from the previous study, our new questions all assume that we’re dealing with actions. It therefore isn’t surprising that we again found a high percentage of ‘NA’ responses for ‘voluntary’ (46.0%), and to a lesser extent ‘involuntary’ (9.3%), across our responses to these new questions. These responses were not evenly distributed, however. Across the two terms, 69 items (out of 198) had more than 30% ‘NA’ responses, accounting for 88.6% of the ‘NA’ responses overall. And, as expected, these were predominantly the items we classified as nonactions in Study 1 (88.4%). By contrast, ‘NA’ responses were rare for ‘voluntarily’

7 Linguistic Corpora and Ordinary Language: On the Dispute Between Ryle. . .

133

(0.4%) and ‘involuntarily’ (0.5%), suggesting that these terms almost always modify actions. (Given that these are adverbs, this is not surprising.) For the subsequent analysis, we restricted our samples to items that were classified as actions according to Q1 in our first study and then received less than one-third ‘NA’ responses to Q4–8.13 Our first question in the present study is Q4, which concerns Austin’s claim about ‘voluntarily’ and ‘involuntarily’ being used only to describe things that are done in ‘some special way or circumstances’. Only 20.2% of the uses of ‘voluntarily’ and 46.3% of the uses of ‘involuntarily’ in our sample were judged by a majority to satisfy Austin’s claim. Thus, the majority of the items for each term were not clearly used in the way that Austin claims is needed for the ‘modifying expression [to be] called for, or even in order’.14 Here are two examples that did seem to match Austin’s description, each receiving unanimous affirmative answers to Q4: [281] And the only examination that I ever voluntarily and with malice aforethought . . . failed . . . was the Bank examination . . . . [321] Once more he raised his arm involuntarily, as if in greeting. Arguably, these two sentences are examples in which the terms of interest are used in reference to acts that were done in some special way or circumstances—namely, failing an exam on purpose rather than, as is typical, doing so despite attempting to pass; or raising one’s arm, but not in the ‘usual’ way of waving hello. Alongside these Austin-friendly examples, however, many more of the occurrences in our sample used our terms of interest to describe actions that did not seem to be done in a special way or circumstances, given the kinds of actions that they are. (As Austin writes, ‘both the normal and the abnormal differ according to what verb in particular is in question’ (Austin, 1957, p. 16)). Consider shuddering, shivering, or trembling, for example: [343] Mait shuddered involuntarily. [350] She shivered involuntarily, a reaction prompted by something other than cold. [398] Quite involuntarily Isabel began to tremble. We will consider examples like these in more detail below in discussing Q8, but the thing to emphasize for present purposes is that what ‘involuntarily’ seems to be indicating in these examples is only that the act in question was a mere bodily

13 This left us with 40 items for ‘voluntary’ and 74 items for ‘involuntary’. Other than the duplicate items that were removed as described in Footnote 6, no further items were removed for ‘voluntarily’ or ‘involuntarily’, leaving us with the original set of 99 unique items for the former and 95 for the latter. 14 Both proportions are, of course, above and significantly different from the 15% threshold used previously: χ2 = 321.0, p < .001, 95% CI [0.70, 0.87]; χ2 = 108.5, p < .001, 95% CI [0.43, 0.64].

134

M. Zahorec et al.

response (that is, that it was done ‘inadvertently, mechanically, or instinctively’, as Ryle puts it). And, of course, this is exactly what is typically the case for acts like shuddering, shivering, and trembling. These therefore look like counterexamples to Austin’s claim that ‘involuntarily’ is only used to describe actions done in ‘some special way or circumstances, different from those in which such an act is normally done’. Still, charity suggests thinking further about these cases, as there is an argument to be made that these uses of ‘involuntarily’ are not simply redundant or inelegant, but instead are being used to convey the idea that the bodily response in question stems from some unconscious fear or desire.15 If this is correct, then perhaps these aren’t counterexamples to Austin’s claim after all: for example, the ‘normal’ way to tremble is in response to something fearful, rather than to underlying romantic desire. Yet this only leads to a more serious problem for Austin’s claim, namely that often it is totally unclear what counts as a ‘special way or circumstance’, given the difficulty of saying what counts as the normal way of, or circumstances for, doing the kind of thing in question. Consider, for example, the following: [213] Maj. Gen. Rodrigo Sanchez Casillas became the new Army Chief of Staff and Brig.-Gen. Garin Aguirre became the new Army Inspector General both replacing those who had ‘voluntarily resigned’ over the La Cutufa affair. What are the normal conditions for an officer to resign? One might expect that sometimes officers resign under pressure or out of necessity, and on other occasions they do so freely and without undue pressure, perhaps to take a private sector job or make a political statement. But since it’s not obvious which of these possibilities should be counted as ‘special’, it is hard to say how Q4 should be answered in relation to it. The difficulty of applying Austin’s description to examples of ‘voluntarily’ and ‘involuntarily’ was a common one. Indeed, nearly half of our responses to Q4 for ‘voluntarily’ (49.9%) and a third of our responses for ‘involuntarily’ (32.8%) were ‘Not Sure’. Not surprisingly, across the two terms only a minority of items (32.8%) secured majority agreement to Q4, and even fewer found us agreeing unanimously (8.2%). And although Austin only mentions this principle with regard to the adverbs ‘voluntarily’ and ‘involuntarily’, it is worth noting that we encountered the same difficulty with ‘voluntary’ and ‘involuntary’ as well: 31.8% of the judgments for these items were ‘Not Sure’, and just 28.1% of items secured majority agreement (2.6% unanimous). The rest of the questions that we posed in this second study are drawn from Ryle’s distinctions between different ways that our terms of interest can be used. In contrast with the claim that we drew from Austin, Ryle’s claims generated distinctions that were more readily applied to the sample. Thus, while we were

15 Specifically, in [343] the shudder is a response to a remark about being close to open water, in [350] the shiver is a response to the unexpected silence of an empty castle, and in [398] the trembling is a response to the ‘low laugh’ of a seducer.

7 Linguistic Corpora and Ordinary Language: On the Dispute Between Ryle. . .

135

very often uncertain in our responses to Q4, this happened far less frequently for the remaining four questions, as we chose the response ‘Not sure’ just 8.9% of the time overall. Not only were we generally able to apply Ryle’s distinctions, but each seemed to clearly apply to some of the uses in our sample. Nonetheless, as we will see, these distinctions don’t apply equally clearly to each use, nor do they collectively exhaust the variety of ways the terms of interest are used—indeed, far from it. As is clear from Fig. 7.2, the questions about compulsion (Q6) and whether something was done inadvertently, mechanically, or instinctively (Q8) produced the most positive responses. A majority of us responded ‘Yes’ or ‘Probably’ to Q6 for 47.7% of items and to Q8 for 31.8% of items, but Q5 (on purpose vs. by accident) and Q7 (something really ‘done’ at all?) received majority positive responses for only 8.1% and 14.9% of the items, respectively. Interestingly, however, our positive responses to Q6 and Q8 were associated with different sets of terms, as Q6 tended to be answered positively in connection with the use of ‘voluntary’ and ‘voluntarily’, while for Q8 the positive responses were primarily for uses of ‘involuntary’ and ‘involuntarily’ instead. Specifically, we found that while a majority of us responded positively to Q6 for 67.5% of the uses of ‘voluntary’ and 85.9% of the uses of ‘voluntarily’, a majority of us gave positive responses for only 24.0% of uses of ‘involuntary’ and 17.9% of uses of ‘involuntarily’. The inverse was found for Q8, as a majority of us responded positively for just 2.5% of uses of ‘voluntary’ and 1.0% of uses of ‘voluntarily’, compared to 46.7% of uses of ‘involuntary’ and 64.2% of uses of ‘involuntarily’. As we discuss in Sect. 7.5, this finding anticipates a point that Austin makes about the use of the adverbs ‘voluntarily’ and ‘involuntarily’: that in their ordinary use these words ‘are not opposed in the obvious sort of way that they are made to be in philosophy or jurisprudence’, and so ‘in spite of their apparent connexion, are fish from very different kettles’ (Austin, 1957, p. 17). With regard to the question of compulsion, Ryle draws the contrast, as framed in Q6, between things that are done and things that are suffered under compulsion, suggesting that we sometimes use ‘voluntarily’ (and ‘voluntary’) to mark the first category, and ‘involuntarily’ (and ‘involuntary’) to mark the second. And our study did turn up a number of examples that were judged to fit this characterization. For instance, consider the following uses of ‘voluntarily’: [252] . . . the Cabinet approved a plan which called on all groups and individuals voluntarily to hand over small-calibre weapons by mid-March. [260] . . . members dissatisfied with their union can voluntarily resign without the threat of losing their job. In each of these passages, the use of the target term seems to describe an agent as having done something of their own accord in a context where we might otherwise assume that this was done only due to coercion or the threat of retaliation. Interestingly, for many of the items where ‘voluntary’ and ‘voluntarily’ were used in this way, the contrast with coercion seems to carry with it a suggestion of pressure, sometimes even serving as a direct threat—i.e., that if the agent does not do the

136

M. Zahorec et al.

thing in question ‘voluntarily’ then the desired outcome will be brought about in some other way. This is clearly the case for [252] above, which goes on to say that ‘the government threatened that after the deadline it would launch a campaign forcibly to collect the weapons, and warned that the “severest of penalties” would be inflicted on those violating the order’. On the other side of the distinction, we also find many examples of ‘involuntarily’ and ‘involuntary’ being used to mark things that a person suffered, often through having something done to them by force. For instance, [129] describes the Politburo’s ouster of Khrushchev as ‘the first and only involuntary departure of an established Kremlin leader’, while [315] describes the commitment of the English poet Christopher Smart to St. Luke’s Hospital for Lunatics in terms of his ‘entering the asylum involuntarily’, and [345] concerns a member of a merchant vessel crew who was pressed into service, so that he ‘found himself involuntarily a member of the Royal Navy’. In each case, the suggestion is not simply that some pressure or coercion was brought to bear on a person in a way that influenced their choice of action; rather, ‘involuntarily’ is used to indicate that something was forcibly done to them: Khrushchev was removed, Smart imprisoned, Lyell impressed. In this context, it is worth revisiting Q7, which was intended to track Ryle’s appeal to the contrast between ‘something that happens to a person, not something which he does’. While our initial expectation was that answers to this question would largely align with answers to Q6, in fact this is not what we found: a majority of us answered both Q6 and Q7 positively for only 11 total items, or 3.6% of our sample. Instead, responses to Q7 were much more aligned with those to Q8, which concerns the distinction between doing something ‘while heeding what [one was] doing, as opposed to doing it inadvertently, mechanically, or instinctively’. In particular, many of the items that generated positive responses to Q7 concerned bodily acts like shivering, shuddering, and trembling, while others concerned things that (in an ordinary sense of this phrase) were clearly done to people, often by a government agency or bureaucracy and described in a way that minimized responsibility. Examples include discussions of involuntary repatriation [148, 175] and involuntary reception into care [133, 153]. Indeed, only one item seemed to squarely fit with Ryle’s example of the yachtsmen carried out to sea by the wind and tide: [319] The pilot stated that as the aircraft rose above the treeline, at about 150 ft above the ground it involuntarily banked to the right, and despite maintaining the climb speed he could not prevent the roll to the right which continued past ninety degrees of bank. This suggests that our target terms are not generally used to specifically mark whether something merely happened to a person, but that they are much more frequently employed with regard to either whether something was done to a person or whether a person did something but without conscious intent. Turning finally to Q8—which contrasts voluntary action with things that are done inadvertently, mechanically, or instinctively—our results are essentially the reverse of what we saw for Q6: while the terms ‘involuntary’ and ‘involuntarily’ are very

7 Linguistic Corpora and Ordinary Language: On the Dispute Between Ryle. . .

137

often used to draw the distinction that is framed in Q8, the terms ‘voluntary’ and ‘voluntarily’ are used in this way much less often. Indeed, only two items in our samples for ‘voluntary’ and ‘voluntarily’ received majority positive responses to Q8. The first is an item, noted earlier, which describes medical research on the use of biofeedback to treat incontinence: [88] Furthermore, the clinical outcome seemed also to be related to the length of prolonged voluntary anal contraction achieved by patients. The second describes work on hypnosis, emphasizing that ‘the hypnotised subject is not a will-less automaton’: [264] A subject may take up the hypnotic suggestion that he is unable to bend his arm: He is actively, deliberately, voluntarily keeping his elbow stiff while simultaneously orchestrating for himself the illusion that he is really trying his best to bend it. The target expressions in these examples are plausibly intended to emphasize that the agent was doing the thing in question deliberately or on purpose—contrary, as Austin might have added, to how it is typically done (as in the first item), or to how it would have appeared to a naive observer (as in the second). In items [88] and [264], the point of using ‘voluntary’ and ‘voluntarily’ seems to be to say that the act in question was not merely a physiological response or incapacity, but rather something that fell within the agent’s control. Many of the uses of ‘involuntary’ that generated favorable responses to Q8 seemed to concern a similar distinction, such as describing shivering as ‘a form of involuntary muscular action [that] raises the metabolic rate and elevates body temperature’ in [101] or speaking of ‘uncontrollable involuntary bodily movements’ in [155]. Further, numerous uses of ‘involuntarily’ also seemed to evoke this distinction—among them, the shivering described in [350] that was unrelated to the cold, the description in [312] of a man’s ‘hands clenching involuntarily’ after a shock, the discussion in [394] of the range of stress symptoms resulting when ‘muscles are involuntarily clenched’, and the following first-person narrative: [351] Coming down the slippery track, I stumble. Involuntarily I reach out my arm. This last item describes a bodily movement where the contrasts drawn in Q6 and Q8 are both salient: the movement in question is performed ‘involuntarily’ insofar as it is inadvertent or mechanical, rather than deliberate or purposive. However, for many of the bodily movements at issue, this contrast is not nearly so salient. For instance, as noted above acts like shivering, shuddering, and trembling are not generally done on purpose: indeed, there is a sense in which ‘shivering on purpose’, for example, would not be to actually shiver, but to fake it. In closing our discussion of the results for our second set of questions, we want to note two further reasons for thinking that Ryle’s characterization of the different ways of using our terms of interest is less than fully adequate. First, we found that 10.4% of our items failed to elicit majority favorable responses to

138

M. Zahorec et al.

any of the four questions based on his remarks, suggesting that they involve uses of our terms of interest that Ryle’s characterizations fail to capture. While this included items for each of the four terms, the largest subset was for ‘voluntary’, with 30% of the actional uses failing to receive a majority favorable response to any of our questions. These items included multiple instances of the phrases ‘voluntary workers’ ([46, 47]) and ‘voluntary work’ ([67]), as well as ‘voluntary contributions’ ([10]), ‘voluntary dog walkers’ ([30]), ‘voluntary helpers’ ([82]), and ‘voluntary assistance’ ([87]), among related uses. In these items, the point of describing the acts in question as ‘voluntary’ seems to be to say that they are things that the agents in question volunteered to do, as opposed to doing this for pay or as part of an official capacity. Ryle’s description of the many different ways that ‘voluntary’ can be used leaves out this possibility. Second, within those items that did elicit majority positive responses to any given question, there was often notable heterogeneity in what the terms of interest seemed to be used to say. Consider for example the following four sentences, each of which received mainly positive responses to Q5: [104] Tics are involuntary movements. Like Martial and Rabelais, Mozart’s lavatory humour in his letters, poems, and canons (for example, Leck mich im Arsch: lick my arse, K231) was not involuntary but intentional. [374] Beetles had fed on the pollen of cycads and they were among the first to transfer their attentions to the early flowers like those of magnolias and waterlilies. As they moved from one to another, they collected meals of pollen and paid for them by becoming covered in excess pollen which they involuntarily delivered to the next flower they visited. [88] Furthermore, the clinical outcome seemed also to be related to the length of prolonged voluntary anal contraction achieved by patients. [253] Taking these several points in combination we come to the particular significance of the shedding of blood in the ritual of circumcision. The belief was that to ritually and voluntarily—and I stress the word voluntarily—shed one’s own blood was to recommend oneself to and establish a link with the Creator of the Universe, and this is precisely what happened with circumcision. There are important differences between these examples. In [104] and [88] a contrast is drawn between an involuntary movement and an action that involves the agent’s conscious control, but in [374], the involuntary action (pollination) is something that is done inadvertently, as a by-product of the activity of feeding on pollen, while in [253] the contrast is between freely choosing to do something and being coerced into doing it by some outside power (choosing to shed one’s own blood freely versus ‘suffering’ the procedure or being coerced into it). As such, it seems that the distinction drawn in Q5 was insufficient on its own to distinguish between these different uses, as it cross-cut some other important differences in how our terms of

7 Linguistic Corpora and Ordinary Language: On the Dispute Between Ryle. . .

139

interest are used. The penultimate section of our chapter presents a further analysis that we conducted in order to see whether further differences like these could be brought out by looking at our responses taken in aggregate, in order thereby to provide a more comprehensive and systematic overview of the ways that our terms of interest are ordinarily used.

7.4 Categorizing Uses In looking at actual examples of how people use the terms ‘voluntary’, ‘involuntary’, ‘voluntarily’, and ‘involuntarily’ in the past two sections, we’ve seen that while Ryle’s and Austin’s remarks pick up on some facets of the ordinary use of these terms, they are neither wholly accurate nor do they tell the full story. In many instances we found uses that were plausibly related to distinctions they drew, but nevertheless did not fit them squarely. In others we found uses that weren’t clearly related to these distinctions at all. Finally, in yet other cases we found ourselves unsure of how to respond to various items, or in disagreement amongst ourselves in how we did respond. In order to help reveal some deeper order in these response patterns, our final study used the technique of cluster analysis to group the items in our samples together based on our responses in our first two studies. Cluster analysis includes a broad range of statistical procedures that aim to group items together in a way that minimizes the differences between items in a group (or ‘cluster’) and maximizes the differences between groups, all with regard to some relevant set of measurements.16 For our purposes, the items we wanted to cluster are our sample KWICs for ‘voluntary’, ‘involuntary’, ‘voluntarily’, and ‘involuntarily’, and the measurements are the mean responses to the questions that were presented in our first two studies. To minimize ‘NA’ responses, we used the restricted set of items discussed at the start of Sect. 7.3, with any remaining ‘NA’ responses excluded from the calculation of the means. Given the scope of the restricted set of items, we excluded Q1 from the analysis of ‘voluntary’ and ‘involuntary’, and given the close correlation between Q2 and Q3, these questions were combined into a single dimension. Further, given the difficulties with applying Q4 that we described earlier, this question was excluded for all terms. For the analysis we employed agglomerative hierarchical clustering using Ward’s method with Euclidean distance.17 To aid the interpretation process, cluster dendrograms

16 For

examples of the use of cluster analysis in experimental philosophy, as well as further discussion of the method, see Levine et al. (2021), Fischer and Sytsma (2021), Sytsma and Snater (Forthcoming), and Woike et al. (2020). See Reuter et al. (manuscript) for another example of the use of cluster analysis in looking at linguistic corpora. 17 See Sytsma and Snater (Forthcoming) for a more detailed description of the procedure. As described there, if the analysis is being used for hypothesis testing, it is good practice to compare multiple clustering methods to test robustness. Given that our goal here was instead exploratory, aiming merely to help organize the items in a way that would be fruitful, we instead took a ‘proof

140

M. Zahorec et al.

Fig. 7.3 Annotated dendrogram for agglomerative hierarchical clustering using Ward’s method with Euclidean distance for the restricted set of 40 items for ‘voluntary’ (items classified as actions in our first study and receiving less than one-third ‘NA’ responses in our second study)

were generated separately for each of our four terms of interest. The dendrograms that were generated by this process, overlaid with the conceptual distinctions that we identified, are shown in Figs. 7.3, 7.4, 7.5, and 7.6.18 Our goal in performing the cluster analysis was to uncover some structure from the complicated variation in our judgments about the target expressions. To this end, after generating the dendograms we went on to explore them qualitatively, looking for semantic unity within the different uses that were clustered together and offering our own interpretation of the conceptual distinctions that these clusters suggest. The analysis performed admirably in this capacity, highlighting both some broad differences in the use of our target terms as well as some more subtle distinctions.

is in the pudding’ approach, and as we’ll see our default setting produced intelligible clusterings of KWICs. 18 Items are labeled with their number from the original sample and abbreviated KWIC. Color versions of the figures are available in the supplemental materials. Cluster labels corresponds with our interpretation of the conceptual distinctions captured by the cluster analysis for each of our four target terms, showing sub-distinctions where they could be discerned.

7 Linguistic Corpora and Ordinary Language: On the Dispute Between Ryle. . .

141

Fig. 7.4 Annotated dendrogram for agglomerative hierarchical clustering using Ward’s method with Euclidean distance for the restricted set of 74 items for ‘involuntary’ (items classified as actions in our first study and receiving less than one-third ‘NA’ responses in our second study)

Consider, as a first example, the clusters labeled (1) in the dendrograms, which tend to concern examples of things that were done on a volunteer basis rather than for the sake of compensation or reward, including volunteer labor and charitable donations. This is the use that we identified at the end of Sect. 7.3 as being overlooked in Ryle’s analysis, and a significant number of uses of the modifiers ‘voluntary’ and ‘voluntarily’, as seen in Figs. 7.3 and 7.5, were found to employ it. Notably, however, we did not find any examples at all of this use in connection

142

M. Zahorec et al.

Fig. 7.5 Annotated dendrogram for agglomerative hierarchical clustering using Ward’s method with Euclidean distance for the 99 non-duplicate items for ‘voluntarily’

with ‘involuntary’ and ‘involuntarily’—another point in favor of Austin’s advice not to treat these pairs as simple opposites. (For example, the opposite of ‘voluntary assistance’ [87] is not ‘involuntary assistance’, but paid assistance.) The second family of uses we identified—those labeled (2) in the dendrograms— has some presence among each of the four terms we investigated, but appears

7 Linguistic Corpora and Ordinary Language: On the Dispute Between Ryle. . .

143

Fig. 7.6 Annotated dendrogram for agglomerative hierarchical clustering using Ward’s method with Euclidean distance for the 99 non-duplicate items for ‘involuntarily’

far more commonly in connection with ‘involuntary’ and ‘involuntarily’ than ‘voluntary’ and ‘voluntarily’. These uses have to do with what we have called a physiological notion of voluntariness, centering on whether an action or movement is in some way automatic rather than under conscious control. Within the clusters that exemplified this use we found a further distinction, not always sharp, between how ‘actional’ the behaviors are. Thus, in one sub-category of uses, identified in our clusters as (2a), we find that the term of interest modifies mere bodily movements,

144

M. Zahorec et al.

as in ‘involuntary spasms’ [190] or ‘involuntary muscular action’ [101], as well as cases of shivering, shuddering, or trembling as discussed above. By contrast, in the second sub-category, identified as (2b), the examples tend to involve what are more naturally described as actions, such as in an ‘involuntary yell of alarm’ [173] or ‘involuntary sobs’ [163]. As we discuss in Sect. 7.5, this is the use of our terms of interest that arguably corresponds most closely to the philosophical notion of voluntariness as the basis of debates about free will. We therefore found it notable that it appears in our sample in connection with a relatively small range of descriptions, and—as observed above—hardly at all in the use of ‘voluntary’ and ‘voluntarily’. The third family of uses we identified corresponds roughly to Ryle’s distinction between ‘things done voluntarily’ and ‘things suffered under compulsion’, though we found this specific distinction to be more of an endpoint along a spectrum. Perhaps the furthest from Ryle’s description are a set of cases in (3a) that include phrases like ‘voluntarily unemployed’ [291], ‘involuntary childlessness’ [195], and ‘voluntary abstinence’ [69], where there does not seem to be any implication that the actions in question might have been compelled. Rather, by our lights the purpose of our terms of interest in these cases is to specify whether the agent chose to act in the way in question despite having been able to do something else, where it might have been expected that they would choose this alternative instead. The second set of choices we distinguished, in (3b), involves a reference to a codified prescriptive norm such as a rule or regulation. That is to say, these items typically concern whether the action in question was done because it was mandated or required—for example, describing a certain program as a ‘democratically-determined voluntary levy’ [49] or discussing whether the restaurant industry would ‘voluntarily list and explain’ the ingredients in their meals [290]. Often the items that fell under this subclass involve actions that we’d usually consider good, and they are most often described using ‘voluntary’ or ‘voluntarily’—in some cases as a way of highlighting that credit is deserved, in others as a way of noting that something wouldn’t be done unless further pressure was brought to bear (e.g., ‘farmers were not going voluntarily to raise wages’ [201]). Third, in (3c) we found items that seemed to make reference to some form of explicit pressure or coercion. Relatively few examples of this use were explicitly identified in our clusters, but we can think of this use as contrasting with the one just discussed. Indeed, we find an example of just such an explicit contrast, with a hypothetical payment being described as ‘voluntary if no improper pressure was brought to bear, and involuntary if it was’ [149]. Finally, we have a larger set of uses classified under (3d) that correspond with what Ryle calls the difference between suffering something, or having it done to one, rather than doing it in a strict sense: for example, the point of saying that a duke ‘was voluntarily disrobed’ [236] is to describe this as his own act rather than an act that he suffered, while ‘involuntary reception into care’ [153] means being put into care rather than going there on one’s own. The last use of our terms of interest that we identified in our clusters—those labeled under (4) in our dendrograms—seem to center around whether the thing in question was done intentionally or not. The uses of ‘involuntary manslaughter’

7 Linguistic Corpora and Ordinary Language: On the Dispute Between Ryle. . .

145

tended to fall under this category, as was the description in [111] of an accidental intrusion. This use is made quite explicit in a legal discussion of whether certain payments ‘were made voluntarily in the sense of being made to close the transaction’ [255], and was also displayed in the recollection of having ‘voluntarily [thrown] away something that promised to be special, and very, very wonderful’ [267]. There are, of course, many points at which some of these uses shade off into other ones, and the items that we have highlighted in our figures do not all correspond exactly to a given use. Still, we are struck by the extent to which these distinctions in use are borne out in the results of our cluster analyses, which seem to us therefore to speak in favor of the utility of this method as a tool for the exploration of ordinary meaning. To help validate the insights we drew from the cluster analyses, we employed a further technique from corpus linguistics, looking at where our terms of interest were located in a semantic space built from another common, general-purpose corpora (Corpus of Contemporary American English). We used the best performing distributional semantic model from Sytsma et al. (2019) to check the nearest neighbors of our target terms in the semantic space—the terms that the model says are closest in meaning. The results were striking, with the most synonymous terms generated suggesting the dominant categories we arrived at. For instance, the nearest neighbor for ‘voluntary’ was ‘mandatory’, suggesting (3b) from our classification, which was the largest identified cluster. Similarly, the nearest neighbor for ‘involuntarily’ was ‘reflexively’, suggesting (2) from our classification, which again was the largest identified cluster.19

19 See

Sytsma et al. (2019) for a further explanation of distributional semantic models and the semantic space used. The 20 nearest neighbors for each term of interest in terms of cosine (given in parentheses) are as follows. ‘voluntary’: mandatory (0.73), mandated (0.67), mandate (0.64), voluntarily (0.63), compulsory (0.63), compliance (0.63), comply (0.60), require (0.60), discriminatory (0.59), prohibit (0.59), implement (0.59), licensure (0.59), exempt (0.58), restrictive (0.58), participation (0.58), encourage (0.58), participate (0.58), requirement (0.57), stringent (0.57), cessation (0.57); ‘involuntary’: involuntarily (0.69), convulsive (0.61), spasmodic (0.60), manslaughter (0.59), forcible (0.57), induce (0.55), spasm (0.55), uncontrollable (0.54), agonized (0.54), convulsion (0.53), nausea (0.53), voluntary (0.53), bodily (0.52), breathlessness (0.52), gagging (0.52), tetany (0.52), immobility (0.51), asphyxia (0.51), peristaltic (0.51), forced (0.51); ‘voluntarily’: refuse (0.74), obligate (0.67), consent (0.65), permission (0.64), willingly (0.64), voluntary (0.63), legally (0.63), lawfully (0.61), allow (0.61), permit (0.61), request (0.61), unwilling (0.58), reluctant (0.57), reluctantly (0.56), coerce (0.56), subsequently (0.56), comply (0.56), illegally (0.56), notify (0.56), tacitly (0.55); ‘involuntarily’: involuntary (0.69), reflexively (0.63), volition (0.58), immobilize (0.57), shudder (0.57), weakly (0.58), paralyzed (0.56), flinch (0.56), numb (0.56), revolted (0.56), spasm (0.55), convulse (0.55), uncontrollably (0.55), contort (0.55), numbness (0.54), violently (0.54), convulsive (0.54), unconscious (0.54), unconsciousness (0.53), immediately (0.53).

146

M. Zahorec et al.

7.5 Conclusion Our close look at the sample of ordinary use drawn from the BNC has taken us far beyond Mates’s diagnosis of the supposed disagreement between Ryle and Austin about the use of ‘voluntary’, ‘involuntary’, ‘voluntarily’, and ‘involuntarily’. But we have vindicated Mates’s general worry that ordinary language philosophy should not be practiced without a systematic survey of the way language is ordinarily used, as we have seen how both Ryle and Austin overlook some types of ordinary uses of these expressions and how that leads to incomplete general observations about the features of ordinary language. For example, Austin warns against treating ‘voluntary’ and ‘involuntary’ and ‘voluntarily’ and ‘involuntarily’ as simple opposites. While we have identified one specific instance in which that observation holds true—the opposite of ‘voluntary assistance’ [87] is not ‘involuntary assistance’, but assistance that is paid—it clearly does not apply to all examples of the ordinary use of these expressions: the opposite of ‘voluntary anal contraction’ [88] is ‘involuntary anal contraction’. This physiological use of ‘voluntary’ and ‘involuntary’ is something both Ryle and Austin seem to fail to notice, or at least seem not to appreciate the significance of Ryle and Austin’s neglect of this use was noted by G. E. M. Anscombe in Intention, where she argues that we should ‘reject a fashionable view of the terms “voluntary” and “involuntary” [according to which] they are appropriately used only when a person has done something untoward’, suggesting that anyone ‘tempted by this view . . . should consider that physiologists are interested in voluntary action, and they are not giving a special technical sense to that word’ (Anscombe, 1963, p. 12). As we have discussed, many of the sentences falling under category (2) in our dendrograms supply clear illustrations of the type of use that Anscombe highlights, revealing that this is a point at which Ryle and Austin miss the significance of one aspect of the ordinary use of these expressions. The fact that both Austin and Ryle overlook the physiological use of these expressions raises a more challenging question about the methodology of ordinary language philosophy, a question on which the authors of this chapter are divided: can the use of these words that Anscombe highlights be drawn on in a defense of traditional philosophical discussions of voluntary action? Consider the following examples of clearly philosophical uses of our terms of interest, both of which appeared in the cluster of physiological uses: [107] What, then, is moral luck? Nagel observes that it is intuitively plausible that people cannot be morally assessed for what is not their fault, or for what is due to factors beyond their control. Of course this makes sense in the case of insanity, automatism or involuntary movement but the range of factors over which one has no control is obviously wider than such clear instances of total lack of control. [144] Notice also that expressive behaviour in this definition does not distinguish between voluntary and involuntary behaviour. If I jump with surprise when a dog

7 Linguistic Corpora and Ordinary Language: On the Dispute Between Ryle. . .

147

suddenly barks at me, my behaviour is no less expressive than if I shout at it to shut up. It is, we think, significant that both of these uses of ‘involuntary’ in abstract philosophical discussions are clustered alongside physiological uses like ‘involuntary discharge of faeces and urine’ [142] and ‘involuntary muscular action’ [101]. Since, as Anscombe says, the latter use of ‘involuntary’ does not involve giving the word any ‘special technical sense’, then perhaps the same is true of the philosophical uses that clustered with it. If that’s right, then our survey of ordinary use provides the resources for a novel rebuttal that the traditional philosopher can give to the ordinary language philosopher: philosophical use is an extension of one particular sub-type of ordinary use, and it is meaningful to the extent that that sub-type of ordinary use is meaningful. The authors of this chapter are divided over how successful they think this reply to the ordinary language philosopher is. Some of us find it significant that the operative way of using these words is infrequent (only one use in our entire sample for ‘voluntary’, none for ‘involuntarily’, and only a small group each for ‘involuntary’ and ‘voluntarily’), and also that it only appears in connection with a narrow range of descriptions (for discussion, see Schwenkler, Forthcoming). But this is not the place to settle that debate. The essential lesson we wish to draw is the methodological point that the only way to make progress on resolving this kind of debate is by following Wittgenstein’s (1953/2009, §66) advice: if one wishes to see what is common and different among the ways that we use our words, then “don’t think, but look!”

References Anscombe, G. E. M. (1963). Intention. Cornell University Press. Austin, J. L. (1957). A plea for excuses: The presidential address. Proceedings of the Aristotelian Society, 57, 1–30. Baz, A. (2017). The crisis of method in analytic philosophy. Oxford University Press. Bluhm, R. (2016). Corpus analysis in philosophy. In M. Hinton (Ed.), Evidence, experiment and argument in linguistics and philosophy of language (pp. 91–109). Peter Lang. BNC Consortium. (2007). The British National Corpus, XML edition. Oxford Text Archive. http:/ /hdl.handle.net/20.500.12024/2554 Caton, J. N. (2020). Using linguistic corpora as a philosophical tool. Metaphilosophy, 51(1), 51–70. Cavell, S. (1958). Must we mean what we say? Inquiry, 1(1–4), 172–212. Fischer, E. (2019). Linguistic legislation and psycholinguistic experiments: Redeveloping Waismann’s approach. In D. Makovec & S. Shapiro (Eds.), Friedrich Waismann: The open texture of analytic philosophy (pp. 211–241). Palgrave Macmillan. Fischer, E., & Sytsma, J. (2021). Zombie intuitions. Cognition, 215, 104807. Fischer, E., Engelhardt, P. E., Horvath, J., & Ohtani, H. (2021). Experimental ordinary language philosophy: A cross-linguistic study of defeasible default inferences. Synthese, 198, 1029– 1070. Grice, H. P. (1961). The causal theory of perception. Proceedings of the Aristotelian Society, Supplementary Volumes, 35, 121–152.

148

M. Zahorec et al.

Hacker, P. M. S. (1996). Wittgenstein’s place in twentieth-century analytic philosophy (Vol. 9, pp. 243–268). Blackwell. Hanfling, O. (2000). Philosophy and ordinary language: The bent and genius of our tongue. Routledge. Hansen, N. (2017). Must we measure what we mean? Inquiry, 60(8), 785–815. Hansen, N., Porter, J. D., & Francis, K. (2021). A corpus study of “know”: On the verification of philosophers’ frequency claims about language. Episteme, 18(2), 242–268. Levine, S., Rottman, J., Davis, T., O’Neil, E., Stich, S., & Machery, E. (2021). Religious affiliation and conceptions of the moral domain. Social Cognition, 39(1), 139–165. Liao, S.-Y., & Hansen, N. (2022). “Extremely racist” and “incredibly sexist”: An empirical response to the charge of conceptual inflation. Journal of the American Philosophical Association, 1–23. https://doi.org/10.1017/apa.2021.46 Mates, B. (1958). On the verification of statements about ordinary language. Inquiry, 1(1–4), 161– 171. Norris, A. (2017). Becoming who we are: Politics and practical philosophy in the work of Stanley Cavell. Oxford University Press. Reuter, K., Baumgarten, L., & Willemsen, P. (manuscript). Tracing thick and thin concepts through corpora. Unpublished manuscript. http://philsci-archive.pitt.edu/20584/ Russell, B. (1953). The cult of “common usage”. The British Journal for the Philosophy of Science, 3(12), 303–307. Ryle, G. (1949/2009). The concept of mind: 60th anniversary edition. Routledge. Schwenkler, J. (Forthcoming). Knowledge of language as self-knowledge. Inquiry, 1–25. https:// doi.org/10.1080/0020174X.2022.2074888 Stroud, B. (1984). The significance of philosophical scepticism. Clarendon Press. Sytsma, J., & Snater, M. (Forthcoming). Consciousness, phenomenal consciousness, and free will. In P. Henne & S. Murray (Eds.), Advances in experimental philosophy of action. Bloomsbury. Sytsma, J., Bluhm, R., Willemsen, P., & Reuter, K. (2019). Causal attributions and corpus analysis. In E. Fischer & M. Curtis (Eds.), Methodological advances in experimental philosophy (pp. 209–238). Bloomsbury. Ulatowski, J., Weijers, D., & Sytsma, J. (2020). Cognitive science of philosophy symposium: Corpus analysis. The Brains Blog. https://philosophyofbrains.com/2020/12/15/cognitive-scienceof-philosophy-symposium-corpus-analysis.aspx Waismann, F. (1997). The principles of linguistic philosophy. Springer. Wittgenstein, L. (1953/2009). Philosophical investigations (4th ed.). (Hacker & Schulte, Trans.). Wiley-Blackwell. Woike, J. K., Collard, P., & Hood, B. (2020). Putting your money where your self is: Connecting dimensions of closeness and theories of personal identity. PLoS One, 15(2). https://doi.org/ 10.1371/journal.pone.0228271

Michael Zahorec is a graduate student at Florida State University, where he is pursuing a PhD in Philosophy and an MS in Computer Science. He is interested in traditional theoretical problems in most areas of philosophy, especially in philosophy of language and logic. He is also interested in applying empirical methods to advance philosophical discussions, especially when applying those methods gives him a good excuse to learn more about computer science. Robert (Bob) Bishop received his Ph.D. in philosophy at Florida State University. He is now an assistant professor at California State University, San Bernardino, specializing in ancient Greek philosophy and ethics. Nat Hansen is Associate Professor of Philosophy and co-director of the Centre for Cognition Research at the University of Reading. His research concerns contextualism, experimental semantics and pragmatics, the meaning of color terms, and ordinary language philosophy.

7 Linguistic Corpora and Ordinary Language: On the Dispute Between Ryle. . .

149

John Schwenkler is Professor of Philosophy at Florida State University. His research is in the philosophy of mind and action. Justin Sytsma is Associate Professor of Philosophy at Victoria University of Wellington. He does this and that.

Chapter 8

Light in Assessing Color Quality: An Arabic-Spanish Cross-Linguistic Study David Bordonaba-Plou and Laila M. Jreis-Navarro

Abstract The debate about the meaning of color terms in the philosophy of language has been dominated by two main issues. Firstly, there is the discussion about the context-dependency of color terms, specifically, quantity, the degree to which the object is of the color, and one of the dimensions of color quality, hue. Secondly, there is the question of how indexical contextualism can account for these elements of context-dependence. The aim of this chapter is twofold. First, to examine brightness, one of the dimensions of color quality that has been neglected in the literature. For this purpose, we will examine how the equivalent of “white” in Arabic and Spanish interacts with brightness modifiers. The analysis will allow us to distinguish three different usage patterns, light source, light reflection, and contrast between white and black, pointing to three new ways of contextual incompleteness of color terms. Second, to assess indexical theories of color terms considering these new empirical findings. We will argue that the results of our study offer support for indexical contextualist theories by distinguishing new dimensions of contextual incompleteness of color terms. However, the consequences of applying the standard indexical contextualist explanation–hidden-variable analysis– are too strong.

D. Bordonaba-Plou () Departamento de Lógica y Filosofía Teórica, Universidad Complutense de Madrid, Madrid, Spain e-mail: [email protected] L. M. Jreis-Navarro Departamento de Lingüística y Literaturas Hispánicas, Universidad de Zaragoza, Zaragoza, Spain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Bordonaba-Plou (ed.), Experimental Philosophy of Language: Perspectives, Methods, and Prospects, Logic, Argumentation & Reasoning 33, https://doi.org/10.1007/978-3-031-28908-8_8

151

152

D. Bordonaba-Plou and L. M. Jreis-Navarro

8.1 Introduction Indexical proposals have significantly dominated the debate on the meaning and use of color terms1 in the philosophy of language. The starting point of the discussion is, no doubt, the famous and contentious example found in Travis (1997). There, Charles Travis upholds the idea that those sentences that include color terms can be associated with different truth values. Consider the following scenario: Pia’s Japanese maple is full of russet leaves. Believing that green is the color of leaves, she paints them. Returning, she reports, ‘That’s better. The leaves are green now.’ She speaks truth. A botanist friend then phones, seeking green leaves for a study of green-leaf chemistry. ‘The leaves (on my tree) are green,’ Pia says. ‘You can have those.’ But now Pia speaks falsehood (Travis, 1997, p. 89).

As Travis explains, when Pia utters the sentence “The leaves are green” in the painted-leaves scenario, the sentence is true. However, the sentence is false when she utters it in the botanic scenario. Following one of the central tenets of truthconditional pragmatics (see Travis, 1997; Bezuidenhout, 2002; Recanati, 2004, 2010), we can say that the content of “The leaves are green” is semantically underdetermined (see Bach, 1999). In short, there are occasions when the linguistic meaning of an expression or sentence is not sufficient to determine the content that the speaker wants to communicate because the content “depends on an indefinite number of unstated background assumptions, not all of which can be made explicit.” (Bezuidenhout, 1997, p. 105). Indexical theories of color terms (see Szabó, 2001; Rothschild & Segal, 2009; Hansen, 2011) have answered truth-conditional pragmatics, in general, and Travis’ challenge, in particular, treating color terms as indexical expressions. To accommodate the semantic underdetermination, they posit hidden variables in the logical form of color terms to articulate all the circumstances or elements of the conversational setting necessary to determine the truth value of those propositions including color terms. Some of the most cited variables are standards of comparison and the part of the object that possesses the color in question. However, according to Clapp (2012), the explanatory power of indexical contextualism is undermined by the fact that color terms are absolute degree adjectives. In other words, because they are not relative degree adjectives like “tall” or “rich,” they lose one of the strongest sources of contextual variation for explaining truth-value variability: the standard of comparison. For this reason, one of the most widely used indexical contextualist strategies has been to distinguish new dimensions of contextual variation, for example, frames of reference or observation conditions. The aim of this chapter is twofold. First, to examine one of the dimensions associated with color quality: brightness. Kennedy and McNally (2010) distinguishes

1

Although the term most used is “color adjectives,” we will use the more general label “color terms.” We analyze color terms in Arabic, and, in this language, color terms do not always function as adjectives. Moreover, on certain occasions, it is also necessary to consider derived nouns in the analyses.

8 Light in Assessing Color Quality: An Arabic-Spanish Cross-Linguistic Study

153

two readings of color terms, color quantity, i.e., how much the object is the color, and color quality, i.e., how close the object’s color is to a prototype. Color quality comprises three dimensions: hue, brightness, and saturation. Virtually all studies in the philosophy of language are focused on the hue dimension. However, this work explores one of the other two dimensions of color quality that have been ignored: brightness. To fill this gap, we will examine in Arabic and Spanish how a specific color term, “white,”2 interacts with brightness modifiers. Analyses allow us to distinguish three different usage patterns with their own contextual characteristics: light source (something is described as pure white or shining white because it is a light source), light reflection (something is described as pure white or shining white because it reflects light), and contrast between white and black (the most relevant is the contrast between what is described as pure white or shining white and the dark background). These three patterns point to three new ways of contextual incompleteness of color terms,3 i.e., three new features that can make a sentence that includes the term “white” semantically underdetermined. Second, to assess indexical theories of color terms considering these new empirical findings. The indexical contextualist could postulate new hidden variables to accommodate these new ways of contextual incompleteness, but we contend that the price of this maneuver is too costly. By examining one of the three dimensions of color quality, brightness, in two languages, Arabic and Spanish, we have identified three new elements of contextual incompleteness. In considering the other dimensions, hue and saturation, or in considering brightness in other languages, new elements could emerge, which would lead the indexical contextualist to postulate new hidden variables. This is because it is impossible to know how many variables should be included in the logical form of color terms to account for their contextual incompleteness, something that, at the very least, leaves indexical contextualism in need of additional explanations. Most studies that deal with color terms in the philosophy of language do so by studying them in English. However, our study will examine color terms in both Arabic and Spanish. We have decided to adopt a cross-linguistic perspective for the following reasons. Firstly, a broader range of data will better structure the findings. Secondly, a more comprehensive insight into the meaning and use of color terms can be achieved by applying this perspective. If we are interested in analyzing a set of terms and we do so by using only one language, but it turns out that these terms behave radically differently in other languages, then the scope of our findings may be somewhat limited. However, through a cross-linguistic perspective, it is possible to assess and compare the behavior of color terms in two or more languages, thus ascertaining how widespread our findings are. Moreover, the results of a cross-

2

We chose this color because of its proximity to light and brightness, which is why it offered more instances for conducting a far-reaching analysis. 3 Our results apply only to the term “white,” but new studies could reveal that these three dimensions are related to other color terms.

154

D. Bordonaba-Plou and L. M. Jreis-Navarro

linguistic study can be extrapolated to other languages, in our case, for example, by checking if the patterns found are also present in English or any other language. The plan for the chapter is as follows. Section 8.2 reviews indexical contextualist theories on color terms, showing that they all use the same line of argument, the hidden-variable analysis. Section 8.3 is devoted to justifying why we have chosen to focus our analysis on brightness (and brightness modifiers) to the detriment of the other dimensions, hue, and saturation. Section 8.4 explains the methods and materials used in the study. Section 8.5 presents the results of the analyses, emphasizing common usage patterns in both languages. It describes through corpora examples how the color term “white” interacts with different brightness modifiers. Section 8.6 assesses the analyses results and discusses the implications of these findings for indexical contextualist theories on color terms.

8.2 Indexical Theories on Color Terms In the literature, we find different indexical proposals on color terms.4 For example, Szabó (2001) argues that color terms are contextually incomplete predicates, distinguishing two dimensions of contextual incompleteness. First, since color terms are gradable adjectives (Paradis, 2001; Kennedy & McNally, 2005; Kennedy, 2007), it is necessary to know the contextually salient standard of comparison to determine the truth value of any sentence that includes them. In other words, just as to say that John is tall is to say that John is at least as tall as a contextually-determined point on a height scale, to say that John’s shirt is green is to say that John’s shirt is at least as green as a contextually-determined point on a greenness scale. Second, the contextually specifiable part of an object is relevant for assessing the object’s color. For example, if a car is painted black on the outside, but its interior is red, what color is the car, red or black? Depending on the part contextually specified, an utterance of “The car is red” would express a true or a false proposition. To account for these two elements of contextual incompleteness, Szabó argues that the logical form of color terms hides two hidden variables, one for the standard of comparison and one for the contextually specifiable part since he states that the “different dimensions

4

We will not discuss Rothschild and Segal (2009). The authors maintain that color terms are indexical expressions. Their strategy is built, like the indexical theories we will review, on expanding the set of deictic expressions by treating color terms as simple indexicals like “I” or “that.” However, as Clapp (2012) highlights, Rothschild and Segal’s (2009) proposal cannot be considered an indexical theory. In a nutshell, they do not treat color terms like typical indexical expressions because, within their theory, the meaning of color terms are not characters. The extension of characters varies systematically regarding different aspects of the context, but nothing changes extension across contexts according to the axioms defining the meaning of color terms they provide. Besides, “they never describe any sort of character, or ‘recipe,’” (Clapp, 2012, p. 91) of color terms, as is always the case with indexical expressions like “I” or “that.”

8 Light in Assessing Color Quality: An Arabic-Spanish Cross-Linguistic Study

155

of incompleteness correspond to different sorts of variables in the logical form,” (Szabó, 2001, p. 136). Hansen (2011) proposes a more refined hidden-variable indexical proposal. In addition to considering the two variables distinguished in Szabó (2001), he discusses three types of contextual variation, corresponding to three variable types. Firstly, frames of reference, the relevant perspective for evaluating the color. This perspective or frame of reference can be that of the object or the stimulus. In the former, what is essential is the “range of conditions that need to be taken into consideration when identifying an object’s object color” (Hansen, 2011, p. 212), while the latter “requires focusing on how objects look here and now, rather than seeing the color of objects as a function of how they would look in a variety of circumstances” (Hansen, 2011, p. 213). Secondly, observation conditions, for example, “the distance at which an object is observed” (Hansen, 2011, p. 213). Thirdly, there are observer conditions; for example, “variations in macular pigment” (Hansen, 2011, p. 214), can produce disagreements between different observers about the color of an object. Against these two theories, Clapp (2012) argues that color adjectives cannot be treated as indexicals. According to him, color terms are semantically contextinvariant because they are absolute gradable adjectives, that is, gradable adjectives whose scales are closed at both ends. Unlike what happens with relative gradable adjectives such as “tall” or “rich,” where the context determines the relevant standard of comparison, the conventional meaning of absolute gradable adjectives like “full,” “open” or “red” determines the relevant standard of comparison. In other words, just as a glass that contains no liquid at all cannot be emptier, something that is barely red cannot be less red. So, the scale of redness semantically associated with the color term “red” is closed at its negative endpoint. In the same way, just as a glass that is completely full cannot be any fuller, something that is totally red cannot be redder. In other words, the scale of redness semantically associated with “red” is closed at its positive endpoint. Clapp presents empirical evidence of this by indicating that color adjectives can occur felicitously with degree modifiers that are appropriate only for adjectives with negative endpoints, e.g., “barely,” “slightly,” “partially;” with degree modifiers that are appropriate only for adjectives with positive endpoints, e.g., “totally,” “completely,” “perfectly;” and with proportional modifiers that are appropriate for adjectives associated with scales that are closed at both ends, e.g., “mostly” or “two-thirds.” Clapp then argues that color terms are minimum-standard absolute gradable adjectives, i.e., absolute gradable adjectives whose scales are fixed at the low endpoint. For example, to be able to say that a door is open, it is not necessary for the door to be open to a high or medium degree, but simply that it is not completely closed. All that is required is that the door is open to a minimal degree. Following the presupposition accommodation experiments of Syrett et al. (2010), Clapp concludes that competent speakers would reject sentences like “Please hand me the red one” in the presence of two red objects because the two objects would be judged to be red, and thus they would not be able to accommodate the uniqueness presupposition. In the end, this would show that “red” is a minimumstandard absolute gradable adjective.

156

D. Bordonaba-Plou and L. M. Jreis-Navarro

To test Clapp’s (2012) conclusion, Hansen and Chemla (2017) examines empirically whether color terms are relative or absolute gradable adjectives. They ran two different experiments to test it: one based on entailment patterns (see Kennedy, 2007; Kennedy & McNally, 2005) and another on presupposition accommodation (Syrett et al., 2010). The first experiment’s results confirm that color adjectives are more like absolute gradable adjectives than relative gradable adjectives. The results of the second experiment are inconclusive. Hansen and Chemla examine presupposition accommodation considering both a quantitative and a qualitative reading of color adjectives. Concerning the quantitative reading, the participants’ answers suggest that color adjectives can be classified into three different groups: minimumstandard, medium-standard, and maximum-standard absolute gradable adjectives. Concerning the qualitative reading, most responses identify color adjectives as minimum-standard absolute gradable adjectives. In summary, the authors’ findings are consistent to a great extent with Clapp’s (2012) conclusion that color terms are absolute gradable adjectives. Whether color terms are absolute or relative degree adjectives is of the utmost importance. As we have said in the Introduction, Clapp (2012) emphasizes that the explanatory power of indexical contextualism is undermined by the fact that color terms behave as absolute gradable adjectives because the indexical contextualist loses one of the strongest sources of contextual variation in explaining truth-value variability: the standard of comparison. However, as Hansen and Chemla (2017, p. 268) emphasize, “showing that color adjectives lack a context sensitive standard would not show that color adjectives are not semantically context sensitive,” because it will be possible to find other sources of contextual variation. In this regard, our analyses will show that a source of contextual variation in color terms not previously distinguished in the literature has to do with one of the dimensions of color quality: brightness. However, we will first discuss the reasons that led us to investigate this dimension specifically.

8.3 What’s with the Light? The principal constituents or dimensions of color are hue, saturation, tone, and brightness. Hue refers to the spectrum of visible light; saturation, to the purity of a hue in relation to the amount of gray; tone, to the admixture of white or black, with the fully saturated hue in the middle of a scale that ranges from pale to dark; and brightness, to the amount of light (Biggam, 2012, pp. 3–5).5 Philosophy of language studies investigating color terms have only considered three of these four

5

In addition to these dimensions, others, such as texture, moistness/humidity or edibility, have been identified as constituents of the color systems in cultural contexts other than the English-speaking ones (see, for example, Conklin, 1955).

8 Light in Assessing Color Quality: An Arabic-Spanish Cross-Linguistic Study

157

dimensions, hue, brightness, and saturation, referring to them with the expression “COLOR QUALITY” (Kennedy & McNally, 2010, p. 90). Most of the works have considered color quantity, articulating this element of contextual incompleteness in their explanations. Only a few of them (Kennedy & McNally, 2010; Hansen, 2011; Hansen & Chemla, 2017) explicitly mentioned color quality. However, when they analyze color quality, they tend to focus on hue, overlooking saturation and brightness. For example, in Hansen and Chemla (2017), the authors test competent speakers’ intuitions regarding what types of gradable adjectives are color adjectives taking color quality into account. However, their examples only consider hue and overlook the other two dimensions, brightness, and saturation (see Hansen & Chemla, 2017, p. 52). It therefore seems reasonable to suppose that the philosophy of language has paid too much attention to only two features relevant for explaining the meaning and use of color terms, color quantity, and one of the three dimensions that characterize color quality, hue.6 To cover part of the gap left by previous works, we will take brightness as the central dimension guiding our analysis. In the remainder of this section, we will consider some of the possible causes that have led philosophers of language to overlook brightness. Specifically, we will present two reasons making it difficult to assess light in color use. First, light has been a constituent of particular ambiguity in color studies (Biggam, 2007). On one hand, it remains difficult to determine whether this part of color depends on the amount of white of a hue, its paleness/whiteness, or on that of the light on the visualized surface of the objects, its “brightness.” In anthropological studies, based on the Munsell color system, pure light or brightness seems to be equivalent to pure white (MacLaury, 1992, p. 138). In more linguistic approaches, the term “brightness”, understood as the luminosity or reflectivity of a shining referent, is intertwined with its hue meaning, with no clear boundaries (Casson, 1997, p. 227). In Biggam’s (2007) proposal for a standardized terminology, “white” does not refer to a hue but an achromatic tone, equivalent to the chromatic tone “pale,” while “brightness” refers to light-emission, reflectivity, and surface or space illumination. On the other hand, in cross-linguistic and comparative approaches, it has been suggested that a more appropriate term instead of “brightness” could be “lightness” as the first, along with other terms such as “saturation,” “hue,” and “color” itself, are treated as universal conceptual constructs when they are Anglocentric and obscure. Levisen (2019, p. 85) calls for “paying attention to new aspects of visual semantics which are not (exclusively) chromatic.” On the other hand, the distinction might be a question of discipline as, according to MacLaury (1992, p. 139), what 6

Here is a list of the various degree modifiers that appear in the literature: “barely,” “slightly,” “partially,” “totally,” “completely,” “perfectly,” “mostly,” “two-thirds,” “pretty,” “really,” “not so,” “too,” “as x as,” “less x than,” “not x enough,” “half,” “part.” As can be seen, most of them are quantity degree modifiers; only one refers to color quality, “perfectly,” and the other, “pretty,” “really,” “not so,” “too,” “as x as,” “less x than,” “not x enough,” could refer to color quantity or quality.

158

D. Bordonaba-Plou and L. M. Jreis-Navarro

anthropologists refer to as the “brightness” dimension of the Munsell colors is what psychologists call “lightness.” Secondly, and more importantly, from a diachronic perspective, there seems to be a tendency toward a fundamentally chromatic approach to the detriment of brightness. MacLaury (1992) proposes a middle ground approach between universalist and relativist views of color categorization, by signaling the role of brightness in the construction of the hue sequence. In most languages, color categorization is predominantly based on hue. However, many languages emphasize brightness over hue, which hinders the universalistic view. According to him, the increasing preponderance of hue over brightness in the cognitive process is related to an increased attention to distinctiveness: Where attention to similarity is strong, color categories may be constructed primarily in reference to brightness. As distinctiveness is more strongly emphasized, people will elevate hue to the highest focus and will pay less attention to brightness (MacLaury, 1992, p. 161).

Following MacLaury’s anthropological lead, Ronald W. Casson studies the linguistic evolution of color terms from Old English (600–1150) to Modern English, stating that hue became salient in conceptualizations of color in the Middle English period (1150–1500) and that, until then, brightness was the predominant sense of color words. This shifting process from brightness to hue continued in the Modern English period. He justifies this shift as “a response to an increasingly complex color world in the Middle English period”, which translated into a “cognitive refocusing” – borrowing a MacLaury’s term– where “culture members restructured their systems of color categorization.” (Casson, 1997, p. 238). Biggam (2007) studies the ambiguity in the use of the English words “bright” and “brightness” by tracking a list of publications from the twentieth century –including Casson (1997)– on Old English and its previously mentioned preponderance of brightness over hue. She concludes that the argument of this preponderance has been backed by incomplete evidence with poor attention to the contexts in which the color terms’ references are found. Although her improvement of the research methodology is adequate to disprove the previous research accuracy, she only focuses on the gray category –which was the weakest argument in Casson’s study– and does not thoroughly address his evidence on the other colors. In fact, she states that the intention of her study is not “to disagree with the proposal that hue has become the most important aspect of color in English over the centuries.” She even speculates that “chromatic tone forms a logical semantic ‘bridge’ between brightness and hue, and this would be an interesting area of investigation in future research.” However, she maintains that “it is still debatable whether a brightnessdominance remained in Old English. The evidence brought forward to support such a view is fragile.” (Biggam, 2007, p. 185). She does not question the progress from brightness to hue, but she believes that “the process began before Old English, perhaps long before [...].” (Biggam, 2007, p. 186). From these three studies, it seems that the argument on the increased predominance of hue over brightness, whether it is considered regarding the evolution of our cognitive processes or from a historical linguistic perspective, is still resisting but

8 Light in Assessing Color Quality: An Arabic-Spanish Cross-Linguistic Study

159

we must avoid broad generalizations when drawing conclusions in our approaches to the matter; using extensive and representative corpora instead of limited data is also an issue. To sum up, when investigating the meaning and use of color terms, philosophy of language has been focused on examining only one of the three dimensions of color quality, hue. This is not surprising since, as we have said, anthropological and linguistic studies attest to a drift of the importance of brightness toward hue. As stated in the Introduction, we aim to explore brightness by analyzing brightness degree modifiers both in Arabic and Spanish. It is true that we will expose ourselves to problems given the great complexity of brightness. However, our analysis will not be based on the use of limited data, but on huge amounts of data extracted from enormous corpora. For this reason, although we will not be able to make the problems arising from the complexity inherent in any study of brightness disappear, we will be able to reduce them in a way that allows us to examine this unexplored dimension effectively. In the next section, we will set out the methods and materials used in this work.

8.4 Methods and Materials Several authors have recently upheld the use of linguistic corpora to carry out research in philosophy of language (Bluhm, 2013, 2016; Hansen & Chemla, 2015; Hansen et al., 2019; Bordonaba-Plou & Torices, 2020; Caton, 2020; Hinton, 2021; Tallant & Andow, 2020). For the inquiries, we have used two corpora of the TenTen family in Sketch Engine: the Spanish corpus esTenTen18 (Kilgariff & Renau, 2013), and the Arabic corpus arTenTen12 (Arts et al., 2014), which are web-based corpora compiled on the methodology described in Sharoff (2006) and Baroni et al. (2009). We have also adopted a cross-linguistic perspective. Cross-linguistic analyses are common in disciplines like corpus-assisted discourse studies (see, for example, Freake et al., 2011; Taylor, 2014, Nardone, 2018). However, they are an underrepresented methodology in the philosophy of language. Although some works exist (Fischer et al., 2021), they represent just a tiny fraction of the total number, as most use only one language in their analyses and examples, English. We will present some advantages and problems of adopting such a perspective. First, let us look at the advantages. Cross-linguistic approaches are relevant to determining the relationship between language form and language meaning, distinguishing “universal” from language-specific patterns (Raffaelli et al., 2019). In this sense, a cross-linguistic study will help us discover common patterns using “white” with brightness modifiers and discard patterns that could be specific only to one language. As Fischer et al. (2021, p. 1031) highlights, by taking a crosslinguistic perspective, it is possible to show that our findings do not depend on the idiosyncrasies of a particular language. Besides, cross-linguistic studies on a particular set of terms represent a more comprehensive analysis of the meaning and use of those terms. When only one language is considered, it is easy to overlook the

160

D. Bordonaba-Plou and L. M. Jreis-Navarro

differences in how color terms behave in different languages. However, as Freake et al. (2011, p. 29) emphasizes, by employing two different languages in the analysis, it would be possible to uncover the similarities and differences. Now, let us see the problems. The most significant difficulties were raised when using the Spanish corpus esTenTen18 and the Arabic corpus arTenTen12 in Sketch Engine. Two main issues emerged in our inquiries. On one hand, the lists of collocations provided by Sketch Engine differ in type and usefulness. For example, in Spanish, the tool provides lists that are similar to those in English. However, in Arabic, we have fewer lists available, reflecting only grammatical categories and the collocation position (left or right), i.e., there is no modifiers list. This shortcoming of the tool does not provide a complete perspective on the linguistic behavior of the term. On the other hand, the analyses conducted by the tool show different degrees of accuracy in Part of Speech (POS) tagging, failing to identify certain grammatical categories, and neglecting the subtraction of some affixes. These inaccuracies imply that the statistical scores are not as reliable as is desired. This was one of the reasons we decided to carry out an analysis that was not entirely quantitative, and to complement it with qualitative analysis, specifically a close reading of examples. Our analysis focuses on two main topics, brightness degree modifiers, and the objects or substances to which the color terms refer. We used Sketch Engine’s Word Sketch tool to investigate the brightness degree modifiers.7 We chose the following from the different lists provided by the tool8 : 1. n¯as.i (pure):

• abyad. n¯as.i (abyad.: adj. m. white): absolute frequency = 639; Log Dice9 = 6.9. • bayd.a¯ n¯as.i a (bayd.a¯ : adj. f. white): absolute frequency = 1231; Log Dice = 7.6.

7

The search was conducted from September 1 to 7, 2021. Due to the abovementioned problems, we had to manually select the brightness modifiers for the Arabic from all lists provided by the Word Sketch tool. 8 The search criteria were different in Arabic and Spanish because of differences in the behavior of the tool for each language. In Spanish, it was not necessary to specify the gender when searching: for blanco (m.) (white), Word Sketch searches for occurrences of both blanco (m.) and blanca (f.). To find instances of blanco (white) with brightness modifiers we had to search for the term when it functions as a noun because when searching for it as an adjective the tool only finds quantity modifiers. In Arabic, the performance of the tool is much poorer, so we had to search for the adjectives abyad. (m.) and bayd.a¯ (f.) (white) separately because the tool does not return the occurrences of both when you search either of them. We also had to search for bay¯ad. (whiteness), because some Spanish uses of the color term are equivalent to the noun in Arabic, as in bay¯ad. al-mal¯abis/el blanco de la ropa (the whiteness of the clothes). 9 Log Dice expresses “the tendency of two words to co-occur relative to the frequency of these words in the corpus.” It is a “standardized measure operating on a scale with a fixed maximum value of 14, which makes Log Dice directly comparable across different corpora and somewhat preferable to the MI-score and MI2, neither of which have a fixed maximum value.” (Gablasova et al., 2017, p. 164).

8 Light in Assessing Color Quality: An Arabic-Spanish Cross-Linguistic Study

161

• bay¯ad. n¯as.i (bay¯ad.: n. white): absolute frequency = 156; Log Dice = 5.6. 2. Mušriq, s¯a.ti (shining)

• bayd.a¯ mušriqa: absolute frequency = 151; Log Dice = 3.9. • abyad. s¯a.ti : absolute frequency = 47; Log Dice = 2.7. 3. puro (pure): absolute frequency = 2.604; Log Dice = 6.5. 4. reluciente (shining): absolute frequency = 150; Log Dice = 5.7. 5. resplandeciente (dazzling): absolute frequency = 142; Log Dice = 5.6. Then, we conducted a concordance analysis for each modifier, also known as KWIC –for keywords in context– analysis. A KWIC analysis obtains a “list of all of the occurrences of a particular search term in a corpus, presented within the context that they occur in” (Baker, 2006, p. 71). In this way, the researcher can visualize and examine in detail all the occurrences of a given term together with the words to the left and the right surrounding the term. In a nutshell, a KWIC analysis facilitates the detection of recurring usage patterns, as well as other specific usage characteristics of the searched term. As said above, during the KWIC analysis, we pay special attention to the objects or substances to which the color terms refer because, in analyzing them, it is possible to have additional information to determine which aspects of the visual semantics related to the brightness dimension are more usual. For example, if the object to which “white” plus the brightness modifier applies has a reflective surface, or a specific texture, is living matter, or if the object constitutes a light source. In the next section, we will present the analysis carried out in Arabic and Spanish for the color term “white.”

8.5 Analyses We searched for blanco/abyad., bayd.a¯ , bay¯ad. (white), and we chose and analyzed the brightness modifiers described in the previous section. The KWIC analysis show that the most common references in Arabic and Spanish are: • with n¯as.i /puro (pure): structures (furniture, walls), precious stones/minerals (pearl, crystal, ivory), natural objects (clouds, snow, ice, flower, sand, moon), light and light sources, human/animal body (eyes, face,10 bold head/shoulder), animals (pigeon),11 clothing (cloth), food (bread). • with s¯a.ti , mušriq/resplandeciente, reluciente (shining): structures (buildings, walls, furniture, hall, city), natural objects (snow, sand, pools, fur, teeth, skin, flower), light sources (Sun), clothing (dress, buttons), precious stones/minerals

10 This 11 This

term only appeared in Arabic. term only appeared in Arabic.

162

D. Bordonaba-Plou and L. M. Jreis-Navarro

(gems, quartz), spiritual experience (angels, auras), human/animal body (woman, man, beard, teeth, face/horse), liquids (wine),12 paper.

More importantly, the KWIC analysis allowed us to identify three recurrent usage patterns: first, light reflection, pointing out that the object is white because it reflects light; second, light source, which indicates that the object is white because it is a source of light; and third, contrast between light and darkness, showing that the contrast between the object described as pure or shining white and the surrounding darkness is salient in the context. These semantic patterns can be observed with the two sets of modifiers studied, n¯as.i /puro (pure) and s¯a.ti , mušriq/resplandeciente, reluciente (shining). However, differences can be appreciated between the two languages. While the first two patterns can be noted in both languages, the third pattern is exclusive to Spanish. It is worth noting that, in each sentence, we have considered the existence of an additional term that explicitly indicates the presence of the semantic pattern. Below are examples for each of the modifiers and each of the patterns. Let us begin with the examples of the first modifier, n¯as.i /puro (pure), and the first semantic pattern, light reflection: 1) Al-mad¯ına mu˙gat..ta¯ t bi-rid¯a abyad. n¯as.i , yabruqu bar¯ıq-an gˇ am¯ıl-an ma a aši at al-šams. (The city is covered in a pure white robe, shining beautifully in the sunshine). 2) Al entrar te envuelve el blanco puro de sus paredes que deslumbra cuando entra la magnífica luz. (The pure white of its walls surrounds you as you enter, and dazzles when the magnificent light enters).

In 1), “pure white” refers to the snow cover that surrounds the city and which is responsible for the sunlight reflection. Similarly, in 2), the speaker talks of the pure white of the walls; as the light enters the room, they reflect the light and illuminates the room. Note the importance of the objects’ surfaces. In both cases, the objects in question, the snow, and the wall, are light-reflecting materials. Now, two examples of the first group of modifiers, n¯as.i /puro (pure), and the second semantic pattern, light source: 3) Inbataqa l-n¯ur bi-but. muntašir-an bi-bay¯ad. n¯as.i , kašafa an kull m¯a mawˇgu¯ d. ¯ (Slowly, the light emerged, spreading in pure whiteness, revealing all that existed). 4) Damos un gris pálido como primer color, para luego iluminar con blanco puro. (We apply a pale gray as the first color, and then illuminate with pure white). In 3), light and pure white are the same. One extends to the extent that the other does. In 4), the pure white color that the painter applies constitutes the very light of the painting.

12 This

term only appeared in Spanish.

8 Light in Assessing Color Quality: An Arabic-Spanish Cross-Linguistic Study

163

Finally, we present a Spanish example of puro (pure), and the third semantic pattern, contrast between light and darkness: 5) Hola, me encanta la luminosidad que se consigue con el blanco puro contrastado con la madera de color natural. (Hi, I love the luminosity achieved with the pure white contrasted with the natural colored wood).

As seen, in 5), the most relevant feature of 5)’s context is the contrast between the object described as pure white and the colored wood. In this example, the extra element refers directly to the idea of contrast. Now, let us look at the examples of the second group of modifiers, s¯a.ti , mušriq/resplandeciente, reluciente (shining). Let us begin with the first semantic pattern, light reflection:

6) K¯anat azh¯ar al-karaz bayd.a¯ mušriqa ka-l-fid.d.a l-mutala li a tah.ta aši at alšams al-d¯afi a. (The cherry blossoms were shining white like dazzling silver in the warm sunlight). 7) Una construcción gótica decorada en un resplandeciente blanco, que parece brillar gracias al precioso mármol. (A Gothic building decorated in shining white, which seems to shine due to the precious marble).

In 6), the cherry blossoms are shining white, like silver, when bathed in sunlight. In this case, the extra element marking the presence of the light reflection pattern is “dazzling silver in the warm sunlight.” In 7), the walls of the building are shining white but, as indicated, due to the reflection caused by the material they are made of, marble. Now, examples of the second modifier, s¯a.ti , mušriq/resplandeciente, reluciente (shining), and the second pattern, light source: 8) Dahalat al-q¯a a fa-ida¯ bi-h¯a q¯a a bayd.a¯ mušriqa mun¯ıra wa-ˇgam¯ıla. ˘(She entered the ¯ hall, and there was a white hall, shining, luminous, and beautiful). 9) reemplazamos los tonos grises de la ropa de cama por un blanco reluciente. La cama ilumina toda la estancia. (We replaced the gray tones of the bedding with a shining white. The bed lights up the entire room). In 8), the white walls of the hall are painted in such a way that they emit light. Note that, unlike in 1), here we have no reference to another light source. In (1), the expression “with the sunlight” indicates that the city shines because of an external light source. However, in 8), it is the room itself that is luminous. In the same way, in (9), the shining white of the sheets lights up the whole room, i.e., it is the source of light.

164

D. Bordonaba-Plou and L. M. Jreis-Navarro

The examples so far show the presence of the first two semantic patterns, light reflection and light source, for the two groups of modifiers in the two languages. Besides, they show that the third pattern, contrast between light and darkness, can be observed only in Spanish for the first group of modifiers. As we will see below, the same is true for the second group of modifiers. Although we found an example in Arabic that potentially looked like an instance of the pattern, as we shall see, we cannot consider it as such. Consider the following two examples:

10) k¯anat la-hum asn¯an-an wa- uy¯un-an tabd¯u bayd.a¯ bi-lam a s¯a.ti a alà halfiyyat wuˇgu¯ hi-him al-sawd¯a ma a mal¯amih. ifr¯ıqiyya. ˘ (They had teeth and eyes that looked white, glittering and shining against the background of their black faces, bearing African features). 11) [Lorca] exigía “la simplicidad del escenario y que fuera un blanco resplandeciente”. El blanco contrasta con el negro del luto que visten. ([Lorca] demanded “a stage that was simple and shining white”. The white contrasts with the black of the mourning they wear).

As can be seen, in 11), the stage’s shining white contrasts with the actors’ black costumes. In this example, the extra element that allows us to confirm the presence of the semantic pattern is “contrasts with the black of the mourning they wear.” However, in 10), although we can find the extra element that alludes to the semantic pattern in question, “against the background of their black faces,” the brightness modifier is not applied to the object whose whiteness is predicated, the teeth, but to the brightness they have. Moreover, when analyzing the examples of this second group of modifiers, we were able to find examples where blanco/abyad., bayd.a¯ , bay¯ad. (white) plus the brightness modifier refer to the hue dimension, not to the brightness dimension. Although the presence of these cases is much lower than that of the patterns, for the sake of exhaustiveness, it is convenient to attest their existence. Consider the following examples:

12) Al-aˇga¯ nib yadhab¯una ilà l-bah.r wa-yatasat..tah.u¯ na alà l-š¯a.ti li-l-h.us.u¯ l alà ¯a li-anna bašarata-hum bayda¯ s¯ati a. Ilà ayna yadhabu lbašara samr¯ . . ¯ sa u¯ diyy¯un li-l-h.us.u¯ l alà bašara bayd.a¯ n¯a ima ka-l-h.ar¯ır? (Foreigners go to the sea and flatten themselves on the beach to get tan skin because their skin is shining white. Where can Saudis go to get white skin as smooth as silk?). 13) Todos los vestidos destacan por tener un blanco reluciente, sin ningún cambio de tonalidad. (All the dresses stand out for their shining white, without any change of shade). In 13), the dresses referred to are shining white because there is no change of shade in their fabrics. In 12), the speaker talks about how foreigners turn brown on the beach, how their skins turn from shining white to brown, and how the Saudis cannot do the opposite. Although the idea of tonality change is not explicitly expressed as

8 Light in Assessing Color Quality: An Arabic-Spanish Cross-Linguistic Study

165

in (13), it is implicit. In this sense, in both cases, the expression “shining white” receives a quality reading, referring to the hue dimension, not to that of brightness. To conclude, we present an example where the second group of modifiers does not receive a quality reading but a quantity reading. We could only find this type of case in Spanish. 14) Si nos fijamos bien, hay una toma en la que vemos el interior de la nave, y parece recién sacada del concesionario. Un blanco reluciente, sin manchas ni rayadas . . . (If we look closely, there is a shot where we see the interior of the ship, and it looks like it just came out of the factory. A shining white, with no stains or scratches . . . ). In (14), the speaker is talking about a shot from a movie showing the interior of a spaceship, which she describes as shining white, with no spots or scratches. That is, the fact that the interior of the spaceship can be characterized as shining white has to do, not with the fact that it reflects or emits light, that there is a contrast with dark elements in the scene, or that there are no changes in hue, but rather with the surface of the object being completely white. Having presented all the cases of the interaction between the two sets of modifiers and the three different semantic patterns, in the next section, we will discuss the findings of the analyses and what the implications are for indexical contextualist theories.

8.6 Discussion

Arabic and Spanish seem to behave similarly in two of the three patterns, light reflection and light source, across all modifiers. That is, QUALITY-brightness-lightreflection and QUALITY-brightness-light-source readings are available in Arabic and Spanish for n¯as.i /puro (pure) and s¯a.ti , mušriq/resplandeciente, reluciente (shining). However, instances of the third pattern can be found only in Spanish. In other words, QUALITY-brightness-contrast readings are available in Spanish for puro (pure) and resplandeciente, reluciente (shining). Besides, s¯a.ti , mušriq/resplandeciente, reluciente (shining white) has QUALITY-hue readings both in Arabic and Spanish, although in Arabic is not so clear because the idea is expressed implicitly. In Spanish, blanco reluciente (shining white) also has QUANTITY readings. The cross-linguistic perspective adopted in this work has proved helpful given the similarities and differences between the two languages. The similarities in the use of brightness modifiers with blanco/abyad., bayd.a¯ , bay¯ad. (white) have allowed us to establish the existence of three recurring patterns of use. The first two patterns, light reflection, and light source are present in the two languages across all the modifiers investigated. Thus, we can conclude that these two patterns do not correspond to the

166

D. Bordonaba-Plou and L. M. Jreis-Navarro

particularities of a single language. On the contrary, being present in two languages, languages that belong to different families, we can say that they are central features for understanding the meaning of “white.” However, the third pattern, contrast between light and darkness, is present only in Spanish, which lead us to adopt a cautious attitude in concluding the relevance of the pattern as a constitutive element of the meaning of “white.” The need for studies in other languages to corroborate the lack or presence of the three patterns is self-evident. Our analyses confirm the intuition shared by linguists and anthropologists regarding the blurred boundaries between whiteness and brightness. As the existence of the first two patterns shows, whiteness and brightness are strongly linked. Now, what is the relevance of the results of our analysis for contextualist theories? If we recall, one of the problems with these theories was that, as pointed out by Clapp (2012) and confirmed by Hansen and Chemla (2017), color terms are absolute rather than relative degree adjectives. That is, the standard of comparison, one of the elements most often used to explain contextual incompleteness and the consequent variability in truth value, was not available to the indexical contextualist. However, one of the possible ways out for the indexical contextualist was to identify new forms of contextual incompleteness of color terms that could account for the variability in the truth value of propositions that include them. We have investigated how “white” interacts with brightness modifiers. We have distinguished three recurrent usage patterns. Of these three patterns, two are common to Arabic and Spanish, so there are at least two new ways an utterance including “white” (or their equivalents in other languages) can be semantically underdetermined. Suppose that a person, who has just arrived at a meeting with her friends, hears someone says: h¯ada¯ innahu abyad. s¯a.ti ! (It’s shining white!). What does the ¯ this person’s utterance mean? Considering the results of proposition expressed by our analysis, it could mean several things. First, that the object to which the person refers, for example, her new fridge, is of that color when light enters through the window. Second, that the object, let us say that it is her new sheets, brightens up her bedroom making it a perfect place to rest. Now, recall that, in Spanish, blanco (white), when appearing with a brightness modifier like reluciente (shining), not only has QUALITY-brightness-lightreflection and QUALITY-brightness-light-source readings but also QUALITYbrightness-contrast and, to a lesser degree, QUALITY-hue and QUANTITY readings. So, if the person was talking in Spanish, he could also be saying that the suit they wore to the last family celebration shines in the photo they took because he was surrounded by people in black clothes, that the handkerchief he just washed is shining white because it has no trace of any other hue or that the surface of his new car is totally white, without any stains or scratches. This adds new extra elements of contextual incompleteness in addition to those identified above. Thus, the results of our study offer support for indexical contextualist theories by distinguishing new dimensions of contextual incompleteness of color terms. However, we may ask, at what cost? In our study, we investigated the color term blanco/abyad., bayd.a¯ , bay¯ad. (white), and two sets of brightness modifiers. We have discovered two recurrent patterns representing two new forms of contextual

8 Light in Assessing Color Quality: An Arabic-Spanish Cross-Linguistic Study

167

incompleteness, light reflection and light source. In addition, we have also seen that blanco puro (pure white) and blanco reluciente (shining white) can have QUALITYbrightness-contrast readings, and that the latter can also have QUALITY-hue and QUANTITY readings. That is, there are five different ways in which a proposition can be semantically underdetermined. If we were to investigate other brightness modifiers in Arabic and Spanish, it is possible that new patterns would emerge and, with it, new forms of contextual incompleteness. We could also investigate the same patterns in other languages or other colors in the same languages studied to see if new patterns emerge and then examine whether they exist in Arabic and Spanish. In addition, new forms of contextual incompleteness might emerge if we were to investigate not the dimensions of hue or brightness but that of saturation, a dimension not explored so far. For each of these new forms of contextual incompleteness, the indexical contextualist would be forced to postulate a variable gap in the logical form of “white,” which seems to us to be too demanding a requirement because it seems that the number of variable gaps would depend on the different forms of contextual incompleteness, and they would increase as we investigate more and more modifiers in more and more languages. In sum, the indexical contextualist is forced to postulate an indefinite number of variables in the logical form of color terms, and this is not desirable because, as Cappelen and Lepore (2002, p. 274) highlights, postulating an indefinite number of variables seems too extreme.

8.7 Conclusions

In this paper, we have investigated one of the dimensions of color quality not examined so far, brightness. To this end, we have examined how the term blanco/abyad., bayd.a¯ , bay¯ad. (white), interacts with brightness modifiers, paying special attention to the objects to which the speaker refers. We used the Sketch Engine Word Sketch tool to select the brightness modifiers that most frequently accompany blanco/abyad., bayd.a¯ , bay¯ad. (white). We also conducted a KWIC analysis to determine the objects that most frequently appear as referents of the expression “blanco / abyad., bayd.a¯ , bay¯ad. (white) + brightness modifier” and, more importantly, to detect recurring usage patterns. The results of the KWIC analysis have allowed us to distinguish three patterns of use that appear in most of the modifiers considered: light reflection, pointing out that the object is white because it reflects light; second, light source, which indicates that the object is white because it is a source of light; and third, contrast between light and darkness, showing that the contrast between the object described as pure or shining white and the surrounding darkness is salient in the context. These three patterns allude to three forms of contextual incompleteness, i.e., three ways in which a proposition that includes “white” (or their equivalents in other languages) can be semantically underdetermined.

168

D. Bordonaba-Plou and L. M. Jreis-Navarro

Finally, we have assessed the impact of these findings for indexical contextualist explanations of the meaning of color terms. The identification of new forms of contextual incompleteness gives support to indexical contextualists since, in this way, they have additional explanatory elements on which to make truth value variability depend. However, the price to be paid is too high. Indexical contextualist explanations are based on hidden-variable analysis. In other words, they articulate these forms of contextual incompleteness as variable gaps in the logical form of color terms. Our analyses identify two new forms by examining “blanco/abyad., bayd.a¯ , bay¯ad. (white) + brightness modifiers” in two languages, Arabic and Spanish: QUALITY-brightness-light-reflection and QUALITY-brightness-light-source readings. Therefore, the indexical contextualist would be required to postulate two new variable gaps for these new forms of contextual incompleteness. In addition, we have found a third recurrent pattern in Spanish, QUALITY-brightness-contrast readings, and two readings for blanco reluciente (shining white), QUALITY-hue and QUANTITY readings, that is, three new forms of contextual incompleteness. In summary, the indexical contextualists will be forced to postulate a new variable gap in the logical form of color terms whenever we discover a new form of contextual incompleteness. This requirement seems very strong because the number of variable gaps seems to depend on how deep or new our analyses are. It only remains to point out that COLOR is a complex category of analysis which is difficult to standardize because it is a subjective experience that depends on various factors, e.g., conditions of observation or cultural conceptualizations. However, by performing cross-linguistic studies that make use of large amounts of data that collect speakers’ use of color terms, we can overcome this enormous obstacle.

References Arts, T., Belinkov, Y., Habash, N., Kilgarriff, A., & Suchomel, V. (2014). arTenTen: Arabic corpus and word sketches. Journal of King Saud University – Computer and Information Sciences, 26(4), 357–371. Bach, K. (1999). The semantics–pragmatics distinction: What it is and why it matters. In K. Turner (Ed.), The semantics/pragmatics interface from different points of view (pp. 65–84). Elsevier Science. Baker, P. (2006). Using corpora in discourse analysis. Continuum. Baroni, M., Bernardini, S., Ferraresi, A., & Zanchetta, E. (2009). The WaCky wide web: A collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation, 43, 209–226. Bezuidenhout, A. (1997). Pragmatics, semantic underdetermination and the referential/attributive distinction. Mind, 106(423), 375–409. Bezuidenhout, A. (2002). Truth-conditional pragmatics. Philosophical Perspectives, 16, 105–134. Biggam, C. P. (2007). The ambiguity of brightness (with special reference to old English) and a new model for color description in semantics. In R. E. MacLaury, G. V. Paramei, & D. Dedrick (Eds.), Anthropology of color. Interdisciplinary multilevel modeling (pp. 171–187). John Benjamins. Biggam, C. P. (2012). The semantics of colour. A historical approach. Cambridge University Press.

8 Light in Assessing Color Quality: An Arabic-Spanish Cross-Linguistic Study

169

Bluhm, R. (2013). Don’t ask, look! Linguistic corpora as a tool for conceptual analysis. In M. Hoeltje, T. Spitzley, & W. Spohn (Eds.), Was dürfen wir glauben? Was sollen wir tun? Sektionsbeiträge des achten internationalen Kongresses der Gesellschaft für Analytische Philosophie (pp. 7–15). e.V. DuEPublico. Bluhm, R. (2016). Corpus analysis in philosophy. In M. Hinton (Ed.), Evidence, experiment and argument in linguistics and philosophy of language (pp. 91–109). Peter Lang. Bordonaba-Plou, D., & Torices, J. R. (2020). Paving the road to hell: The Spanish word menas as a case study. Daimon. Revista Internacional de Filosofía, 84, 47–62. Cappelen, H., & Lepore, E. (2002). Indexicality, binding, anaphora and a-priori truth. Analysis, 62, 271–281. Casson, R. W. (1997). Color shift: Evolution of English color terms from brightness to hue. In C. L. Hardin & L. Maffi (Eds.), Color categories in thought and language (pp. 224–239). Cambridge University Press. Caton, J. N. (2020). Using linguistic corpora as a philosophical tool. Metaphilosophy, 51(1), 51–70. Clapp, L. (2012). Indexical color predicates: Truth conditional semantics vs. truth conditional pragmatics. Canadian Journal of Philosophy, 42(2), 71–100. Conklin, H. C. (1955). Hanunóo Color Categories. Southwestern Journal of Anthropology, 11(4), 339–344. Fischer, E., Engelhardt, P. E., Horvath, J., & Ohtani, H. (2021). Experimental ordinary language philosophy: A cross-linguistic study of defeasible default inferences. Synthese, 198, 1029– 1070. Freake, R., Gentil, G., & Sheyholislami, J. (2011). A bilingual corpus-assisted discourse study of the construction of nationhood and belonging in Quebec. Discourse & Society, 22(1), 21–47. Gablasova, D., Brezina, V., & McEnery, T. (2017). Collocations in corpus-based language learning research: Identifying, comparing, and interpreting the evidence. Language Learning, 67(S1), 155–179. Hansen, N. (2011). Color adjectives and radical contextualism. Linguistics and Philosophy, 34, 201–221. Hansen, N., & Chemla, E. (2015). Linguistic experiments and ordinary language philosophy. Ratio, 28(4), 422–445. Hansen, N., & Chemla, E. (2017). Color adjectives, standards, and thresholds: An experimental investigation. Linguistics and Philosophy, 40(3), 239–278. Hansen, N., Porter, J. D., & Francis, K. (2019). A corpus study of “know”: On the verification of philosophers’ frequency claims about language. Episteme, 18(2), 242–268. Hinton, M. (2021). Corpus linguistics methods in the study of (meta)argumentation. Argumentation, 35, 435–455. Kennedy, C. (2007). Vagueness and grammar: The semantics of relative and absolute gradable adjectives. Linguistics and Philosophy, 30(1), 1–45. Kennedy, C., & McNally, L. (2005). Scale structure, degree modification, and the semantics of gradable adjectives. Language, 8(2), 345–381. Kennedy, C., & McNally, L. (2010). Color, context, and compositionality. Synthese, 174, 79–98. Kilgarriff, A., & Renau, I. (2013). esTenTen, a vast web corpus of peninsular and American Spanish. Procedia – Social and Behavioral Sciences, 95, 12–19. Levisen, C. (2019). “Brightness” in color linguistics. New light from Danish visual semantics. In I. Raffaelli, D. Katunar, & B. Kerovec (Eds.), Lexicalization patterns in color naming. A cross-linguistic perspective (pp. 83–108). John Benjamins. MacLaury, R. E. (1992). From brightness to hue. Current Anthropology, 33, 137–186. Nardone, C. (2018). ‘Women and work’: A cross-linguistic corpus-assisted discourse study in German and in Italian. Critical Approaches to Discourse Analysis across Disciplines, 10(1), 167–186. Paradis, C. (2001). Adjectives and boundedness. Cognitive Linguistics, 12(1), 47–65. Raffaelli, I., Katunar, D., & Kerovec, B. (2019). Introduction. In I. Raffaelli, D. Katunar, & B. Kerovec (Eds.), Lexicalization patterns in color naming. A cross-linguistic perspective (pp. 1–19). John Benjamins.

170

D. Bordonaba-Plou and L. M. Jreis-Navarro

Recanati, F. (2004). Literal meaning. Cambridge University Press. Recanati, F. (2010). Truth-conditional pragmatics. Oxford University Press. Rothschild, D., & Segal, G. (2009). Indexical predicates. Mind and Language, 24, 467–493. Sharoff, S. (2006). Creating general-purpose corpora using automated search engine queries. In M. Baroni & S. Bernardini (Eds.), WaCky! Working papers on the web as corpus (pp. 63–98). Gedit. Syrett, K., Kennedy, C., & Lidz, J. (2010). Meaning and context in children’s understanding of gradable adjectives. Journal of Semantics, 27(1), 1–35. Szabó, Z. G. (2001). Adjectives in context. In I. Kenesei & R. M. Harnish (Eds.), Perspectives on semantics, pragmatics and discourse: A festschrift for Ferenc Kiefer (pp. 119–146). John Benjamins. Tallant, J., & Andow, J. (2020). English language and philosophy. In S. Adolphs & D. Knight (Eds.), The Routledge handbook of English language and digital humanities (pp. 440–455). Routledge. Taylor, C. (2014). Investigating the representation of migrants in the UK and Italian press. A cross-linguistic corpus-assisted discourse analysis. International Journal of Corpus Linguistics, 19(3), 368–400. Travis, C. (1997). Pragmatics. In B. Hale & C. Wright (Eds.), A companion to the philosophy of language (pp. 87–106). Blackwell.

David Bordonaba-Plou is an Assistant Professor at the Universidad Complutense de Madrid (Spain). He holds a PhD in Philosophy from the University of Granada. His research focuses on experimental philosophy of language, the role of intuitions, Digital Humanities, and political issues like analysis of parliamentary debates or polarization. Some of his most recent publications are “An Analysis of the Centrality of Intuition Talk in the Discussion on Taste Disagreements,” in Filozofia Nauki, or “Disagreement is Said in Many Ways: An Experimental Philosophy of Language Study on Taste Discussions” in Philosophical Approaches to Language and Communication, vol. 2. Laila M. Jreis-Navarro is a lecturer in Arabic language and literature at the University of Zaragoza (Spain). She holds a BA in Arabic Philology, and a PhD. in Languages, Texts, and Contexts from the University of Granada. She is the author of the book Entre dos orillas (UCOPress, 2021). Her research focuses on pre-modern Iberian culture and discursive analysis of subjective expression in classical Arabic. Her most recent publication is “La codificación lingüística de la subjetividad en la Nuf¯ad.at al-ˇgir¯ab de Ibn al-Hat.¯ıb” (Al-Qant.ara). ˘

Part III

Politically-Engaged Experimental Philosophy of Language

Chapter 9

Experimentally-Informed Philosophy of Hate Speech Bianca Cepollaro

Abstract The past 20 years witnessed a growing interest in philosophy of language and linguistics for expressives and, in particular, for slurs – terms that target people and groups on accounts of their belonging to a certain category (typically having to do with ethnic origins, gender, sexual orientation, religion, and so on). This lively debate often relies on empirical claims – “these terms are not derogatory in this context”, “their use affects the audience’s beliefs and attitudes in this and that way”, “reporting a slur backfires”, and so on and so forth. Some scholars have tried to back up their claims with existing empirical data from psychology and psycholinguistics, while a few others went and investigated their questions on experimental grounds. In this chapter, I offer an overview of this experimental literature on slurs and derogatory labels, and I illustrate an array of ways in which the philosophical issues that slurs raise to the proverbial armchair benefit from empirical studies in psychology, psycholinguistics, and experimental philosophy.

9.1 Introduction The past 20 years witnessed a growing interest in philosophy of language and linguistics for expressives and, in particular, for slurs – terms that target people and groups on accounts of their belonging to a certain category (typically related to ethnic origins, gender, sexual orientation, religion, and so on). This lively debate often relies on a range of empirical claims, grounded in the philosopher’s intuitions: “slurs are derogatory in this context (or not)”, “epithets1 have a distinctive power to affect people’s beliefs and attitudes”, “reporting a slurring utterance backfires”,

1

It is common in the literature to use “slurs” and “derogatory epithets” as synonymous. Quite a misleading move, for “epithets” seems to narrow down the range of syntactic positions that the

B. Cepollaro () Faculty of Philosophy, Vita-Salute San Raffaele University, Milan, Italy © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Bordonaba-Plou (ed.), Experimental Philosophy of Language: Perspectives, Methods, and Prospects, Logic, Argumentation & Reasoning 33, https://doi.org/10.1007/978-3-031-28908-8_9

173

174

B. Cepollaro

and so on and so forth. Many of these questions can and should be addressed on empirical grounds too. Yet, very few philosophers went for an experimental approach. In this chapter, I illustrate an array of ways in which the philosophical issues that slurs raise to the proverbial armchair benefit from empirical studies in psychology, psycholinguistics, and experimental philosophy. The works I consider in this overview are of great interest to experimental philosophy of language – but do they constitute experimental philosophy of language? As many2 have pointed out, we can contrast a narrow understanding of experimental philosophy with a broader one. According to the narrow view, experimental philosophy empirically investigates lay people’s intuitions about philosophically interesting scenarios (typically thought experiments). These studies elicit participants’ intuitions about hypothetical cases that are relevant to philosophical questions, in order to test hypotheses about the main concepts (or mechanisms) involved. In cases where armchair philosophers’ tended to converge on an idea, this experimental inquiry could reveal whether or not they were on the right track. Alternatively, when there is little or no consensus on a given matter, an experimental approach could get us out of the impasse where mutually exclusive claims are juxtaposed in a rather sterile way (“I have the intuition that p” vs. “I have the intuition that non-p”). However, this narrow definition is often seen as inaccurate and disappointing, for it doesn’t exactly capture the kind of experimental philosophy that has been done, nor does it describe the sole kind of experimental philosophy that ought to be done. This is why many authors argue in favor of a broader and inclusive understanding whereby experimental philosophy is a way of pursuing exquisitely philosophical inquiries by resorting to the empirical methods and tools of experimental sciences. The underlying idea is that the practices we associate with science are a legitimate and fruitful way to do philosophy, i.e., to achieve progress in solving the disputes that the philosophical investigation raises. According to this definition, only a few of the works I include in this chapter would be characterized as experimental philosophy of language, in the narrow or broad sense.3 What I gather here are the main empirical works in psychology, neuroscience, psycholinguistics, and experimental linguistics that are compellingly informative to the debate on slurs and hate speech in philosophy of language (be it from the armchair or in the lab). This overview encourages an integrated approach that brings together theoretical and empirical lines of investigation, resulting in what I name “experimentally-informed philosophy of language”. Experimentallyinformed philosophy can be both armchair and experimentally driven. The research program to which I’m pointing is one where philosophers systematically engage in term at stake can take, while we know that slurs can occur in many different syntactic positions. However, since this terminology is so spread in the literature, I will follow suit. 2 See i.a. Rose and Danks (2013), Sytsma and Livingston (2015), Sytsma and Buckwalter (2016), Williamson (2016), Sytsma (2017). 3 The best candidates for this label are, in particular, the studies designed to solve the impasse between alternative intuitions on the projective patterns of slurs’ derogatory content, which is a crucial step to understand the semantics and pragmatics of these expressions (see Sect. 9.4).

9 Experimentally-Informed Philosophy of Hate Speech

175

a fruitful and critical dialogue with existing and perspective experiments that have the potential to back up or defuse, test, and enrich their theoretical claims with some empirical data. Such a dialogue is not without its pitfalls and challenges. For instance, studies in psychology or linguistics may have overlooked those distinctions that are crucial to philosophers of language. Similarly, the available methodologies may just be inapt to capture the subtleties that a philosophical investigation requires. Nevertheless, the experimentally-informed approach to philosophy of language tries to integrate these strategies. Paraphrasing Sytsma and Livengood (2015, p. 5), this research program should systematically collect and analyze not only empirical data, but also existing empirical studies, “in attempting to answer philosophical questions or solve philosophical problems”. This notion is thus even broader than the broad understanding of experimental philosophy, for it looks at the multiple ways in which empirical works from many fields can and should inform philosophical inquiries. This survey of the existing experimental studies helps to illuminate the philosophical debate on slurs, and it encourages this experimentallyinformed approach to philosophy of language more generally. The philosophical investigation of epithets has only begun to design its own studies to test its armchair hypotheses, but what it can and should already do is appreciate how the existing experimental literature on slurs and derogatory labels helps to tackle the puzzles that these phenomena raise. The chapter is articulated into independent thematic sections that correspond to different questions in the debate: for each of them, I’ll illustrate the main results in the experimental literature. In Sect. 9.2, I discuss the expressive nature of slurs, and what distinguishes them from neutral words and from other insults. In Sect. 9.3, I focus on the effects of epithets, i.e., on the consequences that slurring brings about with respect to people’s feelings, beliefs, and attitudes. In Sect. 9.4, I explore the context of reported speech, whereby a speaker reports someone else’s slurring utterances. Finally, in Sect. 9.5, I consider the phenomenon of reclamation, in virtue of which speakers can employ a slur in a subversive way that does not seem to derogate the target class, but expresses pride and solidarity instead. In what follows, I’ll refer to the members of the group targeted by a given slur as “targets” or “ingroups”, and to everyone else as “non-target” or “outgroup”.4 This terminology is relative to a particular slur, and does not correspond to a more general distinction between oppressed and oppressors, for our social reality is too complex and intersectional to draw such a clear-cut line.

4

The terminology in this case often changes between philosophy and psychology, and even within the same discipline.

176

B. Cepollaro

9.2 The Expressive Nature of Slurs Philosophers of language and linguists have tried to account for the peculiar hyperprojectivity of the derogatory content of slurs, that is, the fact that slurs seem to convey their emotionally loaded content across contexts and uses. Embedding a slur under negation, antecedent of a conditional, questions, modals, etc. will not seal its pejorative content. Take the Italophobic slur “wop”; not only does it convey derogatory content when it is predicated of a subject (like in “Lenù is a wop”), but also in complex utterances like “Lenù is not a wop”, “If Lenù is a wop, her daughters are too”, “Is Lenù a wop?”, and so on. Intuitively, all the resulting utterances derogate Italians. How should we explain this feature? Many approaches appeal to common linguistic devices that are known to survive across a variety of contexts and embeddings, like conventional implicature (Potts, 2005; Williamson, 2009; McCready, 2010; Whiting, 2013; Gutzmann, 2019), or presupposition (Macià, 2002; Schlenker, 2007; Cepollaro, 2015, 2020; Marques and García-Carpintero 2020). According to these theories, the derogatory content of slurs survives because of how it is recorded in the meaning of these terms. However, experimental studies on expressives in neuropsychology suggest that this need not be the end of the story – that is, that further factors may contribute to slurs’ hyper-projectivity, besides the varieties of meaning that they encode. In fact, curse words in general are shown to display a very peculiar property: Bowers and Pleydell-Pearce (2011) found that the distinctive emotional response that a taboo word elicits does not entirely depend on what the term means but (also) on its phonetic realization. In their experiment, participants were presented with a few terms – one after the other – and read it aloud. They saw two curse words (“cunt”, “fuck”), two neutral terms (“glue”, “drum”), and their respective euphemisms (“C-word”, “F-word”, “G-word”, “D-word”). They were told in advance that the full-fledge terms and their euphemistic counterparts meant the very same thing. The study measured the participants’ emotional response by tracking their electrodermal activity. Bowers and Pleydell-Pearce showed that the participants’ emotional responses to swear words were larger than to euphemisms and neutral words. This suggests that even if a curse term and its euphemistic counterpart are synonymous (they mean one and the same thing by hypothesis), the emotional response they elicit at least partly depends on something other than their meaning, which, according to Bowers and Pleydell-Pearce (2011), is their phonological realization. In addition to this study, many neuropsychological works convincingly suggest that swear words have a distinctive power to trigger emotional responses, independently of what they mean or refer to. The idea that taboo language is processed differently than non-taboo speech, namely via the brain emotional centers, is supported by a range of empirical observations, for instance the fact that patients whose ability to produce propositional speech is impaired – due to aphasia, Alzheimer Disease, dementia – can still preserve the ability to swear (see Van Lancker & Cummings, 1999; Jay, 2000, 2009 for a survey). In some cases, they slur: Singer (1997), quoted in Rappaport (2020, p. 196) mentions the production of racial epithets.

9 Experimentally-Informed Philosophy of Hate Speech

177

What about slurs? We can gather some indirect evidence from many studies in psychology that contrast the effects of derogatory epithets (like “faggot”) with those of non-slurring category labels (like “gay”) or non-slurring insults (like “asshole”) (see e.g., Carnaghi & Maass, 2008; Fasoli et al., 2015). I’ll discuss the effects of slurs in the next section, but what is interesting for now is that such works contrast these categories of expressions by employing isolated terms with no linguistic context (i.e., as lexical items among others), and yet find differences between slurs and non-slurring words (labels or insults), as we will see. We could understand the experimental stimuli as featuring mentions rather than uses of these expressions. But if this is right,5 it means that slurs bring about their associated effects also when they are merely mentioned rather than fully used. In other words, the distinctive properties of slurs may not just depend on their encoding derogatory content about the target class (however we analyze it), but also on further features that they seem to share with expressive language more in general.6 Without relying on these empirical results, Predelli (2021) focuses on the intuition that something in the expressive character of slurs and swear words seems to survive even when these terms are merely mentioned. On this basis, he argues that the study of truth-conditional and non-truth-conditional meaning does not suffice to account for slurs and coarseness. These expressions involve a peculiar phenomenon – that of linguistic taboo. In his view, taboo is a meaningrelated conventional feature (that’s why philosophers and linguists have attempted to account for slurs’ hyper-projectivity by resorting to well-known dimensions of meaning), but one that is strictly associated with the tokening process: whatever taboo words do depends on their mere display, even within quotation marks. But what is taboo exactly, and how does this notion relate to the expressive nature of slurs? This line of research raises interesting challenges concerning the relation between slurs and taboo words: are slurs taboo words? What are the conditions to count as a taboo word? How would the expressive character of taboo words interact with the content that they lexically encode? What are the factors that drive our intuitions about the effects and projection patterns of slurs? Could it be that some but not all slurs are taboo words? Can philosophers theorize about slurs as if they were a uniform category? A full-fledged account of derogatory epithets should be sensitive to these findings in the neuro-psychological literature on emotionally loaded language when it comes to explain the distinctive features of slurs, but also when choosing on which material one is testing their intuitions, as different slurs may have different emotional import.

5

See Moreno and Pérez-Navarro (2021). Note that this also sounds a note of caution vis-à-vis the practice of mentioning slurs in academic papers (or including them in experimental material!). See Moreno and Pérez-Navarro (2021) for a discussion about this problem.

6

178

B. Cepollaro

9.3 The Effects of Slurs Philosophers claim that in addition to expressing the speaker’s prejudiced attitude, slurs have a distinctive power to spread bigotry. Most accounts see this as a core feature of epithets. According to the presuppositional view, for instance, slur-users presuppose that the target class is despicable as such; by acting as if such an evaluative assumption was common ground, the speaker assumes that everyone shares their bigoted perspective, and, at the same time, imposes a conversational pressure on them to align, i.e., to accommodate such a presupposition (Lewis, 1979; Sbisà, 1999; Cepollaro, 2020). In a similar spirit, according to the speech act account of slurs (Langton, 2012), derogatory epithets are employed to perform subordinating speech acts, enacting oppressive systems that rank people and groups as inferior and, in so doing, legitimize violence and discrimination against them (McGowan, 2019). Both accounts (as well as many others) predict that, because of how slurs work, they play a role in how prejudice gets spread. Interestingly, the idea that derogatory epithets have detrimental effects has received ample support from studies in psychology, where scholars meritoriously distinguished between different audiences: targets (qua addresses or bystanders) and non-targets. As far as targets are concerned, studies established that derogatory speech is associated with a range of negative emotional responses, like anger and discomfort (Swim et al., 2001, 2003), but also long-term negative outcomes in one’s cognitive, emotional, and psychological health such as depression, fear, low self-esteem, stress, and drastic behavioral changes (Garnets et al., 1990; D’Augelli, 1992; Herek et al., 1999; Cowan & Mettrick, 2002). These effects do not merely depend on targets being directly addressed with slurs, for similar effects were detected when targets merely witnessed prejudiced discourse that was not directed at them. When non-targets are exposed to discriminatory contents, they can experience anger, depression, and lower self-esteem too (Swim et al., 2001; Dickter et al., 2012, p. 112; Dickter & Newton, 2013). However, when philosophers claim that slurs are “words that wound” they do not merely have in mind personal suffering. Their claim is stronger than that: most research on slurs is driven by the idea that epithets and prejudiced language in general are powerful tools capable of shaping the social reality, by legitimizing and even encouraging discrimination (Langton, 2012; McGowan, 2019; Caponetto & Cepollaro, 2021). The literature in social psychology supports such views with empirical evidence (see Cervone et al., 2021 for an overview). First of all, when outgroup bystanders are exposed to hate speech, they get desensitized to its oppressive power, and their own prejudice is raised. Soral et al. (2018) exposed participants to a “desensitization” training consisting in going through five webpages of discussion forums where hate speech occurred (while no hate speech occurred in the control condition). Afterwards, they tested how sensitive participants were to hate speech and how prejudiced towards the targets of hate speech (for the former, participants were asked to rate the offensiveness of further instances of hate speech; for the latter, they had to rate how willing they would be to accept a target as

9 Experimentally-Informed Philosophy of Hate Speech

179

a colleague/neighbor/family member). Soral and colleagues found that participants in the experimental condition, after the exposure to the hateful webpages, were both less sensitive to new instances of hate speech and more prejudiced towards the target group than participants in the control condition (see also Winiewski et al., 2017). This suggests how hate speech (of which slurs are a paradigmatic example) plays a role in spreading prejudice and legitimizing violence and discrimination. Further studies show how the detrimental effects of hate-speech exposure can be detected in very concrete and measurable behaviors such as how one directs their charity donation (Ford et al., 2008), or how favorably they will evaluate the abilities of hate speech targets (Goodman et al., 2008). Fasoli et al. (2015), for instance, found that exposure to slurs fosters the avoidance of targets. They showed that when participants were given the chance to choose where to sit relatively to a gay man of their same age, they opted for a greater physical distance if they had been exposed to a homophobic slur rather than a non-slurring category label (like “gay”) or nonhomophobic insults (like “asshole”). Note that not all expressives bring about such effects. Most philosophers of language operate under the premise that slurs have a distinctive character that makes them different from other pejoratives. The former derogate socially relevant classes while the latter do not. Consequently, philosophers treat slurs and insults along different lines, and while slurs count as hate speech (or oppressive, dangerous, toxic, etc.), insults do not. Saka (2007), for instance, introduces the label “particularistic insult” to distinguish non-slurring expressions like “bastard” or “jerk” from slurs. In support of the philosophical distinction between the two categories of expressives, Fasoli et al. (2015) found that even if particularistic insults trigger negative contents just like slurs, only the latter lead to greater physical distancing. Moreover, slurs but not insults lead to dehumanization (see i.a. Bar-Tal & Hammack, 2012). In the philosophical literature (Tirrell, 1999; Jeshion, 2013, 2018), slurs are often associated with dehumanization, which in turn legitimizes violence and discrimination: “Dehumanizing ( . . . ) [is] treating humans or human groups as inferior qua persons; treating humans or human groups as having lesser standing in the moral domain, as unworthy of equal standing or full respect as persons” (Jeshion, 2018, p. 79). Fasoli et al. (2015) ran two studies in Italian where they exposed participants to a homophobic slur (like “faggot”), a non-slurring category label (like “gay”), or a particularistic insult (like “asshole”) (supraliminally in one study, and subliminally in another). Participants then had to associate human-related and animal-related words to gay and straight people. They found that after being exposed to a slur, participants associated less human-related words with gay people, compared with non-slurring labels and particularistic insults.7 These lines of experimental investigation in social psychology enrich the philosophical debate on slurs by providing empirical support for many premises on which philosophers routinely rely. Note, however, that these studies – at least

7

See Polakof in this volume for a discussion of how both slurs and swearwords have descriptive and expressive components.

180

B. Cepollaro

for the time being – won’t allow us to tackle finer-grained questions that are crucial to the philosophical theories mentioned at the beginning of this section – e.g., Do slurs constitute or instead merely cause oppression? How should we understand the presupposition triggered by these terms? Nonetheless, they still provide experimental support to the armchair claim that slurs (unlike cognate negative expressions) have the distinctive power to shape our moral values and social reality by reinforcing and spreading prejudice.

9.4 Reporting Slurs As we saw, the derogatory content of slurs displays a distinctive power to survive under all sorts of embeddings. Scholars got particularly interested in the case of reported speech, where the pejorative content apparently gets ascribed to the reporter and not just to the reportee (the phenomenon that goes under the label of non-displaceability). Suppose that Lenù says: “Lila is a dyke”. In this case, no report is involved8 and Lenù is conveying homophobic attitudes (unless her use is reclamatory, but we shall leave this to the next section). Imagine now that Marcello reports what happened, and says: “Lenù said that Lila is a dyke”. Is Marcello being homophobic? Is he just accurately reporting Lenù’s words? The question is whether the derogatory content of slurs survives under report – and is therefore ascribed to the reporter, Marcello – or whether the verbum dicendi (“says that”) can succeed in ascribing the prejudiced attitude to the reportee only (Lenù). There are many reasons why philosophers got interested in this question. From a purely theoretical point of view, whether or not slurs’ derogatory content survives under report provides a clue as to how it is encoded, for different kinds of content survive or fail to survive in reported speech (for instance, presuppositions are typically blocked, while conventional implicatures survive). Moreover, whether the reporter conveys derogatory content – whether they slur – has significant practical consequences: is it OK to report slurring utterances? Shall we systematically refrain from it? Shall we regulate and constrain reported uses without banning them? On many occasions it seems important to accurately report one’s words, and the fact that someone slurred might be a relevant thing to underline. Armchair philosophers and linguists got to an impasse where alternative claims are juxtaposed in a way that risks becoming sterile (the derogatory content survives vs. it doesn’t survive). Testing lay speakers seems a promising strategy to gain further insights. 8

Cepollaro et al. (2019) call these cases of non-reported speech “direct speech” (as opposed to indirect or reported speech). This may be misleading, though, as the label “direct speech” also applies to a particular way of reporting a speaker’s words in literary writing, as in: “‘Hello!’, said he”. At any rate, the debate on reported slurs is not concerned with this kind of literary styles of reporting speech, but is merely interested in contrasting reported and non-reported speech. Thanks to Isidora Stojanovic for raising this point.

9 Experimentally-Informed Philosophy of Hate Speech

181

Two teams investigated how Italian slurs are perceived in reported speech and obtained similar results. The first study by Panzeri and Carrus (2016) compared the perceived offensiveness of isolated slurs in a list with slurring utterances under a range of embeddings (negation, antecedent of conditionals, questions, and – crucially for the matter at stake – verba dicendi). They found that the offensiveness rates decrease in the case of reported speech compared to slurs in isolation, but perceived offensiveness does not entirely disappear. Slurring utterances embedded in questions or antecedents of conditionals were perceived as offensive as slurs in isolation. Note that Panzeri and Carrus (2016) didn’t compare reported and nonreported slurring utterances, but instead contrasted reported slurring utterances with isolated slurs. One may worry that the perceived offense of slurs in isolation is not the same as that of a full-fledge slurring utterance where the epithet is ascribed to an individual.9 If the worry is justified, then their results on how reported and non-reported slurring utterances are perceived cannot be taken to be conclusive. Cepollaro et al. (2019), however, ran a study where they asked participants to rate the perceived offensiveness of reported and non-reported slurring utterances. They found that reported speech of the form “Z: Y said that X is a S” (where X, Y, and Z are names of agents, and S is a slur) decreases (without deleting) the perceived offensiveness of slurring utterances of the form “Y: X is a S”, much in line with Panzeri and Carrus (2016). How can these studies inform the philosophical debate on reporting slurs? Cepollaro et al. (2019) take their own results and Panzeri and Carrus’ as speaking against prohibitionist theories, according to which slurs’ derogatory power does not depend on their meaning (semantic or pragmatic), but simply on their being prohibited words, and “( . . . ) as such, their uses are offensive to whomever these prohibitions matter” (Anderson & Lepore, 2013, p. 43). The fact that reporting slurring utterance decreases their perceived offensiveness suggests that their pejorative content does not merely depend on their use being banned. Apart from speaking against prohibitionist approaches, these results do not align with clear-cut claims as to whether the derogatory content of slurs survives under report (for instance, they do not immediately support the presuppositional or conventional implicature views). What is interesting about these studies (and their non-clear-cut results) is that philosophers of language (armchair and experimental ones alike) will need to take this evidence into account to develop adequate theories, or accommodate existing ones. Note however that this kind of investigation does not let us distinguish between different dimensions that in the philosophical literature are taken to contribute to slurs’ pejorative power. For instance, philosophers of language distinguish between the pejorative content that slurs encode – derogation – and the psychological effects that they produce in non-prejudiced hearers – offense (see Hom, 2012; Hom & May,

9

As Cepollaro et al. (2019) in fact find.

182

B. Cepollaro

2013).10 Since these experiments investigate the rates of perceived offensiveness, they may fail to track derogation and only detect offense, or mix the two. If this were the case, then we couldn’t legitimately infer any conclusive verdict on whether slurs’ derogatory content survives under report (and therefore, any conclusion as to how it is encoded). Similarly, this empirical evidence helps us illuminate the practical question as to whether slurring utterances can or should be accurately reported. Even if it doesn’t provide an absolute answer (reporting slurring speech is perfectly fine vs reporting slurring speech is just as bad as slurring), at least it tells us that report decreases the perceived offensiveness of these expressions, without entirely deleting it. In conclusion, note that cultural variations are to be expected. There are contexts and societies where slurs in general (or certain slurs in particular) are so tabooed that speakers can be punished for using words that merely resemble slurs (see Jeshion, 2013). Before generalizing these observations on Italian slurs and participants to slurs more broadly, cross-linguistic and cross-cultural evidence is needed. In this respect, it is interesting to note how Polakof (this volume) found interesting similarities and dissimilarities in the way in which Rioplatense Spanish speakers rated the perceived offensiveness of slurs in isolation, compared to Cepollaro et al.’s (2019) Italian speakers.

9.5 Reclaiming Slurs As soon as slurs caught the attention of scholars in philosophy of language, linguistics, and psychology, the phenomenon of reclamation appeared to be challenging and fascinating. Reclamation is typically defined as the practice whereby speakers (typically ingroups) employ a slur in a way that subverts its standard derogatory use. Reclamatory uses challenge default negative associations, and are often taken to express pride and solidarity (see e.g., Tirrell, 1999; Brontsema, 2004; Bianchi, 2014; Croom, 2014; Ritchie, 2017; Anderson, 2018; Jeshion, 2020). They raise many different questions: does the literal meaning of slurs change when they are reclaimed? If so, how? What are speakers trying to achieve through reclamation? What can they achieve? What are the conditions for reclaiming a slur? What is the role of speakers’ intentions? What normative requisites should constrain reclamation? How does reclamation relate to cognate linguistic and non-linguistic practices? Is the meaning of reclaimed slurs positive or merely non-derogatory? Can

10 Lamentably,

scholars’ terminology is not always consistent. For instance, Hom (2012) and Hom and May (2013) characterize derogation as the predication of negative properties and offense as the negative psychological effect of slurs. Davis and McCready (2020), on the other hand, understand offense as the intrinsic negative properties that slurs have, and derogation as the attitudes of the slurring speaker that can be inferred by the audience.

9 Experimentally-Informed Philosophy of Hate Speech

183

reclamation change the meaning of slurs for good? Can it change the way we relate to stigma? And so on and so forth. Many of these questions benefit from the empirical research that has been mainly conducted in social psychology. Among the most interesting ones we find Galinsky et al. (2003) and Galinsky et al. (2013), who investigated the relation between labeling and power. Galinsky et al. (2013) showed that self-applying a slur positively affects how powerful participants feel and are perceived by outgroup observers. This cannot be due to the fact that performing any action (and reclaiming would be one of these) presents subjects as more powerful than those who do nothing (Magee, 2009). Even if any performed action makes the agent appear more powerful than the non-agent, only self-labeling increases the perceived power of the stigmatized group. Moreover, recalling a situation where one’s group was in a position of power increased participants’ willingness to self-ascribe a slur (but not recalling a context where participants themselves were in a position of power). In other words, the relation between self-ascribing slurs and perceived group power (as opposed to individual power) appears to be reciprocal. Galinsky et al. (2013) also showed that self-ascribing slurs attenuates the negativity of the label for the self-ascriber. Similarly, being exposed to acts of slur-self-ascription decreases the perceived negativity of the term in the eyes of nontargets. Whitson et al. (2017) found similar results both for self-ascribing targets and outgroup witnesses.11 These studies support the claim that reclamation has the potential to ameliorate slurs’ meaning by gradually deleting their negative content until they are not derogatory anymore (see Bianchi, 2014).12 Note that Galinsky et al. (2003, 2013), as well as Whitson et al. (2017), aimed to investigate the reclamation of slur, but what they tested is, strictly speaking, the phenomenon of self-labelling. Self-labelling is not the only way to engage in reclamatory practices, albeit the paradigmatic one. Slurs have a variety of reclaimed uses that need not be self-ascriptions “These faggots kill fascists”, “Queer is beautiful”, “Dykes on bikes”, “Fags in support of dykes”, and so on. So the generalization from these very informative and seminal works on the self-ascription of derogatory labels to the reclamation of slurs in general still requires some caution. However, further studies empirically investigated reclamatory uses of slurs 11 Whitson

et al. (2017) found that self-labeling is connected with group identification, that is, the tendency to conceive of oneself as part of a group, and categorize the self at the group level. Group identification is found to be both a cause and a consequence of self-labelling, and when people self-ascribe slurs, they are also perceived by non-targets as being more identified with their group. For Whitson et al. (2017), the relation between self-ascribing slurs and the attenuated perceived negativity of such labels depends on group identification. 12 However, the same finding could also confirm Kleinman et al.’s (2009) preoccupation that, by reclaiming a slur, speakers make the expression permissible in the public eye (which, in their view, would undermine rather than encourage the fight against discrimination) (in a similar spirit, see Herbert, 2015). On the other hand, Henry et al. (2014) showed that slurs directed as lower (as opposed to higher) status groups are perceived as more negative. So the fact that self-ascribing slurs decreases the perceived negativity of these labels might indirectly suggest that this reclamatory practice can ultimately enhance the perceived status of the target group.

184

B. Cepollaro

without relying on self-ascription. Take for instance Gaucher et al. (2015). Their participants – all women – had to imagine themselves in a situation in which they would hear a bystander shout the slur “slut”: in one condition, the story took place in a non-political situation (a “typical context”, walking down the street); in the other, in a militant situation (a “supportive context”, a feminist march), and in both contexts the slurring speaker could either be an ingroup (a woman) or an outgroup (a man). Participants then filled up a series of questionnaires investigating their feelings (e.g., how scared or empowered they felt), and their beliefs (in particular, they had to rate their endorsement of utterances expressing rape myths, e.g., “When girls go to parties wearing slutty clothes, they are asking for trouble”). Gaucher and colleagues found that being exposed to the sexist slur in a supportive rather than typical context raised women’s feelings of empowerment and self-assurance. Moreover, being exposed to slurs uttered in a reclamatory context rather than in a non-militant one affected participants’ beliefs and attitudes regarding the negative stereotypes associated with slurs: participants in the supportive conditions were less willing to endorse rape myths. This suggests that witnessing reclamation has a positive effect in lowering ingroups’ own prejudice, which is especially interesting vis-à-vis the preoccupation that reclamation could backfire by camouflaging self-loathing (discussed for instance in Anderson, 2018). Interestingly, all these observations empirically support what many armchair philosophers have assumed about reclamation. Note before concluding that the reclaimed slur had the same effect regardless of the sex of the speaker.13 This result is a very fruitful contribution to the philosophical debate on who reclaims slurs, for it speaks against targetism, the view – often taken for granted in philosophy – according to which only ingroups can reclaim slurs (for a criticism of targetism, see Cepollaro & López de Sa, 2022). In contrast, these studies indirectly suggest that participants interpret the use of “slut” in the militant marches as reclamatory, regardless of the sex of the speaker. They show that once the context clarifies that a given use of the slur is supportive, it can achieve (at least some of) its positive effects, independently of who the speaker is.

9.6 Conclusion In conclusion, in this chapter I provided a heterogeneous survey of theoretical and empirical questions around slurs and hate speech, ranging from the puzzle of their meaning and expressive nature, to the effects that these terms bring about, to issues about how to report them, and to their reclamatory uses. The heterogeneity of the questions goes hand in hand with the heterogeneity of the methodologies involved,

13 Speaker’s sex only made a difference in the typical context, where participants reported being more scared when the misogynistic slur was shouted by a man rather than by a woman.

9 Experimentally-Informed Philosophy of Hate Speech

185

resorting to the tools and apparatus of psychology, psycholinguistics, experimental pragmatics, and experimental philosophy more in general. In so doing, I illustrated how experimental works can contribute to philosophical questions in a distinctive way, or – put differently – how the philosophical investigation of slurs cannot do without some empirical support. This overview will hopefully help to pave the way for an experimentally-informed approach in the philosophical study of slurs and expressive language.

References Anderson, L. (2018). Calling, addressing, and appropriation. In D. Sosa (Ed.), Bad words (pp. 6–28). Oxford University Press. Anderson, L., & Lepore, E. (2013). Slurring words. Nous, 47(1), 25–48. Bar-Tal, D., & Hammack, P. L. (2012). Conflict, delegimization, and violence. In L. R. Tropp (Ed.), Oxford handbook of intergroup conflict (pp. 29–52). Oxford University Press. Bianchi, C. (2014). Slurs and appropriation: An echoic account. Journal of Pragmatics, 66, 35–44. Bowers, J. S., & Pleydell-Pearce, C. W. (2011). Swearing, euphemisms, and linguistic relativity. PLoS One, 6(7), e22341. Brontsema, R. (2004). A queer revolution: Reconceptualizing the debate over linguistic reclamation. Colorado Research in Linguistics, 17, 1–17. Caponetto, L., & Cepollaro, B. (2021). Discrimination preferred: How ordinary verbal bigotry harms. Australasian Philosophical Review, 5(2), 189–195. Carnaghi, A., & Maass, A. (2008). Derogatory language in intergroup context: Are “gay” and “fag” synonymous. In Stereotype dynamics: Language-based approaches to the formation, maintenance, and transformation of stereotypes (pp. 117–134). Lawrence Erlbaum Associates Publishers. Cepollaro, B. (2015). In defense of a Presuppositional account of slurs. Language Sciences, 52, 36–45. Cepollaro, B. (2020). Slurs and thick terms. When language encodes values. Rowman and Littlefield. Cepollaro, B., Sulpizio, S., & Bianchi, C. (2019). How bad is it to report a slur? An empirical investigation. Journal of Pragmatics, 146, 32–42. Cervone, C., Augoustinos, M., & Maass, A. (2021). The language of derogation and hate: Functions, consequences, and reappropriation. Journal of Language and Social Psychology, 40(1), 80–101. Cowan, G., & Mettrick, J. (2002). The effects of target variables and setting on perceptions of hate speech. Journal of Applied Social Psychology, 32(2), 277–299. Croom, A. (2014). Spanish slurs and stereotypes for Mexican-Americans in the USA: A contextsensitive account of derogation and appropriation. Sociocultural Pragmatics, 2, 1–35. D’Augelli, A. R. (1992). Lesbian and gay male undergraduates’ experiences of harassment and fear on campus. Journal of Interpersonal Violence, 7(3), 383–395. Davis, C., & McCready, E. (2020). The instability of slurs. Grazer Philosophische Studien, 97(1), 63–85. Dickter, C. L., & Newton, V. A. (2013). To confront or not confront: Non-targets’ evaluations of and responses to racist comments. Journal of Applied Social Psychology, 43, E262–E275. Dickter, C. L., Kittel, J. A., & Gyurovski, I. I. (2012). Perceptions of non-target confronters in response to racist and heterosexist remarks. European Journal of Social Psychology, 42(1), 112–119. Fasoli, F., Maass, A., & Carnaghi, A. (2015). Labelling and discrimination: Do homophobic epithets undermine fair distribution of resources? British Journal of Social Psychology, 54(2), 383–393.

186

B. Cepollaro

Ford, T. E., Boxer, C. F., Armstrong, J., & Edel, J. R. (2008). More than ‘just a joke’: The prejudicereleasing function of sexist humor. Personality and Social Psychology Bulletin, 34(2), 159–170. Galinsky, A. D., Hugenberg, K., Groom, C., & Bodenhausen, G. V. (2003). The reappropriation of stigmatizing labels: Implications for social identity. In J. Polzer (Ed.), Identity issues in groups (pp. 21–56). Emerald Group Publishing Limited. Galinsky, A. D., Wang, C. S., Jennifer, A., Whitson, E. M., Anicich, K. H., & Bodenhausen, G. V. (2013). The reappropriation of stigmatizing labels: The reciprocal relationship between power and self-labeling. Psychological Science, 24, 2020–2029. Garnets, L., Herek, G. M., & Levy, B. (1990). Violence and victimization of lesbians and gay men: Mental health consequences. Journal of Interpersonal Violence, 5(3), 366–383. Gaucher, D., Hunt, B., & Sinclair, L. (2015). Can pejorative terms ever lead to positive social consequences? The case of SlutWalk. Language Sciences, 52, 121–130. Goodman, J. A., Schell, J., Alexander, M. G., & Eidelman, S. (2008). The impact of a derogatory remark on prejudice toward a gay male leader. Journal of Applied Social Psychology, 38(2), 542–555. Gutzmann, D. (2019). The grammar of expressivity. Oxford University Press. Henry, P. J., Butler, S. E., & Brandt, M. J. (2014). The influence of target group status on the perception of the offensiveness of group-based slurs. Journal of Experimental Social Psychology, 53, 185–192. Herbert, C. (2015). Precarious projects: The performative structure of reclamation. Language Sciences, 52, 131–138. Herek, G. M., Roy Gillis, J., & Cogan, J. C. (1999). Psychological sequelae of hate-crime victimization among lesbian, gay, and bisexual adults. Journal of Consulting and Clinical Psychology, 67(6), 945–951. Hom, C. (2012). A puzzle about pejoratives. Philosophical Studies, 159, 383–405. Hom, C., & May, R. (2013). Moral and semantic innocence. Analytic Philosophy, 54(3), 293–313. Jay, T. (2000). Why we curse: A neuro-psycho-social theory of speech. John Benjamins Publishing Company. Jay, T. (2009). The utility and ubiquity of taboo words. Perspectives on Psychological Science, 4(2), 153–161. Jeshion, R. (2013). Expressivism and the offensiveness of slurs. Philosophical Perspectives, 27(1), 231–259. Jeshion, R. (2018). Slurs, dehumanization, and the expression of contempt. In D. Sosa (Ed.), Bad words (pp. 77–107). Oxford University Press. Jeshion, R. (2020). Pride and prejudiced: On the reclamation of slurs. In Grazer Philosophische Studien (pp. 106–137). Special Issue: Non-Derogatory Uses of Slurs. Kleinman, S., Ezzell, M., & Frost, C. (2009). Reclaiming critical analysis: The social harms of ‘bitch’. Sociological Analysis, 3(1), 46–68. Langton, R. (2012). Beyond belief: Pragmatics in hate speech and pornography. In I. Maitra & M. K. McGowan (Eds.), What speech does (pp. 72–93). Oxford University Press. Lewis, D. (1979). Scorekeeping in a language game. In Semantics from different points of view (pp. 172–187). Springer. Macià, J. (2002). Presuposición y significado expressivo. Theoria: Revista de Teoria, Historia y Fundamentos de la Ciencia, 3(45), 499–513. Magee, J. C. (2009). Seeing power in action: The roles of deliberation, implementation, and action in inferences of power. Journal of Experimental Social Psychology, 45, 1–14. Marques, T., & García-Carpintero, M. (2020). Really expressive presuppositions and how to block them. Grazer Philosophische Studien, 97(1), 138–158. McCready, E. (2010). Varieties of conventional implicature. Semantics and Pragmatics, 3(8), 1–57. McGowan, M. K. (2019). Just words: On speech and hidden harm. Oxford University Press. Moreno, A., & Pérez-Navarro, E. (2021). Beyond the conversation: The pervasive danger of slurs. Organon F, 28(3), 708–725. Panzeri, F., & Carrus, S. (2016). Slurs and negation. Phenomenology and Mind, 11, 170–180. Potts, C. (2005). The logic of conventional implicatures. Oxford University Press.

9 Experimentally-Informed Philosophy of Hate Speech

187

Predelli, S. (2021). Unmentionables: Some remarks on taboo. Organon F, 28(3), 726–744. Rappaport, J. (2020). Slurs and toxicity: It’s not about meaning. Grazer Philosophische Studien, 97(1), 177–202. Ritchie, K. (2017). Social identity, indexicality, and the appropriation of slurs. Croatian Journal of Philosophy, 17(2), 155–180. Rose, D., & Danks, D. (2013). In defense of a broad conception of experimental philosophy. Metaphilosophy, 44(4), 512–532. Saka, P. (2007). How to think about meaning. Springer. Sbisà, M. (1999). Ideology and the persuasive use of presupposition. In J. Verschueren (Ed.), Language and ideology. Selected papers from the 6th international pragmatics conference (pp. 492–509). International Pragmatics Association. Schlenker, P. (2007). Expressive presuppositions. Theoretical Linguistics, 33(2), 237–245. Singer, C. (1997). Coprolalia and other coprophenomena. Neurologic Clinics, 15(2), 299–308. Soral, W., Bilewicz, M., & Winiewski, M. (2018). Exposure to hate speech increases prejudice through desensitization. Aggressive Behavior, 44(2), 136–146. Swim, J. K., Hyers, L. L., Cohen, L. L., & Ferguson, M. J. (2001). Everyday sexism: Evidence for its incidence, nature, and psychological impact from three daily diary studies. Journal of Social Issues, 57, 31–53. Swim, J. K., Hyers, L. L., Cohen, L. L., Fitzgerald, D. C., & Bylsma, W. H. (2003). African American college students’ experiences with everyday racism: Characteristics of and responses to these incidents. Journal of Black Psychology, 29(1), 38–67. Sytsma, J. (2017). Two origin stories for experimental philosophy. Teorema: Revista Internacional de Filosofía, 36(3), 23–43. Sytsma, J., & Buckwalter, W. (Eds.). (2016). A companion to experimental philosophy. Wiley. Sytsma, J., & Livengood, J. (2015). The theory and practice of experimental philosophy. Broadview Press. Tirrell, L. (1999). Derogatory terms: Racism, sexism, and the inferential role theory of meaning. In C. Hendricks & K. Oliver (Eds.), Language and liberation: Feminism, philosophy, and language (pp. 41–79). SUNY Press. Van Lancker, D., & Cummings, J. L. (1999). Expletives: Neurolinguistic and neurobehavioral perspectives on swearing. Brain Research Reviews, 31, 83–104. Whiting, D. (2013). It’s not what you said, It’s the way you said it: Slurs and conventional implicatures. Analytic Philosophy, 54(3), 364–377. Whitson, J., Anicich, E. M., Wang, C. S., & Galinsky, A. D. (2017). Navigating stigma and group conflict: Group identification as a cause and consequence of self-labeling. Negotiation and Conflict Management Research, 10(2), 88–106. Williamson, T. (2009). Reference, inference, and the semantics of pejoratives. In J. Almog & P. Leonardi (Eds.), The philosophy of David Kaplan (pp. 137–158). Oxford University Press. Williamson, T. (2016). Philosophical criticisms of experimental philosophy. In J. Sytsma & W. Buckwalter (Eds.), A companion to experimental philosophy (pp. 22–36). Blackwell. ´ Winiewski, M., Hansen, K., Bilewicz, M., Soral, W., Swiderska, A., & Bulska, D. (2017). Contempt speech, hate speech. Report from research on verbal violence against minority groups. Stefan Batory Foundation.

Bianca Cepollaro is assistant professor in Philosophy of Language at the Faculty of Philosophy, Vita-Salute San Raffaele University (2017, PhD in Linguistics, Scuola Normale, Pisa; PhD in Philosophy, École Normale Supérieure, Institut Jean Nicod, Paris). Her main research is in social philosophy of language, both on theoretical and experimental grounds. She is the author of Slurs and Thick Terms – How Language Encodes Values (Rowman and Littlefield, 2020), as well as of numerous articles that appeared in journals such as Pacific Philosophical Quarterly, Synthese, Linguistics and Philosophy, Journal of Pragmatics.

Chapter 10

Slurs in the Rio de la Plata Ana Clara Polakof

Abstract The goal of this paper is to explore the use of slurs in Rioplatense Spanish (RpS, variety spoken around the Rio de la Plata Basin of Uruguay and Argentina). Specifically, we will analyze gender and sexual identity related slurs— which involve the way in which we identify ourselves with relation to gender and sex, as well as how we are identified by others—, given that our first approach to the corpus showed that they are more frequent than racial slurs in RpS, and that they are easily interpreted as such. While a rioplatense speaker could argue that negro (black) is not a slur, she certainly would argue that tortillera (dyke) is one. In order to analyze their use, we will first conduct a pilot which tests the offensiveness of slurs and swearwords, such as estúpido (stupid), and imbécil (imbecile). Second, we will conduct a RpS corpus analysis (at https://www.corpusdelespanol.org/webdial/). And, third, we will set up an experiment (an off-line test with a multiple choice questionnaire). Overall, we will defend that the use of linguistic methods can help us to test the hypothesis that both slurs and swearwords have descriptive and expressive components.

10.1 Introduction Pejoration, as a global phenomenon, is understudied. It is associated with a cognitive attitude that can be expressed through language (see Finkbeiner et al., 2016). Slurring, which may be considered as a part of pejoration (Meibauer, 2016), has

A. C. Polakof () Universidad de la República, Montevideo, Uruguay Sistema Nacional de Investigadores, Montevideo, Uruguay © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Bordonaba-Plou (ed.), Experimental Philosophy of Language: Perspectives, Methods, and Prospects, Logic, Argumentation & Reasoning 33, https://doi.org/10.1007/978-3-031-28908-8_10

189

190

A. C. Polakof

an extensive line of study (Hom, 2012; Croom, 2013; Anderson & Lepore, 2013; Pullum, 2018, to name some). Experimental approaches are less common, but there are many by now (Cepollaro et al., 2019; Spotorno & Bianchi, 2015; Gutzmann, 2019; Cepollaro & Zeman, 2020, to name some). Our research can be regarded as a further attempt to understand how slurs (linguistic expressions which are used to convey a pejorative attitude towards a group of people) work from a corpus and an experimental approach. We will deal with the syntactic-semantic behavior of slurs, and compare it to that of swearwords (linguistic expressions which are used to convey an offensive attitude towards a person). Our work is inspired in Cepollaro et al. (2019), who argue that the difference between them is that slurs are used to describe and to offend, while swearwords are used mainly to offend. Accordingly, slurs would have descriptive and expressive content, while swearwords would have mainly an expressive content.1 On the contrary, our hypothesis is that both slurs and swearwords have descriptive and expressive components, and that swearwords, in particular, present no difference between their expressive and descriptive content. To show this, we will use empirical methodology which is commonly used in linguistics, and in experimental philosophy of language. To decide which slurs we were going to analyze, we first did a non-scrutinized corpus search. We searched the web, we consulted other native speakers, and we used the dictionary Marchetti (2014) to find different slurs. Then, we did a corpus search at Corpus DAVIS web/dialects and restricted it to Argentina and Uruguay, providing us with a lax notion of the Rio de la Plata. That first corpus search showed us that racial slurs are used less frequently than non-racial slurs, and they involve fewer processes of word formation. Thus, we decided to analyze gender and sex related slurs. We, then, created a list with the help of different corpus, data, and Marchetti (2014) of 50 gender and sex related slurs, from which we selected a list of 30 slurs to perform our pilot study. We did not do a corpus search of swearwords, since they were thought to be the control items in our research. Thus, we selected 30 swearwords from our native language knowledge. Once the pilot study was completed, we selected from the 40 most offensive insults (which include slurs and swearwords) 10 slurs and 10 swearwords to do the corpus search and the experiment. This chapter is divided as follows: first, we present our pilot study, its design, results, and discussion; second, we present our corpus analysis, in which we compare frequencies of slurs and swearwords, and, then, we perform a qualitative analysis of the syntactic and semantic contexts in which slurs appear; third, we

1

In footnote 9, Cepollaro et al. (2019) argue that swearwords have a vague descriptive content that does not provide precise information about the persons identity as slurs do. Even though they do in fact claim that they have some vague descriptive content, they defend that they are mainly expressive, as may be seen in the following quote which precedes footnote 9 “a non-slurring insult like ‘asshole’ does not provide precise descriptive information about the subject; rather, it merely expresses a negative attitude” (Cepollaro et al., 2019, p. 36). Thus, they do defend that they are mainly expressive, and we want to show that swearwords do not present, from a statistic point of view, any significant difference between their expressive and descriptive content.

10 Slurs in the Rio de la Plata

191

present our experiment, its design, results, and discussion; finally, we present some final remarks.

10.2 Pilot Study Our pilot study is inspired in the work of Cepollaro et al. (2019), and it is our first approach to understanding the use of slurs in Rioplatense Spanish. Cepollaro et al. (2019) experimented on the offensiveness of slurs, and compared it to that of swearwords in Italian.2 Their results showed that, even though slurs were found to be more offensive than swearwords without any context, swearwords were found more offensive in predicative contexts such as ‘X is Y’. Furthermore, their use in reported speech made both slurs and swearwords to be considered less offensive than in the other analyzed contexts. They defend that the differences with respect to how they are interpreted are due to the fact that (Cepollaro et al., 2019, p. 37): Insults are perceived as being meant only to offend and the perceived offensiveness increases when the negative attitude of the speaker targets a particular person; slurs, on the other hand, are perceived as means to not only insult, but also describe, and it is for this reason that they are perceived as less offensive when they are predicated of a particular individual.

They defend that slurs are perceived as descriptive plus offensive, while insults or swearwords are perceived only as offensive. In our pilot study we compared how Rioplatense speakers perceive the offensiveness of slurs and swearwords when presented in isolation.

10.2.1 Desgin of the Pilot Study Even though our pilot study is inspired in Cepollaro et al. (2019), it only contains gender and sex related slurs, because there are more gender and sex related lexical items available.

2

The notion of ’offensiveness’ is not defined by Cepollaro et al. (2019), nor will it be defined in this chapter. The notion involves subjective evaluations, and the participants were not given an explanation of what offensiveness is. It was assumed that, though subjective, participants had an understanding of what it means to rate something with regard to offensiveness. We chose not to work in our main study with offensiveness, because the evaluation may change depending on who rates it, who is being offended, who you are trying to offend, etc. We did use it in our pilot, because it was inspired by Cepollaro et al. (2019).

192

A. C. Polakof

10.2.1.1 Participants We recruited the participants via email or social networks where we provided a link to the online survey (done with soscisurvey). The data of 152 participants entered the final study. We did not take into account the demographic data. To ensure that all the data belonged to Rioplatense speakers, only inhabitants of Montevideo and Buenos Aires could evaluate the slurs.

10.2.1.2 Design The pilot involves 30 critical items (slurs, such as tortillera (dyke)), 30 control items (swearwords, such as imbécil (imbecile)), and 30 fillers (neutral words such as professor (teacher)).3 No ‘non-slurring labels’ were used, because there was only a very small number of default labels related to gender and sex related slurs.4 Participants were asked to rate the offensiveness of the isolated words in a 7 point scale ranging from non-offensive to extremely offensive, and they were asked to answer some demographic information which involved gender, age, and place of residence.5

10.2.2 Results Even though it is a pilot study, the data shows that Rioplatense speakers also evaluate slurs as being more offensive than swearwords in isolation. The mean of slurs was 4.3, while the mean of swearwords was 3.3. On average, they were not rated as extremely offensive. We ran a Mann Whitney test comparing the values of slurs and swearwords, which uses Median values instead of Means (where S .= slurs, and SW 6 .= swearwords) (Fig. 10.1): The Mann-Whitney test indicated that the offensiveness value was greater for slurs (Mdn .= 4.5) than for swearwords (Mdn .= 3.4), U(S) .= 5867.5, z .= 7.4179,

3

For a complete list of the terms, see Polakof (2021). ‘Non-slurring labels’ is the term used by Cepollaro et al. (2019) to account for what is usually called the ‘neutral counterpart’, such as hombre homosexual (homosexual man), which is the term we will use in the rest of the paper. 5 Even though we did not take into account the demographic information, it could be used in future research, which is why we asked for it. 6 Since a t.test for results which use a likert-type scale may be questioned, we decided to use a Mann Whitney test. However, some have defended that a t-test can be used in these conditions (see Warachan, 2011). So, we also performed a t-test, just in case. The results were: t(151) .= 8.1 and p .< 0.00001, where the mean of slurs is 4.3, and the mean of swearwords is 3.3. The standard deviation was 1.068, and 0.964, respectively. Thus, the results are very similar and both tests confirm that we may reject H0. 4

10 Slurs in the Rio de la Plata

193

Two sample mann-whitney u 6 4

4.483 3.367

2 0 S median

SW median

Fig. 10.1 Median values of offensiveness for slurs and swearwords

and p < 0.00001. Since Z is greater than 1.96 and p is below alpha we can reject the H.0 hypothesis. That is to say, slurs are different than swearwords with regard to offensiveness. That may be read as having similar values to the ones found by Cepollaro et al. (2019).

10.2.3 Discussion Rioplatense Spanish speakers consider slurs to be more offensive than swearwords. These results agree with the results found in Cepollaro et al. (2019), and with what has been defended by scholars such as Nunberg (2018). Italians, however, seem to consider both swearwords and slurs to be more offensive than Rioplatense speakers. The former seem to consider them on average between 4 and 5 on the scale of offensiveness, while the latter seem to consider them between 3 and 4. Thus, even though offensiveness (in isolation) does not seem to be idiosyncratic, how offensive both types of insults are rated by native speakers does seems to be idiosyncratic. We agree with Cepollaro et al. (2019) that further experiments in other languages are needed to confirm both hypotheses, and that the pilot alone is not enough to conclude anything about the behavior of slurs. However, instead of testing further on offensiveness, our research will focus on the syntactic-semantic behavior of slurs and swearwords. We will use the results of the pilot to choose the lexical items that we will use in the corpus analysis, and in our experiment.

10.3 A Corpus Analysis of Slurs and Swearwords To do our corpus analysis we selected 10 offensive slurs and 10 offensive swearwords from the 40 most offensive insults in our pilot experiment, which may be seen in Fig. 10.2.

194

A. C. Polakof 6

Mean

4

0

tragaleche (S) Chupapija (S) puta (S) hijo de puta (SW) puto (S) malcogido (SW) trabuco (S) putón (S) culorroto (S) marimacho (S) maricón (S) tortillera (S) putona (S) zorra (S) trava (S) trola (S) marica (S) mariquita (S) trolo (S) machona (S) chango (S) cagador (SW) imbécil (SW) maraca (S) atorranta (SW) torta (SW) basura (SW) desviado (S) idiota (SW) cagón (S) carolo (S) estúpido (SW) cagona (SW) pija (SW) tarado (SW) tarada (SW) varonera (S) baboso (SW) chanta (SW) pelotuda (SW)

2

Insult

Fig. 10.2 Mean offensiveness of the most offensive insults, where S .= slur and SW .= swearword

We did not select the slurs and swearwords based only on their offensiveness, because in the first approximation to the corpus we found out that not all of them had the intended reading. In addition to this, in the pilot study, we used the same swearword with gender variation (for instance, tarado (idiot-male) and tarada (idiot-female)) which we avoided in the corpus analysis, as well as in the experiment. We eliminated slurs which were ambiguous between a gender related slur reading, and a way of living (profession) slur reading, such as puta, zorra, trola, which may be read as (bitch) or (prostitute). We also eliminated chupapija (cocksucker) which is used as a slur, but also as a swearword that offends both men and women, as in examples (1), and (2): (1) Tengo nostalgia de ir a la tribuna Olímpica, sentar me en el cemento, recostar mi espalda en la fila de atrás y disfrutar de una tarde de sol en invierno sin que nadie a mi alrededor gritara puto, puta y chupapija durante los condenados 90 minutos.7 ‘I am nostalgic of going to the Olimpica tribune, sitting on the cement, laying my back on the row behind mine and enjoy an afternoon of sun in the winter

7

The examples from the corpus maintain the exact spelling in which they appear in the corpus.

10 Slurs in the Rio de la Plata Table 10.1 Fequency of slurs vs. frequency of swearwords

195

Slur machona trolo puto trabuco culorroto marimacho maricón tortillera tragaleche trava Total:

Occurrences 21 97 1368 21 56 20 194 16 14 53 1860

Swearword hijo de puta malcogido cagador imbécil basura idiota cagón pija tarado estúpido Total:

Occurrences 1421 9 44 806 8130 1945 563 487 495 1754 15,654

without anybody around me screaming faggot, whore and cocksucker for the damned 90 minutes.’ (2) Seré la Eva Perón de las no chupa pija.8 ‘I might be the Eva Perón of the no dick suckers.’ We also discarded from the analysis slurs that had the same lexical root, and analyzed the most offensive ones. Thus, we analyzed puto (faggot), but not putón, putona; we analyzed maricón, but not marica, mariquita. And so on. Since the pilot was not controlled, we had to do a finer grained selection of the slurs and swearwords we were going to analyze in the corpus analysis. We analyzed the slurs: machona (tomboy), trolo (faggot), puto (faggot), trabuco (transvestite), culorroto (faggot), marimacho (tomboy), tortillera (dyke), trava (transvestite), tragaleche (faggot), and the swearwords: hijo de puta (son of a bitch), malcogido (motherfucker), cagador (shithead), imbécil (imbecile), basura (trash), idiota (idiot), cagón (fearful), estúpido (stupid), pija (dickhead), and tarado (asshole). The quantitative analysis showed that swearwords have a much higher frequency of use than slurs, as may be seen in Table 10.1 (the words appear in no particular order): Without calculating any relative frequencies, it is in itself evident that, in the Rio de la Plata, we use swearwords more frequently than slurs. Even if we eliminated the swearword basura (trash) because of its evident higher frequency, and we had 10 slurs and 9 swearwords (with 7524 total occurrences), the latter would have a 4 times higher frequency of use (calculated by dividing the total occurrences of 9 swearwords over the total occurrences of 10 slurs). Thus, we seem to be able to use more liberally swearwords than slurs. This could be seen as an evidence in favor of a deflationary or prohibitionist account, such as that defended by Anderson and

8 The word chupapija (cocksucker) may be found written altogether or as separate lexical items as in this example.

196

A. C. Polakof

Lepore (2013) and Anderson et al. (2012). However, the data tells us that we do use slurs. The lower frequency of use of slurs could be explained by assuming that, for Rioplatense inhabitants, slurs are considered to be more of a taboo than swearwords. This actually makes sense, since taboos prohibit certain people to perform certain actions (Allan, 2019, p. 1). Thus, we may think that since slurs offend someone for being a part of a group, they are more taboo because they are harmful to the individual and the group, not just to the individual alone (see also Allan & Burridge, 2006). And, perhaps, if we take these results into account and the pilot results into account, it may be seen as evidence in favor of arguing for a global approach to slurs in which prohibition explains the fewer registers, the higher level of offensiveness of slurs in isolation, but not the overall behavior of slurs, because (even if they are taboo) we still use them. To better understand slurs, we have to analyze, in addition to offensiveness, and frequencies, the syntactic-semantic contexts in which they are used, which is what we will do in the next section.

10.3.1 Slurs and Swearwords Are Not Pure Expressives In this section, we will follow the grammar proposed by Gutzmann (2019) for pure expressives and show that both slurs and swearwords do not behave as pure expressives, i.e. both of them have descriptive content.

10.3.1.1 Differences Between Pure Expressive Adjectives and Descriptive Adjectives According to Gutzmann (2019), pure expressive adjectives (EA) license non-local readings, as in (3), while descriptive adjectives (DA) do not, as in (4).9 In (3) the negative attitude may be interpreted in relation to the watch, but it may also be interpreted in relation to the event of losing the watch. In (4), the descriptive adjective can only be interpreted locally, with regard to the watch. EA cannot be interpreted intersectively, as shown in (5), while descriptive adjectives can be so, as in (6). 10 (3)

9

γI

lost my damn watch and I need to buy another one soon! (Gutzmann, 2019, p. 71)

.

Gutzmann (2019) provides an extensive characterization of the differences between EA and DA. In this article, we will only exemplify some of those characteristics. 10 I follow Gutzmann (2019) and Horn (2013) in using .γ to mark examples found using google. Examples in Rioplatense Spanish will be from Corpus Davis web/dialects with the selection of “Argentina” and “Uruguay”.

10 Slurs in the Rio de la Plata

(4)

197

γI

lost my gold watch the same day we had a cleaning woman for the first time. [https://lessonsonasmalllake.com/tag/absent-minded/] (5) the damn watch .≈ the x, which is damned and a watch (6) the gold watch .≈ the x, which is gold and a watch .

EA cannot appear in comparative constructions, as in (7), while DA can, as in (8). Degree expressions cannot be combined with EA, but can select DA, see (9) and (10):11 (7) .∗The .{damn-er, more damn.} dog howled the whole night. (8) der aggressiv-er-e Hund ‘the more aggressive dog’ (9) The .{very, extremely, utterly.} aggressive dog barked the whole night. (10) .∗The .{very, extremely, utterly.} damn dog barked the whole night. EA cannot be the target of adverbial modification, while DA can; see (11) and (12). Finally, EA cannot combine with other adjectives, while DA can; see (13) and (14):12 (11) (12) (13) (14)

∗ The .{presumably, probably, actually.} damn dog barked the whole night. The .{presumably, probably, actually.} aggressive dog barked the whole night. The young and aggressive dog barked the whole night. .∗ The young and damn dog barked the whole night. .

With these and more contexts at hand, Gutzmann (2019) defends that EA behave differently from DA, and proposes a different syntactic and semantic interpretation which may be seen in Gutzmann (2019). In the next section, we will show that slurs and swearwords have to be patterned both with EA and DA, which is further evidence to defend that they are mixed expressives (see McCready, 2010; Gutzmann, 2015; Meibauer, 2016, among others).

10.3.1.2 Slurs and Swearwords Slurs and swearwords can be classified as what Gutzmann (2019) calls mixed expressive adjectives (MEA), because they contribute both descriptive and expressive content.13 They share some characteristics with EA, and some with DA. They have expressive content, and they may also have a non-local interpretation. However, as DA, they may have an intersective interpretation, they may be graded, they can appear in comparative constructions, may involve adverbial modification,

11 Theses

examples were taken from (Gutzmann, 2019, p.79). taken from Gutzmann (2019). 13 Mixed expressive adjectives seem to coincide with thick concepts (Väyrynen, 2021). We will restrict ourselves to the use of the linguistic terminology. For a thorough discussion of thick concepts see Kirchin (2013), and for the relationship between slurs and thick concepts see Cepollaro (2020). 12 Examples

198

A. C. Polakof

and combine with other adjectives. In addition to this, when a MEA as beschissen appears in the predicative position, the expressive content can only have a sententiallevel interpretation which accounts for the non-local interpretation that makes for MEA, (Gutzmann, 2019, p. 122), as in (15): (15) Das Auto ist beschissen. ‘The car is shitty.’ In the rest of the section, we will show that both slurs and swearwords in Rioplatense Spanish behave as MEA. We will start by showing that both share the characteristics of DA presented in the previous section. First, we show that both can have intersective interpretations, as in (16) and (17):14 (16) La vida le esta dando demasiado a este tecnico cagón ‘Life is giving way too much to this scaryhead trainer.’ a. este técnico cagón .≈ this x, which is a trainer and scaryhead in the same context (see Kratzer and Heim, 1998, p. 71) (17) Ellos odian las mujeres marimacho. ‘They hate the tomboy woman.’ a. las mujeres marimacho .≈ these x, which are woman and tomboy Second, we will show that they may be graded and appear in comparative constructions, as in the following examples:15 (18) Fui muy machona de niña. ‘I was very tomboy as a child.’ (19) Es muy estúpido. ‘He is very stupid.’ (20) Vos sos más puto que Flavio Mendoza. ‘You are more faggot than Flavio Mendoza.’ (21) A ver cuál es el más hijo de puta. ‘Let‘s see who is the more son of a bitch.’ Third, we will show that they may involve adverbial modification, and combine with other adjectives: (22) Solo el del medio es medio trolo. ‘Only the one in between is half faggot.’ (23) Yo soy medio tarado. ‘I am half dumb.’ 14 Most

of these MEA appear in highly offensive contexts, and in hateful speech contexts. contexts are problematic. It could be argued that it is expressivity what is being graded, and not the descriptive content of the adjectives (see, for instance, Geurts, 2007). Nonetheless, a pure expressive could not appear on the predicative position gradated, because it cannot appear in a predicative position alone: *La canción es muy puta./ ’*The song is very damn.’ [my own example]. Thus, it still involves a DA context, since we are focusing on the predicative context alone. 15 Graded

10 Slurs in the Rio de la Plata

199

(24) Megan, la gordita machona y ordinaria. ‘Megan, the fatty tomboy and ordinary’ (25) No aceptan argumentos que lastimen a su novio cagador y veleta. ‘They do not accept arguments that hurt their shitty and flying boyfriend.’ These data show that both slurs and swearwords share characteristics with DA. They are not the same kind of adjective. The first may be characterized as intersective, while the second as non-intersective (see Demonte, 2011). Nonetheless, as several authors have noted (Morzycki, 2016; McNally, 2016; Kennedy & McNally, 2005, to name some), none of the traditional classifications are problem free. What matters for our paper is that both slurs and swearwords have been said to be non-gradable, but (as the data has shown) they both can be graded (see Morzycki, 2016, for a further development).16 Thus, they both share characteristics with DA which is what matters for the purposes of our paper. Now, we will move to the expressive dimension which involves non-local interpretations for slurs, and swearwords which must be interpreted at the sentence/propositional level. (26) Les tengo malas noticias, yo no soy puto, (que no es un insulto y por eso lo digo con todas las palabras). ‘I have bad news for you, I am not a faggott (which is not an insult and that is why I say it).’ (27) Soy un tarado; es verdad. ‘I am dumb; that is true.’ Both examples involve sentence level reading. Example (26) shows that it is not possible for the speaker to remove the derogation the slur itself provides, not even when he tries to defend that puto is not an insult, which is something noted by Nunberg (2018, p. 297): But speakers do bear moral responsibility when they manifest an intention to affiliate with the provenance of a slur in the knowledge that it is not the default term for a group, even when they disclaim any derogatory intent and insist that the word itself is not a derogation at all.

Example (27) shows that the speaker acknowledges the fact that he is un tarado, and assigns the utterance a truth value, which is clearly related to the fact that it does have descriptive content. Thus, both slurs and swearwords (both MEA) are selected by degree expressions that project Degree Phrases, and they are not EA. Overall, if

16 In the case of swearwords, they have been classified as ’non-dimensional adjectives’ by Bierwisch (1988). They lack a clear antonym, which is why he argued that they are essentially non-gradable. This is problematic, as Morzycki (2016) shows, but it could explain why they behave similarly with other supposed to be non-gradable adjectives, as slurs (for more on the issue, see Morzycki, 2016, pp. 133–135).

200

A. C. Polakof

we take the qualitative corpus analysis into consideration, we may defend that both slurs and swearwords have descriptive and expressive content.17

10.3.1.3 A Note on Some Non-syntactic Behaviour of Slurs In the quantitative corpus analysis, we showed that swearwords are more used than slurs, and argued that the data could be seen as evidence that Rioplatense speakers consider slurs more taboo than swearwords. In this section, we will show some linguistic contexts, which are not syntactic-semantic, that were found only in relation to slurs. They involve reappropriation, and acknowledgment of pejorative content, as in the following examples: (28) Unidas entre sí, por haber transformado un insulto (marica o tortillera de mierda) en un término reivindicativo. ‘United among each other, because they had transformed an insult (fuckin faggot or dyke) into a reivindicative term.’ (29) Yo no tengo una tendencia, soy puto, asi con todas las letras, ‘I do not have a tendency, I am a faggot, with all the letters.’ (30) Se ríe de hablar en femenino y utiliza expresiones como marica, puto, trolo para referir se a otros gays. ‘She laughs when she speaks in femenine and uses expressions such as faggot, faggot,gay to refer to other gays.’ (31) Unos chicos me gritaron ¡tortillera!. No sabía lo que significaba esa palabra, pero sonaba a insulto. ‘Some kids screamed at me dyke! I did not know what that word meant, but it sounded as an insult.’ (32) La utilización de la palabra puto, maricón, gay, tortillera o trava para descalificar a otra persona es parte de un discurso y una práctica social que reproduce estereotipos que conllevan invariablemente a la negación de derechos. ‘The use of the word puto, maricón, gay, tortillera or trava, to disqualify another person is a part of a speech and a social practice that reproduces stereotypes which involve the negation of civil rights.’ (33) Preguntale a cualquiera que el adjetivo puto, trolo y demás, que no estoy de acuerdo que se lo digan a un compañero ‘Ask anyone that the adjectivo puto, trolo and others, that I do not agree with their use towards a partner.’ With regard to reappropriation, some more context is needed to understand (28) and (30).18 In (28), they are talking about the LGBT and queer community which unite to transform into a reinvidicative term slurs such as tortillera (dyke), marica

17 See the Appendix for the syntactic and semantic representation of slurs and swearwords at the DP level. 18 Example (29) is self-explanatory.

10 Slurs in the Rio de la Plata

201

(faggot). In (30), there is a gay character in the book “Sólo te quiero como amigo” by Dani Umpi who behaves in that manner. All those three examples involve the reappropriation of a slur, but (30) also allows us to see that they recognize the reivindicative purpose of the slur. Thus, there are pragmatic and social consequences when a slur is reappropriated, and they are not a part of the syntax-semantics of the language. We could, actually, substitute the slurs for swearwords and have perfectly grammatical and understandable sentences. We did not find it in the corpus, because we do not use swearwords in reappropriated contexts because we do not need to. We also did not find occurrences in which the offensive nature of swearwords was recognized, and it is not because they are not offensive. As the pilot showed, they are offensive. However, we do not need to separate ourselves from the people who use them, because we all use them. In the case of slurs, we do need to separate ourselves from those that use them. Example (31) clearly shows the offensiveness being recognized by the receiver of the slur, who claims that she does not know what it means. In (32), which involves a more formal register, the writer is stating that using gender related slurs involves human right negations, which clearly explains the social role slurs bear. Finally, in (33), the speaker separates himself from those that use homosexual men related slurs, and says that s/he does not agree with those that say them to a co-worker. Thus, they involve linguistic attitudes that have nothing to do with the syntax-semantic interface, and everything to do with the social life of slurs (Nunberg, 2018).

10.4 The Experiment The experiment was designed to test the descriptiveness and expressiveness of slurs and swearwords in the Spanish spoken in the Rio de la Plata, to confirm that they are both mixed expressive adjectives, and to provide further evidence against the view that swearwords are mainly expressive (as defended by Cepollaro et al., 2019).

10.4.1 Desgin of the Study The experiment only contains gender and sex related slurs. They were selected from the pilot study. We used the same slurs and swearwords that had been selected for the corpus study. That is, they were not selected independently.

10.4.1.1 Participants We recruited the participants via email or social networks where we provided a link to the online survey (done with soscisurvey). The data of 200 participants entered the final study. We did not take into account the demographic data.

202

A. C. Polakof

Fig. 10.3 Example given to participants to reinforce multiplechoice

To ensure that all the data belonged to Rioplatense speakers, only inhabitants of Montevideo and Buenos Aires could fill the questionnaire.

10.4.1.2 Design The experiment involves 10 critical items (slurs, such as tortillera/dyke), 10 control items (swearwords, such as imbécil/imbecile), and 10 fillers (neutral words such as profesor/teacher). Participants were asked to choose between three different options the one that would best explain what we would like to express if we used a sentence as the one given, and they were asked to answer some demographic information which involved gender, age, and place of residence. They could choose more than one option (“neither of them” was an exclusive option), as may be seen in Fig. 10.3:19 We provided them with the example to reinforce the fact that they had a multiple choice option. Providing this example was done to eliminate any doubt with regard to how the questionnaire should be filled.

10.4.2 Results The results showed that the descriptive content of slurs is more salient or prominent for the audience than the one of swearwords, with a t(199) .= 7.186, and a p .= 1,311e.−11, and that the expressive content of swearwords was more salient than the one of slurs with a t(199) .= −9.6173, and a p .< 2.2e.−16. Since we were interested in the descriptive and expressive behavior of slurs and swearwords, we did not take into account the “Neither of them” choice. The results, as well as the means, are showed in Fig. 10.4 (were D .= descriptive content, E .= expressive content, G .= slurs, and I .= swearwords):

19 If

someone says that Manuel is a child, she implies that:

1. Manuel is a young person. 2. She feels appreciation towards Manuel. 3. Neither of them.

10 Slurs in the Rio de la Plata

203

D

E

0.8

m

0.6 group G I

0.4 0.2

0.0 G

I

group

G

I

Fig. 10.4 Results of descriptiveness vs. expressiveness in slurs and swearwords in the two categories (error bars represent the standard error of the mean) G

I

0.8

m

0.6 desc D E

0.4

0.2

0.0 D

E

desc

D

E

Fig. 10.5 Results of descriptiveness vs. expressiveness in slurs and swearwords within each category (error bars represent the standard error of the mean)

The results were also interesting with regard to the comparison of the descriptive and expressive content in the same category. Slurs do not seem to test as having the same degree of descriptive and expressive content, since the results were t(199) .= 11,587, and p .< 2.2e.−16. Descriptive statistics show that they are more descriptive than expressive. Swearwords test as not having a difference between the descriptive and the expressive content, since the result were t(199) .= 1.19, and p .= 0.2355. Thus there is no difference, as may be seen in Fig. 10.5 (were D .= descriptive content, E .= expressive content, G .= slurs, and I .= swearwords):

204

A. C. Polakof

10.4.3 Discussion These results show that, even though we may find differences while comparing how descriptive and expressive slurs and swearwords are, both have descriptive and expressive content, which is what we showed in the corpus analysis. These results come with no surprise: slurs are evaluated as more descriptive than swearwords, and swearwords as more expressive than slurs. What’s interesting is when we compare the items between themselves. Slurs are more descriptive than expressive, but swearwords are not more expressive than they are descriptive.20 The previous sections showed that, with regard to syntactic and semantic constraints, both slurs and swearwords behave in RpS as mixed expressive adjectives. The results obtained in the experiment can be used to further defend this idea. Even though when we compare slurs to swearwords the first are more descriptive than the second, swearwords are descriptive as well. Thus, the experiment can be used as further evidence that swearwords, on the one hand, should be considered to be the prototype of mixed expressive adjectives, since they are as expressive as they are descriptive. While slurs, on the other hand, should be further analyzed, since they are more descriptive than expressive.21 We conclude this section by defending that there have to be further differences between slurs and swearwords. We think that the differences could be in how the descriptive content relates to the expressive content both slurs and swearwords have. And, perhaps, a closer look to the formation of epithets might help us to better understand where those differences reside.

10.5 A Remark on Epithets Though there has been an extensive analysis on epithets, both from a linguistic and a philosophic perspective (see for instance Potts, 2007; Hom, 2008), we will focus on the fact that not all slurs can be used as complex epithets (those that have the structure: Det+Epithet+NP, such as that bastard Paul) which has attracted the attention of other RpS researchers (Saab & Carranza, 2021; Orlando & Saab, 2020a; Di Tullio & Saab, 2006; Orlando & Saab, 2020b, for instance).22 20 With regard to how the descriptive content was construed both for slurs and swearwords, see Polakof (2021). 21 Note that we are not trying to explain how offensive slurs and swearwords are, or how the offensiveness rate may be affected by their syntactic position. To explain offensiveness other factors need to be considered. 22 We will not focus on the syntax and semantics of epithets, since it is a complicated matter. Saab and Carranza (2021), Di Tullio and Saab (2006), among others, have defended that they are third person pronouns. However, this is not problem free, and others have defended that when there is a proper name they stand for the proper name (Gutzmann, 2019). Saab (2004) has extensively argued in favor of the third pronoun interpretation. Nonetheless, we need further studies with regard to the

10 Slurs in the Rio de la Plata

205

Saab and Carranza (2021), based on Orlando and Saab (2020b), defend that the use of an epithet is not only expressive, but also communicates a certain worldtype view which they propose to model as a stereotype. They propose to add to the meaning of a slur such as sudaca (South American-pejorative), in a McCready 23 (2010) L.+ CI , the stereotype associated with it, as in: (34) .[[sudaca.]]g,w = λw.λx . x es sudamericano en .w λx.λp . .∃ P. [P ∈ C ∧ p = [λw.P.(∩ Sudamericano.)(w)] ∧ x ≤∩ Sudamericano] Their idea is that, after the diamond, we have a function that takes an individual and predicates that it belongs to a given class, and a given theory which they call the stereotype. The stereotype denotes a set of propositions, of the form P(.∩ X), where P denotes a predicate of type , and .∩ X denotes the plural individual who is the target of the stigmatization. Finally, the last coordinated part of the formula tells us that x is a part of the individual plural in question. According to their proposal, it will be the context C the one that determines how the stereotype will be constituted (Saab and Carranza, 2021, p. 516). What allows RpS slurs to function as complex epithets is their departure from the meaning of their neutral counterpart (in the case of sudaca (South American-pejorative) it is sudamericano (South American)), while still maintaining the stereotype content which is a part of the expressive dimension. That would be why we could form a complex epithet with an insult like nazi (Nazi) but not with bolita (Bolivian-pejorative), as may be seen in (examples and translations from Orlando and Saab , 2020b): (35) El nazi de Juan lo dijo otra vez. ‘That nazi Juan said that again.’ (36) ??? El bolita de Juan llegó tarde otra vez. ‘That bolita Juan arrived late again.’ We want to test whether it is descriptiveness or expressiveness or both which are responsible for the availability of slurs as complex epithets. To test this, we searched on the corpus how many epithets there were with regard to the slurs analyzed, and the results we found might be seen in Table 10.2:24 The table has, after the slur, the total of occurrences the slur had in corpus Davis. Then, it has the number of participants who rated the slur as being expressive (from

syntax and semantics of epithets to defend one alternative or the other. Since this paper is not focused on the syntax and semantics of epithets, we will leave this issue for future research. 23 Where .[[sudaca.]]g,w provides us with the interpretation function of sudaca in an assignment g, and a world w; x is a variable for individuals, and p for propositions. 24 We should note that culorroto (faggot) tested low on descriptiveness. We think that this is due to the fact that its meaning has shifted from the one provided by Marchetti (2014). In that dictionary, which helped us to decide which slurs we were going to analyse, culorroto (faggot) is defined as a masculine and pejorative word that means effeminate, and it is given as a synonym of homosexual man. However, we think that the reason why it tested so low in descriptiveness is because it has shifted in meaning towards a slur as malcogido (motherfucker). This hypothesis should be tested in the future.

206

A. C. Polakof

Table 10.2 Slurs expressiveness and descriptiveness vs. epithets

Slur machona trolo puto trabuco culorroto marimacho maricón tortillera tragaleche trava

Total 21 97 1368 21 56 20 194 16 14 53

Expressiveness ratings 38 78 136 74 174 66 97 59 136 59

Descriptiveness ratings 195 174 145 177 22 193 151 189 110 184

Epithets 0 6 50 0 5 0 4 0 2 0

E/t 0 0.06 0.04 0 0.09 0 0.02 0 0.14 0

0,16 0,14 0,12 0,1 0,08

ep/tot

0,06 0,04 0,02 0 20

40

60

80

100

120

140

160

180

200

Fig. 10.6 Ep/tot vs. expressiveness in slurs

a total of 200 answers). In the fourth place, it has the number of participants who rated the slur as being descriptive (from a total of 200). In the fifth place, it has the number of registers it has as a complex epithet (we searched the form: slur + de to ensure it was a complex epithet). Finally, we have the relative frequency of the epithet, to do a valid calculation of the correlation between expressiveness and complex epithetic uses. These data allowed us to calculate a Pearson Correlation Coefficient. The value of R(8) is 0.7773. This is a strong positive correlation, which means that high X variable scores go with high Y variable scores (and vice versa). The P-Value is 0.008144. The result is significant at p < 0.05. The value of R2, the coefficient of determination, is 0.6042. The graph in Fig. 10.6 can help us to visualize the correlation between expressiveness (x) and the appearance of epithets (y):

10 Slurs in the Rio de la Plata

207

0,16 0,14 0,12 0,1 0,08

ep/tot

0,06 0,04 0,02 0 0

50

100

150

200

250

Fig. 10.7 Ep/tot vs. descriptiveness in slurs

The results are also significant if we perform a Spearman’s Rho. The results are: rs .= 0.87162, p (2-tailed) .= 0.00101. By normal standards, the association between the two variables would be considered statistically significant. These results show an evident correlation between expressiveness and the formation of full epithets for slurs, which is interesting because it would approximate them to swearwords with which we can form epithets without a doubt. However, we have not said anything with regard to the descriptive content of slurs, and we also want to know whether descriptiveness has something to do with epithetic constructions. The graph in Fig. 10.7 compares descriptiveness (x) and the appearance of epithets (y) where the correlation seems to be the opposite than the one found with regard to the expressive content. We calculated the Pearson Correlation Coefficient, and the value of R(8) is .−0.742. This shows a strong negative correlation, which means there is a tendency for high X variable scores to go with low Y variable scores (and vice versa). The value of R2, the coefficient of determination, is 0.5506. The P-Value is 0.014005. The result is significant at p < 0.05. We also calculated a Spearman’s Rho and the result is statistically significant. The results are: rs .= .−0.88572, p (2-tailed) .= 0.00065. Thus, both descriptiveness and expressiveness seem to be involved in the possibility of forming complex epithets with slurs. What these results tell us is that for a slur to be able to form a complex epithet it must have a lower descriptive content, and a higher expressive content. These results seem to agree with the idea set by Orlando and Saab (2020b): the study shows that those slurs whose descriptive content is less prominent are more likely to be used as epithets. It also shows that those slurs whose expressive content is more prominent

208

A. C. Polakof

can be used as epithets. However, it says nothing with regard to stereotypes, because no stereotypes where given in the study, given that participants only evaluated slurs with regard to an expression of contempt. Thus, it allows us to question the role that stereotypes have as a part of the expressive dimension, and it allows us to leave the question open on whether we could maintain a simpler expressive dimension, as the one proposed by Gutzmann (2019) or even Potts (2007). Obviously, these results should be taken with caution, because our data are very small, and many issues still need to be considered: how do the descriptive content and expressive content interact with the formation of epithets? What is relevant there? To answer these questions, more experiments are needed. Nonetheless, they seem to point us towards a very interesting research that uses experimental methods to really understand how slurs work in predicative contexts and in typically expressive contexts as the one in complex epithetic constructions.

10.6 Final Remarks Both slurs and swearwords have descriptive content. We have shown that a proposal that analyses swearwords as being mainly offensive cannot be maintained with a corpus analysis nor with an experimental approach. They are considered not to be different with regard to the descriptive content and the expressive content, and perhaps that is why they can all form complex epithets. On the other hand, slurs are considered to be more descriptive than they are expressive by RpS speakers, and perhaps that is why they cannot all form complex epithets. We have shown that there seems to be a correlation between a lower rate of descriptiveness and the formation of epithets, and between a higher rate of expressiveness and the formation of complex epithets. This calls for a global theory of pejoration, that calls for further experimentation which is what we will try to do in the future. Acknowledgments This research was possible thanks to the grant FCE-3-2018-1-148810 of the Agencia Nacional de Investigación e Innovación (Uruguay). I am extremely grateful to Camila Zugarramurdi for all her help in the statistical analysis, and in the reading of my first draft. I am also grateful to the students of my research group, and to my Colombian colleagues, for listening to me and for their useful suggestions.

Appendix: The Syntactic and Semantic Representation of Slurs and Swearwords We will follow (Gutzmann, 2019, p. 120), and assume that a MEA as beschissen has descriptive content, and can receive the ordinary semantic interpretation of other degree adjectives. Thus, it may be represented as (37), where bad stands for the

10 Slurs in the Rio de la Plata

209

evaluative (descriptive) content they have: 37. beschissen . (‘shitty’) .λd .λe.bad : Both slurs and swearwords may have roughly the following simplified syntactic and semantic representation at the DP level. In the syntactic representation, iEx is the interpretable expressive feature, and uEx is the uninterpretable expressive feature (as in Gutzmann, 2019). In the semantic representation, u is the useconditional content that “is carried to the top” (as in Gutzmann, 2019, p. 32). We will exemplify with ese técnico cagón, where scared stands for the evaluative (descriptive) content of cagón, and la gorda marimacho in example (30), where tomboy stands for the evaluative (descriptive) content of marimacho:25

25 Both the syntactic and semantic representation involve several simplifications. There is no DivP, no ClassP (see Harbour, 2008). Both slurs and swearwords are treated as simple intersective adjectives, and DegP is simply analyzed as Gutzmann (2019). Nonetheless, for the purposes of this paper, the structures are enough.

210

A. C. Polakof

40. No fue tambien la gorda machona.26

26 This example includes an epithet which involves evaluative noun which we will not analyze, but see for Spanish Di Tullio and Saab (2006) for epithets, and Masiá (2017) for evaluative nouns, as well as Gutzmann (2019).

10 Slurs in the Rio de la Plata

211

Our syntactic and semantic representations are based on Gutzmann (2019) for MEA, and on Saab and Carranza (2021) for the introduction of NumP to the DP.27 We did have to introduce the MEA in the posnominal position, though no typeshifting was involved.28 Both the syntactic and semantic representation reflect the fact that slurs and swearwords have descriptive and expressive components, and their nondisplaceability (Gutzmann, 2019, p. 33).

References Allan, K. (2019). Taboo words and language: An overview. The oxford handbook of Taboo words and language (pp. 1–27). Oxford University Press. Allan, K., & Burridge, K. (2006). Forbidden words: Taboo and the censoring of language. Cambridge University Press. Anderson, L., & Lepore, E. (2013). What did you call me? Slurs as Prohibited Words. Analytic Philosophy, 54(3), 350–363. Anderson, L., Haslanger, S., & Langton, R. (2012). Language and race. Routledge Companion to the Philosophy of Language (pp. 753–767). Routledge. Bierwisch, M. (1988). Tools and explanations of comparison-part 1. Journal of Semantics, 6(1), 57–93. Cepollaro, B. (2020). Slurs and thick terms: When language encodes values. Lexington Books.

27 The introduction of NumP would allow us to deal with plural examples such as “las mujeres marimacho” in example (17). 28 For other alternatives of syntactic analysis of expressives in Spanish see Saab and Carranza (2021). We will simplify the analysis and not take into account the evaluative component of the noun.

212

A. C. Polakof

Cepollaro, B., & Zeman, D. (2020). Editors’ introduction: The challenge from non-derogatory uses of Slurs. Grazer Philosophische Studien, 97(1), 1–10. Cepollaro, B., Sulpizio, S., & Bianchi, C. (2019). How bad is it to report a slur? An empirical investigation. Journal of Pragmatics, 146, 32–42. Croom, A. M. (2013). How to do things with slurs: Studies in the way of derogatory words. Language and Communication, 33(3), 177–204. Demonte, V. (2011). Adjectives. In Handbücher zur Sprach-und Kommunikationswissenschaft [Handbooks of linguistics and communication science] (pp. 1314–1340). De Gruyter Mouton. Di Tullio, Á., & Saab, A. (2006). Dos clases de epítetos en el español: Sus propiedades referenciales y distribución sintáctica. In Proceedings of the XIV Congreso Internacional de la Asociación de Lingüística y Filología de América Latina (ALFAL). Finkbeiner, R., Meibauer, J., & Wiese, H. (2016). What is pejoration, and how can it be expressed in language. Pejoration (pp. 1–18). John Benjamins Publishing Company. Geurts, B. (2007). Really fucking brilliant. Theoretical Linguistics, 33(2), 209–214. Gutzmann, D. (2015). Use-conditional meaning: Studies in multidimensional semantics. Oxford Studies in Semantics and Pragmatics (Vol. 6, 1st ed.). Oxford University Press. Gutzmann, D. (2019). The grammar of expressivity. Oxford studies in theoretical linguistics (Vol. 72, 1st ed.). Oxford University Press. Harbour, D. (2008). Mass, non-singularity, and augmentation. MIT Working Papers in Linguistics, 49, 239–266. Hom, C. (2008). The semantics of racial epithets. The Journal of Philosophy, 105(8), 416–440. Hom, C. (2012). A puzzle about pejoratives. Philosophical Studies, 159(3), 383–405. Horn, L. R. (2013). I love me some datives: Expressive meaning, free datives, and f-implicature. In Beyond expressives: Explorations in use-conditional meaning (pp. 151–199). Brill. Kennedy, C., & McNally, L. (2005). Scale structure, degree modification, and the semantics of gradable predicates. Language, 81(2), 345–381. Kirchin, S. (2013). Thick concepts. OUP Oxford. Kratzer, A., & Heim, I. (1998). Semantics in generative grammar (Vol. 1185). Blackwell Oxford. Marchetti, P. (2014). Puto el que lee: Diccionario argentino de insultos, injurias e improperios. Ediciones Granica. Masiá, M. S. (2017). Adverbial adjectives and nominal scalarity. Ph.D Thesis, Universitat Autònoma de Barcelona. McCready, E. (2010). Varieties of conventional implicature. Semantics and Pragmatics, 3, 1–57. McNally, L. (2016). Modification. In M. Aloni, & P. Dekker (Eds.), The Cambridge handbook of formal semantics (pp. 442–464). Cambridge University Press. Meibauer, J. (2016). Slurring as insulting. Pejoration (pp. 145–167). John Benjamins. Morzycki, M. (2016). Modification. Cambridge University Press. Nunberg, G. (2018). The social life of slurs. In D. Fogal, D. W. Harris, & M. Moss (Eds.), New work on speech acts (1st ed., pp. 237–295). Oxford University Press. Orlando, E., & Saab, A. (2020a). Slurs, stereotypes and insults. Acta Analytica, 35(4), 599–621. Orlando, E., & Saab, A. (2020b). A stereotype semantics for syntactically ambiguous slurs. Analytic Philosophy, 61(2), 101–129. Polakof, A. C. (2021). An experimental approach to slurs and swearwords in the río de la plata. In Proceedings of the eighteenth international workshop of logic and engineering of natural language semantics 18 (LENLS18) (pp. 335–344). JSAI. Potts, C. (2007). The expressive dimension. Theoretical Linguistics, 33(2), 165–198. Pullum, G. (2018). Slurs and obscenities: Lexicography, semantics, and philosophy. In D. Sosa (Ed.), Bad words: Philosophical perspectives on slurs. Engaging Philosophy (1st ed., pp. 168– 192). Oxford University Press. Saab, A. (2004). Epítetos y elipsis nominal en español. Revista de la Sociedad Argentina de Lingüística, 1, 31–51. Saab, A., & Carranza F. (2021). Dimensiones del significado. Buenos Aires, SADAF.

10 Slurs in the Rio de la Plata

213

Spotorno, N., & Bianchi, C. (2015). A plea for an experimental approach on slurs. Language Sciences, 52, 241–250. Väyrynen, P. (2021). Thick ethical concepts. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy (Spring 2021 ed.). Metaphysics Research Lab, Stanford University. Warachan, B. (2011). Appropriate statistical analysis for two independent groups of Likert-type data. American University.

Ana Clara Polakof Ana Clara Polakof is Adjunct Professor in the Linguistics Institute at Universidad de la República (Uruguay). Her work lies at the intersection of philosophy of language, linguistics, semantics and pragmatics.

Chapter 11

Who Has a Free Speech Problem? Motivated Censorship Across the Ideological Divide Manuel Almagro

, Ivar R. Hannikainen

, and Neftalí Villanueva

Abstract Recent years have seen recurring episodes of tension between proponents of freedom of speech and advocates of the disenfranchised. Recent survey research attests to the ideological division in attitudes toward free speech, whereby conservatives report greater support for free speech than progressives do. Intrigued by the question of whether “canceling” is indeed a uniquely progressive tendency, we conducted a vignette-based experiment examining judgments of offensiveness among progressives and conservatives. Contrary to the dominant portrayal of progressives and conservatives, our study documented ideological symmetry in their evaluations of offensive speech. When faced with utterances whose content matters to them, both conservatives and progressives viewed outgroup speakers as more offensive than ingroup speakers. A second contribution of this chapter is to provide a deeper understanding of the cognitive mechanism implicated in evaluating outgroup speech as more offensive than ingroup speech. Our results suggest that perception of offensiveness is mediated by ascriptions of intent: we tend to attribute negative intent to the speaker whenever we deem their utterances to be offensive, even against the explicitly stated speaker’s background attitudes.

M. Almagro () Department of Philosophy, Faculty of Philosophy and Educational Sciences, University of Valencia, Valencia, Spain e-mail: [email protected] I. R. Hannikainen · N. Villanueva Department of Philosophy I, Faculty of Psychology, Campus Universitario de Cartuja, Granada, Spain e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Bordonaba-Plou (ed.), Experimental Philosophy of Language: Perspectives, Methods, and Prospects, Logic, Argumentation & Reasoning 33, https://doi.org/10.1007/978-3-031-28908-8_11

215

216

M. Almagro et al.

11.1 Introduction Recent years have seen recurring episodes of tension between proponents of freedom of speech and advocates of the disenfranchised (see Meesala, 2020; Ramsay, 2021). This tension has also acquired a political tint: Voices on the left assume a duty to protect underprivileged groups from insidious forms of discrimination, that sometimes take the form of humor, science, or campaign messaging. Meanwhile, the right reacts with resentment toward a growing climate of censorship, the socalled “cancel culture”, which–from its perspective–has degraded intellectual life and stifled public discourse (Romano, 2021). This phenomenon has also fractured academia: In 2020, a long list of writers signed the Harper’s Letter, in which they denounced creeping restrictions on intellectual freedom throughout many contemporary democracies (Harper, 2020). Shortly after, The Objective published a critical response in which a second group of journalists and academics, emphasizing the signatories’ position of privilege, questioned how a group of primarily cisgender white members of the cultural elite could expound on their experience of intellectual repression while drawing attention away from the deeper and persistent problems that divergent, minority voices face (The Objective, 2020). Recent survey research attests to the ideological division in attitudes toward free speech, whereby conservatives report greater support for free speech than progressives do (Pew Research Center, 2016, 2021). This alignment between ideology and endorsement of free speech has fostered distinct stereotypes about liberals and conservatives: for instance, the idea that progressives are overly critical of generalizations and differentiation involving sociodemographic groups, and unduly sensitive to purportedly offensive language, and the idea that conservatives are callous in their treatment of marginalized groups and even lack a moral compass. Intrigued by the question of whether “canceling” is indeed a uniquely progressive tendency, we conducted a vignette-based experiment examining judgments of offensiveness among progressives and conservatives. Our study also builds on previous work in which we identified contextual factors that render speech offensive, providing a clearer picture of the circumstances in which these effects arise and why. Before presenting this research, we survey evidence from the published literature elucidating the factors that shape our evaluations of offensive speech.

11.1.1 Background Whether a given statement is offensive or not depends evidently on what is being said; i.e., the content of speech. Yet experimental research has established that a host of contextual elements also impact our judgments of whether speech is offensive. For instance, their offensive meaning shifts between professional and personal contexts (Fasoli et al., 2015; O’Dea et al., 2015; O’Dea & Saucier, 2016). Speakers with malicious intent are treated as more offensive than speakers with neutral intent–even when making the same remark; and whether the audience reacts

11 Motivated Censorship Across the Ideological Divide

217

adversely or indifferently can retrospectively determine whether a statement was offensive (Swim et al., 2003).1 Other studies have uncovered an influence of the speaker’s identity: Listeners rapidly form an impression of the speaker from their voice (Fourcart & Hartsuiker, 2021), accent (Cai et al., 2017), and gender (Fasoli et al., 2015; O’Dea et al., 2015; O’Dea & Saucier, 2016)–which informs their interpretations of the speaker’s words. In our own research, we have focused primarily on whether the offensiveness of a statement about some target group depends on the speaker’s relationship to that same group (Almagro et al., 2021). Our inspiration came from the observation that remarks that can be acceptable –merely informative– when made by members of non-dominant groups are sometimes seen as offensive and censurable when made by members of socially advantaged outgroups–a phenomenon to which our studies attested. For example, in one of the scenarios we employed, a politician from the Spanish city of Ceuta, an underprivileged region located in North Africa, asserts that “For Spain, losing Catalonia is not the same as losing Ceuta” –a statement than can readily be understood as informative and/or offensive. We then compared participants’ reactions in this control condition to two other experimental conditions in which the same statement was made by an outgroup member. Our results showed that the statement was more offensive and worthy of censorship when uttered by an outgroup member, e.g., a politician from a different region in Spain, than by an ingroup member. Additionally, this effect held whether the speaker had a higher social status or a comparable social status (e.g., belonging to an equally disadvantaged outgroup)–which suggests that the offensiveness results from outgroup membership and not from elevated social status. We refer to this finding as the speaker membership effect, and in the present study we seek to understand whether it arises among conservatives’ and progressives’ assessments of offensive speech. The speaker membership effect is a context-dependent phenomenon that seemingly affects the meaning of our utterances, making them susceptible to be interpreted as merely descriptive or as potentially offensive. If this affected only certain political groups one could easily wonder whether meaning is actually modulated by this feature of the context, or we’re simply in front of a mistake, a confusion of sorts. Speaker membership could be interpreted as a distraction, something that makes it difficult, for some people, to properly understand the meaning of certain utterances. But if the effect were present on liberals and conservatives alike, this would be a reason to resist that line of thought. The systematic presence of the speaker membership effect would certainly not dissuade those willing to explore the idea that we’re simply confused when we judge an utterance to be offensive due to the identity of the speaker, but it would be a new uncomfortable piece of evidence to accommodate within their narrative.

1

There is at least another possibility here: the offensive content of speech is determined by contextual factors. This alternative formulation implies that the nature of offensiveness is semantic and not exclusively pragmatic. In other words, it is what we say –rather than how we say what we say– that is apt to be offensive.

218

M. Almagro et al.

From a more theoretical perspective, even if it were assumed that the speaker membership effect modulates the meaning of our utterances, it would still remain undecided whether this affects what is said, or some other layer of meaning. Over the past few years, several options have been pushed forward to accommodate the evaluative content of our utterances. For some, it belongs to our presuppositions (see for example Cepollaro, 2017a, b; Cepollaro & Stojanovic, 2016), for others it’s part of the content that we conventionally implicate (see for example Copp, 2009; Gutzmann, 2011; McCready, 2010; Potts, 2005, 2007), conversationally implicate (see for example Väyrynen, 2013), etc. Our results, again, will not adjudicate this dispute, but they are not completely neutral with respect to it either. If the speaker membership effect is, as we think, a widespread phenomenon, if it’s not contained within a particular group of speakers, we have a reason to place this sort of evaluative information close to the more stable, less sensitive, range of contextual features. This, in turn, should be taken into consideration when thinking about the layer of meaning where this contextual information gets placed.

11.1.2 Moral Conviction and the Contours of Speaker Identity Norms Throughout this broad base of experimental research, differences between progressives’ and conservatives’ response patterns have either been left unexamined, limited in scope, or secondary to the researchers’ objectives. In the present study, we direct our emphasis onto this exact question: Do conservatives and progressives differ in their tendency to oppose discordant messages and deem them offensive? In recent years, research in political psychology has systematically examined whether various psychological dispositions are symmetrical (manifesting equally among progressives and conservatives) or asymmetrical (i.e., emerge selectively or more strongly in one ideological group; see e.g., Crawford, 2014; Viciana et al., 2019). A historical perspective calls into question the view that freedom of speech is a fundamentally conservative principle. Early appeals to freedom of expression were made from the left, in an effort to protect labor unions’, abolitionists’ and suffragists’ right to protest from the suppression of government. These considerations provide some intuitive support for the symmetry view, and point toward a particular explanation: Perhaps, listeners oppose free speech that threatens their moral convictions (Epstein et al., 2018). When the exercise of free speech violated tradition and undermined authority–characteristically conservative values (Graham et al., 2009)–then it was progressives who defended free speech and conservatives who called for its containment. Nowadays, as free speech seemingly challenges the progressive values of justice and equality, conservatives have come to adopt a more favorable attitude, and progressives a more restrictive attitude, toward free speech (see Epstein et al., 2018). From this perspective, the contemporary politicization of free speech is primarily a product of opportunism and not principle (see identityprotective cognition, Kahan, 2017; Kahan et al., 2011) – i.e., a reflection of the shifting issues at hand and not of stable political values of the left versus right.

11 Motivated Censorship Across the Ideological Divide

219

In the present chapter, we home in on this question: namely, whether progressives and conservatives selectively “cancel” speakers who threaten their worldview while upholding the freedom of speech of those who promulgate it. This makes a further prediction about the evolution of historical debates as ideological division wanes (Hernández et al., 2021). To assess the latter hypothesis, we also contrast attitudes toward contemporary or active political disputes (regarding Covid-19, immigration and gender inequality) with attitudes toward historical or inactive debates (regarding divorce, same sex marriage, and the Spanish national anthem). Conservative and progressive views about the permissibility of divorce or same sex marriage, though once in stark conflict, have largely aligned over time in the direction of the historically progressive stance. So we reasoned that a closer look at judgments of offensiveness in the context of historical issues for which ideological division has waned would help dissociate the contributions of ideological conviction and political identity to evaluations of offensive speech.

11.1.3 Speaker Identity Norms as Effects of Inverse Planning A second contribution of this chapter is to provide a deeper understanding of the cognitive mechanism implicated in evaluating outgroup speech as more offensive than ingroup speech. In our previous studies, participants often denied the relevance of speaker identity when asked in abstract terms, despite the fact that it exerted a robust influence on their case-based evaluations (Almagro et al., 2021). This helps rule out a particularly straightforward explanation: namely, the reason why people selectively censor outgroup speakers is not due to the endorsement of a corresponding explicit norm. Partly inspired by previous research on the reappropriation of certain slurs (Gibson et al., 2019), we considered a rather different explanation: i.e., that contextual cues regarding the speaker’s identity and prior attitudes could help establish whether what they said was offensive by informing listeners’ representations of the speaker’s intent.2 For instance, outgroup speakers may be ascribed more harmful intent (i.e., the desire to offend the target group) than ingroup speakers, who might be ascribed a neutral intent (such as the goal of describing the target group)–when making the exact same statement. This intention ascription could then help explain the speaker membership effect on offensiveness, since mental state inferences are known to exert downstream effects on prescriptive evaluations, such as blame and wrongness (Cushman, 2008; Monroe & Malle, 2017). On this view, a speaker’s identity has an immediate effect on representations of their intention, and indirectly on the degree to which their utterances are deemed offensive.

2

This hypothesis could also link people’s judgment patterns to their abstract principles. If indeed speaker identity influences ascriptions of intent, then the endorsement of speaker intent as the most relevant criterion could help account for the large impact of speaker identity on perceptions of offensiveness.

220

M. Almagro et al.

11.2 Methods Materials, data and analysis scripts are publicly available on the Open Science Framework at: https://osf.io/u83d4/

11.2.1 Power Analysis To estimate our required sample size, we conducted a power analysis (f2 = .10, α = .05, power = .90) for a linear regression model with 7 numerator degrees of freedom, corresponding to the three main effects (statement orientation, speaker membership, and speaker intention) and all their possible interactions. A target N of 182 provided 90% power to observe a significant effect of magnitude f2 = .10. Since we were interested in conducting two separate studies (one among progressives and one among conservatives), our target sample size was 364 participants.

11.2.2 Participants We recruited 400 Spanish natives in partnership with a survey research firm. The sample was divided into four strata according to political orientation and sex, with 100 participants in each stratum: Men-Left, Women-Left, Men-Right, and Women-Right. Mean sample age was 49 years old (M = 50 left-leaning participants, M = 48 right-leaning participants). In addition to sex, the sample was also nationally representative for age group (six strata: 18–24, 25–34, 35–44, 45–54, 55–64, 65–99) and geographical region (nine regions: northeast (Catalonia and Balears), east, south (Andalusia), center, northwest, north center, Canary Island, Barcelona metropolitan area, Madrid metropolitan area). Furthermore, the sample included self-declared supporters of each of the five primary political parties in Spain (Unidas Podemos: n = 36, PSOE: n = 73, Ciudadanos: n = 37, PP: n = 68, Vox: n = 42, Other: n = 66, No answer: n = 78). By comparison to recent nationally representative polls of voting intention (Centro de Investigaciones Sociológicas, 2022), our study appeared to overrepresent conservative voters (i.e., Ciudadanos, PP and Vox), which could be explained by our 1:1 left/right stratification.

11 Motivated Censorship Across the Ideological Divide

221

Table 11.1 Example stimuli Context Membership Attitude Orientation

During a TV show, several guests discuss the consequences of the law that regularized divorce in Spain, approved in 1981 Ana, a woman who is the spokesperson for a feminist group (/Ángel, a bishop) And who in the past has always been uncomfortable (/comfortable) in the presence of married people, says the following: “Most divorced people feel freer, safer, happier and more dignified than during marriage” (/“Children of divorced families are, statistically, more prone to delinquency and antisocial conduct”)

11.2.3 Materials The materials for this study were based on three historic or ‘closed’ issues (divorce, same sex marriage, and the Spanish national anthem), and three contemporary or ‘open’ issues (Covid-19, immigration and gender inequality). For each issue, we wrote a total of eight variants, in order to complete the 2 (orientation: whether the remark affronts progressive or conservative values) x 2 (identity: whether the speaker was a member/non-member of the target group) x 2 (attitude: whether the speaker has a negative/positive attitude toward the target group) matrix. These manipulations are illustrated in Table 11.1. In each scenario, a verbal statement was made that was defined by its target group (i.e., the political group that could potentially take offense). For the sake of our present example, let us suppose that the speaker states that “Children of divorced families are, statistically, more prone to delinquency and antisocial conduct” and, therefore, would have progressives as their target group. After the opening sentence, we provided information about the speaker’s identity, by randomly presenting either a characteristically conservative (e.g., a bishop3) or progressive (e.g., a spokesperson for a feminist group) speaker. This manipulation allowed us to code for two further properties of the scenario: whether the speaker belongs to the target group or not (which we refer to as membership), and whether they belong to the participant’s ideological ingroup or outgroup (which we refer to as ingroup status). Next, we convey the speaker’s background attitude toward the target group, which could be positive or negative (depending on condition assignment). And lastly, we displayed the statement (which, as previously stated, could be geared toward offending progressives or conservatives).

3

It is important to note that, in Spain, the catholic religion is associated with conservatives.

222

M. Almagro et al.

11.2.4 Procedure In a balanced incomplete block design, participants were randomly assigned to one of eight groups and viewed a battery of six consecutive scenarios in a random order. In each group, participants viewed six of the eight factorial combinations in the 2 (speaker membership: ingroup, outgroup) × 2 (speaker’s background attitude: offensive, neutral) × 2 (statement orientation: left, right) matrix paired with a different scenario on each trial. Thus, no participant viewed the same scenario or factorial combination twice. Collapsing across groups, we achieved balance in the scenario-by-condition matrix (n per cell: Mdn = 52, Min = 41, Max = 64). After the six scenarios, participants self-reported the relevance of the same four considerations in determining whether speech is offensive: the speaker’s identity and intent, as well as the statement’s orientation, and the salience of the issue in question.

11.2.5 Measures Offensiveness and Intent Ratings On each trial, participants evaluated the statement’s offensiveness and the speaker’s intent. Offensiveness was assessed through four items: • [O1 ] the speaker “is offensive,” • [O2 , reverse-scored] the speaker “simply provides information about” the situation, • [O3 ] the speaker “should not say that kind of thing”, and • [O4 , reverse-scored] “I see no problem with the speaker saying that kind of thing”. Attributions of intent were made in a pair of items: • [I1 , reverse-scored] the speaker “merely wanted to provide information about” the situation, and • [I2 ] the speaker “believed that her statement would be offensive”. Every item was assessed on a continuous scale from 1: “Strongly disagree” to 7: “Strongly agree” in decimal increments. The four offensiveness items formed a composite score with good reliability (Cronbach’s α = .77). The composite intentscore, however, exhibited questionable reliability (Cronbach’s α = .60), perhaps due to the low number of items. Self-Reported Criteria Participants reported their emphasis on (1) the speaker’s identity (“who the speaker is”), (2) the speaker’s backgroundattitude (“whether or not the speaker holds a pejorative attitude toward the group he/she talks about”), (3) the statement’s orientation (“whether the topic under discussion is relevant to the

11 Motivated Censorship Across the Ideological Divide

223

left or to the right”), and (4) the salience of the issue in question (“whether or not the topic is politically salient”). These ratings were provided using single items and recorded on a continuous scale from 1: “No importance/relevance” to 7: “Absolute importance/relevance”. Demographics We also recorded participants’ age, sex, political orientation, and voting preference.

11.3 Results The results section is divided into four subsections: In Sect. 11.3.1 (on offensiveness) and 11.3.2 (on intent), we analyze the responses of left-leaning and right-leaning participants separately. These subgroup analyses are followed by moderation analyses on the full dataset, to assess whether judgments of offensiveness and intent depend on participants’ ideology. In Sect. 11.3.3, we introduce a plausible causal model of intent and offensiveness and conduct mediation analyses to evaluate its fit to the data. In Sect. 11.3.4, we report supplementary analyses of differences in reactions toward active debates (such as Covid-19, immigration, or gender inequality) versus inactive or historical issues (such as divorce, same sex marriage, or the Spanish national anthem).

11.3.1 Part 1: Offensiveness Progressive Sample As shown in Table 11.2, in the left-leaning sample, we observed main effects of statement orientation, speaker membership, and speaker attitude. Statement orientation interacted with speaker membership, whereas there were no interactions involving the background attitude factor. Replicating previous research, the main effect of attitude revealed that negative background attitudes rendered speech more offensive (M = 4.07, 95% CI = [3.35, 4.80]) than positive background attitudes (M = 3.81, 95% CI = [3.08, 4.53]), B = 0.26, t = 3.02, p = .003. We followed up on the membership×orientation interaction by examining the pattern of marginal effects: Affronts to a progressive worldview were seen as more offensive when coming from non-members (M = 4.61, 95% CI = [3.89, 5.32]) than members (M = 3.95, 95% CI = [3.23, 4.66]), B = 0.66, t = 5.35, p < .001. The pattern reversed for affronts to a conservative ideology: They were seen as more innocuous when coming from progressive non-members (M = 3.44, 95% CI = [2.73, 4.16]) than conservative members (M = 3.75, 95% CI = [3.04, 4.47]), B = −0.31, t = −2.55, p = .011. Conservative Sample Among right-leaning participants, we found a main effect of statement orientation, which was qualified by a two-way interaction with speaker

224

M. Almagro et al.

Table 11.2 Model comparisons: offensiveness M

Left (n = 200) F(1, 983) = 4.08, p = .044

A O M×A

F(1, 1060) = 8.68, p = .003 F(1, 1087) = 58.17, p < .001 F(1, 1050) = 0.00, p = .93

M×O F(1, 1023) = 31.90, p < .001 A×O F(1, 1122) = 0.25, p = .62 M×I×O F(1, 1043) = 4.15, p = .042

Right (n = 200) F(1, 996) = 2.31, p = .13 Model 1 L (AIC = 4509) F(1, 1068) = 2.06, p = .15 F(1, 1100) = 7.51, p = .006 F(1, 1076) = 0.66, p = .42 Model 2 L (AIC = 4480) F(1, 1042) = 6.73, p = .010 F(1, 1133) = 0.01, p = .91 F(1, 1038) = 0.29, p = .59 Model 3 L (AIC = 4479)

Model 1R (AIC = 4470)

Model 2R (AIC = 4469)

Model 3R (AIC = 4470)

Fig. 11.1 Marginal effect of background attitudes (top) and speaker membership (bottom) on offensiveness and intent ascriptions. In the top panel, we display the main effect of attitudes, collapsed across statement orientation. In the bottom panel, we display the membership effect separately for progressive (red) and conservative (blue) statements

membership–as in the progressive sample (see Table 11.2). No main effects of speaker membership or attitude were observed. The two-way interaction mirrored the result obtained among progressives (see Fig. 11.1): Affronts to a conservative ideology were seen as more offensive when stemming from non-members (M = 4.14, 95% CI = [3.36, 4.92]) than members (M = 3.79, 95% CI = [3.01, 4.56]), B = 0.35, t = 2.97, p = .003. In contrast, affronts to a progressive ideology were evaluated comparably whether coming from a member (M = 3.79, 95% CI = [3.01, 4.57]) or non-member (M = 3.71, 95% CI = [2.93, 4.49]) of the target group, B = −0.08, t = −0.69, p = .49.

11 Motivated Censorship Across the Ideological Divide

225

In Table 11.2, “M” stands for speaker membership, “A” stands for speaker attitude, and “O” stands for statement orientation. “AIC” stands for the Akaike information criterion. Model 1 analyzed the main effects of M, A and O. Model 2 analyzed the interaction effects of M×A, MxO, and AxO. Model 3 analyzed the interaction effects of MxAxO. In each model, “L” stands for left-leaning participants and “R” stands for right-leaning participants. Summary and Comparative Analyses Statement orientation interacted with speaker membership in both progressives’ and conservatives’ ratings of offensiveness. In both groups, outgroup speakers were seen as more offensive than ingroup speakers when making a statement hostile to participants’ own ideology. To jointly analyze progressive and conservative responses, we recoded the independent variables: (1) ideological fit, which takes the participants’ political orientation and the statement’s content and codes whether these are concordant or discordant, and (2) ingroup/outgroup, which takes the participant’s political orientation and the speaker’s identity and codes whether the speaker belongs to the participant’s ingroup or outgroup. We conducted separate analyses for concordant and discordant statements. In comparative analysis of concordant messages, this membership effect was significant, F(1, 885) = 4.85, p = .028, and was not moderated by participants’ political ideology, F(1, 876) = 1.72, p = .19. For discordant messages, the effect of speaker membership was highly significant, F(1, 885) = 35.29, p < .001, and was moderated by participants’ political ideology, F(1, 830) = 4.29, p = .039. Specifically, outgroup speakers were perceived as more offensive by liberals than by conservatives, B = 0.47, t = 3.61, p < .001, while ingroup speakers were evaluated similarly in both groups, B = 0.12, t = 0.91, p = .36 (see Fig. 11.1). Thus, while we observed the membership effect among both progressives and conservatives (suggesting ideological symmetry), the magnitude of the effects was significantly different (suggesting asymmetry). When looking separately at progressives and conservatives, the effect of speaker attitude attained statistical significance only in the left-leaning subsample. Yet a comparative analysis revealed a main effect of attitude, F(1, 2143) = 8.60, p = .003, that was not moderated by participants’ political orientation, F(1, 2143) = 0.61, p = .44–suggesting that the discrepancy in the subgroup analyses was not meaningful and, in turn, that the effect of background attitudes on offensiveness is symmetrical across the political divide.

11.3.2 Part 2: Attributions of Intent Progressive Sample Among left-leaning participants, we observed main effects of statement orientation, speaker membership, and speaker intent. The effects of statement orientation and speaker membership were qualified by a two-way interaction (see Table 11.2). Once again, speaker attitude exerted a main effect–

226

M. Almagro et al.

such that speakers with a negative attitude (M = 4.07, 95% CI = [3.54, 4.61]) were ascribed more harmful intent than speakers with a positive attitude (M = 3.86, 95% CI = [3.33, 4.40]), B = 0.21, t = 2.44, p = .014. Regarding the two-way interaction between statement orientation and speaker membership, marginal effects revealed a strong influence of speaker membership for conservative remarks. Outgroup speakers (M = 4.61, 95% CI [4.08, 5.14]) were ascribed more harmful intent than were ingroup speakers (M = 3.78, 95% CI [3.24, 4.31]) when making the same statements hostile to a progressive worldview, B = 0.83, t = 6.89, p < .001. Interestingly, the effect of speaker membership appeared to reverse for progressive remarks: In other words, outgroup/conservative speakers (M = 3.86, 95% CI [3.33, 4.39]) were ascribed marginally more harmful intent than ingroup/progressive speakers (M = 3.63, 95% CI [3.10, 4.16]), t = 1.95, p = .051–despite being members of the target group which the remark could ostensibly offend. Conservative Sample Right-leaning participants exhibited a similar pattern of effects (see Table 11.2): We found that the speaker’s identity, and attitude (but not statement orientation) impacted attributions of harmful intent. The two-way interaction between statement orientation and speaker membership (observed among progressive participants) also emerged in the conservative sample. The main effect of intent revealed that speakers with negative intent (M = 3.93, 95% CI = [3.43, 4.43]) were ascribed more offensive intent than speakers with a positive attitude (M = 3.74, 95% CI = [3.25, 4.24]), B = 0.19, t = −2.27, p = .023. To probe the two-way interaction, we examined the marginal effects of speaker membership separately for each statement orientation: When uttering statements hostile to a conservative worldview, outgroup speakers (M = 4.14, 95% CI [3.65, 4.64]) were ascribed more harmful intent than were ingroup speakers (M = 3.63, 95% CI = [3.14, 4.13]), B = 0.51, t = 4.44, p < .001. No difference between ingroup and outgroup speakers arose for statements concordant with a conservative worldview, B = 0.06, t = 0.53, p = .60 (Table 11.3). Summary & Comparative Analyses Both conservatives and progressives tended to ascribe more harmful intent to outgroup speakers than ingroup speakers– when interpreting statements that defy their own ideology. In a comparative analysis, this membership effect on discordant statements was highly significant, F(1, 824) = 65.70, p < .001, and appeared to differ when comparing progressives to conservatives, F(1, 824) = 4.32, p = .038. Specifically, when making ideologically discordant statements, outgroup speakers were ascribed more harmful intent by liberals than by conservatives, B = 0.44, t = 3.44, p < .001, while ingroup speakers were ascribed comparable intent, B = 0.10, t = 0.81, p = .42 (see Fig. 11.1). The corresponding effect on concordant statements was non-significant, F(1, 881) = 2.62, p = .11. Meanwhile, the effect of background attitudes, F(1, 2111) = 10.91, p < .001, did not vary by participants’ political orientation, F(1, 2112) = 0.01, p = .94.

11 Motivated Censorship Across the Ideological Divide

227

Table 11.3 Model comparisons: intent Left (n = 200) F(1, 983) = 12.34, p < .001 Model 1 L (AIC = 4472) A F(1, 1053) = 5.76, p = .017 O F(1, 1078) = 25.80, p < .001 F(1, 1044) = 0.37, p = .55 Mode 2 L M×A (AIC = 4436) M×O F(1, 1021) = 39.89, p < .001 A×O F(1, 1119) = 0.51, p = .48 M×I×O F(1, 1041) = 2.41, p = .12 Model 3 L (AIC = 4436)

M

Right (n = 200) F(1, 995) = 7.10, p = .008

Model 1R (AIC = 4428)

F(1, 1047) = 4.98, p = .026 F(1, 1072) = 2.25, p = .13 F(1, 1055) = 0.46, p = .50 Model 2R (AIC = 4421) F(1, 1035) = 12.48, p < .001 F(1, 1124) = 0.11, p = .74 F(1, 1031) = 0.49, p = .49 Model 3R (AIC = 4422)

11.3.3 Part 3: Mediation Analyses The previous sections documented two consistent effects across intent and offensiveness models: what we refer to as the effect of background attitudes, and of speaker-orientation congruence (Fig. 11.2). These effects on offensiveness can be parsimoniously explained by positing a common mechanism: When observing an ambiguous intentional action (in our present case, a remark that could be construed as either merely informative or offensive), individuals may seek contextual or environmental cues to help interpret the behavior. In particular, individuals may be asking themselves why the behavior was performed: To this end, information about the speaker’s relation to the social group–in terms of their prior attitudes toward the group, as well as their membership or non-membership–might help to infer the agent’s goal or intent (Baker et al., 2009). Then, perceptions of whether the speaker had malicious or neutral intent can be expected to derivatively impact evaluations of their offensiveness–as documented in abundant previous research on the downstream effects of culpable mental states (Cushman, 2008; Malle & Knobe, 1997; Young et al., 2011). This mechanism can be depicted as a causal diagram in which ascribed intent mediates the effects of attitudes and speaker-orientation congruence (see Fig. 11.3). To evaluate this model, we aggregated the data from progressives and conservatives and employed the recoded factors as independent variables: ideological fit (whether the statement coheres with the participant’s political orientation or not), ingroup/outgroup (whether the speaker belongs to the participant’s ingroup or outgroup), and attitude. Thus, the fit×outgroup interaction captures the selective effect of outgroup derision, i.e., when outgroup members make ideologically discordant remarks.

228

M. Almagro et al.

Fig. 11.2 Scatter Plot of Offensiveness by Intent. The panels display assessments of progressive (left) and conservative (right) speakers issuing statements that cohere with progressive (top) or conservative (bottom) values. Sample means and 95% confidence ellipses, revealing discrepancies between progressive and conservative participants’ assessments, are overlaid

As a preliminary step in our mediation analysis, we ask whether entering intentionality ascriptions into a model of offensiveness ‘blocks’ the experimental effects of attitudes and (fit×outgroup) congruence. In a model without intentionality ratings, the effects of attitudes, F(1, 2149) = 9.33, p = .002, and fit×outgroup congruence, F(1, 2005) = 6.73, p = .010, on offensiveness were both significant. Then, entering intentionality ratings into the model rendered both effects nonsignificant (i.e., attitudes: F(1, 2141) = 0.93, p = .33, congruence: F(1, 2010) = 0.28, p = .59). In this same model, the effect of intentionality attributions was highly significant, F(1, 2391) = 2051, p < .001. We then conducted two separate mediation analyses with 5000 quasi-Bayesian simulations to evaluate the direct and indirect effects of each experimental treatment. (1) The effect of background attitudes on offensiveness was mediated by perceived intent (mediated/total = .76), ACME = 0.14, 95% CI [0.06, 0.22],

11 Motivated Censorship Across the Ideological Divide

229

Fig. 11.3 Causal model of verbal offense: the experimental effects of speaker-orientation congruence and background attitudes on offensiveness are mediated by ascriptions of intent

p < .001, leaving no direct effect from attitudes to offensiveness, p = .32. Similarly, (2) the effect of congruence on offensiveness was mediated by perceived intent (mediated/total = 1.15), ACME = 0.36, 95% CI [0.20, 0.52], p < .001, with no remaining direct effect, p = .57.4 These results add plausibility to the model depicted in Fig. 11.3, and derive support for the broader thesis that contextual elements drive perceived offensiveness as listeners spontaneously mentalize about the speaker’s intent. Moderation by Explicit Norms In our previous research, multiple approaches revealed that participants endorsed the speaker’s background attitude and denied the relevance of the speaker’s identity (Almagro et al., 2021), when asked to report on their reasoning processes. Additionally, our analyses repeatedly found no relationship between participants’ overt principles regarding offensive speech and the influence they had on their particular judgments of offensiveness. In the present study, participants viewed the speaker’s background attitude as the most relevant principle (replicating Almagro et al., 2021); which significantly exceeded the relevance of every other principle in pairwise comparisons: identity, salience, and orientation.5 Furthermore, the rank order of these principles was the same among conservatives and progressives.

4

Both the outcome and mediator models also included the main effects of fit and membership, enabling us to test whether there were indirect and direct effects of these two variables. The effect of ideological fit was direct (p < .001) not indirect (p = .87), and the main effects of membership were weak and inconclusive (direct p = .12, indirect p = .085). 5 Speaker intent was more important for progressives (M = 5.51, SD = 1.34) than for conservatives (M = 5.18, SD = 1.51). Speaker identity (M = 4.75, SD = 1.78; M = 4.53, SD = 1.79), topic salience (M = 4.44, SD = 1.69; M = 4.44, SD = 1.78), and statement orientation (M = 4.38, SD = 1.85; M = 4.40, SD = 1.76) mattered equally to progressives and conservatives.

230

M. Almagro et al.

Table 11.4 Moderated mediation results: indirect and direct effects at low (first quartile) and high (third quartile) values of the moderator

Moderator Intent relevance

Treatment Attitude

Intent relevance

Congruence

Identity relevance

Congruence

Effect Indirect Direct Indirect Direct Indirect Direct

Low (Q1 ) 0.03 [−0.07, 0.12] 0.10 [−0.01, 0.21] 0.19 [−0.00, 0.37] −0.06 [−0.29, 0.17] 0.40 [0.23, 0.58]** −0.04 [−0.24, 0.16]

High (Q3 ) 0.31 [0.18, 0.44]** −0.04 [−0.17, 0.09] 0.57 [0.34, 0.80]** −0.07 [−0.35, 0.20] 0.26 [0.06, 0.45]** −0.10 [−0.34, 0.15]

** denotes statistical significance where p < 0.1

Motivated by the finding that intent ascriptions fully mediated the effects of both attitudes and congruence, we reasoned that participants’ beliefs about the importance of speaker intent, and not identity, might moderate these effects. In other words, the effects of attitudes and congruence on offensiveness might be stronger among participants who consider a speaker’s intent (and not identity) relevant in deciding whether a statement is offensive. To evaluate this hypothesis, we conducted the mediation model at two different levels of the moderator: low relevance (the first quartile), and high relevance (the third quartile)–and qualitatively examined the overlap in the confidence intervals. This exercise revealed that participants’ beliefs about the relevance of speaker intent moderated the magnitude of both indirect effects, i.e., of background attitudes, and congruence (as shown in Table 11.4).

11.3.4 Part 4: Active Versus Inactive Debates Finally, we analyze the differences in reactions toward the three active debates vs. the three inactive issues. Statement orientation significantly interacted with topic salience in both progressive, F(1, 897) = 14.38, and conservative groups, F(1, 810) = 17.02, both ps < .001. When looking at contemporary disputes, the effect of ideological fit was present among both progressives and conservatives: Progressives viewed conservative remarks (M = 3.77, 95% CI = [2.74, 4.79]) as more offensive than progressive remarks (M = 3.46, 95% CI = [2.44, 4.49], B = 0.30, t = 2.34, p = .020. Conservatives deemed progressive statements more offensive (M = 3.78, 95% CI = [2.67, 4.89]) than conservative statements (M = 3.20, 95% CI = [2.09, 4.31]), B = 0.59, t = 4.70, p < .001. Thus, we observed symmetry in relation to active debates–whereby both ideological groups viewed discordant remarks as significantly more offensive than concordant remarks. When considering historical or inactive debates, the pattern was distinctly asymmetrical: Progressives viewed regressive statements (M = 4.79, 95% CI = [3.76, 5.81]) as much more offensive than progressive statements (M = 3.73, 95% CI = [2.71, 4.76]), B = 1.05, t = 8.18, p < .001. The effect of ideological fit was absent among conservatives. If anything, the difference trended toward greater

11 Motivated Censorship Across the Ideological Divide

231

opposition to regressive (M = 4.30, 95% CI = [3.19, 5.41]) than progressive (M = 4.14, 95% CI = [3.03, 5.25]) statements, B = 0.16, t = 1.30, p = .19. In other words, progressives found reactionary statements more offensive than progressive statements, while conservatives judged them equally. One explanation for this is that contemporary conservatives may not view regressive views on historical debates as any more concordant with their ideology than progressive views on that set of issues: It is in the nature of the historical debates included here that they have yielded a general consensus in favor of the progressive view: For example, what was once the progressive stance toward divorce (i.e., pro-divorce) may now be the norm–embraced by progressives and conservatives alike. This line of reasoning could explain why conservatives are not selectively offended by the defense of progressive values on these historical disputes –what was once a paradigmatically conservative position is no longer identified as a stance that is associated with their political ideology.

11.4 Discussion Recent calls to limit freedom of speech have reified certain stereotypes about conservative and progressive worldviews: Progressives express greater reservations about the exercise of free speech, and conservatives report greater support (Pew Research Center, 2016). This represents a striking reversal of the attitudes conservatives and progressives held historically–when free speech rights emerged as a tool for government criticism and dissidence. In this chapter, we sought to contribute to our understanding of verbal offense, by examining who sees speech as offensive (i.e., whether progressives and conservatives exhibit similar reactions), when or under what circumstances, and why, through which specific mechanism.

11.4.1 Symmetry and Asymmetry in Offensive Speech Norms Participants in our study were confronted with relatively subtle acts of verbal discrimination–of the kind that blur the boundary between innocuous and offensive speech. In one case, a hypothetical speaker stated that “many women claim that they do not suffer any harassment or discrimination”. This statement, uncongenial to a progressive worldview, was–according to conservative participants–protected by freedom of speech. In contrast, progressives considered statements of this kind offensive and censurable, particularly when made by an outgroup member. In this way, our study was able to recreate the attitudes that prevail in today’s cancel culture debate. We experimentally reversed the orientation of several such offenses so they would constitute affronts to a conservative worldview instead, e.g., “the results of

232

M. Almagro et al.

implicit bias tests show that men have more gender biases than they are aware of”. In a demonstration of ideological symmetry, progressives saw this type of statement as an innocuous exercise of free speech, while conservatives perceived it as offensive and deserving of censorship (once again, especially when coming from an outgroup member). Recent surveys have found that the right purports to uphold freedom of expression, while the left expresses greater reservations (Pew Research Center, 2021). Contrary to the idea that this constitutes a principled difference between conservatives’ and liberals’ beliefs, our findings documented motivated appeal to free speech rights among both conservatives and liberals: Participants upheld the freedom to express messages concordant with their own worldview, but took offense at outgroup speakers whose utterances could be construed as targeting their ideological convictions. These results can be fruitfully explained by theories of identity-protective or cultural cognition (Kahan, 2017; Kahan et al., 2011), and suggest that the current politicization of free speech reflects a contingent fact, i.e., that most recent episodes of intergroup disagreement have involved conservative affronts to progressive values (and not vice versa). Another possibility, compatible with the previous one, is that most judgments within the culture war are a way of expressing adhesion to a political ideology (Funkhouser, 2020; Ganapini, 2021; Williams, 2021). Maybe those who complain about “cancel culture” are mainly signaling their political identity. This would mean that the central issues in the culture war are issues for which the parties are highly affectively, but not ideologically, polarized (Bordonaba-Plou & Villanueva, 2018; Mason, 2018; Iyengar et al., 2019). It is inconsistent for somebody to complain about “cancel culture” and yet be willing to “cancel” those who, from a different ideology, make potentially offensive statements about something that matters to her. One way to measure the signaling hypothesis might be to test the correlation between the degree of disapproval produced by an abstract case of “cancellation” and the degree of endorsement of the core tenets of right-wing political parties on “cancel culture”, in participants who are in fact willing to “cancel” those who, from a different ideology, make potentially offensive statements about something they care about. If the correlation were high, then it would suggest that participants’ complaint about “cancel culture” is just a way of expressing adhesion to right-wing political parties’ ideology. Lastly, although the presence of offensive speech norms was symmetrical, their magnitude was asymmetrical. For instance, the tendency to view outgroup speakers as more offensive when making ideologically discordant remarks was significantly stronger among progressives than among conservatives. Our final analyses indicated that this discrepancy was driven by the inclusion of historical issues. These historical issues have, over time, been settled in favor of a liberal stance. From this observation, two assumptions may follow: (1) that modern-day conservatives may not align ideologically with the historically conservative stance, and (2) that modern-day progressives may imbue these historical victories with much stronger moral conviction than their attitudes regarding ongoing culture wars.

11 Motivated Censorship Across the Ideological Divide

233

In this regard, a model according to which a difference in ideological conviction is not accompanied by an actual difference on how freedom of expression mediates our particular judgments of offensiveness can accommodate evidence of asymmetry regarding historical social issues.

11.4.2 Cognitive Mechanism: The Role of Intent In our previous research, we observed that a speaker’s identity impacted participants’ perceptions of offensiveness–though they then espoused the speaker’s intent as the primary guiding principle behind their judgments (Almagro et al., 2021). The present work replicated both these findings, and offered a tentative explanation of the seeming discrepancy. When judging third-party behavior, we spontaneously form representations of the agent’s intent (Knobe, 2003; Kneer & Bourgeois-Gironde, 2017) and these representations have downstream effects on attitudes of blame (Cushman, 2008; Kirfel & Hannikainen, 2022; Malle & Knobe, 1997). Inspired by this literature, we recorded participants’ ascriptions of intent and learned that speaker intent played an important mediating role. When considering an ideologically discordant statement, participants appeared to avail themselves of contextual cues with which to inform their representations of the speaker’s intent (see Baker et al., 2009; Gibson et al., 2019). For instance, the statement that children of divorced families are “statistically more prone to delinquency and antisocial behavior” could be made either with neutral intent in order to inform and describe, or with a harmful intent. In this predicament, information about who the speaker is played a decisive role in disambiguating perceived intent among the target group (in this case, progressives): The statement was seen as intended to harm when uttered by a conservative individual, but intended to convey information when made by a fellow liberal; and this phenomenon too was symmetrical across the political divide. In turn, perceived intent appeared to fully account for the effect of identity (and background attitudes) on the degree to which such remarks were considered offensive. These results could dissolve the apparent incongruity between participants’ behavioral focus on identity, and their self-reported emphasis on intent (and disregard of speaker identity; Almagro et al., 2021). Moreover, the process by which variation in speaker identity is encoded as differences in intent (and derivatively in offensiveness) appeared to be within executive control, since participants’ selfreported emphasis on intent did in fact determine the extent to which outgroup derogation was seen as offensive (via ascriptions of intent).

234

M. Almagro et al.

11.5 Conclusion Contrary to the dominant portrayal of progressives and conservatives, our study documented ideological symmetry in their evaluations of offensive speech. When faced with utterances whose content matters to us, and somehow threaten our ideological stance, both conservatives and progressives react as if the utterances were in fact offensive. Support for free speech does not seem to have an impact on our actual judgments of offensiveness. These processes can be interpreted as evidence of identity-protective or cultural cognition in response to outgroup affronts to one’s values. To explain these results, we advanced an exploratory model of offensive speech judgments in which representations of the speaker’s guiding intent occupy a central role. Even against explicitly stated background attitudes, we tend to attribute negative intent to the speaker whenever we deem their utterances to be offensive. In future work, this model should be confirmed and extended to a broader range of linguistic phenomena. Practical consequences of our study can be explored in different ways. We will mention two of those possible paths. Firstly, we could ponder what the conclusions of our study imply for regulation aimed at expanding or protecting freedom of speech. Regulators should take into account the fact that abstract support for freedom of expression is seemingly disconnected from actual concrete judgments of offensiveness. This might lead to unexpected backlash from the public, once the consequences of such putative policies appear. Secondly, this difference between what we are willing to endorse on an abstract level and our more concrete judgments might be indicative of a situation of social conflict. Those situations in which our concrete judgments seem to part ways with the general principles that we take ourselves to endorse might be associated with a polarized public opinion, with a strong partisan divide. This could be the seed for new and promising ways to assess the rise of polarization.

References Almagro, M., Hannikainen, I. R., & Villanueva, N. (2021). Whose words hurt? Contextual determinants of offensive speech. Personality and Social Psychology Bulletin, 48(6), 937–953. Baker, C. L., Saxe, R., & Tenenbaum, J. B. (2009). Action understanding as inverse planning. Cognition, 113(3), 329–349. Bordonaba-Plou, D., & Villanueva, N. (2018). Affective polarization as impervious reasoning. In Philosophical perspectives. The 13th conference of the Italian Society for Analytic Philosophy: Italian Society for Analytic Philosophy. Cai, Z. G., Gilbert, R. A., Davis, M. H., Gareth Gaskell, M., Farrar, L., Adler, S., & Rodd, J. M. (2017). Accent modulates access to word meaning: Evidence for a speaker-model account of spoken word recognition. Cognitive Psychology, 98, 73–101. Centro de Investigaciones Sociológicas. (2022). CIS Survey Number 3363: May Barometer. https:/ /www.analisis.cis.es/cisdb.jsp Cepollaro, B. (2017a). Slurs as the shortcut of discrimination. Rivista di Estetica, 64, 53–65. Cepollaro, B. (2017b). The semantics and pragmatics of slurs and thick terms. PhD thesis, PSL Research University.

11 Motivated Censorship Across the Ideological Divide

235

Cepollaro, B., & Stojanovic, I. (2016). Hybrid evaluatives: In defense of a presuppositional account. Grazer Philosophische Studien, 93(3), 458–488. Copp, D. (2009). Realist-expressivism and conventional implicature. In R. Shafer-Landau (Ed.), Oxford studies in metaethics (Vol. 4, pp. 167–202). Oxford University Press. Crawford, J. T. (2014). Ideological symmetries and asymmetries in political intolerance and prejudice toward political activist groups. Journal of Experimental Social Psychology, 55, 284– 298. Cushman, F. (2008). Crime and punishment: Distinguishing the roles of causal and intentional analyses in moral judgment. Cognition, 108(2), 353–380. Epstein, L., Parker, C. M., & Segal, J. A. (2018). Do justices defend the speech they hate? Journal of Law and Courts, 6(2), 237–262. Fasoli, F., Carnaghi, A., & Paladino, M. P. (2015). Social acceptability of sexist derogatory and sexist objectifying slurs across contexts. Language Science, 52, 98–107. Fourcart, A., & Hartsuiker, R. J. (2021). Are foreign-accented speakers that ‘incredible’? The impact of the speaker’s indexical properties on sentence processing. Neuropsychologia, 158. https://doi.org/10.1016/j.neuropsychologia.2021.107902 Funkhouser, E. (2020). A tribal mind: Beliefs that signal group identity or commitment. Mind & Language, 37(3), 444–464. Ganapini, M. B. (2021). The signaling function of sharing fake stories. Mind & Language. https:// doi.org/10.1111/mila.12373 Gibson, J. L., Epstein, L., & Magarian, G. P. (2019). Taming uncivil discourse. Political Psychology, 41, 383–401. Graham, J., Haidt, J., & Nosek, B. B. (2009). Liberals and conservatives rely on different sets of moral foundations. Journal of Personality and Social Psychology, 96(5), 1029–1046. https:// doi.org/10.1037/a0015141 Gutzmann, D. (2011). Expressive modifiers & mixed expressives. In O. Bonami & P. C. Hofherr (Eds.), Empirical issues in syntax and semantics (Vol. 8, pp. 123–141). CSSP. Harper. 2020. A letter on justice and open debate. Harper’s Magazine. https://harpers.org/a-letteron-justice-and-open-debate/. Accessed 2 Mar 2022. Hernández, E., Anduiza, E., & Rico, G. (2021). Affective polarization and the salience of elections. Electoral Studies, 69. https://doi.org/10.1016/j.electstud.2020.102203 Iyengar, S., Lelkes, Y., Levendusky, M., Malhotra, N., & Westwood, S. J. (2019). The origins and consequences of affective polarization in the United States. Annual Review of Political Science, 22, 129–146. Kahan, D. M. (2017). The expressive rationality of inaccurate perceptions. Behavioral and Brain Sciences, 40, e6. https://doi.org/10.1017/S0140525X15002332 Kahan, D. M., Jenkins-Smith, H., & Braman, D. (2011). Cultural cognition of scientific consensus. Journal of Risk Research, 14(2), 147–174. Kirfel, L., & Hannikainen, I. R. (2022). Why blame the ostrich? Understanding culpability for Willful ignorance. In S. Magen & C. Prochownik (Eds.), Advances in experimental philosophy of law. Bloomsbury Press. Kneer, M., & Bourgeois-Gironde, S. (2017). Mens rea ascription, expertise and outcome effects: Professional judges surveyed. Cognition, 169, 139–146. Knobe, J. (2003), Intentional action and side effects in ordinary language. Analysis, 63, 190–194. https://doi.org/10.1111/1467-8284.00419 Malle, B. F., & Knobe, J. (1997). The folk concept of intentionality. Journal of Experimental Social Psychology, 33(2), 101–121. Mason, L. (2018). Uncivil agreement: How politics became our identity. University of Chicago Press. McCready, E. (2010). Varieties of conventional implicature. Semantics and Pragmatics, 3(8), 1–57. Meesala, S. (2020). Cancel culture: A societal obligation or infringement on free speech? UAB Institute for Human Rights Blog. https://sites.uab.edu/humanrights/2020/12/04/cancel-culturea-societal-obligation-or-infringement-on-free-speech/ Accessed 2 Mar 2022

236

M. Almagro et al.

Monroe, A. E., & Malle, B. F. (2017). Two paths to blame: Intentionality directs moral information processing along two distinct tracks. Journal of Experimental Psychology: General, 146(1), 123–133. O’Dea, & Saucier, D. A. (2016). Negative emotions versus target descriptions: Examining perceptions of racial slurs as expressive and descriptive. Group Processes & Intergroup Relations, 20(6), 813–830. O’Dea, C. J., Miller, S. S., Andres, E. B., Ray, M. H., Till, D. F., & Saucier, D. A. (2015). Out of bounds: Factors affecting the perceived offensiveness of racial slurs. Language Science, 52, 155–164. Pew Research Center. (2016). In ‘political correctness’ debate, most Americans think too many people are easily offended. Pew Research Center. https://www.pewresearch.org/facttank/2016/07/20/in-political-correctness-debate-most-americans-think-too-many-people-areeasily-offended/. Accessed 15 Mar 2022 Pew Research Center. (2021). How Americans feel about ‘cancel culture’ and offensive speech in 6 charts. Pew Research Center. https://www.pewresearch.org/fact-tank/2021/08/17/howamericans-feel-about-cancel-culture-and-offensive-speech-in-6-charts/. Accessed 15 Mar 2022 Potts, C. (2005). The logic of conventional Implicatures. Oxford University Press. Potts, C. (2007). Into the conventional-implicature dimension. Philosophy Compass, 2(4), 665– 679. Ramsay, A. (2021). Culture wars: It’s the right that is trying to cancel free speech. Open democracy. https://www.opendemocracy.net/en/opendemocracyuk/culture-wars-its-the-right-that-istrying-to-cancel-free-speech/. Accessed 2 Mar 2022. Romano, A. (2021). The second wave of “cancel culture”. Vox. https://www.vox.com/22384308/ cancel-culture-free-speech-accountability-debate. Accessed 2 Mar 2022. Swim, J. K., Scott, E. D., Sechrist, G. B., Campbell, B., & Stangor, C. (2003). The role of intent and harm in judgments of prejudice and discrimination. Journal of Personality and Social Psychology, 84(5), 944–959. The Objective. (2020). A more specific letter on justice and open debate. The Objective. https:// objectivejournalism.org/2020/07/a-more-specific-letter-on-justice-and-open-debate/. Accessed 2 Mar 2022 Väyrynen, P. (2013). The lewd, the rude and the nasty: A study of thick concepts in ethics. Oxford University Press. Viciana, H., Hannikainen, I. R., & Gaitán, A. (2019). The dual nature of partisan prejudice: Morality and identity in a multiparty system. PLoS One, 14(7), e0219509. https://doi.org/ 10.1371/journal.pone.0219509 Williams, D. (2021). Signalling, commitment, and strategic absurdities. Mind & Language. https:/ /doi.org/10.1111/mila.12392 Young, L., Jonathan, S., & Saxe, R. (2011). Neural evidence for “intuitive prosecution”: The use of mental state information for negative moral verdicts. Social Neuroscience, 6(3), 302–315.

Manuel Almagro is a Juan de la Cierva Fellow in the Department of Philosophy at the University of Valencia. His research focuses on political epistemology, philosophy of language, and experimental philosophy. He is also keenly interested in philosophy of mind, philosophy of psychiatry, and Wittgenstein’s philosophy. Ivar R. Hannikainen obtained a PhD in philosophy from the University of Sheffield, and went on to work in the Law Department at the Pontifical Catholic University of Rio de Janeiro. He is currently a Ramón y Cajal fellow in the Department of Philosophy I at the University of Granada. His research is in cognitive science and moral psychology, with an emphasis on applications to legal and medical decision-making. Neftalí Villanueva is Profesor Titular at the Department of Philosophy I, University of Granada, Spain. Most of his work focuses on applying the philosophy of language to classical questions

11 Motivated Censorship Across the Ideological Divide

237

in the history of philosophy and to political and social problems. He is Principal Investigator on several different research projects exploring the connection between polarization and disagreement funded by the Spanish Ministry for Science and Innovation, the Comunidad Autónoma de Andalucía, and the BBVA Foundation.

Part IV

Experimental Philosophy of Language and Psychology

Chapter 12

How Understanding Shapes Reasoning: Experimental Argument Analysis with Methods from Psycholinguistics and Computational Linguistics Eugen Fischer and Aurélie Herbelot

Abstract Empirical insights into language processing have a philosophical relevance that extends well beyond philosophical questions about language. This chapter will discuss this wider relevance: We will consider how experimental philosophers can examine language processing in order to address questions in several different areas of philosophy. To do so, we will present the emerging research program of experimental argument analysis (EAA) that examines how automatic language processing shapes verbal reasoning – including philosophical arguments. The evidential strand of experimental philosophy uses mainly questionnaire-based methods to assess the evidentiary value of intuitive judgments that are adduced as evidence for philosophical theories and as premises for philosophical arguments. Extending this prominent strand of experimental philosophy, EAA underpins such assessments, extends the scope of the assessments, and expands the range of the empirical methods employed: EAA examines how automatic inferences that are continually made in language comprehension and production shape verbal reasoning, and draws on findings about comprehension biases that affect the contextualisation of such default inferences, in order to explain and expose fallacies. It deploys findings to assess premises and inferences from premises to conclusions, in philosophical arguments. To do so, it adapts methods from psycholinguistics and recruits methods from computational linguistics.

E. Fischer () Norwich, UK e-mail: [email protected] A. Herbelot Rovereto, TN, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Bordonaba-Plou (ed.), Experimental Philosophy of Language: Perspectives, Methods, and Prospects, Logic, Argumentation & Reasoning 33, https://doi.org/10.1007/978-3-031-28908-8_12

241

242

E. Fischer and A. Herbelot

12.1 Experimental Argument Analysis: Motivation and Key Ideas Our chapter will present the emerging research program of experimental argument analysis.1 The present section will introduce the motivation and guiding ideas of EAA by outlining how the research program emerged from empirical engagement with the critical strand of Oxford ordinary language philosophy. Sections 12.2 and 12.3 will present a relevant example: a body of research that (i) documents a previously unrecognised comprehension bias that affects the processing of polysemous words (i.e., words with several distinct but related senses) and (ii) deploys findings to explain and expose a fallacy of equivocation in a key argument from the philosophy of perception (viz., the argument ‘from hallucination’). Sections 12.4 and 12.5 will explain the empirical methods employed. In this way, the chapter will illustrate how experimental philosophers can study language processing to address issues from different areas of philosophy. As practiced in the mid-twentieth century, ordinary language philosophy (OLP) was analytic philosophy’s first attempt to overcome limitations of armchair reflection through the use of (informal) experiments (Hansen & Chemla, 2015), (peerbased) focus groups (Urmson, 1969), and empirical surveys (Murphy, 2014). This makes OLP an important historical precursor of current experimental philosophy.2 OLP’s ‘critical’ strand popularized the idea that many characteristically philosophical problems arise from conceptual confusions or verbal fallacies and can be ‘dissolved’ by exposing these confusions or fallacies in the underlying arguments (e.g., Austin, 1962; Ryle, 1949, 1954; Waismann, 1968; for a review, see Schroeder, 2006). Prima facie, this idea makes most sense where philosophical problems are generally regarded as arising from antinomies, like sceptical problems and the problems of free will, mental causation, and perception. As typically conceived, these problems arise from persuasive arguments that lead to conclusions that appear to rule out familiar facts, as recognised by common sense (cf. Fischer, 2011; Papineau, 2009). The resulting problems are often articulated by questions that ask how familiar facts are as much as possible. They occasion the kind of wonder that Plato notoriously regarded as the starting-point of all philosophizing (Theaetetus 155b-d). Let’s call them ‘Platonic puzzles’. A case in point is the ‘problem of perception’ (Smith, 2002). As typically conceived (for a review, see Crane & French, 2021), it arises from an antinomy

1

The program’s proponents – who include the present authors – have cheekily appropriated a simple but broad label for a quite specific research program that focuses on how automatic default inferences from words influence verbal reasoning. Important work that does not share this specific focus but could well also be described as ‘experimental argument analysis’ includes work on fallacies in reasoning with conditionals (Pfeifer, 2012; Pfeifer & Tulkki, 2017, cf. Skovgaard-Olsen et al., 2019) and with metaphors (e.g., Ervas et al., 2015, 2018). 2 For critical discussion of this claim, see Longworth (2018). For helpful discussion of how corpus methods can be used to empirically implement OLP, see Chap. 7 of this volume.

12 How Understanding Shapes Reasoning: Experimental Argument Analysis. . .

243

developed by arguments ‘from illusion’ and ‘from hallucination’. These arguments proceed from the uncontroversial assumptions that illusions and hallucinations occur (or, in more cautious versions, that these phenomena are at least possible). In a first step, the arguments conclude that in the cases considered – i.e., illusions or hallucinations – viewers are aware, or directly aware, of subjective and immaterial objects (perceptions or sense-data) in their minds, rather than physical objects in their environment (see below, Sect. 12.2). In a second step, the arguments then generalise to all cases of visual perception. These arguments challenge what philosophers regard as the common-sense view of vision, which grants viewers direct access to physical objects, without detour via any immaterial objects of sight. The arguments raise the problem ‘that if illusions and hallucinations are possible, then perception, as we ordinarily understand it, is impossible’ (Crane & French, 2021, §2; cf. Hume, 1777) and motivate the question at the centre of many debates about the nature of perception: How is perception, as we ordinarily understand it, even possible? (Robinson, 1994; Smith, 2002). If conceptual confusions or verbal fallacies prevent the underlying arguments from getting off the ground, this question is ill-motivated and needs to be rejected rather than answered. It is hence plausible to try to ‘dissolve’ problems of this kind by exposing such fallacies in the underlying arguments. J.L. Austin sought to ‘dissolve’ the problem of perception by exposing ‘seductive (mainly verbal) fallacies’ that act as ‘concealed motives’ for formulating the problem (Austin, 1962, p. 5). That the fallacious inferences are ‘concealed’ means, on a charitable interpretation, that thinkers are not conscious of making the inferences and presupposing their conclusions, in the relevant arguments (Fischer, 2014). Those inferences are automatic, i.e., they require no attention, are unconscious, and insensitive to the thinkers’ goals (Bargh et al., 2012; Evans & Stanovich, 2013). Austin sought to clarify ‘the root ideas behind the uses’ of key words that are employed in the targeted arguments (Austin, 1962, p. 37). He went on to discuss inferences which are supported by those ‘root ideas’ but have subtle contextual defeaters (Austin, 1962, pp. 37–43). He seems to suggest that the targeted arguments involve automatic inferences which go through even where they are defeated by context. While multiple further applications offer themselves, efforts to develop EAA are animated by an interest in ‘dissolving’ philosophical problems like the problem of perception and draw on psycholinguistic findings to develop Austin’s suggestions (Fischer & Engelhardt, 2016, 2017a, b, 2020; Fischer et al., 2021a, b). EAA identifies what Austin called ‘root ideas behind the uses of words’ as stereotypes associated with words. As standardly conceived, stereotypes are implicit knowledge structures in semantic memory that encode information about statistical regularities observed in the physical or discourse environment (e.g., tomatoes are typically red and juicy) (McRae & Jones, 2013). They thus capture what psychologists call ‘world knowledge’ and philosophers regard as empirical knowledge. They encode statistical information about typical and diagnostic properties of category members, which may be objects, people, or events (Hampton, 2006). Complex stereotypes (known as ‘situation schemas’) encode information about typical features of events or actions, agents, ‘patients’ acted on, and typical relations

244

E. Fischer and A. Herbelot

between them (Ferretti et al., 2001; Hare et al., 2009; McRae et al., 1997). This knowledge about the world plays a key role in language processing (Elman, 2009): Stereotypes can be associated with individual nouns and verbs. These words (like ‘tomato’) activate stereotypical information rapidly (within 250 ms) (for a review, see Engelhardt & Ferreira, 2016), automatically, and largely irrespective of context, i.e., ‘by default’ (Machery, 2015).3 Activated stereotypes support stereotypical inferences to attributions of stereotypical features (the tomato talked about will be red) (Levinson, 2000). These automatic inferences are unavoidable, get things right more often than not, but are defeasible (‘the tomato was still green’). The crucial Austinian suggestion then becomes that the targeted philosophical arguments involve defeasible stereotypical inferences that are contextually defeated – but whose conclusions are presupposed in further reasoning, anyway. Since verbal stimuli trigger these inferences by default, these inferences provide the first materials from which language users construct the situation model, i.e., the representation of the situation talked about that provides the basis for further reasoning about that situation (Zwaan, 2016). Default inferences are therefore bound to shape verbal reasoning profoundly. Indeed, Levinson (2000, p. 28) suggested these inferences are instrumental in facilitating effective communication in the face of the ‘articulation bottleneck’: Pre-articulation processes in speech production are 3–4 times faster than normal speech (Wheeldon & Levelt, 1995), as are parsing processes and comprehension inferences in speech comprehension (Mehler et al., 1993). Default inferences that deploy our statistical knowledge about the world allow hearers to rapidly fill in detail. Anticipating such inferences allows speakers to skip mention of typical features and use fewer words. Default inferences thus facilitate effective communication. The first key idea EAA derives from Austin is to examine how default inferences shape verbal reasoning – for a start, in philosophical arguments and with a view to resolving Platonic puzzles. Considerable care is required, however, to develop Austin’s more specific suggestion that contextually inappropriate default inferences might be an important source of reasoning fallacies. Psycholinguistic work on sentence comprehension suggests that language users are good at contextualising default information. For a start, nouns and verbs together (‘The mechanic checked . . . ’) can swiftly activate complex stereotypes that encode information about recurrent situations (car inspections) and are not activated by individual words on their own (Bicknell et al., 2010; Matsuki et al., 2011). Activation for the less specific stereotypes initially activated by individual words then decays where they lack contextual support (Oden & Spira, 1983) and the contextually more appropriate schema enters into the situation model.

3

That information is ‘activated’ means that it is made more readily available for use in further cognitive processes. Information activated by a verbal stimulus thereby becomes more readily available for processes ranging from word recognition (e.g., recognising the next word) to sentence parsing (e.g., assigning thematic roles like agent and patient) and verbal reasoning.

12 How Understanding Shapes Reasoning: Experimental Argument Analysis. . .

245

In a neo-Gricean framework, stereotypical inferences have accordingly been conceptualised as governed by a heuristic, viz., Levinson’s (2000) I-heuristic, that tells hearers that, in the absence of explicit indications to the contrary, they should assume that the situation talked about conforms to the relevant stereotypes, and should treat the most specific stereotypes activated (say, about car inspections) as the most relevant. Moreover, stereotypical inferences that clash with contextual information or background knowledge can be suppressed within 1 s (Fischer & Engelhardt, 2017b; cf. Faust & Gernsbacher, 1996). Diagnosing fallacies in philosophical arguments requires caution at the best of times. The interpretation of philosophical texts is governed by widely accepted principles of charity. These principles tell us to credit authors with linguistic competence and rationality. This requirement creates a tension with the attribution of fallacies to authors (Adler, 1994; Lewinski, 2012). Medium-strength principles of charity resolve the tension by allowing interpreters to attribute fallacies to authors only if the attribution is backed up by an empirically supported explanation that explains when and why even competent thinkers commit fallacies of the relevant kind (Thagard & Nisbett, 1983). Given that competent language users are generally good at contextualising default information, the Austinian suggestion that influential philosophical arguments rely on contextually inappropriate stereotypical inferences is in particularly acute need of such an explanation. Philosophical argument analysis is often regarded as the epitome of an armchair activity. However, the Austinian suggestion that fallacious automatic inferences drive philosophical arguments requires empirical support. The need for empirical support arises from the facts that the posited inferences are fallacious and automatic. An a priori reconstruction of fallacious verbal reasoning can only specify inference chains that could have led thinkers from a premise to a conclusion it does not entail. Thinkers have no privileged access to automatic inferences. Their self-reports or acceptance of a proposed reconstruction therefore cannot provide a justified answer to the question of which inference chain – of many potentially relevant chains – actually led them from premise to conclusion. To support the hypothesis that a particular automatic inference drives an argument, we need to document the posited inference experimentally. Moreover, the attribution of fallacious inferences to competent thinkers like philosophers is constrained by principles of charity that ask interpreters to support such attributions with empirical error theories that explain when and why such fallacies occur. Hence, we need experimental evidence not only of the specific inferences posited, but also for accounts that explain them. Inspired by these sources, EAA examines how default inferences that go on continually in language comprehension and production drive verbal reasoning. It focuses on how stereotypical inferences shape philosophical arguments and seeks to expose contextually inappropriate stereotypical inferences in such arguments. This requires developing psycholinguistic explanations of these fallacies and conducting experiments (i) to examine these explanations and (ii) to document the specific fallacies posited in arguments. EAA seeks to explain why inappropriate stereotypical inferences influence further reasoning by reference to comprehension biases. We now present the

246

E. Fischer and A. Herbelot

approach through a case study on the argument from hallucination: We present a reconstruction of the argument that takes it to rely on contextually cancelled stereotypical inferences from polysemous perception verbs (Sect. 12.2). We then outline a psycholinguistic explanation of when and why such inferences are made from polysemous words (Sect. 12.3). We finally explain how hypotheses about inappropriate stereotypical inferences have been examined with methods from psycholinguistics (Sect. 12.4) and computational linguistics (Sect. 12.5).

12.2 Example: A Philosophical Argument Consider a classic statement of the argument from hallucination, by the influential British mid-twentieth century philosopher A.J. Ayer. This statement carefully distinguishes between a perceptual sense of the verb ‘to see’ and a phenomenal sense that serves purely to describe the viewer’s subjective experience and thus lacks all factive, spatial, etc., implications: Let us take as an example Macbeth’s visionary dagger [ . . . ] There is an obvious [perceptual] sense in which Macbeth did not see the dagger; he did not see the dagger for the sufficient reason that there was no dagger there for him to see. There is another [viz., phenomenal] sense, however, in which it may quite properly be said that he did see a dagger; to say that he saw a dagger is quite a natural way of describing his experience. But still not a real dagger; not a physical object . . . If we are to say that he saw anything, it must have been something that was accessible to him alone . . . a sense-datum. (Ayer, 1956, p. 90, emphasis added)

The second half of the argument then generalises from this special case to all cases of visual perception (Macpherson, 2013; Smith, 2002). The argument is commonly intended as a deductive argument. The following reconstruction remains as close to the text as possible and builds a deductive argument from the bits highlighted in italics above (explicit assumptions and conclusions numbered in round brackets, implicit assumptions in square brackets): (1) ‘There was no [real] dagger there.’ (2) ‘Macbeth did see a dagger.’ To deductively infer that Macbeth did not see a real dagger (‘But still not a real dagger’), we need an implicit assumption: [3] If Macbeth saw a real dagger, there was a real dagger there. By (1) & [3] with modus tollens: (4) ‘Macbeth did not see a real dagger.’ [5] Macbeth did not see any other physical object, either. By (4) & [5]: (6) ‘Macbeth did not see a physical object.’ Hence:

12 How Understanding Shapes Reasoning: Experimental Argument Analysis. . .

247

(7) ‘If Macbeth saw any object, he saw a non-physical object, i.e., a “sensedatum”’.4 By (7) & (2): (8) ‘Macbeth saw a sense-datum.’ This reconstruction posits a previously little noted fallacy of equivocation: The implicit assumption [3] uses ‘see’ in the perceptual sense that has factive implications – if S sees an F (say, a dagger), then an F is there. Hence the conclusions derived from it, directly or indirectly, need to use the verb in the same perceptual sense (highlighted in italics). This includes (7). But Ayer then derives the crucial conclusion (8) from (7) and (2) – even though (2) explicitly uses the verb in the phenomenal sense (underlined) that lacks factive implications. Pace (3), that Macbeth ‘saw’ a real dagger in this sense does not imply there was a real dagger. While this criticism applies regardless of the specific explication of the phenomenal sense used, the following illustration may help to bring out the fallacy. On one interpretation of Ayer’s explanation of the phenomenal sense (Fischer & Engelhardt, 2020), ‘S seesPHEN an F’ means ‘S has an experience like that of seeing an F’. Macbeth is meant to have an experience just like that of seeing a physical dagger. In the phenomenal sense, he can therefore be said to ‘see a physical dagger’, because that is exactly what his experience is like. In this phenomenal sense, he cannot be said, e.g., to see a translucent non-physical dagger (his experience is not like that). In Ayer’s text, the move from ‘Macbeth saw a dagger’ (in the phenomenal sense) to ‘but still not a real dagger’ is hence fallacious. This reconstruction faces the challenge from the principle of charity: Ayer explains the two senses of perception verbs, before setting out the argument, and flags their uses, in the argument. Our reconstruction suggests he made an inference from the phenomenal use that is licensed only by the perceptual sense. This violates Ayer’s own explanation of the phenomenal sense, i.e., a self-imposed semantic rule. Analytic philosophers are competent speakers. Our reconstruction thus implies that a competent speaker violated a semantic rule he explained himself a few lines up, in an inference from a premise where the special use of the word was explicitly marked. The principle of charity hence requires us to explain why such a competent thinker would commit the relevant fallacy under the circumstances. We explain the fallacy of equivocation by reference to a comprehension bias that occurs in polysemy processing. This bias asserts itself under conditions that frequently arise in philosophical reflection.

4

Arguably, this step assumes a dichotomous distinction between ‘physical objects’ and ‘sensedata’, whereby any non-physical object of vision is a private sense-datum.

248

E. Fischer and A. Herbelot

12.3 Example: A Comprehension Bias Polysemes activate a unitary representation of semantic information that is deployed to interpret utterances which use the word in different senses (Macgregor et al., 2015; Pylkkänen et al., 2006). The findings we reviewed above (Sect. 12.1) about how words cue world knowledge for rapid deployment in utterance interpretation suggest a unitary representation is typically built around stereotypes associated with the word. Different senses can sometimes be generated by rules (as in metonymy) and sometimes not (as in metaphor). In the latter case, of ‘irregular polysemy’, the unitary representation consists in overlapping clusters of features (Brocher et al., 2016; Klepousniotou et al., 2012), and may include overlapping stereotypes. Different components of these unitary representations get activated in different strength by the verbal stimulus. The stimulus activates the features shared by related senses quickly and strongly, regardless of context. By contrast, the activation of unshared features is a function of their relative exposure frequency (Brocher et al., 2018): The more often the language user encounters the word in one sense, rather than another, the more strongly the (unshared) features associated with (only) that sense are activated, when the user encounters the word. This is consistent with a sensible predictive strategy: The use frequencies observed to date provide the baseline probability that the word is being used in this sense, on this occasion. This baseline activation may be boosted by context (op. cit.). Another factor influencing strength of activation is prototypicality: Features deemed to make for particularly good examples of the relevant category are activated more rapidly and strongly (Hampton, 2006). Strength of activation thus depends on linguistic ‘salience’ (Giora, 2003). Unlike the contextual salience involved in familiar salience biases (see Taylor & Fiske, 1978 for a review), this is not a contextual magnitude, but a function of relative exposure frequency over time modulated by prototypicality. Interpreting any particular use of a polyseme then requires activating all contextually relevant, but unshared features, and suppressing all contextually irrelevant, but activated features. Consider a simple case where the features relevant for interpreting a subordinate use are a subset of the features that make up the stereotype associated with the dominant sense: the verb ‘to see’ is associated with a situation schema (the ‘seeing-schema’) that includes the typical agent features S has eyes, S looks at X, S knows X is there, and S knows what X is; patients typically are mediumsized dry goods; and typical relations between patients and agents include X is in front of S and X is near S. To interpret the purely epistemic use illustrated by ‘Jack saw Jane’s point’, precisely the last two agent features are relevant: Jack knows there is a point of Jane’s and he knows what it is. These need to be retained, while the other features need to be suppressed, applying the ‘Retention/Suppression strategy’ (Giora, 2003). Two circumstances may prevent complete suppression of contextually irrelevant features: First, suppose features irrelevant for the subordinate sense (as, e.g., X is in front of S is irrelevant for the epistemic sense of ‘see’) are associated with a clearly dominant sense (e.g., the visual sense of ‘see’) that is far more frequent than all

12 How Understanding Shapes Reasoning: Experimental Argument Analysis. . .

249

other senses. Then these irrelevant features will receive very strong initial activation (Brocher et al., 2018). Second, frequently co-instantiated component features of a stereotype exchange lateral co-activation (Hare et al., 2009; McRae et al., 2005). Where only some, but not all of the components associated with the dominant stereotype are relevant for interpreting a subordinate use, the contextually relevant features will continue to pass on activation to the contextually irrelevant features. Where these two factors come together, strong initial activation of contextually irrelevant features is followed by their continued cross-activation. This makes complete suppression impossible. When merely partially suppressed, irrelevant features continue to support stereotypical inferences. This creates a linguistic salience bias (Fischer & Engelhardt, 2019, 2020; Fischer & Sytsma, 2021): When (i) one sense of an irregular polyseme is much more salient than all others, (ii) interpretation of utterances with a subordinate sense requires suppression of features associated with that dominant sense, and (iii) some, but not all, of the features strongly associated with the dominant sense are contextually relevant then 1. contextually inappropriate stereotypical inferences supported by the dominant sense will be triggered by the subordinate use as well, and 2. these automatic inferences will influence further judgment and reasoning. I.e.: When an irregular polyseme is seriously unbalanced and the Retention/Suppression strategy is used to interpret subordinate uses, even competent thinkers cannot help being influenced by automatic inferences that are cancelled by contextual information. Thinkers are then swept along by defeasible inferences, even when these are defeated by the context. The relevant conditions are often met in philosophy: Philosophers often give special but related uses to familiar words that have clearly dominant senses from ordinary discourse. Arguably, the use of such polysemes is an important source of fallacies in philosophical reasoning. Studies to date provided evidence that linguistic salience bias affects inferences from subordinate uses of perception verbs (Fischer & Engelhardt, 2017a, b, 2019, 2020; Fischer et al., 2022), from phenomenal uses of appearance verbs that are involved in arguments from illusion (Fischer & Engelhardt, 2016; Fischer et al., 2021a, b), from philosophical uses of ‘zombie’ (Fischer & Sytsma, 2021), and from purely descriptive uses of the verb ‘to cause’ in morally valenced cases (Livengood et al., 2017; Livengood & Sytsma, 2020). Crucially, a study with academic philosophers revealed that they are no less susceptible to the bias than laypeople (psychology undergraduates) (Fischer et al., 2022). This finding allows to invoke linguistic salience bias to explain fallacies of equivocation in philosophical arguments.

250

E. Fischer and A. Herbelot

12.4 Methods from Psycholinguistics Most of these studies have adapted the cancellation paradigm that psycholinguists developed to study automatic comprehension inferences. In this paradigm, participants read or hear sentences where the expression of interest is followed by text that defeats or ‘cancels’ the inference that is by hypothesis triggered by that expression. To examine, for example, whether participants make automatic inferences from ‘S sees X’ to X is in front of S we can ask them to read sentences like: Sheryl sees the picture on the wall behind her.

If the inference is made, the resulting clash of the conclusion with the sequel causes comprehension difficulties which require cognitive effort to overcome. When we expend cognitive effort, our pupils dilate (Kahneman, 1973; Laeng et al., 2012). When we struggle to integrate new information with information inferred from previous text, we need longer to read the cancellation phrase (e.g., ‘behind her’) and make more backwards eye movements from that phrase (Patson & Warren, 2010). Finally, perceived conflicts prompt signature electrophysiological responses (‘N400s’) (Kutas & Federmeier, 2011). These ‘online’ measures (which tap into cognitive processes as they unfold) can be used to examine whether specific automatic inferences are triggered by words, as people read or hear them. As noted above, however, initially activated stereotypical information may simply decay in the absence of contextual support (Oden & Spira, 1983), and stereotypical inferences that clash with contextual information or background knowledge can be suppressed within 1 s (Fischer & Engelhardt, 2017b; cf. Faust & Gernsbacher, 1996). Either way, initially triggered automatic inferences fail to influence further judgment and reasoning. To study whether automatic inferences influence further cognition, we therefore complement online measures with subsequent plausibility ratings: Where inferences are not suppressed, perceived clashes with sequels will persist and lead to lower ratings. This paradigm is illustrated by three studies on spatial inferences from subordinate uses of perception verbs ‘see’ and ‘aware of’ (Fischer & Engelhardt, 2017b, 2019, 2020). Prior corpus analyses revealed that the purely epistemic sense (‘I see your point’) is the most salient of the subordinate senses of ‘see’. In one paradigmatic study, occurrence frequencies in a random 1000-sentence sample from the British National Corpus (BNC) served as proxy measure for exposure frequency, and frequencies from a sentence completion task measured prototypicality (see Table 12.1). A pre-study revealed that members of our participant pool reject spatial inferences from purely epistemic uses yet more strongly than spatial inferences from other subordinate uses (like the phenomenal use) (Fischer & Engelhardt, 2020). Our studies therefore considered spatial inferences from purely epistemic uses of ‘see’. We now consider in some detail the fixation-times study (Fischer & Engelhardt, 2019) that demonstrates the most subtle methodology that allows us to examine both automatic inferences and the mechanism of polysemy processing. In reading, the eye moves in stops and starts. Readers tend to fixate most, but not all words,

12 How Understanding Shapes Reasoning: Experimental Argument Analysis. . .

251

Table 12.1 Occurrence and completion frequencies for ‘see’

Sense Visual Epistemic Doxastic Phenomenal Remainder

Example ‘I saw him daily’ ‘I see your point’ ‘As he saw fit’ ‘Hallucinating, Macbeth saw a dagger’

% of BNC occurrences 68 12.4 9.7 1.1