207 91 668KB
English Pages [154] Year 2017
Notes on Knowledge, Indifference and Redundancy
Notes on Knowledge, Indifference and Redundancy By
Esteban Céspedes
Notes on Knowledge, Indifference and Redundancy By Esteban Céspedes This book first published 2017 Cambridge Scholars Publishing Lady Stephenson Library, Newcastle upon Tyne, NE6 2PA, UK British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Copyright © 2017 by Esteban Céspedes All rights for this book reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner. ISBN (10): 1-4438-5064-0 ISBN (13): 978-1-4438-5064-3
TABLE OF CONTENTS
Introduction ............................................................................................... vii Acknowledgements .................................................................................. xiii Chapter One ............................................................................................... 1 Emergent Properties, Abnormality and Underdescription Levels of Emergence and Overdetermination ....................................... 2 Complexity and Two Features of Emergence ....................................... 6 Chapter Two ............................................................................................. 15 Cognition, Enactivism and Differentiation Cognitivism ......................................................................................... 15 Emergence and Connectionism ........................................................... 17 The Enactive Approach ....................................................................... 18 Radical Enactivism ............................................................................. 20 Knowledge and Differentiation ........................................................... 24 Chapter Three .......................................................................................... 27 Conceptual Spaces and a Note on Disagreement Interpretation, Representation and the Emergence of Concepts .......... 28 Cognitive Semantics ........................................................................... 34 Agreement and Indifference................................................................. 36 Chapter Four ........................................................................................... 41 Signals and Redundant Information Signalling Games and Information ...................................................... 41 Redundant Information ....................................................................... 46 Consciousness, Experience and Signals............................................... 48 Chapter Five............................................................................................. 53 Epistemic Difference-making in Context Chapter Six............................................................................................... 65 Epistemic Contrastivism and Concept Distinction
vi
Table of Contents
Chapter Seven .......................................................................................... 77 The Principle of Indifference Formulation of the Principle and Multiple Partitions .......................... 77 The Principal Principle ....................................................................... 81 Ignorance ............................................................................................ 85 Chapter Eight........................................................................................... 89 Causal Redundancy The Causal Relation ............................................................................. 89 The General Notion of Causal Redundancy......................................... 90 Symmetric Overdetermination ............................................................ 92 Asymmetric Overdetermination .......................................................... 92 Early Preemption ................................................................................ 93 Late Preemption .................................................................................. 94 Chapter Nine ............................................................................................ 97 Further Examples of Overdetermination Multi-level Causation........................................................................... 97 Mathematical Overdetermination ........................................................ 98 Constitutive Overdetermination ......................................................... 100 Overdetermination in Physics and Biology........................................ 104 The Problem of Many Hands ............................................................. 105 Chapter Ten ............................................................................................ 109 Causation, Variable Choice and Proportionality Principles for Variable Choice ........................................................... 109 Proportionality and Context ................................................................116 Chapter Eleven ...................................................................................... 121 Decision Theory and Indifference Evidential Decision Theory ............................................................... 121 Causal Decision Theory ..................................................................... 122 Decisional Overdetermination .......................................................... 125 Newcomb’s Problem and the Prisoners’ Dilemma ............................ 128 Bibliography ............................................................................................ 135 Index ........................................................................................................ 139
INTRODUCTION
Suppose that you are looking for a blue pen inside a messy drawer. You see a couple of pencil erasers, some paper clips and an old calendar, but you immediately dismiss those things and keep looking. They are fully irrelevant for your search, that is, you are indifferent with regard to them. Since you think that the pen might be inside the drawer, objects that are outside the drawer are also irrelevant to you for the moment. Suppose now that someone asks you whether the blue pen is inside the drawer and you answer that you really do not know. If that person told you that there was a red marker inside the drawer, which you did not know, you would have an attitude of indifference regarding the value of that information. This means that if she asked you again whether the blue pen was inside the drawer, you would still answer that you do not know. Thus, the relevance of some piece of information does not simply depend on its novelty, but on your interests. Your indifference might also be related to your actions. Whether you decide to get some water or not will not change the place of the blue pen. You are indifferent with regard to that decision. Thus, you may get some water. That would not affect your search of the blue pen. These ideas show in which sense the notion of indifference is fundamental for our understanding of what is knowledge. They also show the importance of actions with regard to epistemic processes. These concepts constitute the main topic of this work. Knowledge is not simply based on singular cognitive states involving pieces of information. It depends on how agents interact with their environment and reconsider their beliefs in a dynamic way. Basic interactions are involved in perception, particularly when an organism is capable of selecting and distinguishing an object in a background. There is no perception without interaction and differentiation. On this basis, cognitive processes can be considered as emergent processes. The categories according to which we can search objects or attribute different characteristics to them are also emergent phenomena. They emerge from inter-subjective interaction, that is, from communication. Whether a concept is formed and stays as part of a language depends on how expressions involving that concept can make a difference on the basis of the interests of the speakers in particular situations.
viii
Introduction
Regarding propositions or beliefs, difference-making is also crucial. For instance, suppose that you saw the blue pen inside the drawer a couple of days ago and you are sure that no one was near your drawer until now. Those facts make a difference regarding your knowledge about the fact that the blue pen is inside the messy drawer. If you were not aware of them, perhaps you would not be so sure that the pen is inside the drawer. Knowledge also involves contrasts between different alternatives. You believe that the blue pen is inside the drawer rather than in the refrigerator. In order to believe this, you have to be able to distinguish between a drawer and a refrigerator. This is related to the topic of concept formation already mentioned. Suppose now that, reconsidering your reasons, you are not quite sure whether the blue pen is actually inside the messy drawer or somewhere outside the drawer. In that case, according to the principle of indifference, you should consider both alternatives as equiprobable. Regarding causation, indifference is closely related to overdetermination. Suppose that a certain event is overdetermined by two causes. The absence of one of the overdetermining causes would not make any difference with regard to the occurrence of the considered effect. As will be shown, this depends on how the scenario is described and how the variables are chosen. The indifference associated with causal overdetermination can be observed in decision scenarios involving overdetermination. Suppose that you do not need to find the blue pen urgently and that a friend of yours would find it for you tomorrow if you do not find it today. In that case, you might be indifferent with regard to the options of searching for the pen. That is, stopping to search would be as rational as continuing with the search. The first chapter is focused on the notion of an emergent property. Roughly, a property is emergent if it is formed as a result of the increasing complexity of the interactions between the constituents of a system. Irreducibility is also a key feature of emergent properties. In principle, descriptions of emergent phenomena are not easily reduced to descriptions of the constituent elements of the system in which they arise. This may depend on the complexity and stability of the given property, on how the constituents of the system interact, on whether the emergent states are able to affect the constituents of the system causally and on whether the emergent states are cognitive states or not. It is shown that the irreducibility and the complexity are two features that must be present in an appropriate characterisation of the concept of an emergent property. Cognition is the main topic of Chapter 2. As mentioned in the first chapter, cognitive processes can be understood in terms of emergence. The
Notes on Knowledge, Indifference and Redundancy
ix
enactive account of cognition, developed by Francisco Varela, Evan Thompson and Eleanor Rosch, is introduced, showing its differences with cognitivism and connectionism. It is shown how the account characterises distinct aspects of knowledge, such as perception and categorical thinking, in a way that is clearly describable in terms of emergence: Knowledge is more than the result of a correct representation or of a set of neural processes. It is an activity that consists in the constant relationship between an organism and its environment. Now, in order to interact with the environment, an organism must be able to distinguish between the many features that manifest themselves. Considering this, differentiation is just as important in perception as it is regarding categorisation. Now, in order to understand categorisation we have to understand what a concept is. The account on conceptual spaces proposed by Peter Gärdenfors is the main topic of the third chapter. Knowledge can be characterised in terms of geometrical structures formed on the basis of quality dimensions. One can give different interpretations to these structures, depending on the epistemic interests. An interesting application of this account is concerned with cases of miscommunication, that is, cases in which there is a relevant mismatch of the concepts used by two or more speakers referring to the same topic. A conversation may end in disagreement or proceed to meaning negotiation, depending on the importance that each part attributes to the issue. In this sense, the state of indifference of one of the speakers may be an obstacle for the second option. In Chapter Four, the notion of a signal is introduced in order to consider the question of how expressions obtain their meanings. Following David Lewis and Brian Skyrms, it is shown how the meaning of a signal is formed on the basis of the interactions between the players of a signalling game. Whether a signal carries relevant information for an agent depends on how it changes the probabilities that the agent assigns to a state. One of the main goals of this chapter is to show that redundancy may be informationally valuable. The last section of this chapter is focused on the notion of experience, which can be characterised as a feature according to which an organism changes its ways of acting and perceiving its environment as a result of the influences of the environment itself. On this basis, one can think of processes that are indifferent to the experience of a particular system. The topic of the fifth chapter is the thesis of epistemic difference making, which states that an agent knows that P only if there is a set of facts described as F, such that, if F had not obtained, she would not have believed that P. Juan Comesaña and Carolina Sartorio criticise a similar
x
Introduction
condition for knowledge ascription, the sensitivity condition. According to it, an agent S knows that P only if the following holds: If P had not obtained, S would not have believed that P. As they argue, there are cases in which an agent would have still believed that P even if P were not actually true. I argue that one can avoid that a version of the sensitivity condition based on the notion of a concept is not affected by those cases. In Chapter Six, the account of epistemic contrastivism is discussed. According to this account, knowledge is not simply a binary relation between someone who knows something and something that is known, but a ternary relation, such that knowledge attributions have the following form: S knows that P rather than Q. Peter Baumann argues that not all types of knowledge can be described in a contrastivist manner. I consider a way to support epistemic contrastivism against this kind of criticism on the basis of contextualist assumptions. The principle of indifference is the main topic of Chapter Seven. According to its classic formulation, if there is a set of mutually exclusive hypotheses and the available evidence does not provide more reason to believe one of them rather than another, one should assign each hypothesis the same probability. The multiple partitions argument, which seems to show a potential weakness of the principle, is briefly discussed. Chapter Eight focuses on causation and, particularly, on cases of causal redundancy. These are cases in which one single effect is associated with two or more sufficient causes. A simple counterfactual account of causation faces problems describing scenarios that involve causal redundancy. According to such an account, we say that some event A causes another event B just in case B would not have occurred if A had not occurred. Here is the general structure of a case involving causal redundancy. Suppose that two events, A and B, can independently cause a third event, C. According to a simple counterfactual account of causation, C would have still occurred if A had not occurred. This result is uncomfortable enough, considering that it is assumed that A can be considered to be a cause of C. The same problematic result counts for B, of course. Different particular cases involving causal redundancy are revised in this chapter, such as cases of symmetric overdetermination, early preemption and late preemption. The physical possibility of symmetric overdetermination is briefly questioned and the importance of the description of events is also considered. Chapter Nine can be considered as a complementary chapter. It presents different cases of overdetermination, such as cases of multiple level causation, mathematical overdetermination, constitutive overdetermination, as well as overdetermination scenarios in the natural sciences.
Notes on Knowledge, Indifference and Redundancy
xi
In the tenth chapter, I consider the importance of describing the events involved in a causal scenario appropriately. James Woodward proposes and discusses a group of criteria for choosing the variables of a causal model. It is crucial, for instance, that variables should have unambiguous effects and that relations between the variables of a model should be stable. The principle of proportionality for causal claims is then briefly discussed. Roughly put, it states that the description of a cause should not say more than what is necessary to produce its effect. The final chapter is about decision theory. First, causal decision theory and its differences to evidential decision theory are presented. Then, cases of decisional overdetermination are considered. In these cases, an agent is confronted with options that, as assumed, would make no difference regarding some expected effect. That is, the considered outcome will occur no matter what the agent does. In principle, causal decision theory and evidential decision theory make the same recommendation in cases of decisional overdetermination: Indifference seems to be the most rational attitude. However, it is shown that this might not be the intuitive answer when the stakes are considerably high. I address these topics in a very general way, based on notes, reflections, discussions and on how the relevant issues have been tackled by some philosophers and thinkers. I present nothing near to an account of the notion of indifference, but expect to show how it might be considered in order to develop a clear theory about knowledge. Valparaíso, December 2016
ACKNOWLEDGEMENTS
I would like to thank Peter Baumann, José Luis Bermúdez, Juan Comesaña, Miguel Ángel Fuentes, Claire Petitmengin, Fabian Seitz, Lawrence Shapiro, Brian Skyrms, Russell Standish and Ibo van de Poel for valuable clarifications on some of the topics discussed in this work. I also thank Stephanie Brantner for corrections and suggestions during the editing process. Additionally, I am grateful to the Chilean Commission for Scientific and Technological Research for financial support (CONICYT / FONDECYT post-doctoral fellowship / project No: 3160180).
CHAPTER ONE EMERGENT PROPERTIES, ABNORMALITY AND UNDERDESCRIPTION
Emergence is, in a broad sense, a relation between a property of a system and its constituents. We might say that some property emerged from a given system and that descriptions involving such new property cannot, in many cases, be reduced to descriptions involving the constituent parts of the system. Consider Paul Davies’s characterisation of the concept of emergence: Roughly speaking, it recognizes that in physical systems the whole is often more than the sum of its parts. That is to say, at each level of complexity, new and often surprising qualities emerge that cannot, at least in any straightforward manner, be attributed to known properties of the constituents. In some cases, the emergent quality simply makes no sense when applied to the parts. Thus water may be described as wet, but it would be meaningless to ask whether a molecule of H2O is wet. (Davies 2006, x)
Of course, just by assuming that the considered emergent property cannot be attributed to the parts of the system, we are not able to conclude that descriptions involving such a property are not reducible to some description involving certain properties attributed to the parts. For instance, a description of the drawing of a triangle on a piece of paper might involve the property of having a certain area, while, geometrically, none of the lines that constitute the triangle have such a property. Nevertheless, a description of the drawing might be reduced to some set of descriptions involving the lines that constitute it. Anyhow, if a description of a system’s property is irreducible to any set of descriptions involving its constituents, it follows that such a property cannot be attributed to each constituent part. In this chapter, I argue that emergent properties are better described if one focuses on the effects of their instantiation instead of focusing on the constituting parts from which they emerge. According to this idea, whenever there is emergence, one can determine an overdetermination
2
Chapter One
structure within the system in which it takes place. Thus, as should be clear later, the notion of epistemic indifference is associated with emergent properties.
Levels of Emergence and Overdetermination In order to have a clearer idea of emergence and of its relation to reduction, let us consider a characterisation of different levels of emergence proposed by Terrence Deacon (2003) and discussed by George Ellis (2006). Level 1 Emergence: A property of the system emerges which is, in principle, reducible to the constituents of the system. When this type of emergence occurs, the constituent parts of a system form generic properties. An example is the relation between the molecules of a system and the properties of a liquid that those molecules constitute. In principle, it is possible to reduce descriptions of one state of the system to descriptions of the other. In this type of emergence, bottom-up action may lead to higher-level generic properties but not to higher-level complex structures or functions. Suppose that the property of a liquid causally influences a surface. For instance, let us say that the surface of the table got colder because someone dropped water on it. According to the point of view that I favour, the effect, the fact that the temperature of the table’s surface decreased, might help us to understand the relation between a particular emergent property of the liquid and its constitutive elements. A description of the process by which the surface of the table got wet could involve information about its temperature. Under the assumption that the wetting process occurs given a certain property that a liquid instantiates, one can arrive at a description of the relation between that property and the molecular interactions of the liquid. That property might be, for example, the surface tension. Thus, the fact that the liquid has a particular surface tension is considered as an overdetermining cause of the decreasing temperature on the table’s surface. Consider now another kind of emergence: Level 2 Emergence: The emergent property is a result of the activity involved among the constituent parts and a set of boundary conditions.
Emergent Properties, Abnormality and Underdescription
3
Changes in the forms of sand piles are a good example of this type of emergence. The interactions between sand particles, as well as the given conditions in which they interact, produce a level of complexity from which a new property of the system emerges. Given the drastic change in the complexity degree of the system, when this kind of emergence occurs, descriptions of some resulting structures are not reducible to descriptions of the basic constituents. According to the idea I am arguing for, the relation between emergent properties of a system and its constituent parts can be specified by the effects they produce. How can this idea be applied to level 2 emergence? Suppose that after adding a single grain of sand to a sand pile, a sand avalanche occurs. This event can be considered as an emerging property of the system constituted by the whole set of grains and the interactions between them. The effects of the sand avalanche might help us to specify the information about the system’s constituents. For instance, we might be able to obtain information about the number of sand grains or about their form, the ways in which they toppled over each other. Furthermore, the effects might help us to derive information about the conditions in which the sand pile was formed, which is crucial to understand the special character of level 2 emergence. For instance, the intensity of the avalanche might help us to get information about the surface on which the pile was formed. Consider the following distinction characterised by Per Bak: In a noncritical world nothing dramatic ever happens. It is easy to be a weather (sand) forecaster in the flatland of a noncritical system. Not only can he predict what will happen, but he can also understand it, to the limited extent that there is something to understand. The action at some place does not depend on events happening long before at faraway places. Contingency is irrelevant. Once the pile has reached the stationary critical state, though, the situation is entirely different. A single grain of sand might cause an avalanche involving the entire pile. A small change in the configuration might cause what would otherwise be an insignificant event to become a catastrophe. (Bak 1996, 59)
This characterisation shows in what way the conditions under which a system evolves are crucial in order to understand the emerging phenomena produced in it. The mere information about the sand grains and their interactions do not contribute to arrive at relevant information about possible avalanches (Bak 1996, 60). Of course, the single sand grains and their interactions are somehow causally relevant for the effects of a sand avalanche. Such effects might involve, for instance, the destruction of a solid structure. A set of facts involving the sand grains can be considered as an overdetermining cause of the destruction. Thus, we might derive
4
Chapter One
relevant information about the sand grains from a description of the avalanche, the emergent property of the system, and its destructive effects. In this sense, if someone would like to explain the destruction of the solid structure at a certain degree of specificity, she might be indifferent regarding two kinds of explanations, one involving a description of the avalanche and the other involving a description of the interactions between the sand grains. Depending on her epistemic standards, both explanations could be satisfactory to her. We should emphasise here that whenever one considers the overdetermination structure of a system involving emergent processes, one should have in mind the differences between the constituent events and the emergent states. The properties of a sand avalanche are very different from the properties of the sand grains. We may even talk about general (or macroscopic) properties and physical (or microscopic) properties. In this sense, we are not dealing with cases in which overdetermination occurs within the same level of description. In this sense, we confront cases that may be initially seen as overdetermination cases, but then be reduced to sets of physical processes or sets of causal facts described with different levels of specificity. Let us now consider a third type of emergence: Level 3 Emergence: The emergent property and the constituents are connected in such a way, that the system involves coordination and top-down causation. When level 3 emergence occurs, the system can be described in a teleonomic way, that is, in terms of tendencies or goals. Let us take phototropism in order to consider how an overdetermining structure may contribute to specify the relation involved between an emergent property of a system and its constituent parts. The phototropic movement of a plant can be considered as a property emerging from its cellular constitution, from the interaction between cells and, of course, from the interaction between the plant and the environment. The different reactions of the cells depending on the intensity in which light is received might be considered as a coordinated action, also an emergent property of the system. While shadow causes some cellular walls to expand, by the increase of auxin, a plant hormone, light inhibits expansion and, as a result, the plant as a whole exhibits a curved shape towards the light source (Sakai & Haga 2012). We may consider the plant’s shape in order to infer some information about the overdetermining causes. That is, given the phototropic movement of the plant during a given period of time, we
Emergent Properties, Abnormality and Underdescription
5
might obtain some information about the plant’s cellular constitution. Alternatively, we might infer information about the plant’s phototropic movement on the basis of information about the flux of auxin between the cells. An epistemic agent might be indifferent, according to a certain context of explanation, as to whether the plant’s shape is explained in terms of its auxin flux or in terms of its phototropic movement. Both explanations could be equally satisfactory to that agent. Note that the assumption about the effects produced by the overdetermining causes, which are the emergent property and the constitutive, interacting elements of the system, is crucial to infer the information about the relations that occur between them. Without information about the plant’s final shape, acquiring information about the auxin flux between the plant’s cells may not lead to the same description about the plant’s phototropic movement. Consider another kind of emergence: Level 4 Emergence: Emergent properties are involved in coordination and top-down causation that is based on memory. This kind of emergence occurs in animals, for instance when decisions or reactions are supported by information that an agent remembers. Consider a domestic dog’s reaction after seeing and smelling food. The dog’s reaction is not only based on the information he is receiving at that very moment, but also on information that he remembers from other occasions on which he saw and smelled something similar. The dog’s particular behaviour might be considered as an effect of both his mental state and a set of neural processes from which such a mental state emerges. The relation between the dog’s neural and mental processes might be hard to describe. However, considering information about the effects of these processes, such as the dog’s behaviour, the task of arriving at a rich description of how that relation works could be easier. Forms of communication that depend on the interpretation of bodily behaviour are based on this kind of emergence. Let us now focus on a fifth order of emergence: Level 5 Emergence: Emergent properties involve goals that can be expressed in a language system. In this type of emergence, the goals of an agent might be expressed symbolically, which could help to develop models of the environment. A general idea of meaning might be originated through level 5 emergent
6
Chapter One
properties. The development of a symbolic language might be considered as an emerging property originated in a community. It can hardly be reduced to a mere description of the members of that community and the interactions between them. Anyhow, by considering the effects involved in the overdetermination structure, we might arrive at a clearer understanding of the relation instantiated between descriptions about the members of the community and descriptions about the language. A political conflict might be a good example of such an effect. Some political conflicts cannot occur if the language involved is not sufficiently elaborated. Thus, from information about the kind of political conflict and given a set of descriptions about the members of the society involved in the conflict, we might arrive at valuable information about the language that they have been able to develop.
Complexity and Two Features of Emergence As characterised generally, emergent properties are generated according to a certain level of complexity. Thus, understanding the notion of complexity should help us to understand the notion of emergence. A clear notion of complexity based on computability is the notion of Kolmogorov complexity (Kolmogorov 1965, Chaitin 1969), which can be broadly defined as follows: Kolmogorov complexity: The complexity of a description is the length of the shortest possible computer program that can generate that description, given a universal language. In this sense, the Kolmogorov complexity of a description is its algorithmic information content. A description might also be considered as a string of characters, a piece of data or simply as an object. Take, for instance, the string “ABABAB”. The shortest computer program that can generate that string might be based on the following rule: “Print AB three times”. Thus, the algorithmic information content of the string “ABABAB” is quite simple. It has a relatively low Kolmogorov complexity, compared to the randomly generated string of characters “NDYOSNKFK”. Perhaps the shortest program that can generate such a piece of information is the following: “Print NDYOSNKFK”. The string has a high Kolmogorov complexity. Two interesting aspects of the notion of Kolmogorov complexity may be identified (Standish 2001, 1). The first aspect is the context (or
Emergent Properties, Abnormality and Underdescription
7
language) dependence of the determination of complexity: Context dependence: For any two objects, x and y, there are two description languages U and V, such that: a) Either x is more complex than y according to U and y is more complex than x according to V, or b) y is more complex than x according to U and x is more complex than y according to V. Of course, this is only problematic if one supposes that complexity should be described in absolute terms, that is, if one assumes that there is a single language according to which we can really determine the complexity of a given piece of information. The second interesting aspect about Kolmogorov complexity that we might consider is the fact that randomly generated strings, such as “NDYOSNKFK”, have a maximum degree of complexity. This might be problematic if one assumes that random sequences should be considered as containing no information. One might characterise the content of information of a description as the degree at which it helps us to predict other information. For instance, information contained in the weather’s forecast at a given moment for a given region is valuable; it might help me to decide, in case I am in that region, whether I should go out later taking my umbrella with me or not. In this sense, one may suppose that a randomly generated piece of information, such as the one considered above, contains no information, since it would not help to obtain information about something else. That is, from the string “NDYOSNKFK” one cannot state nothing else but “NDYOSNKFK”. A different notion of complexity, effective complexity, is based on the distinction between two parts of a description. Any string of characters has a part that is associated with the regularities that one extracts from it and a part exhibiting randomness. How should we understand regularities involved in a description? Gell-Mann and Lloyd (1996) characterise the regularity of an entity as follows: Recognizing certain regularities of an entity can be regarded as equivalent to embedding it in a set of similar entities and specifying a probability for each member of the set. The entities differ in various respects but share the regularities. (Gell-Mann & Lloyd 1996, 49)
In this sense, one may obtain the regularities from a piece of information comparing it with similar sequences of information. The comparison is carried out by determining a set containing the considered sequence of
8
Chapter One
information and the similar sequences. The regularity is the information shared by the sequences contained in that set. Additionally, a probability is assigned to every member of the set. Consider, for instance, a group of ten books standing on a shelf. Two of them are about physics and eight of them are about art. We can form a set with all the possible titles that the group of ten books could have included, under the assumption that two of them should be about physics and the rest about art. This assumption represents the regularity according to which we form the similarity set, also called ensemble by Gell-Mann and Lloyd. Let us now focus again on the strings considered above. The string of characters “ABABAB” exhibits a clear regularity, namely, the repetition of “AB”. Actually, the shortest program that may produce that string is based on that regularity. By contrast, the randomly generated string of characters “NDYOSNKFK” exhibits no regularity. The Kolmogorov complexity of such a string is equal to the string itself. While Kolmogorov complexity is a measure assigned to a single sequence of information, the effective complexity of a sequence is assigned to the ensemble containing similar sequences. In other words, effective complexity is a measure assigned to regularities. It should be noted that the original sequence must be a typical member of the ensemble, that is, it should not be comparatively improbable. Roughly, we might define effective complexity as follows (Gell-Mann & Lloyd 1996, 49): Effective complexity: The complexity of an entity is the length of the shortest possible computer program that can generate a description of its regularities, given a universal language. This notion of complexity differs crucially from the notion of Kolmogorov complexity regarding randomly generated strings of characters. For instance, while we would assign to the string “NDYOSNKFK” a very high degree of Kolmogorov complexity, its degree of effective complexity is zero, since it contains no regularities. As should be clear, with the notion of effective complexity we may clarify the second interesting aspect associated with Kolmogorov complexity. According to Kolmogorov complexity, the information content of a randomly generated sequence is high, although we may like to say that it contains no information. On the basis of effective complexity, we can satisfy that intuition. Randomly generated sequences contain no information, their effective complexity is zero.
Emergent Properties, Abnormality and Underdescription
9
The other interesting aspect associated with the notion of Kolmogorov complexity, namely context dependence, is also involved in the account of effective complexity. In the definition of the notion of effective complexity, the restrictions of a language are also crucial. Thus, for any two objects, x and y, there might be two description languages U and V, such that, either x is more complex than y according to U and y is more complex than x according to V, or y is more complex than x according to U and x is more complex than y according to V. Let us now turn to emergence. On the basis of effective complexity, Miguel Fuentes (2014, 6) defines the concept of an entity’s emergent property as follows: Emergence as abnormality: The property of an entity, represented by a string of characters, is an emergent property if its effective complexity increases abnormally, given a set of control parameters. The set of control parameters may describe some relevant features of the system, as well as the ways in which a system interacts with the environment. The notion of emergence based on effective complexity is also context dependent. On the one hand, the set of control parameters depends on the language in which the entity is described, which is of course, the language in which the complexity of a given piece of information is determined. On the other hand, since effective complexity is context dependent, whether the complexity of a description increases abnormally or not is also context dependent. Thus, it might be that, according to a language U, a property is an emergent property, while, according to a different language V, it is not. Russell Standish (2001) also proposes to accept the context dependence of complexity and to consider it as a fundamental feature of an account of emergent properties. In his account, emergence is not characterised merely as the property of a system, but as a binary relation between languages that describe that system. Let the microdescription of a certain system describe the system in a fundamental way and let its macrodescription describe the system in more general terms. The microdescription is based on a microlanguage and the macrodescription on a macrolanguage. For example, we might describe a certain liquid substance in terms of its constituting molecules, according to a microdescription, and also in terms of transparency, according to a macrodescription. Following Standish (2001, 4), we might define an emergent property as follows:
10
Chapter One
Emergent property as underdescribed property: A phenomenon is an emergent property of a system if it can be described by atomic concepts of the macrolanguage associated with the system but cannot be described in the microlanguage associated with it. Consider again the example of the liquid substance. Its transparency is regarded as an emergent property because it is a feature that cannot be described in terms of the molecules that constitute the substance. Comparing the two notions of emergence just considered, it is interesting to note that when the effective complexity of a system increases abnormally, there is an underdescribed phenomenon. Also, since in order to determine a system’s effective complexity one has to embed its description in a set containing similar possible systems, one has already the elements that may constitute the macrolanguage’s concepts. These concepts may be originated from the regularities extracted from the original description of the system. The interesting link here is associated with the idea that emergence characterises a manner in which a theory fails to account appropriately for the facts that are available. We may have a case of underdescription when the theory’s microlanguage is not expressive enough. We may also have abnormal descriptions of the facts that should be described, involving abrupt changes in informational content. We may focus on a state of underdescription of the system, with a macrolanguage describing a phenomenon that cannot be described in the microlanguage. This might occur because some concepts of the macrolanguage are not translatable in any form to the microlanguage. If this underdescription occurs due to an abnormal increase of effective complexity, we can change the microlanguage constructing a new ensemble based on one of the concepts of the macrolanguage that are untranslatable. Of course, the new ensemble is conceptually and theoretically constrained. Although considering the relations between the notion of emergence as abnormality and the notion of emergent property as underdescribed property might be very helpful in order to understand how we discover and explain emergent phenomena, I think that it is appropriate to consider emergent properties as the features of a system that instantiate both notions. Consider the following characterisation of the notion of an emergent property: Emergent property: Given a microlanguage U and a macrolanguage V, if a phenomenon x is an emergent property of a system, then:
Emergent Properties, Abnormality and Underdescription
11
a) the effective complexity of x increases abnormally, according to U, and b) x is underdescribed, according to U and V. The two conditions required in this definition of the notion of an emergent property are usually met when we talk about a property that has emerged in a system and is already an object of inquiry. For, only requiring the condition of underdescription, we might not grasp how a given property emerged from the development of the system. This may occur if the microlanguage is not expressive enough to describe the property that is present in the macrolanguage. Suppose, for instance, that the mass of an object involved in a system is a non-emergent property. Thus, we can assign a value to it within a macrolanguage and within a microlanguage. Assume also that, on the basis of the microlanguage that we use to describe the system, we cannot explain how that object obtained its mass. Now, according to emergence as underdescription, mass would be identified as an emergent property, although there might be a more expressive microlanguage in relation to which it is not. The condition of emergence as abnormality helps us to find more expressive microlanguages. If we have, for example, a case in which the change of an object’s mass is underdescribed, we can still check if, according to our microlanguage, there is an abnormal increase of complexity in the description of the system involving that object. If there is no abnormality, the considered change is not an emergent process. Again, this depends on the theoretical framework on the basis of which this is considered. It may also be that, only on the grounds of the condition of emergence as abnormality, one does not have the appropriate language to describe an anomaly. Suppose that a pacific community exhibits abrupt changes regarding the behaviour of its members that could be expressed in terms of an abnormal increase in effective complexity. Suppose further that the changes in the community are not due to an internal organisation of the system, but to its coupling to another, invader community. The invader community violently imposes actions on the individual members of the pacific community. In such a case, it would be hard to identify the abrupt change of the pacific community as an emergent property. This shows the sense in which emergence as abnormality can be considered as a sign that the theory on the basis of which one is describing the data is not appropriate. A better language is called for. If we decided to include the invader community into the description, the behaviour of the members of the pacific community would not appear as abnormal.
12
Chapter One
Of course, the criterion of abnormality is a sufficient condition to identify that there is a property originating. However, within a given context, it is not enough to distinguish that property from others. Furthermore, it is not enough to determine whether one should attribute the property to the system originally described or whether it would be better to attribute the property to a coupled system involving the originally described system as a part. It may be that, according to a macrodescription involving both communities as interacting systems, the property is not underdescribed and that there is a clear explanation of the community’s abrupt change in terms of the interactions between the members of both communities. As already mentioned, it is also possible that, within the same time span considered initially, the effective complexity never increases abnormally, regarding a description of the system that includes both communities. Of course, this depends on whether there is a possible microlanguage according to which we can explain such an abnormality. In other words, when an incomplete theory is assumed, abnormality and underdescription are present. With the examples just mentioned, the characterisation of the notion of an emergent property involving both conditions, abnormality and underdescription, should be justified. The main point is the following: Emergence can be understood both as a relation between sets of states corresponding to different levels of description and as a resulting state of a system described in terms of its constituents. According to the first interpretation, one may claim, for instance, that a given state E emerged from (and is not reducible to) a set B of basic processes. According to the second interpretation, one may claim that a particular state of the system being studied does not involve enough regularities to provide a simple description of it. So emergence cannot be fully understood without recognising that we can describe a single system with different degrees of specificity and that some of its descriptions are more complex than others. According to this way of understanding emergence, we might distinguish the different steps that are required to arrive at a satisfactory account of level 5 emergent phenomena. Consider, for instance, the cultural state of a community within a given historical period. On the one hand, it may be described, according to a macrolanguage, in statistical and sociological terms. On the other hand, it might be described, according to a microlanguage, in terms of linguistic heritage between groups of people. We could then confirm whether abnormal changes occur in the effective complexity involved in the microdescription. Also, among some descriptions of the macrolanguage, there might be some regularities that are not related to any linguistic phenomena described in the
Emergent Properties, Abnormality and Underdescription
13
microlanguage. For instance, there might be certain regular, historical phenomena that are better associated with psychological descriptions. This might force the inquiry towards another theoretical direction, in which further languages are required. Such an extension might finally help us to develop a better description of the cultural state considered originally. It should be clear that whether a property is identified as an emergent property depends on the features of the epistemic frameworks according to which the property is being studied. Epistemic features, and cognition in general as well, may also be considered as emergent phenomena.
CHAPTER TWO COGNITION, ENACTIVISM AND DIFFERENTIATION
We can understand cognition as the group of processes occurring in the life of an individual organism or system that leads to knowledge processing of some kind. In this sense, I do not take cognition and knowledge to be equivalent characteristics of a system, but to be related in such a way that explains how some knowledge states of a system are produced by other states or processes. We might think of knowledge states as emergent properties of a system that are produced according to the several ways in which some parts of the system interact with each other and with the environment. A state of symbolic knowledge constitutes one kind of state that might emerge in that form, but it is not the only kind. Habits and skills can also be considered as emergent states of knowledge. This chapter is based on and motivated by an account of cognition developed by Francisco Varela, Evan Thompson and Eleanor Rosch (1991) and on one of its more relevant concepts, the concept of enaction.
Cognitivism When Varela, Thompson and Rosch characterise the cognitivist perspective of knowledge, they note that, according to such an account, representation is crucial: The cognitivist argument is that intelligent behavior presupposes the ability to represent the world as being certain ways. We therefore cannot explain cognitive behavior unless we assume that an agent acts by representing relevant features of her situations. To the extent that her representation of a situation is accurate, the agent’s behavior will be successful (all other things being equal). (Varela, Thompson & Rosch 1991, 40)
In this sense, the basic idea of cognitivism is that we cannot explain cognitive processes without using the notion of representation. As Varela,
16
Chapter Two
Thompson and Rosch recognise, the problem of this account is not related to the fact that the notion of representation might be very helpful if we want to explain certain types of cognitive processes, but to the assumption that cognition should be defined in terms of symbolic representations. For instance, consider the process in which a person draws a house on a piece of paper. According to this way of understanding cognitivism, one can only determine whether this process is a cognitive process or not by describing the person’s representations. Also, if we think that her drawing behaviour was in a sense successful, one may assume that she had the appropriate concepts of a house, of a pencil and of how a pencil might produce lines on the paper. On these grounds, the problem that cognitivism tries to solve is how the occurrence of a representational state can be explained in terms of physical states. According to Varela, Thompson and Rosch, the main hypothesis that this account establishes in order to solve the mentioned problem is based on the notion of symbolic computation: Here is where the notion of symbolic computation comes in. Symbols are both physical and have semantic values. Computations are operations on symbols that respect or are constrained by those semantic values. In other words, a computation is fundamentally semantic or representational. (Varela, Thompson & Rosch 1991, 41)
The main idea is that semantics is a necessary element for computation. For instance, a logically programmed computer can compute the value of some variable A, depending on the values of other variables B and C. This will suffice to consider that such a computer works on the basis of a formal semantics. If the computer is programmed to detect objects in its environment that determine, say, the values of variable B, then we might consider that its behaviour is based on representations. This consideration may also be appropriate if, for instance, the values of B are determined by a certain statistical analysis of a particular piece of data. The other crucial point in the cognitivist hypothesis is the assumption that symbolic computational processes are constituted by physical processes. This relation occurs in the programming process itself: A digital computer, however, operates only on the physical form of the symbols it computes; it has no access to their semantic value. Its operations are nonetheless semantically constrained because every semantic distinction relevant to its program has been encoded in the syntax of its symbolic language by the programmers. In a computer, that is, syntax mirrors or is parallel to the (ascribed) semantics. The cognitivist claim, then, is that this parallelism shows us how intelligence and intentionality
Cognition, Enactivism and Differentiation
17
(semantics) are physically and mechanically possible. (Varela, Thompson & Rosch 1991, 41)
Thus, the syntax, including the rules according to which symbols can be related with each other by a computer depends on the programmers’ goals. Of course, these goals are always physically constrained by the computer’s architecture. Symbols mirror representations. The set of values that a given variable can take can be considered as information about what the computer is able to represent. Following this way of considering cognition, one should focus on the physical constituents of symbols in order to arrive at an appropriate account of how the occurrence of representational states can be explained in terms of physical states.
Emergence and Connectionism As Varela, Thompson and Rosch point out (1991, 85), one of the problems with the cognitivist perspective is the fact that brains do not seem to process and store information on the grounds of explicit symbolic rules as computers do. According to an alternative point of view, called connectionism, one can answer this issue by describing cognitive processes in terms of neural connections. The connectionist account is characterised by Varela, Thompson and Rosch in the following way: The strategy, as we said, is to build a cognitive system not by starting with symbols and rules but by starting with simple components that would dynamically connect to each other in dense ways. In this approach, each component operates only in its local environment, so that there is no external agent that, as it were, turns the system’s axle. But because of the system’s network constitution, there is a global cooperation that spontaneously emerges when the states of all participating “neurons” reach a mutually satisfactory state. In such a system, then, there is no need for a central processing unit to guide the entire operation. (Varela, Thompson & Rosch 1991, 99)
Thus, one of the main differences between cognitivism and connectionism is the fact that while the former proposes to explain cognition in terms of symbolic computations, the latter proposes to explain it in terms of the interactions between their constituent biological parts. A further difference is the fact that, according to connectionism, cognitive processes are understood on the grounds of properties that emerge from the interactions between neurons and not on the grounds of symbols and representations. If, according to connectionism, cognitive processes are not understood as operations involving symbols, which is then the importance of symbols?
18
Chapter Two
According to Varela, Thompson and Rosch, they are not fundamental: One of the most interesting aspects of this alternative approach in cognitive science is that symbols, in their conventional sense, play no role. In the connectionist approach, symbolic computations are replaced by numerical operations—for example, the differential equations that govern a dynamical system. These operations are more fine-grained than those performed using symbols; in other words, a single, discrete symbolic computation would, in a connectionist model, be performed as a result of a large number of numerical operations that govern a network of simple units. In such a system, the meaningful items are not symbols; they are complex patterns of activity among the numerous units that make up the network. (Varela, Thompson & Rosch 1991, 99)
According to Varela, Thompson and Rosch, one could explain the occurrence of a given cognitive process within the connectionist framework by describing the interactions of singular constituents of the cognitive system and the new properties that emerge from those interactions. Although symbols do not play the same fundamental role that they play in the cognitivist account, a symbolic operation could be described numerically in terms of the network’s operations. The mathematical descriptions of the system involve more specificity than the descriptions of symbolic operations. Furthermore, a cognitive state described as the result of symbolic operations could be considered as an emergent property grounded on the development of the neural network.
The Enactive Approach One could try to explain how mental, symbolic representations arise from neural processes and show that, after all, connectionism and cognitivism can be part of a single theory of cognition. The problem of doing this can be clearly understood in the light of the distinction between two notions of representation, considered, by Varela, Thompson and Rosch (1991, 135). On the one hand, according to a weak notion of representation, a cognitive system may construct images or experiences in a simple way. There are neither strong epistemological nor ontological commitments associated with this idea in the sense that one does not need to specify the conditions under which a representation is correct. On the other hand, according to a strong notion of representation, there is a pre-given world and cognitive processes are part of it. Cognition consists in representing features of the world correctly and acting appropriately on the basis of correct representations. While the idea of weak representation is, according to Varela, Thompson and Rosch, uncontroversial, the notion of strong
Cognition, Enactivism and Differentiation
19
representation turns out to be problematic and too simple if one wants to explain cognitive processes on the basis of representational operations in this sense. They propose an approach on cognition that is more elaborated than cognitivism and connectionism or that at least does not present cognitive processes as processes that one can simply reduce or model as a single kind of entity. Their point of view, the enactive approach, which we can also call enactivism, establishes the following about cognitive structures and their basis, perception (Varela, Thompson & Rosch 1991, 173): Enactive perception: Perception consists in perceptually guided action. As an example of how perception can be understood as guided action, consider the following description of an experiment designed by Bach-yRita (1972): Bach-y-Rita has designed a video camera for blind persons that can stimulate multiple points in the skin by electrically activated vibration. Using this technique, images formed with the camera were made to correspond to patterns of skin stimulation, thereby substituting for the visual loss. Patterns projected on to the skin have no “visual” content unless the individual is behaviorally active by directing the video camera using head, hand, or body movements. When the blind person does actively behave in this way, after a few hours of experience a remarkable emergence takes place: the person no longer interprets the skin sensations as body related but as images projected into the space being explored by the bodily directed “gaze” of the video camera. Thus to experience “real objects out there,” the person must actively direct the camera (by head or hand). (Varela, Thompson & Rosch 1991, 175)
There are two main ideas involved in Bach-y-Rita’s experiment. A first key idea is the fact that the perceptual process depends on the person’s body movements with which the patterns can be explored and traced. A second key idea is the fact that the representation of a spatial, threedimensional object in front of the person that uses the camera emerges from several interactions and perceptual instances. This notion of perception, which is understood as perceptually guided action, is the basis of a more general level of cognition. The notion of enactive cognition can be characterised as follows, based on how Varela, Thompson & Rosch understand the enactive approach (1991, 173):
20
Chapter Two
Enactive cognitive structure: Cognitive structures emerge from the recurrent sensorimotor patterns that enable action to be perceptually guided. As an example of this idea, Varela, Thompson and Rosch (1991, 177) consider categorisation, one of the central processes in which cognitive structures are formed. As they argue, there are several levels of categorisation. The basic level of categorisation arises from direct interactions between the cognitive system and the category members, as well as from social convention: The basic level of categorization, thus, appears to be the point at which cognition and environment become simultaneously enacted. The object appears to the perceiver as affording certain kinds of interactions, and the perceiver uses the objects with his body and mind in the afforded manner. Form and function, normally investigated as opposing properties, are aspects of the same process, and organisms are highly sensitive to their coordination. And the activities performed by the perceiver/actor with basic-level objects are part of the cultural, consensually validated forms of the life of the community in which the human and the object are situated— they are basic-level activities. (Varela, Thompson & Rosch 1991, 177)
One should emphasise that this notion of cognitive structure depends partly on the notion of enactive perception described above. At the same time, the characterisation of both notions, perception and cognitive structure, is rooted in the idea that knowledge implies action.
Radical Enactivism Daniel Hutto and Erik Myin (2012) study the enactive approach presented above focusing on the idea that cognition is not merely based on processes involving representations and conditions of correspondence with reality. On the grounds of this emphasis, they develop the account of radical enactivism. In order to understand the basic ideas of radical enactivism, I would like to consider a distinction made by Hutto and Myin that contrasts their account with classical enactivism and cognitivism. They distinguish three kinds of theories about cognition: content involving cognition (CIC), conservative enactive cognition (CEC) and radically enactive cognition (REC). Consider the characterisation of the viewpoint based on content involving cognition: CIC assumes that cognition requires the existence of contents of some kind or other. Unrestricted CIC takes this to be true of all mentality, always and
Cognition, Enactivism and Differentiation
21
everywhere. Its intellectualist credo is “no mentality without content.” As we have already noted, for the staunchest backers of CIC the widespread influence of the enactive turn and of the associated ideas that minds are Embodied, Embedded, and Extensive is perceived as unwelcome, faddish, unfortunate, and retrograde. (Hutto & Myin 2012, 9).
Hutto and Myin present this approach in such a way, that we could easily compare it to the characterisation that Varela, Thompson and Rosch make of cognitivism, especially if we consider the distinction between weak and strong representation mentioned above. As discussed earlier, cognitivism establishes that every cognitive process involves some sort of symbolic representation. This thesis is more general than the thesis of content involving cognition, which can be stated as follows: Content involving cognition: Representational content is a necessary element of any cognitive process. This account might be based on the strong notion of representation, that is, on a definition of the notion of representation according to which representation necessarily involves content. The content of a cognitive process might be characterised in the following way (cf. Hutto & Myin 2012, x): Cognitive content: The content of a cognitive process is its referential meaning and is associated to a particular set of conditions of satisfaction. Accuracy of content: The content of a cognitive process is accurate if and only if the conditions of satisfaction are met. For instance, suppose that Sarah has the experience of seeing a black dot on the wall. The content of her experience is the object of her representation, the black dot on the wall, considered as something given outside her cognitive state. According to the account based on content involving cognition, Sarah’s cognitive process of seeing a black dot on the wall is taken to be accurate just in case a particular set of conditions are instantiated. For example, such a set of conditions might involve the fact that there is a black dot on the wall in front of her or that she is not hallucinating. The point of view developed by Varela, Thompson and Rosch can be considered as a case of conservative enactive cognition. Although their approach is presented as an alternative to cognitivism, it is not
22
Chapter Two
incompatible with the perspective about content involving cognition. This account is based on the Embodiment Thesis, which might be characterised in the following way (Hutto & Myin 2012, 11): Embodiment Thesis: Some sort of representational content is essentially embodied, that is, it depends on bodily actions. This thesis is one of the bases of the enactive approach developed by Varela, Thompson and Rosch, according to which cognitive structures emerge from sensorimotor patterns. Clearly, this account is also compatible with the cognitive thesis that content is a necessary element of cognition, if one understands cognition as the group of symbolic processes that emerge from bodily action. According to Varela, Thompson and Rosch, perception is a fundamental aspect of cognition and involves processes that might not have cognitive content. However, they do not completely reject the idea that some cognitive processes can involve representational content. According to their approach, cognitive structures based on content emerge from enactive processes. Nevertheless, as Hutto and Myin argue, this does not support the fundamental notion of enactive cognition that is opposed to the thesis that cognition involves content necessarily: CEC does not break faith with unrestricted CIC. Though such conservative renderings are possible, they obviously go against the spirit of an enactivism that is serious about its rejection of content and representation. REC presses for the strongest reading of the Embodiment Thesis—one that uncompromisingly maintains that basic cognition is literally constituted by, and to be understood in terms of, concrete patterns of environmental situated organismic activity, nothing more or less. (Hutto and Myin 2012, 11)
It is important to notice that Hutto and Myin focus here on a notion of basic cognition and not of cognition in general, as Varela, Thompson and Rosch do. We can understand basic cognition as follows (Hutto & Myin 2012, x): Basic cognition: A basic cognitive process is a mental process that involves intentional directedness but does not involve content. According to cognitivism, all cognitive processes involving intentionality should be considered to be representational and be associated with a particular set of accuracy conditions. Hutto and Myin challenge this idea.
Cognition, Enactivism and Differentiation
23
From the perspective of radical enactivism, intentionality does not necessarily imply content. Some cognitive processes are intentional and do not involve representational content. These are the basic cognitive processes. Thus, we might characterise the notion of cognition on which radical enactivism is based as follows: Radically enactive cognition: Every basic cognitive process is constituted only by concrete patterns of environmentally situated organismic activity. Clearly, the approach of radical enactivism does not establish that every cognitive process is constituted only by enactive processes. Only basic cognition is. Perceptual processes are good examples of basic cognition. In this sense, the notion of perception considered by Varela, Thompson and Rosch (1991) is an instance of the notion of radically enactive cognition. A relevant consequence is, then, that perception does not involve representational content necessarily: A truly radical enactivism—REC—holds that it is possible to explain a creature’s capacity to perceive, keep track of, and act appropriately with respect to some object or property without positing internal structures that function to represent, refer to, or stand for the object or property in question. Our basic ways of responding to worldly offerings are not semantically contentful. (Hutto & Myin 2012, 82)
As Hutto and Myin argue, in order to explain a perceptual process (and action in general) as an intentional activity, one does not have to consider descriptions involving representational content. Basic cognition is intentional in the sense that it involves interactions between an agent and its environment, focusing on salient objects of perception, for instance. This does not mean that the agent has a symbolic representation of that object. Although radical enactivism does not establish that cognition never involves representational content, it supports very clearly the idea that cognition does not need to be representational. Thus, one may think that at least concept formation and logical reasoning can be considered as cognitive processes that are based on representational content. Anyhow, content involving cognition does not only depend on a cognitive structure that is isolated from the environment: REC assumes that acquiring capacities for engaging in such sophisticated CIC is possible only with the help of environmental scaffolding. Hence, it assumes that such capacities neither stem from nor have a wholly internal,
24
Chapter Two neural basis. (Hutto & Myin 2012, 138)
Cognitive processes that involve content are not simply symbolic computations as classical cognitivism assumes. I think that this is compatible with the second aspect of cognition considered by Varela, Thompson and Rosch, namely, the fact that cognitive structures emerge from sensorimotor patterns related to the ability of perceptual action. According to Hutto and Myin, we have to understand mental processes as capacities instead of considering them as series of steps involving relations and mediations between symbols. Their account of the emergence of cognitive structures proposes a focus on abilities that depend on communication and interactions with the environment: Within a capacity-oriented framework it is possible to understand how basic minds are augmented through scaffolding in a different light. The capacity to think using contentful representations is an example of a latedeveloping, scaffolded, and socially supported achievement. It originates from and exists, in part, in virtue of social practices that make use of external public resources, such as pen, paper, signs, and symbols. (Hutto & Myin 2012, 152)
Thus, although representational content is relevant for many kinds of cognitive processes, one cannot explain its role within cognition without considering that it is a property acquired as a capacity thanks to social interaction. Not only direct communication with other individuals turns out to be relevant for the formation of cognitive and representational structures, but also the artificial effects of social interaction. Of course, these material resources modify the environment physically, expanding the possible ways in which cognitive abilities can be applied and developed.
Knowledge and Differentiation I would like to consider a fundamental aspect of cognition in connection with the ideas of enactivism that I have presented so far. This fundamental aspect is differentiation. Differentiation is involved in perception, considered as perceptually guided action, as well as in cognitive processes involving contentful representations. Consider the following distinction, explained by Adler and Orprecio, about how visual attention depends on whether there are differences between several stimuli perceived: [A]s a consequence of the initial decomposition of stimuli into their basic features, a stimulus unique for a particular feature is indicated at a particular location on that feature map and attentive processes are then selectively allocated to that stimulus. As a result, regardless of the number
Cognition, Enactivism and Differentiation
25
of stimuli in the array, the amount of time it takes for an individual to detect the stimulus with the unique feature remains relatively stable. In contrast, when there is no stimulus that consists of a single unique identifying feature or the stimulus is defined by a unique combination of features, it does not pop out. (Adler & Orprecio 2006, 190)
At an initial stage of the perceptual process, stimuli are decomposed according to basic features, such as objects, colour, orientation, width, length or motion. If a stimulus can be distinguished from others according to some of its basic features, the agent may probably perceive the stimulus with attention. By contrast, if no stimulus can be distinguished from the others in this way, attention may occur through the selection of a conjunction of features. In other words, there is no attention without differentiation. Of course, the kind of mechanism just described could be considered from the perspective of cognitivism. One may assume that the cognitive system is in front of a world involving fixed, independent features. Perception and attention can be considered as a result of symbolic processes that depend on the reception and symbolic organisation of information according to a set of rules that is established from the beginning. Nevertheless, this would not be incompatible with the enactive approach of perception. For, in this kind of mechanism, the contents of at least two relevant parameters emerge from enactive perception. The first of these two parameters is the set of basic features. What determines which of the basic features is relevant? And more importantly, how does the agent distinguish between two or more different basic features? The distinction of basic features depends on how an agent interacts with the environment and how the objects are useful to the perceiver. Thus, it depends on the way in which the agent enacts within the environment: The basic level of categorization, thus, appears to be the point at which cognition and environment become simultaneously enacted. The object appears to the perceiver as affording certain kinds of interactions, and the perceiver uses the objects with his body and mind in the afforded manner. (Varela, Thompson and Rosch 1991, 177)
In this sense, the initial perception of features like width and motion are determined by the ways in which the agent has interacted with objects in the past and in which interaction seems possible. Considering this, the initial decomposition of stimuli into their basic features does not depend on a fixed list of features. Such a set, as well as the ability to focus on each feature, is something that emerges from the enactive interaction between perceiver and environment.
26
Chapter Two
The second parameter involved in attention that also depends on enaction is the uniqueness of the feature selected from the initial feature map. The way in which a feature is selected from the others might depend on the agent’s particular interests, intentions and present, occurring interactions with the environment. The selection time of information from stimuli depends on whether a relevant feature is highlighted within the initial set of features. Serences and Kastner explain this as follows: [S]election is neither early nor late. Instead, the locus of selection, both in terms of anatomy and time, flexibly depends on the demands placed on sensory processing machinery by the behavioural goals of an observer. Tasks that require highly focused attention on a specific location or feature will encourage early selection, whereas less demanding tasks that can be performed with a more diffuse attentional focus will accommodate late selection. (Serences & Kastner 2014, 97)
This characterisation of the dependence between feature selection and the tasks that the perceiver is performing clearly supports the enactive account of perception. Visual attention, which is a fundamental differentiation action, does not only depend on the stimulus and on how it is processed, but also on the agent’s goals and on her interactions with the environment. As pointed out above, perception is not the only part of cognition in which differentiation plays a fundamental role. Differentiation is also fundamental for cognitive processes that involve representational content. This issue will guide different stages of this book. In order to achieve a better understanding of how differentiation is present in content involving cognition, an inquiry on the notion of a concept will be of great help. This is the topic of the following chapter.
CHAPTER THREE CONCEPTUAL SPACES AND A NOTE ON DISAGREEMENT
Concepts are fundamental to cognition in general, especially regarding symbolical cognitive processes, but also regarding fundamental perceptual activity. As well as the proponents of enactivism, Peter Gärdenfors develops an alternative account of cognition, different from a simple cognitivist or a connectionist point of view: Here, I advocate a third form of representing information that is based on using geometrical structures rather than symbols or connections among neurons. On the basis of these structures, similarity relations can be modeled in a natural way. I call my way of representing information the conceptual form because I believe that the essential aspects of concept formation are best described using this kind of representation. (Gärdenfors 2000, 2)
The basic features of Gärdenfors’s account on cognition are clear. It is an account that is based on the notion of concept, understood in a geometrical sense. Since concepts build geometrical structures, the notion of knowledge is understood in his theory with regard to spatial representations. The conceptual space is structured, fundamentally, by quality dimensions, with which the space is divided in distinct domains. A dimension is simply the geometrical representation of a quality, which may be directly connected to perception or may be formed as a theoretical abstraction. A domain is a general category to which a concept may belong. For instance, spatial concepts belong to a spatial domain, while sound concepts belong to another domain. Similarity between concepts is understood as the distance between the spaces that represent them. A conceptual space can also be understood as a collection of domains (Gärdenfors 2000, 26).
28
Chapter Three
Interpretation, Representation and the Emergence of Concepts In order to apply the basic ideas of the conceptual spaces account, Gärdenfors proposes to distinguish the aim of the inquiry first: Depending on whether the explanatory or the constructive goal of cognitive science is in focus, two different interpretations of the quality dimensions will be relevant. One is phenomenal, aimed at describing the psychological structure of the perceptions and memories of humans and animals. Under this interpretation the theory of conceptual space will be seen as a theory with testable consequences in human and animal behavior. (Gärdenfors 2000, 5)
Thus, a first way in which we can interpret the quality dimension is phenomenal. Phenomenal interpretation: Conceptual spaces can describe cognitive experiences. We may also want to focus the inquiry on the development of theories that help us to explain and predict events. This is the scientific interpretation: Scientific interpretation: Conceptual spaces can be applied to construct theories and artificial systems. Gärdenfors characterises this interpretation as follows: The other interpretation is scientific where the structure of the dimensions used is often taken from some scientific theory. Under this interpretation the dimensions are not assumed to have any psychological validity but are seen as instruments for predictions. This interpretation is oriented more toward the constructive goals of cognitive science. (Gärdenfors 2000, 5)
Having in mind these two interpretations of the theory of conceptual spaces, it is easier to develop descriptions on the basis of the appropriate distinctions, which contributes to avoid misconceptions. Suppose, for example, that two persons are arguing about whether colours are real properties of objects. They stop arguing after they realise that they are basing their descriptions on different interpretations. One of them argues from a phenomenal interpretation of conceptual spaces. Within a phenomenal interpretation, it would be appropriate to divide the quality space according to colours in order to represent visual perceptions.
Conceptual Spaces and a Note on Disagreement
29
Thus, whether an object is red or not would depend on whether the object is a member of the category “red” or not. By contrast, the other person argues from a scientific interpretation. Within this interpretation of the theory, we might appropriately divide the quality space as an electromagnetic spectrum. In this sense, whether an object is red or not does not simply depend on whether it is a member of some segment of the quality space. In order to develop an explanation, the spectrum could be related to a space representing reflection and to a structure in which we may distinguish the sensitivity of the retina to different wavelengths. The ways in which we determine the quality space depends on the aims of inquiry, that is, on the interpretation of the theory. In other words, it depends on the goals to which we want to apply the theory. A further distinction is also interesting. Some quality dimensions are integral, while others are separable. Within a phenomenal dimension, whenever an object is considered as a member of a certain hue category, it also has a brightness value (Gärdenfors 2000, 24). This means that hue and brightness are integral dimensions. Let us characterise these two ways in which we can relate dimensions as follows: Integrality: Two dimensions A and B are integral if, whenever an object is a member of A, it will be a member of B and whenever an object is a member of B, it will be a member of A. Separability: Two dimensions A and B are separable if they are not integral. Hue and loudness constitute a good example of two separable dimensions, according to a phenomenal point of view. Of course, the separability and integrality between two dimensions is not an absolute matter. It may depend on the interpretation of the conceptual spaces account, as well as on how the quality space is determined. We may characterise the notion of a domain on the grounds of the concept of a dimension, together with integrality and separability (Gärdenfors 2000, 26): Domain: A domain is a set of integrable dimensions that are separable from all other dimensions. Consider, for instance, the colour domain, involving the dimensions of hue, chromaticness and brightness. These dimensions are integral. Whenever we assign a certain hue to an abject, we should assign a certain
30
Chapter Three
brightness. The dimensions involved in the colour domain are separable from all other dimensions, such as weight or loudness. Another helpful distinction considered by Gärdenfors is the distinction between three kinds of representation. The first kind of representation is symbolic representation, according to which a cognitive process is based on the manipulation of symbols depending on prefixed rules (Gärdenfors 2000, 35). The notion of symbolic representation is fundamental within the cognitivist point of view, as considered in the preceding chapter. Cognition might be modelled on the basis of propositions and the inferential relations between them or on the basis of the generation of strings of symbols, according to syntactical rules. A general problem for the symbolic perspective on representation is explained by Gärdenfors in the following way: [N]ot only is there a problem of describing the genesis of predicates, but their development in a cognitive system is not easily modeled on the symbolic level. Even after an agent has learned a concept, the meaning of the concept very often changes as a result of new experiences. In the symbolic mode of representation, there has been no satisfactory way of modeling the dynamics of concepts. (Gärdenfors 2000, 38)
Within the symbolic perspective, it is hard to explain how concepts are learned, since predicates are already fixed in the syntax. A correct model of concept acquisition should not only involve the inclusion of a predicate into a language, but it should also involve the variations of a concept’s meaning. It is very difficult to include this linguistic dynamism in the symbolic framework. A second kind of representation is constituted by subconceptual representations. The notion is fundamental to the framework of connectionism, as considered in the previous chapter. Subconceptual representations occur on the most fine-grained level of representation (Gärdenfors 2000, 33) and constitute a connectionist system, which is characterised by Gärdenfors in the following way: Connectionist systems, also called artificial neuron networks (ANNs), consist of large numbers of simple but highly interconnected units (“neurons”). The units process information in parallel (in contrast to most symbolic models where the processing is serial). There is no central control unit for the network, but all neurons “act” as individual processors. (Gärdenfors 2000, 41)
In contrast to the symbolic perspective of representations, subconceptual representations are not based on symbolic manipulations, but on the
Conceptual Spaces and a Note on Disagreement
31
patterns exhibited by the neuron network. However, as the framework of symbolic representations, connectionism is confronted with a fundamental epistemological problem because of the difficulties involved in giving an account of the dynamics of conceptual learning. Gärdenfors considers this problem as follows: A fundamental epistemological problem for ANNs is that even if we know that a network has learned to categorize the input in the right way, we may not be able to describe what the emerging network represents. This kind of level problem is ubiquitous in applications of ANNs for learning purposes. […] [T]he theory of artificial neuron networks must somehow bridge the gap of going from the subconceptual level to the conceptual and symbolic levels. We may account for the information provided at the subconceptual level as a dimensional space with some topological structure, but there is no general recipe for determining the conceptual meaning of the dimensions of the space. (Gärdenfors 2000, 43)
Thus, according to Gärdenfors, one of the major problems for connectionism is the difficulty of explaining how our symbolic representations and theories work on the grounds of neural patterns. This cannot be simply achieved by indicating the correlations between patterns of neural networks and different types of linguistic behaviour. A satisfactory theory on cognition should also show how symbolic representations and their meanings emerge. That is, it should describe how representations are grounded in structures that involve different levels of interaction. The theory of conceptual spaces offers such a description. Before turning to that point, let us focus on the third kind of representation considered by Gärdenfors, the conceptual representation. A clear way of characterising the notion of conceptual representation is comparing it with the other types of representations that were already mentioned. Gärdenfors makes the following comparison between conceptual representations, which are fundamental elements of his theory, and symbolic representations: The dimensions are the basic building blocks of representations on the conceptual level. Humans and other animals can represent the qualities of objects, for example when planning an action, without presuming an internal language or another symbolic system in which these qualities are expressed. As a consequence, I claim that the quality dimensions of conceptual spaces are independent of symbolic representations and more fundamental than these. (Gärdenfors 2000, 43)
As pointed out by Gärdenfors, conceptual representations are more fundamental than symbolic representations. For instance, the
32
Chapter Three
representation involved when a person wants to grab a bottle standing in front of her must not be necessarily understood as formulated symbolically. It may simply be part of a phenomenal quality dimension. According to this, let us characterise the relation between both kinds of representation on the grounds of independence: Structural independence: The structure of a conceptual space does not depend on symbolic representations. Let us now turn again to connectionism. Comparing subconceptual representations and conceptual representations, Gärdenfors argues that a fundamental difference between both is the number of quality dimensions involved: When the behavior of an ANN is regular enough to be viewed from the conceptual perspective, the representational space on this level is usually of a low dimension. In going from the subconceptual level to the conceptual, a considerable reduction of the number of dimensions represented takes place. On the conceptual level, the irrelevant information has been sorted out, while the activation vectors describing the state of an ANN contain a lot of noise and other redundancies from the input. (Gärdenfors 2000, 240)
Artificial neuron networks can be represented as conceptual spaces based on a high number of dimensions. Each neuron corresponds to a dimension. Thus, by generalising a given set of information from a network, the number of dimensions should decrease. For instance, suppose that, at a certain point, the network is able to categorise certain sets of information. When this occurs, it may not be necessary to focus on the values of each neuron. We may be able to describe the network’s learning process in terms of more general concepts, by which the quality space changes to a smaller number of dimensions. Conceptual spaces can be conceived as emergent systems (Gärdenfors 2000, 244). They are not dynamic systems that involve complex interactions between their parts, but they may be described as systems that emerge from connectionist processes when the regularities are sufficiently clear and stable. On this basis, Gärdenfors argues that the notion of subconceptual representation is not incompatible with the notion of symbolic representation: I do not agree that the symbolic and the connectionist modes of representation are incompatible. The relation between the symbolic and conceptual levels on one side and the connectionist (subconceptual) level
Conceptual Spaces and a Note on Disagreement
33
on the other is rather, as described in the previous section, that connectionism deals with the fast behavior of a dynamic system, while the conceptual and symbolic structures may emerge as slow features of such a system. (Gärdenfors 2000, 249)
Symbolic descriptions and connectionist descriptions are not based on opposite ways of representation. The former can be understood as descriptions emerging from descriptions of the subconceptual type. The subconceptual level of representation involves a high number of variables and interactions, from which new behaviours and properties may emerge. A conceptual space can be built on the basis of these new properties, which are crucial in order to understand the notions of a predicate and an individual. The way in which the three levels of representation are related is characterised by Gärdenfors as follows: In brief, the conceptual level can be seen as a bridge between the two other levels. In biological systems, the dimensions of a conceptual space often emerge from self-organizing neural systems. This generally involves a reduction of dimensionality from that of the subconceptual level. When going from the conceptual to the symbolic level, the symbolic structures can be seen as emerging from the structure of the conceptual spaces. (Gärdenfors 2000, 257)
Thus, the fundamental relation between these levels is emergence. Connectionism does not represent a completely distinct, incompatible reality compared to symbolic representationalism. Cognition involves subconceptual as well as symbolic representations and these are connected by conceptual representations through emergent processes. Of course, one should not consider the emergence of conceptual spaces as an isolated process going on in a cognitive system that has no interaction with the environment. Consider the following two conditions for the emergence of cognitive structures: Emergence of a conceptual space: Given the development of a neuron network, a conceptual space may only emerge after a certain point of complexity, involving constant interaction with the environment, has been reached. Emergence of a symbolical structure: Given a set of conceptual structures, a symbolical structure may only emerge on the basis of the interactions between the agents that have developed these structures.
34
Chapter Three
These two theses characterise the importance of interaction between a cognitive system and its environment. An appropriate understanding of knowledge should consider every aspect of a cognitive system and every form in which it is able to represent things. The notion of a conceptual space is crucial for such an understanding. Let us now focus on the notion of meaning involved in the conceptual spaces account. How is semantics understood in this theory?
Cognitive Semantics In order to understand a general notion of meaning one needs to understand what is the role of a meaning with regard to a language. As Gärdenfors clearly characterises it, semantics is a discipline that studies the relations between the expressions of a language and the meanings of those expressions (Gärdenfors 2000, 151). He distinguishes two types of semantic theories, realist semantics and cognitive semantics. A first kind of realist semantics is extensional semantics: In the extensional type of semantics, the constituents of the language become mapped onto a “world.” Names are mapped onto objects, predicates are mapped onto sets of objects or relations between objects, and so forth. By compositions of these mappings, sentences are mapped onto truth values. The main objective of this kind of semantics is to formulate truth conditions for the sentences in the language. Such conditions are supposed to determine the meaning of the expressions in that they specify the way the world should be constituted if the sentences of the language are to be true. (Gärdenfors 2000, 151)
Suppose that I want to say something about the tree that is in front of me. The expression “that tree” may refer to it. In this sense, that expression corresponds to a certain object in the world. Thus, according to extensional semantics, the meaning of an expression such as “that tree” is the object designated by it. I may even give that tree a name in order to make it clearer that I am referring to that specific tree and no other. Suppose that I am considering the tallness of that tree. The predicate “tall” corresponds to the set of tall things. In this sense, when I say “That tree is tall”, I mean that the tree I refer to is a member of the set of tall things. Accordingly, I have a truth condition for that sentence. It is true just in case that tree is a member of the set of tall objects. In a few words, according to extensional semantics, the meaning of an expression depends on how the world is. It should be clear that this type of semantics is based
Conceptual Spaces and a Note on Disagreement
35
on the notion of a single world. By contrast, intensional semantics is based on a set of a plurality of worlds: In this brand of semantics, the set of linguistic expressions is mapped onto a set of possible worlds instead of only a single world. [...] The goal of intensional semantics is still to provide truth conditions for the sentences. The meaning of a sentence is taken to be a proposition that is identified with a set of possible worlds: the set of worlds where the sentence is true. (Gärdenfors 2000, 153)
According to intensional semantics, the meaning of a proposition does not necessarily depend on the constitution of a single world. If I say “That tree is tall”, it may be that whether the proposition I am expressing is true depends on how that tree is in some of the worlds among the set of all possible worlds or in just one world. Now, consider the proposition “That tree may fall”. How should we interpret that proposition? What is its meaning? Surely, it does not describe how the world is actually constituted. Perhaps the fact that the tree may fall depends on a set of properties actually instantiated by the tree in the actual world, such as being old or being cracked. But although this connection to actual properties may justify my belief that the tree may fall, it is hard to see how it accounts for the expression “may”. According to intensional semantics, the proposition “That tree may fall” is true just in case that tree falls in at least one of the worlds that are most similar to the actual world. Thus, the word “may” expresses possibility, a relation between the actual world and what occurs in a set of possible worlds. The theory of conceptual spaces is not based on a realist semantics, but on a conceptualist semantics, also called cognitive semantics. Gärdenfors characterises this account as follows: A semantics is described as a mapping from the expressions to a conceptual structure. This mapping can be seen as a set of associations between words and meanings—associations that have been established when the individual learned the language. According to this view, language represents a conceptual structure, but it does not directly represent the world. (Gärdenfors 2000, 154)
According to conceptualist semantics, the meaning of an expression does not depend on a relation between that expression and what occurs in the world, but on a relation between that expression and a conceptual structure. What is, according to a conceptualist semantics, the meaning of the sentence “That tree is tall”? Since language is not simply constructed on
36
Chapter Three
the basis of sets of properties and individuals, we cannot just consider whether a certain object that we refer to by the expression “that tree” is a member of the set of things that are tall. The account of conceptual spaces is focused on how we can form concepts from the invariances involved in perception (Gärdenfors 2000, 59). Thus, according to this theory, we do not start with a fixed distinction between individuals and properties, but from a conceptual space constructed on the basis of perception. Then we may focus on the features. Such a conceptual space may develop and turn more complex. At a given point, we may determine, for instance, which of the concepts involved in a conceptual space are nouns and which are not, given their specific features. In order to understand the features of a noun in the theory of conceptual spaces, we need the notion of a domain, that is, a set of integral dimensions that are separable from all other dimensions. Normally, nouns correspond to many domains (Gärdenfors 2000, 101). For instance, the word “tree”, used by a particular speaker, may contain information about height, colour, texture, as well as theoretical information about biological or sociological categories. By contrast, adjectives correspond to a single domain. Tallness is a good example. The meaning of the expression “That tree is tall” depends on how the concept of that tree is involved, as a noun, in a set of domains and, particularly, in the dimension represented by the adjective “tall”. Since symbolical structures emerge from communication, that is, from the interaction between agents, the meaning of a symbolic expression depends fundamentally on the success that an agent may have on the basis of a particular conceptual structure related to that expression. For instance, if the dimension representing the concept of “tall” does not have, according to a single agent, the features that it has according to most of the agents in her community, probably she may fail to communicate successfully what she wants to communicate with the expression “That tree is tall”. Of course, the conceptual spaces of an agent can be corrected and transformed according to experience.
Agreement and Indifference I will discuss very briefly how the theory of conceptual spaces can be used to understand actions of negotiation. In a community of agents, the transformation of meaning depends on negotiations. Communication failure and indifference are closely connected. I would like to show in which sense indifference is involved when two agents fail to communicate correctly:
Conceptual Spaces and a Note on Disagreement
37
Communication failure as indifference: Whenever two agents R and S fail to communicate about some topic t, either R is in a state of indifference with regard to t, or S is in a state of indifference, or both are. The relevant state of indifference of an agent concerns the value of reaching a consensus with the other agent. For instance, when two agents argue about whether a tree is tall or not, communication may break because S thinks that to agree on that issue is not very important. Whether both think that the tree is tall or both think that it is not, would make no difference to S. Of course, the mere fact that two persons are arguing about something presupposes that the topic is important to both of them. That is, disagreement alone is not a sufficient condition for arguing. This means that if the state of indifference of one of them explains why their communication failed, there must have been a change regarding the relevance of some of the topics being discussed. At the beginning of the argument, the topic was relevant enough to argue about it, according to S. S was not indifferent. Then, at some point, the topic was not relevant enough to continue arguing. S became indifferent about it. Warglien and Gärdenfors (2015) apply the theory of conceptual spaces to the notion of negotiation in order to explain how communication may influence variations in conceptual space: [W]e will look at meaning negotiation as the process through which agents starting from different preferred conceptual representations of an object, an event or a more complex entity, converge to an agreement through some communication medium. […] The “solution” to the negotiation problem is the agreement reached (or the final disagreement). While this approach maintains a broad scope, it is important to stress that it assumes that agents “move” in a defined conceptual space and have potentially conflicting interests in the agreement to be reached. (Warglien & Gärdenfors 2015, 80)
When two agents disagree about something, they locate the points representing the topic in different regions of a shared conceptual space. The solution of a discussion, as Warglien and Gärdenfors assume, occurs when communication is successful. The ideal final state is the agreement about the concept in question. In such cases, the concepts represented by each of the agents involved meet in a given region of the conceptual space. Another possible final state is what they call final disagreement. Since two agents that end a conversation by agreeing that they disagree do not want to develop the discussion further, I consider final disagreement as a
38
Chapter Three
type of communication failure. Clearly, it involves communicational success about the disagreement itself, but it is a failure with regard to the topic that initiated the conversation. It is important to notice that, according to the conceptualist view about disagreement proposed by Warglien and Gärdenfors, agents assume a defined conceptual space: This view of meaning negotiation crucially depends on the fact that some initial representation is established for each agent, and that agents can locate their meanings in such space. (Warglien and Gärdenfors 2015, 86)
Usually, when two agents disagree, they share the dimensional structure of the conceptual space, but they locate objects in that space differently. Going back to the example about the tree, the two agents may have the same dimension of “height”, but they may have a different threshold separating tall trees from non-tall trees. It is also possible that they share both the dimension structure and the notion of a “tall tree”, but somehow they perceive a particular tree differently, locating it at different sides of the threshold of “tall tree”. Thus, disagreement is based on the agreement on a particular conceptual space. What occurs in a conceptual space when one of the agents changes her mood to a state of indifference? It is possible that the region representing one of the concepts involved in the discussion expands. Suppose that R and S share the dimension of “height” and the threshold for “tall tree”. Both are observing a tree in the distance. R says that the tree they are seeing is tall while S says it is not. Then, S moves to a state of indifference and expands the region that represented the approximate height of the tree. When the discussion ends, S is not sure about the height of the tree. She maintains the initial dimensional structure. Thus, the tree may be tall or not. Consider a case in which R and S share the region of points representing the height of the tree and also the height dimension, but do not share the threshold for “tall tree”. If R moves to an indifference state, the threshold may expand. As a consequence, the number of physically possible height values according to which a tree is considered to be tall may decrease. Of course, it is always possible that the discussion between two persons ends because one of them thinks that it is no longer worth to talk about a given concept in the way they have been considering it. Sometimes, whether a tree is tall is not important. In this case, the state of indifference is related to the dimensional structure itself. The most extreme situation in which a discussion between two persons
Conceptual Spaces and a Note on Disagreement
39
may end involves indifference about having a conversation with the interlocutor. In such a case, the conceptual space may be structured according to the value one may assign to having a conversation with a given person.
CHAPTER FOUR SIGNALS AND REDUNDANT INFORMATION
Nobody doubts that knowledge involves acquisition, creation and processing of information. As has been shown, these must be understood as activities of agents that interact with their environments and not as simple isolated processes. Thus, information does not only consist of states and results, but also of change and transmission. Sometimes, the source of information is reliable according to a given epistemic system. Sometimes, it is not. Of course, the value of information depends on this issue. I would like to consider the idea that the excess of information about something may be of less value than information about the same topic delivered in the right quantity. This can be put in the following way: Value of redundant information: For a receiver, redundant pieces of information about a given topic may be less valuable than a simple piece of information about the same topic. It should be noticed that although this thesis establishes something about excessive information, not every piece of excessive information is an instance of it. In some cases, excess of information can involve abundant details that might be useful. In order to develop this idea, I will focus on the notion of a signal, since it is appropriate if we want to characterise the value of information, its reliability and in which cases information can be considered to be excessive.
Signalling Games and Information A signal can be understood as a piece of information, a physical change or a behaviour that carries a meaning. A clear way to understand the notion of a signal is within the framework of sender-receiver games (Russell 1921, Lewis 1969, Skyrms 2010). Brian Skyrms describes the fundamental elements of a signalling game as follows:
42
Chapter Four There are two players, the sender and the receiver. Nature chooses a state at random and the sender observes the state chosen. The sender then sends a signal to the receiver, who cannot observe the state directly but does observe the signal. The receiver then chooses an act, the outcome of which affects them both, with the payoff depending on the state. Both have pure common interest—they get the same payoff—and there is exactly one “correct” act for each state. In the correct act-state combination they both get positive payoff; otherwise payoff is zero. (Skyrms 2010, )
An important aspect of signalling games is the fact that both players have a common goal. This is the goal of communication, which may serve as a medium for many other goals, of course, such as survival or entertainment. Suppose that two persons are navigating a river on a boat. One of them, Robert, operates the wheel while the other, Susan, orders Robert the way to go when there are bifurcations. While Susan knows the river very well and has a map, Robert is navigating in it for the first time and does not know how to read the map. Although there is a quiet and safe way to arrive at their destination, both know that the river may lead to dangerous rapids and waterfalls. Suddenly, they approach a bifurcation. Susan takes a look at the map and shouts “Left!” Robert turns left and they continue on the quiet side of the river. This example has the basic structure of a sender-receiver game. It should be noticed that the notion of observation is considered in a broad sense. It may be any kind of perception. In a certain way, Susan observes nature when she sees the bifurcation and then looks at the map. Her signal is her order. Robert receives the order and chooses to turn left. Both get a positive pay-off: They continue their journey safely. Of course, their success depends on the fact that Robert knows what the meaning of the signal is. But this may not be the case at the beginning of every signalling game. Although meaning is crucial, it is not an intrinsic characteristic of signals, as Skyrms explains: Signals are not endowed with any intrinsic meaning. If they are to acquire meaning, the players must somehow find their way to information transmission. […] That is not to say that mental language is precluded. The state that the sender observes might be “What I want to communicate” and the receiver’s act might be “Oh, she intended to communicate that.” Accounts framed in terms of mental language, or ideas or intentions can fit perfectly well within sender-receiver games. But the framework also accommodates signaling where no plausible account of mental life is available. (Skyrms 2010, 7)
Signals and Redundant Information
43
The meaning of a signal is not contained in the signal itself, but is the result of a signalling game. It depends on the perception of the receiver and on the pay-offs. Since the receiver’s pay-off is a consequence of her act, the meaning of a signal also depends on action. Thus, meaning emerges from perception and action; it is the result of interaction. Note that it cannot simply emerge from a single instance of the game in which a signal is sent, received and an act follows. The meaning of a signal should be considered as emerging from the whole game’s history. As Skyrms points out, the account of signalling games are not only applicable to cases in which sender and receiver have mental abilities, but also to cases in which these abilities are not involved. Consider phototropism. Light might be considered as consisting of signals and a plant’s cells as receivers. A plant’s particular movement towards light can be considered as an act, following a light signal. Since phototropic behaviour is crucial for a plant’s development, particular movements might be understood as actions associated with a high pay-off. Of course, some notions like the concept of action and the pay-off value should be taken in a broad, and even perhaps in a metaphoric sense. Anyhow, this example shows in which sense we could apply the account of senderreceiver games to signalling processes that do not involve mental faculties. Instead of understanding information on the basis of a strict, representational notion of meaning, it is more appropriate, according to this theory, to understand meaning in terms of information. Thus, in order to better understand the notion of meaning implied by this account, let us consider the notion of informational quantity. Skyrms (2010, 35) defines a signal’s quantity of information in the following way: I = p(AŇs) / p(A) That is, the information contained in a signal s with regard to a state A is, according to a receiver, equal to the probability of the occurrence of A, given the fact that the signal is received, divided by the unconditional probability of that state’s occurrence. Consider again the example of the boat. According to Robert, the informational content of Susan’s order with regard to a given state depends on the probability that he assigns to the state. The fact that Susan shouts “Left!” constitutes a signal, s, which Robert receives by hearing Susan’s voice. She sends that signal after observing a state of the river and of their journey. She thus becomes aware that the branch going to the right does not lead to a safe journey and that the branch going to the left is the safe way. Suppose that any bifurcation has only one safe alternative way and,
Chapter Four
44
as already assumed, Robert does not know the river. That is, if we asked Robert, before he gets Susan’s signal, which way one should go, he would say that he does not know, that the safe way may be left or right. In other words, he has no preference regarding the two options. In a certain way, he has an attitude of indifference; he sees no difference between the two alternative branches of the river and if we asked him which way he preferred, he might say that it would be indifferent to him. Assuming that the state A is the fact that the left branch is the safe way, p(A) equals, for Robert, 0.5. As we assumed, Robert relies completely on what Susan knows about the river, such that we may also assume that he would believe at a high degree that the left branch is the safe way if she says so. Thus, let p(AŇs) equal 1. According to this description of the situation, the informational quantity of Susan’s order is 2. Suppose now a similar scenario in which Robert still does not know anything about the river, but he also does not believe in anything that Susan says. In this case, both p(A) and p(AŇs) are equal to 0.5. Thus, the quantity of the information given by Susan is 1. As Skyrms indicates (2010, 36), in these cases we would like to say that the informational quantity is zero, because the signal has no value at all for the receiver; it does not move the probability of the state. In order to do this, he considers the following extension of the above shown definition: I = log2 [p(AŇs) / p(A)] Consider again the two cases described above. When Robert relies on Susan, the informational quantity of her warning is equal to 1, but when Robert does not rely on her, it equals 0. With this notion of informational quantity we may consider information in bits and based on how signals move probabilities: If the signal does not move the probability off one-half, the information is 0; if it moves the probability a little, there is a little information; if it moves the probability all the way to one or zero, the information in the signal is one bit. In a signaling system equilibrium, one signal moves the probability to one and the other moves it to zero, so each of the two signals carries one bit of information. (Skyrms 2010, 37)
Thus, whether a signal or expression carries content or not depends on whether it is able to change the probabilities of the states associated to it. When Robert relies on Susan, Susan’s warnings carry a lot of information for him, since his beliefs about the river change a lot after she says
Signals and Redundant Information
45
something about it. When she sends him a signal, he moves from his epistemic state of indifference to an epistemic state in which he can confidently decide what to do. This scenario also exemplifies what Skyrms refers to as a state of equilibrium. Suppose that Susan will only use two signals, that there are two possible states and that Robert is aware of that. As assumed, the state in consideration is the fact that the left branch is the safe way to go. Then, when she shouts “Left!”, the probability about the state moves to one, and when she shouts “Right!”, it moves to zero. What would occur in a scenario in which Robert does not know the meanings of the expressions “right” and “left”? He would learn them eventually, after numerous experiences of seeing a bifurcation, hearing Susan’s warnings, acting as a consequence and observing the results of his acts. If, for instance, on most occasions when Susan shouts “Left!” and he goes to the right branch they get to a dangerous rapid, he might learn that Susan’s signal does not mean that he must go to the right. A regularity may emerge from a signalling game if the sender chooses any expression and associates it to the same state kind during a sufficiently prolonged amount of time. At some point, Robert will grasp the regularity and a convention will be established. In this sense, convention is not necessarily the product of a single explicit agreement, but something that emerges from the interactions between the players of a signalling game, between their actions, their observations and the environment. In order to arrive at a better understanding of the notion of meaning involved in the account of signalling systems, one should also consider the question of how concepts are formed. As Skyrms explains, “[s]tates that the sender maps onto the same signal belong to the same category according to the signaling system” (2010, 114). For instance, the signal “Left!” might be associated to different kinds of states, depending on the experiences of the players. It might be associated to the fact that there is a rapid in the right branch of the bifurcation, or that there is a cascade, or that there might be either a rapid or a cascade, or that there are both a rapid and a cascade, or just that the right branch is dangerous. A distinct category can be formed according to each one of these possibilities. The specificity of the category used also depends on the structure of the signalling game and on the interests of the agents. In this sense, a signal referring to a waterfall may not be necessary in the example that I am considering. Just the signals “Left!” and “Right!” are perhaps enough to continue the journey navigating safely. Concept formation does not only depend on the stability of the interaction between agents, but also on perceptual patterns. Besides, concepts may vary, they can be modified when they are involved in
46
Chapter Four
communication. Thus, concept formation and transformation are based in this sense on the way in which categories are constituted according to sender-receiver games. Not only categories and concepts can emerge from signalling systems, but also rules on the basis of which agents can invent signals: General principles of invention emerge. We can suppose that there are acts that the sender can take which the receiver will notice. These could be tried out as signals—either on a short time scale or a very long one. The potential signals may be sounds, or movements, or secretions of some chemical. They may bear some resemblance to other signals, or to other features of the environment that receivers already tend to monitor. With some probability a new signal can be actualized—a sender will send it and a receiver will pay attention to it. (Skyrms 2010, 121)
Thus, as Skyrms points out, the environment may determine which signal is better to communicate a given state. The distance between sender and receiver is a good example of a factor that conditions the type of signal that a sender might choose. It is actually a very strong reason to invent new signals or instruments for signalling. Of course, the abilities of the sender and of the receiver are also strong factors that might have a great influence in the creation of signals.
Redundant Information We are now able to reconsider the value of redundant information. As stated above, redundant pieces of information may have less value than a simple piece of information about the same topic. Consider Fred Dretske’s characterisation on redundant information: It is like the man who keeps going back to the door to make sure it is locked. Such a person does not keep getting new information (viz., that the door is still locked) since the door’s still being locked generates no new information, no information he did not already have (20 seconds ago) when he got the information that it was then locked. Given that it was locked 20 seconds ago, every revisit to the door is informationally redundant. Redundancy may be psychologically reassuring, and to this extent it may be epistemologically relevant (i.e., insofar as it affects one’s preparedness to believe), but it is otherwise of no epistemological significance. (Dretske 1981, 116)
I do not agree completely with Dretske about the significance of redundancy. Of course, redundant information is not epistemologically
Signals and Redundant Information
47
significant in the sense that it does not change the beliefs of the receiver, it does not move the probability associated to a given state. But this is not the only relevant feature of redundancy. Redundant information can be less valuable than a single piece of information about the same state and, thus, can be informative. Considering Dretske’s example, the information that a person gets after checking for the first time that the door is locked is not redundant. That single piece of information is more valuable than the piece of information that one gets checking the door 20 seconds later. We can make this comparison because we are considering the two pieces of information separately. Now, we can also consider the whole process containing both pieces of information. On this basis, redundant information is somehow information about what is already known. It allows one to answer the question “Did I already know that?”. Since anything that helps us to answer a question is epistemologically valuable, redundant information also is. Let us consider another interesting feature of redundancy. Whether a piece of information is redundant or not depends on the interests of the receiver or on other factors involved in each scenario. Consider these two cases: Cloudy sky: If Robert is busy working at his desk and Susan calls him every two minutes just to tell him that the sky is cloudy, he might consider the information redundant. Paramedic: A paramedic checks every two minutes whether a severely injured person has pulse or not. Suppose that every time the paramedic checked the pulse, the pulse of the injured person was within a normal range. Thus, every time she checked, she received redundant information with regard to the fact that the victim’s pulse was normal. Let us focus first on the cloudy sky case. On the one hand, we may like to say that the information received by Robert is redundant because he is not interested in the weather when he is working. On the other hand, if Robert was a meteorologist, perhaps the information contained in Susan’s calls may not be redundant for him. It might be interesting for him to receive updates every two minutes about the weather of the region where Susan lives. The paramedic case also shows how the receiver’s interests may influence the fact whether a piece of information should be considered as redundant or not. It may be relevant for a paramedic to check the patient’s pulse with such a frequency. If the receiver needs an update every two
48
Chapter Four
minutes, the information may not be regarded as redundant. But if the topic is not sufficiently relevant to the receiver, the information may seem redundant. Consider now the following case: Physician: Robert visits the doctor for a routine examination. She checks Robert’s pulse, among other routine check-ups. His pulse is normal. The physician decides to check Robert’s pulse five more times every two minutes. Every time Robert’s pulse is within the normal range. It seems that, in the context of a routine examination, the fact that the doctor checks Robert’s pulse five more times after being sure that it was normal may involve redundant information. Also, this may be a reason for Robert to doubt the physician’s expertise or to think that something is wrong with the results of his other exams. These examples show that, in fact, redundant information may be less valuable than single pieces of information about the same topic. However, whether a piece of information is redundant or not depends on the context in which the signalling processes occur, especially on the receiver’s interests and beliefs. If an agent considers that he has received a signal containing redundant information, the quantity of information associated to it is zero for him, because it does not move the probabilities about the state it refers to. In this sense, whether a signal lacks of information content does not only depend on whether the sender is reliable or not, but also on whether the information is redundant or not. Both cases involve some sort of indifference with regard to information. On the one hand, when there is a problem of reliability, the receiver is indifferent with regard to what the sender has to communicate. On the other hand, when there is redundant information, the receiver is indifferent with regard to the novelty of its informational content.
Consciousness, Experience and Signals Even if meaning emerges from several signalling interactions, how should we consider the fact that a given agent understands a meaning? What is shared by two agents that communicate successfully? One may correctly argue that they share a conceptual structure or a part of their conceptual spaces. But how can we describe an agent’s experience of a particular conceptual structure? Let us consider again the example of Robert and Susan travelling on a
Signals and Redundant Information
49
river. In a certain way, if Robert and Susan arrived safely at their destination thanks to their coordination, we may conclude that they share some semantic structure associated with the signals “Left!” and “Right!”. We may thus assume that they have a similar conscious experience about those signals and about their environment. Consider Claus Emmeche’s characterisation of the notion of consciousness related to biological signalling systems: What is expected to be found in lower intensities are specific sign processes, that is, signs producing and mediating other signs. Consciousness is, in contrast, an emergent phenomenon, associated with particular forms of sign action in particular kinds of systems: self-moving autonomous organisms-animals. The concept of experience describes the continuous scale between very simple and very complex forms of sign activity. The concept of consciousness describes a jump in this continuum. (Emmeche 2004, 325)
Conscious experience is a special kind of experience that involves a high degree of signalling complexity. In a way that is compatible with our considerations involving sender-receiver games and meaning, Emmeche argues that consciousness emerges and is distinct from the signalling processes that constitute the basis of the system in which an organism is involved. The meaning related to a signal is an emergent phenomenon of a given signalling system. Conscious states are also emergent in this sense and consciously understanding a meaning can be considered as a special kind of experience. Meaning and experience are not only two types of emergent phenomena. Since every experience is based on a signalling system and meaning is a fundamental emerging feature of these kinds of systems, the notion of meaning is also crucial in order to understand what experience is. According to Emmeche, a physical or biological system experiences something when it is somehow modified by its surroundings and it can store that type of change in its structure. Thus, experience is possible on the basis of signalling interactions that are constituted between biological systems and their environments, as well as of the capacity systems have to store the information emerging from those interactions. Emmeche explains this idea considering a common feature of biological systems such as plants and animals: In a very general sense of the word experience one can say that all these systems, even the purely physical, experience something, get ‘irritated’ or affected by their surroundings, and store this influence, even when such
50
Chapter Four stimuli are quite evanescent or produced by chance. In that generalized sense, process and experience are interrelated in all situations where the process of interaction between one subsystem (corresponding to an agent) and another subsystem (corresponding to the environment of the agent) leaves traces in one of the subsystems. (Emmeche 2004, 327)
Thus, an organism’s experience involves the ability to store information or, in other words, to preserve particular structural changes produced by the interaction with the environment. Of course, the ability to experience regularities and the meaning associated with certain signals is fundamental. It is interesting to notice that Emmeche characterises the relation between processes and experience; he does not only focus on the changes that affect an agent, but also on the changes that its actions may produce in the environment. We may define the notion of experience, following Emmeche’s account, as follows: Experience: A sequence s of biological states, associated to a given system, is an experience if it involves processes that leave marks in the regularities according to which the system is organised. Although Emmeche refers to an experience as a kind of sign or as a form of movement (2004, 326), I prefer to consider it as a kind of state attributed to a biological agent, which is itself a system embedded in a system. The generality of this definition also allows us to attribute the experience to an individual organism, as well as to a signalling system. On the one hand, an individual agent has the experience of understanding the meaning of a concept because a certain set of signalling processes marked the regularities on which its actions are based. On the other hand, the signalling system experiences, as a community, the emergence of a given meaning. Thus, the regularities, according to which the individuals of the community coordinate, were changed by the processes that lead to the considered experience. In order to clarify the concept of conscious experience, the notion of movement is crucial. Emmeche distinguishes this notion from the notion of physical change as follows: Movement must be distinguished from merely physical change of position over time; rather, the course of movement in animals is always governed by semiotic codes based within the animal body [...]. Movement is an externally observable process that is also well internally sensed. (Emmeche 2004, 329)
Signals and Redundant Information
51
While movement involves perception as enaction, that is, as guided action, physical change is simply considered as a change in space. Of course, any organism involved in a process of movement may be regarded, in a restricted sense, as a body changing its position in time. However, in a more detailed description, considerations about how the organism perceives itself and the environment are crucial. Thus, the notion of movement is not only important if we want to develop a description of the organism, but also if we want to make assumptions about its particular experience of moving. On the grounds of the notion of movement, Emmeche characterises consciousness in the following way: Consciousness appears as the present moment’s qualitative feature of a moving animal which experiences a process of complex relations between sensing the movements of its own body and sensing the corresponding changes of the environment. (Emmeche 2004, 330)
First of all, it should be noticed that Emmeche understands consciousness as a particular attribute of animals, which is coherent with our common use of the word “consciousness”. Many would doubt that conscious experiences can be correctly attributed to non-human animals. Interestingly, the concept of consciousness that Emmeche proposes is broader. I agree with this extension of the concept. Note also that consciousness is not any property of a given biological system, but a qualitative property. Additionally, this notion is perfectly coherent with enactivism. The notion of a signal should not only be based on physical interactions and changes, but also on the conditions according to which an agent is able to interact and change. There is no doubt that informational content can be understood in terms of the probabilistic changes of the players in a signalling game. But a good account of how meaning and knowledge emerges from signalling systems should also describe in which sense an agent experiences a change of belief. According to these considerations, a notion of indifference with regard to experience can be characterised as follows: Indifference regarding experience: A process p is indifferent to the experience of a given system s, if p does not influence the experience of s, that is, if it does not change the regularities according to which s acts and is organised.
52
Chapter Four
Following this notion, we may also propose a notion of indifference with regard to consciousness: Indifference regarding consciousness: A process p is indifferent to the conscious experience of a given system s, if p does not change the qualitative regularities according to which s acts and experiences its own body and the environment. The concept of a redundant experience is hard to define, since every experience is associated with a change of regularity. If there is a process that does not affect an organism’s organisational structure in a way that some information can be stored, then such a process cannot be considered as an experience for that organism. In this sense, there are no redundant experiences. At this point we should recall that a signal may lack of informational content either because it involves redundant information or because the source is not valuable to the receiver. It follows that, regarding experience, whether a given system is in a state of indifference depends only on the subjective relevance of the processes with which the system may interact.
CHAPTER FIVE EPISTEMIC DIFFERENCE-MAKING IN CONTEXT
In a state of indifference regarding two propositions, p and q, there are no factors that make a difference between them. On this basis, we may understand the notion of indifference by focusing on the conditions under which a factor can make a difference on another. Comesaña and Sartorio (2014, 368) propose the following initial definition of the differencemaking relation: DM1: R is a difference-making relation if and only if, whenever R holds between some facts A and B, either B wouldn’t have obtained if A hadn’t obtained or A wouldn’t have obtained if B hadn’t obtained. As they make it clear, difference-making is based on the counterfactual relation. Some fact B depends counterfactually on another fact A if one can say that B would not have occurred if A had not occurred. In this sense, we also say that A makes a difference to B. The difference-making relation is closely related to the notion of causation. Causes make differences in their effects. Comesaña and Sartorio consider the counterfactual notion of the causal relation as follows: Causal DM1: c caused e only if, had c not occurred, e would not have occurred either. This is a way of applying the difference-making relation to a definition of the concept of causation. Another way of applying the difference-making relation focuses on the knowledge relation. Comesaña and Sartorio (2014, 369) define the relation of epistemic difference-making as follows: Epistemic DM1: S knows that P only if, had F not obtained, S would not have believed that P. This should answer a main question of epistemology, namely which conditions should be met in order to establish that someone knows
54
Chapter Five
something. According to epistemic difference-making, we might say that, for any agent and any proposition, there is a set of necessary conditions such that if they do not obtain, then the agent does not believe that such a proposition is true. The kind of facts involved in the set F may vary and so the definition of epistemic difference-making may vary too. If we take F to be the fact described by the proposition P, then we get the sensitivity condition (Comesaña and Sartorio 2014, 369): Sensitivity: S knows that P only if, if P were false, then S wouldn’t believe that P. It should be noted that the definition of the sensitivity condition is similar to the definition in Epistemic DM1. This condition states, in other words, that agents only know true propositions. This seems to be right. Nobody would say that falsehoods can be known. Comesaña and Sartorio consider the following case in order to show that the sensitivity condition for knowledge is not correct: [S]uppose that at noon one day in July in Tucson, I leave a glass with ice outside. Two hours later, I wonder about that glass and come to think (based on my background knowledge of the weather in Tucson in July) that the ice has thawed. Consider now a situation where it is not true that the ice has thawed. Why would that be? One possible explanation is that the temperature has suddenly sunk to below the freezing point. Or maybe someone keeps surreptitiously adding some powerful anti-thawing agent in the glass. Whatever the possible cause, it is a highly unlikely one. And let us suppose that if one of those highly unlikely situations were to arise I would not become aware of it (I am already far from Tucson, say). Thus, if it were not true that the ice has thawed, I would still believe that it has, and based on the same reasons for which I actually believe it. That is to say, my belief that the ice has thawed is not sensitive. But given, in part, the highly unlikely nature of the possible interferences, we do want to grant that I know that the ice has thawed (if we are happy with the existence of inductive knowledge at all). Therefore, the sensitivity condition fails. (Comesaña and Sartorio 2014, 369)
We may consider the basic structure of the example in this way: S knows that P, because P is more likely than ¬P. P is false, so, according to the sensitivity condition, S would not believe that P. But S still believes that P, although we should not believe something false. Therefore, sensitivity is not an appropriate condition for defining the knowledge relation. This can also be regarded as an example that shows how induction does not guarantee knowledge. Although one may have good reasons, based on
Epistemic Difference-Making in Context
55
solid evidence, that the ice has thawed, it does not imply that one knows that ice has thawed. In order to clarify the problem raised by the presented example, I would propose the following reformulation of the principle of epistemic difference-making, where C is a context, i.e. a set of possibilities considered by an agent S: Contextualist Epistemic DM: S knows that P according to C if and only if a) S believes that P according to C, b) C includes some proposition F c) and had F not obtained in a relevant context C', S would not have believed that P within C'. The set C' is an alternative context, which should be sufficiently relevant from the perspective of C, and F is a set of conditions supporting P. Regarding this definition of the epistemic difference-making relation, we do not consider knowledge as an absolute state of the epistemic agent. In this sense, as well as according to epistemic DM1, knowledge is not a binary relation, but a ternary relation involving the agent, a proposition and the set of possibilities considered by the agent at a particular time. Whether an agent knows something does not only depend on a set of epistemic conditions, but also on a wider set of possibilities among which those epistemic conditions are involved. Furthermore, knowledge depends on what would be believed with regard to the alternative context, C'. If in that alternative context the conditions characterised by F are not met, i.e. are not believed, the agent would no longer believe the proposition in question. Following a similar contextualist assumption, we might reformulate the sensitivity condition for knowledge as follows: Contextualist sensitivity: S knows that P according to C only if, if P were false according to some relevant context C', S wouldn’t believe that P in C'. Let us see how we can handle the thawed ice example using Contextualist Epistemic DM. One day of July in Tucson, someone left ice outside. From that agent’s context, involving her knowledge about the weather in Tucson, we would like to say that she knows that the ice has thawed, even if she cannot go and see it herself. Even if it were not actually true that the ice thawed, she would still believe that it did. Nevertheless, the counterexample does not affect the epistemic notion of the sensitivity
56
Chapter Five
condition. We might consider the structure of the case in the following way: S knows that P in C. P is false in C'. Finally, S would still believe that P, which is correct. The important fact is that, according to the example, S never evaluates P with regard to C'. This is because C' is not relevant from the perspective of C. As Comesaña and Sartorio put it, we are assuming that the ice has not thawed but also that the agent never becomes aware of it. If someone is not aware of some piece of information, we cannot say she is able to change her belief state according to such information. Thus, the context C' includes the proposition that the ice thawed, but the agent’s beliefs are not based on that context. Anyway, if S assumed C', then S would not believe that P, which does not conflict at all with the idea that S knows that P in C. This case conflicts somehow with the idea that every context is relevant to an agent, but not with contextualist sensitivity. Let us go back to Epistemic DM1. There is another option considered by Comesaña and Sartorio regarding the facts symbolised by F in that definition. According to this option, F is whatever fact in the world, on which the agent bases her belief that P. On these grounds, Comesaña and Sartorio reconsider the definition of the knowledge relation as follows: “[I]f the subject believes that P based on Q, then the subject knows that P only if, had Q not obtained, the subject wouldn’t have believed that P based on Q” (2014, 370). They argue that this definition is either trivial or refuted by a counterexample similar to the one that refutes the simple definition of the sensitivity condition. It is trivially true if we assume that the agent cannot base her belief in Q when Q is false. Of course, if Q had not obtained, the subject wouldn’t have believed that P based on Q. I agree with this result, but I disagree regarding the refutation. Consider the following version of the thawed ice counterexample: In the actual case, the subject believes that the ice has thawed based on the fact that he left it outside in Tucson in July. In the counterfactual scenario, the subject didn’t leave the ice outside in Tucson in July, but he nevertheless believes that the ice has thawed based on his having left the ice outside in Tucson in July. Why? Because, unbeknownst to the subject, an evil neuroscientist has implanted a device in his brain that guarantees that he will form the belief that he has left the ice outside even if he actually didn’t [...]. In the actual case, however, the implant never intervenes, and the subject is a perfect epistemic correlate of someone without an implant of that kind. So he knows that the ice has thawed, in spite of the fact that, had he not left the ice outside, he would still have believed that he did (and on the same basis). So Epistemic DM1 fails, even on this revised interpretation. (Comesaña and Sartorio 2014, 370)
Epistemic Difference-Making in Context
57
According to Comesaña and Sartorio, this case shows that the definition of knowledge based on difference-making is not correct because the agent would still have maintained his belief in the considered proposition, even if the fact on which such proposition is based had not occurred. Regarding that fact, there is a situation of indifference rather than a situation of difference-making. This is the argument’s structure: S believes that P based on Q and we would like to say that S knows that P. However, if Q had not obtained, S would still have believed that P (given the action of the implant). Therefore, S does not know that P. But which is the relevant fact on which the belief is based? According to the description of the example, the belief that the ice thawed is supposed to be based on the fact that the agent left it outside. However, it seems that the possible presence of the evil neuroscientist willing to control the agent’s beliefs is at least as relevant as that fact. Under this further assumption, which is not made explicit by Comesaña and Sartorio, we might take the set F as consisting of both possibilities, the fact that the agent left the ice outside and the presence of the evil neuroscientist. Thus, if F had not obtained, S would not have believed that P and epistemic DM1 does not fail. Let us see how Contextualist Epistemic DM handles the modified version of the thawed ice scenario. According to this account, we might also include the possibility of the implant in the set F. In such a case, we might correctly say that S knows that P according to C and, had F not obtained in a relevant context C', S would not have believed that P within C'. But suppose that we do not include the possibility of the implant in F, that is, we assume that the agent believes that P is based only on the fact that he left the ice outside. Under this supposition, the case can also be handled on the grounds of contextualist difference-making. Regarding knowledge attributions, relevance is fundamental. The considered counterexample can be summarised in the following way. S knows that P according to C. However, if F had not obtained in C', S would still have believed that P within C'. C' is the context in which the evil neuroscientist ensures that the agent will believe that P. Could such a context be considered relevant with regard to the initial context, C? I think that it should not. Actually, the context C' is all the more irrelevant to the agent assuming, according to the example’s description, that the agent does not know that an evil neuroscientist has implanted a device in his brain. Thus, the example does not meet an important requirement of contextualist epistemic difference-making, namely, that the context within which the difference is made should be relevant. According to this, we might argue that the scenario is not relevant enough to determine whether the agent knows that the ice thawed.
58
Chapter Five
Let us now suppose that Comesaña and Sartorio are right in showing that DM1 fails and follow their argument by considering the alternative notion of difference-making that they propose (2014, 370): DM2: R is a difference-making relation if and only if, whenever R holds between some facts F and G, R wouldn’t have related F’s absence to G, if F had been absent. On this basis, they propose the following epistemic principle (2014, p. 374): Epistemic DM2: E evidentially supports P (given a background of evidence B) only if the absence of E does not evidentially support P (given the same background B). This notion of epistemic difference-making depends on a distinction between evidential support and propositional justification. As Comesaña and Sartorio understand these notions, the difference lies in whether the evidence is believed by a subject or not (2014, 373). While evidential support is a mind-independent relation between propositions, propositional justification is not. An epistemic subject is justified for believing that P is true if she has evidential support for that proposition with regard to a broader background of evidence. One should note that if a subject has proposition justification with regard to some proposition, this does not guarantee that she is justified in believing that proposition. This is related to another notion considered by Comesaña and Sartorio (2014, 374), the notion of doxastic justification: Doxastic justification: A subject has doxastic justification for believing that P if: a) she has propositional justification for believing that P and b) she believes that P on that basis. Thus, a subject may fail to be doxastically justified in believing that P if she has propositional justification in believing that P but does not believe that P on that basis. But clearly, there is another type of situation in which a subject fails to have doxastic justification in believing a given proposition. This occurs when the subject believes that P for the wrong reasons. In this case, a subject might believe that P on the basis of some evidence that does not constitute mind-independent evidential support for that proposition.
Epistemic Difference-Making in Context
59
As Comesaña and Sartorio acknowledge, their proposal for epistemic DM2 is not a definition of the notion of knowledge, because it does not establish necessary and sufficient conditions for the attribution of knowledge of a certain fact to an epistemic agent. Furthermore, epistemic DM2 is not directly a principle about the knowledge attitude of an agent regarding a proposition, but a characterisation of evidential support. Thus, it is prima facie a principle that provides a necessary condition for evidential support (2014, 375). However, it should also be accepted as a principle that provides a necessary condition for knowledge, by assuming noncontroversially that evidential support is necessary for justification and that justification is necessary for knowledge. Consider again the thawed ice case involving the evil neuroscientist. In that scenario, the agent believes that the ice thawed because he left it outside. But if he had not left the ice outside, he would have also believed that the ice thawed, as an effect of the neuroscientist’s action. According to epistemic DM1, it seems that the agent does not know that the ice thawed. As Comesaña and Sartorio show, epistemic DM2 manages this case correctly, for it does not deny that the agent knows that the ice thawed. The fact that the agent left the ice outside does not evidentially support the proposition that the ice thawed. Although the absence of such a fact does not support the proposition that the ice thawed, the agent would still have believed that it did. The possibility of the neuroscientist’s action prevents the fact that the agent left the ice outside from being an evidence of the agent’s belief that the ice thawed (Comesaña and Sartorio 2014, 375). It is clear that epistemic DM2 does not contribute much to our understanding of what it means that an agent knows something. Of course, it contributes in the sense that it establishes which conditions are met if someone knows something. Thus, it somehow explains why epistemic DM1 is not appropriate if we want to describe the thawed ice case as a case involving knowledge. Nevertheless, epistemic DM2 does not help to answer sufficiently whether there is knowledge involved or not in such a case. After all, it does not offer a reductive definition of the notion of knowledge. As I have shown, contextualist epistemic DM can also provide an indeterminate answer in the thawed ice case. In order to arrive at such a result, we admit that the considered context according to which the agent would not have believed that the ice thawed, i.e. the counterfactual context, is not relevant with regard to the context in which the agent actually believes that the ice thawed. But we can also determine whether in such a case the agent knows something or not. Under the framework of contextualist epistemic DM, we could fix a relevant context from the
60
Chapter Five
initial agent’s perspective. If the agent does not consider the intervention of an evil neuroscientist as a serious possibility, then such relevant context should not include information about that. Given a normal context, in which a neuroscientist’s intervention is a far-fetched possibility, the agent knows that the ice thawed. Of course, the agent’s context is not the only one that should determine whether a counterfactual context is relevant or not. Someone could consider that a context involving the possible intervention of the neuroscientist is relevant. However, in that case we cannot determine whether the agent knows something or not. What would be the result if the thawed ice case was evaluated in a society where, whenever someone believes something, interventions of evil neuroscientists should be expected as serious possibilities? In that scenario, we should firmly say that the agent does not know that the ice thawed. If she is aware of the fact that a neuroscientist might intervene in her brain, she should think: “I believe I left the ice outside, but I haven’t checked yet whether an evil neuroscientist is making me believe that. So, I don’t know whether the ice melted”. This freedom related to the choice of a context is not available to the proponents of epistemic DM2, at least not as it is characterised by Comesaña and Sartorio. According to their account of evidential support, there is good evidence and bad evidence. This means that among a given set of contexts there is at least one that is better than the others for evaluating whether someone knows something. This is not surprising under the assumption, accepted by Comesaña and Sartorio, that there might be a strong notion of mind-independent evidential support. As they put it, evidential support is distinguished from propositional justification mainly on the grounds of the kind of relation holding between evidence and a given proposition. That is, regarding propositional justification, “there is a subject who has that evidence” (2014, 373). By contrast, evidential support is understood as a relation between propositions. It is not clear how we should understand a relation between propositions if it does not occur inside a belief system. Now, if this sort of relation must be mind-independent, evidential support should be understood as a relation between propositions that holds in a belief system that is not simply constituted by an individual. This reasoning leads us to the idea that evidential support should hold within an inter-subjective belief system. There are at least two problems with such an idea. First, Comesaña and Sartorio seem to assume that we can easily determine what is “good” evidence (2014, 374). But it is not clear which system should be considered as a “good” belief system. We could evaluate belief systems and theories according to determined
Epistemic Difference-Making in Context
61
scientific and methodological standards. However, even by doing that, it is sometimes difficult to compare theories in such a way that one can establish which one is better than the others. Under this assumption, it would be even harder to determine which belief system is the best one. We can still make sense of the idea that propositional justification is a special case of evidential support. While evidential support is a relation that holds between propositions that may be part of a collective belief system, propositional justification can be considered as a relation that only holds in an individual belief system. We can think of an intersubjective belief system as mind-independent, in the sense that it is not constituted by processes occurring in a single mind. But, of course, if we focus on the constituents of an inter-subjective belief system, which are individual minds, we cannot say that such a system is absolutely mindindependent. However, we can say that inter-subjective belief systems can be constituted independent of any individual belief system. The main difference between epistemic DM2 and contextualist epistemic DM lies in their generality. While the former only establishes a necessary condition for knowledge attributions, the latter intends to provide a definition of knowledge. Furthermore, it can be shown that epistemic DM2 is a special case of contextualist epistemic DM. As Comesaña and Sartorio explain, whenever an agent knows that P on the grounds of a given evidence E, E evidentially supports P. That is, the absence of E would not support P, according to the agent’s background. But, of course, if the absence of E does not support P, according to the agent’s context, we cannot conclude that the agent knows that P, considering the same context. That is: If E does not support P according to an agent’s context C, it cannot be determined whether that agent knows that P according to C. This can be described with the help of contextualist epistemic DM as follows: Assumptions. a) P and E are part of C. b) If E were not part of C, it could not be determined whether P would be part of C. Conclusion. c) Therefore, it cannot be determined whether S knows that P according to C.
62
Chapter Five
Clearly, if E is not part of C, E cannot support P in C. Thus, under the framework of contextualist epistemic DM we can describe the indeterminacy of knowledge that follows from epistemic DM2. However, since contextualist epistemic DM provides a more general account of knowledge, we should also be able to show in which sense the lack of evidential support entails lack of knowledge: Assumptions. a) P and E are part of C and of C'. b) C' is relevant from the perspective of C. c) If E were not part of C', P would still be part of C'. Conclusion. d) S does not know that P in C, based on E. Obviously, from (c) follows that E does not evidentially support P according to the context C'. A proposition cannot support another if it is absent. Furthermore, under the assumption of epistemic DM2, (d) also implies that E does not support P in C. The two arguments that I just presented are based on a supposition that, apparently, Comesaña and Sartorio do not accept, namely, the supposition that a proposition that is absent with regard to a given context cannot evidentially support another proposition within the same context. As is clear, according to epistemic DM2, if a proposition E evidentially supports another proposition P, the absence of E would not support P. How could we make sense of the idea that the absence of a proposition can support another proposition? It should be clear that the absence of a proposition with regard to a belief system does not imply that the negation of that proposition is believed, but that the proposition is just not considered, it is not relevant. In the thawed ice case involving the evil neuroscientist, the proposition that the scientist will intervene in a given circumstance is absent with regard to the agent’s context. This does not mean that the neuroscientist will not actually intervene. The agent is simply not considering that possibility. If the absence of a proposition according to a given context was just considered to be equivalent to the inclusion of the negation of that proposition in that context, the characterisation made by epistemic DM2 would turn out to be trivial. Let us specify the definition of epistemic DM2 by defining evidential support in the following way, using a probability function p.
Epistemic Difference-Making in Context
63
Evidential support: E evidentially supports P in C if and only if p(P|E&C) > p(P|¬E&C). Now, we may define evidential support by absence as follows: Evidential support by absence: The absence of E evidentially supports P in C if and only if p(P|¬E&C) > p(P|E&C). Following this, epistemic DM2 would just mention something that follows from the meaning of the inequality, namely, that for any quantities a and b, if a is greater than b, then b is not greater than a. In order to avoid this result, we may define evidential support as follows: Contextual evidential support: E evidentially supports P in C if and only if a) E and P are in C and b) if E had not obtained in C, P would not have obtained in C. Similarly, we may propose the following definition for contextual evidential support by absence: Contextual evidential support by absence: The absence of E evidentially supports P in C if and only if a) P is in C, b) E is undetermined in C (neither E nor its negation is in C) and c) there is a further proposition G in C that evidentially supports P. According to the last two definitions, we may reformulate epistemic DM2: Epistemic DM3: E evidentially supports P according to a context C only if there is no proposition G in C such that, if E were undetermined in C' and G had not obtained in C', P would not have obtained in C'. To understand the fundamental idea of this definition, one has to assume that C' is a relevant context from the perspective of the initial context C. It should be noticed that, given the formulation of epistemic DM2, the fact that the absence of E does not support P does not imply that the agent knows that P based on E. By contrast, on a contextualist formulation, we can provide such an implication.
64
Chapter Five
Assumptions. a) E and P are in C and b) there is no proposition G in C such that, if E were undetermined in C' and G had not obtained in C', P would not have obtained in C'. c) The context C' is relevant from the perspective of C. d) If E were not in C', P would not obtain in C'. Conclusion. e) Given background C, the agent knows that P based on E. The conclusion (e) follows from the assumptions, together with epistemic DM.
CHAPTER SIX EPISTEMIC CONTRASTIVISM AND CONCEPT DISTINCTION
We have seen how knowledge regarding a fact is related to the manner in which beliefs can make a difference in other beliefs and how context plays a crucial role. Epistemic contrastivism is an account that focuses on what would change if a difference in a set of beliefs occurred. It states the following (cf. Schaffer 2005): Epistemic contrastivism: Knowledge is a ternary relation and knowledge attributions are described as propositions of the form “S knows that P rather than Q”, involving a subject S, a proposition P and a contrast proposition Q. Note that the thesis of epistemic contrastivism involves an idea of what a subject may have known and does not establish how her belief may vary from the actual state to the alternative state. Of course, there are conditions according to which a subject that actually knows that P rather than Q would have believed that Q rather than P, but these must not be stated as a general principle of epistemic contrastivism. In order to include the dependence of a belief on another in epistemic contrastivism, we might consider the following characterisation of epistemic difference-making. Let C be an epistemic context, a set of possibilities considered by the agent S, and C' be a relevant context with regard to C. Contrastive epistemic difference-making: S knows that P rather than Q according to C if and only if: a) S believes that P according to C, b) C includes some proposition F c) and had F not obtained in a relevant context C', S would have believed that Q within C'.
66
Chapter Six
For now, let us leave the notion of difference-making aside and just focus on epistemic contrastivism. Peter Baumann (2008) argues that this thesis does not apply to every case of knowledge ascription, that is, not all knowledge is contrastive. He considers a condition that must be met if we want to claim that all kind of knowledge is fundamentally contrastive. It establishes the following: (Specificity) A contrastivist analysis of knowledge of some type is plausible only if there are for a given subject S a lot of triples of propositions p, q and r such that S knows that p rather than q, but S does not know p rather than r (where ‘‘knows’’ refers to knowledge of that type). (Baumann 2008, 191)
Thus, in order to accept a contrastivist account of knowledge of some type we should be able to find, with regard to a subject, many instances of knowledge ascriptions involving a known proposition, a contrast proposition and a possible contrast proposition. A fundamental aspect of the specificity condition is that one cannot form a contrastive proposition involving just S, the known proposition and one of the contrast propositions. Perceptual knowledge is a good example of a kind of knowledge on which an acceptable account can be developed, respecting the specificity condition. Suppose that Robert has a bill in his hand and observes it. He knows that he has a Euro bill rather than a dollar bill. However, he does not know whether he has a Euro bill rather than a fake Euro bill. There are many triples such as this one regarding perceptive knowledge, which means that a contrastivist analysis of perceptive knowledge is plausible. But not every kind of knowledge admits acceptable contrastive expressions. Baumann considers the following example involving a mathematical truth: Take knowledge of obvious mathematical truths, like the simple one that 2 + 2 = 4. Does anyone who knows that know it in contrast to something else? In contrast to what, then? To 2 + 2 = 5 (Or 2 + 2 = -.7? Or 3 + 3 = 4? Or 12 x 12 = 1212?)? There simply does not seem to be a plausible contrast proposition around. The problem is that (Specificity) does not seem to hold for this kind of knowledge. It is hard to imagine, for instance, how there could be two numbers x and y (not equal to 4) such that S knows that 2 + 2 = 4 rather than 2 + 2 = x but that S does not know that 2 + 2 = 4 rather than 2 + 2 = y. It would be very interesting to find someone who, e.g., knows that 2 + 2 = 4 rather than 2 + 2 = 40 but who does not know that 2 + 2 = 4 rather than 2 + 2 = 5 (‘‘too close!’’). It would be very hard to make sense of such a subject or even to attribute knowledge to her in the first place. It seems that whoever knows that 2 + 2 = 4 knows this rather than 2 + 2 = z, for any z not equal to 4. (Baumann 2008, 191)
Epistemic Contrastivism and Concept Distinction
67
This should show that a contrastivist account of mathematical knowledge would not be plausible, because it seems to be difficult to find many triples involving the known proposition stating that 2 + 2 equals 4 and two contrast propositions. According to Baumann, the case also shows that it is difficult to find any contrast proposition related to the considered mathematical truth. This may be true according to the example presented by Baumann. However, the fact that it is difficult to find an acceptable contrast proposition does not mean that there are not any. Let us consider Baumann’s doubt about the fact that someone could know that 2 + 2 = 4 in contrast to something else. Contrary to his doubts, there are in fact many contrast propositions related to “2 + 2 = 4”. Suppose that Robert is learning to sum numbers and is in front of the equation “2 + 2 = x”, where x is a variable denoting a single natural number. After solving such an equation in his mind, Robert knows that 2 + 2 equals 4 rather than any other natural number. Thus, there is a contrast proposition for this case. I would even say that contrast propositions are abundant in simple operations like the one of the example above. This dissolves Baumann’s doubt on whether one can find, for a given subject, a contrast proposition related to the simple mathematical truth “2 + 2 = 4”. Suppose further that Robert does not know what a real number is. It follows that the following claims are true: “Robert knows that 2 + 2 equals 4 rather than any other natural number” and “Robert does not know that 2 + 2 equals 4 rather than any other real number”. It would not be hard to find many cases like this one. Thus, there are many mathematical truths that can be involved in contrastive knowledge ascriptions. We may conclude that a contrastive account of mathematical knowledge would be acceptable and in accordance with the specificity principle. The example just discussed can be understood more clearly if we turn again to the notion of epistemic difference-making based on a given epistemic context. The contrast class in relation to which one ascribes knowledge to an agent about a given proposition is determined by the epistemic context in which the agent evaluates that proposition. According to the definition of epistemic difference-making given above, we may claim that Robert knows that 2 + 2 equals 4 rather than any other natural number, because he believes that and there is a set of conditions F considered by him, such that if F were not true in a relevant context C', he would believe that 2 + 2 may equal any other natural number. That set of conditions might involve, for example, the fact that Robert understands what a sum is and does not confound it with division. In a certain sense I agree with Baumann when he claims that, regarding a simple mathematical truth, there does not seem to be a plausible contrast
68
Chapter Six
proposition. Of course, there is no single and absolute contrast proposition, because it depends on the context according to which the knowledge is being ascribed. Consider, as another example, the epistemic context of a child that is currently learning the numbers and has been confusing the 4 with the 5. Actually, when asked what equals 2 + 2, he sometimes responded that 2 + 2 equals 4 and, at other times, that 2 + 2 equals 5. After learning the difference correctly, the child knows that 2 + 2 equals 4 rather than 5. According to the child’s context, it seems appropriate to express his knowledge about the considered mathematical proposition in a contrastive manner. Perhaps, according to the context of an average person with a primary education, the contrast proposition would be harder to find, because the epistemic conditions without which she would believe that contrast proposition are not sufficiently specified or not considered at all. Baumann also proposes that adding a context parameter to the contrastivist account of knowledge would be of great help in order to handle difficult cases: [K]nowledge is a ternary relation between a subject, a known proposition and, third, some out of a bunch of entities: contrast propositions, contextspecific epistemic standards, practical interests (how important is it to get it right?), etc. This is, I think, still compatible with the contrastivist spirit even if not with its letter as it has been developed so far. It would be compatible with the spirit of contrastivism if one were to generalize the account and open the third argument slot up for many different kinds of things, or, at least, for more things than just contrast propositions. At the same time, contrastivism would lose some of its simplicity and elegance. (Baumann 2008, 193)
According to Baumann, one may distinguish pure contrastivism from hybrid contextualist contrastivism. The point of view that I favour and that may be based on the notion of epistemic difference-making characterised above can be called hybrid contextualist contrastivism, following Baumann’s distinction. I do not think that the loss of simplicity and elegance should be considered as a relevant issue. Anyway, even if we focused on the fact that the account loses its simplicity by introducing a context parameter into the evaluation of knowledge ascriptions, the simplicity is traded for more accuracy and strength. Thus, we may end up with a less simple theory but also with a stronger theory. Baumann also claims that the loss of simplicity is a prize that the theory has to pay (2008, 198), but this should not be a problem if we compare such loss with the implied gain and consider the result of the complete trade-off. Let us focus now on the number of factors that should be considered in the definition of the knowledge relation. According to the strategy
Epistemic Contrastivism and Concept Distinction
69
mentioned by Baumann, we may still take knowledge to be a ternary relation, but instead of just considering a contrast proposition besides the subject and the known proposition, we may also consider a context. One should note that the description of a context may include a contrast proposition as well as other parameters such as epistemic standards and interests. The notion of a context is included in the definition of contrastive epistemic difference-making that was stated at the beginning of this chapter. It does not only present knowledge as a ternary relation, but further characterises it including the set of factors that would make a difference with regard to the agent’s belief about a given, supposedly known proposition. Such a set is symbolised by F in the definition of contrastive epistemic difference-making. Recall that, according to epistemic difference-making, when a proposition P is known by a subject S, there might be a proposition F that makes a difference in the subject’s belief about P, that is, if S did not believe that F, she would not believe that P. Difference-makers are of great relevance regarding the evaluation of contrastive knowledge propositions. Following an example discussed by Baumann, suppose that Sue knows how to distinguish a terrier from a dachshund and that, at a given moment, she knows that there is a terrier in front of her rather than a dachshund. We might also say that, at that moment, she knows that there is a terrier in front of her rather than a cat. The proposition or set of propositions on the basis of which Sue can distinguish between terriers and dachshunds plays the role of a difference-maker: If she did not know how to distinguish between a terrier and a dachshund, she would not really know that there is a terrier in front of her rather than a dachshund. According to a simple account of epistemic contrastivism, this case might be evaluated differently, as Baumann argues: According to the standard contrastivist view, Sue might well know on one occasion that there is a dachshund in front of her rather than a cat and she might also know on another occasion that there is a terrier in front of her rather than a cat—even though she cannot tell dachshunds from terriers and thus cannot know whether there is a dachshund in front of her rather than a terrier. This, however, sounds abominable and incorrect. It seems false to say that she knows out of the contrast class [dachshund; cat] that there is a dachshund. (Baumann 2008, 194)
I think that the use of the word “abominable” is too dramatic here. However, I agree that it would be incorrect to say that Sue knows that there is a terrier in front of her rather than a cat, considering that she cannot distinguish a terrier from a dachshund. But why does a simple
70
Chapter Six
account of epistemic contrastivism evaluate this scenario in that way? The answer may be connected to how an agent understands a concept and how she associates it to a given class of objects. On this basis, Baumann (2008, 194) introduces the notion of a defeating proposition, which can be defined as follows: Defeating proposition: Given two propositions p and r, proposition r is a defeating proposition of p, just in case: a) p contains a concept F, b) r contains a concept G, c) if p contained the concept G instead of the concept F, p and r would be equivalent, d) the agent does not know how to distinguish F from G, and e) only because of condition (d), the agent does not know that p rather than r. The role of the notion of a defeating proposition is the following: If an agent knows that p rather than q, then there is no defeating proposition r for p. In order to better understand the above mentioned problem, we should focus on the importance of the set of difference-making propositions. According to the definition of a defeating proposition, the set of difference-making propositions is the set according to which condition (d) is satisfied. In this sense, there must be a set Z of facts that is not instantiated, such that, if Z was instantiated, then the agent could know how to distinguish F from G. Let us now consider again the example of Sue. Suppose that there is a dachshund in front of her. For sure, Sue knows that there is not a cat in front of her. She may also know that there is a dog in front of her. However, she does not know that there is a dachshund rather than a terrier in front of her. She does not know how to distinguish dachshunds from terriers. The proposition “There is a terrier in front of me” is, considering Sue’s perspective, a defeating proposition regarding her knowledge about the fact that there is a dachshund in front of her rather than a cat. Let us symbolise the known proposition, the contrast proposition and the defeating proposition as follows: (p) There is a dachshund in front of me. (Known proposition) (r) There is a terrier in front of me. (Defeating proposition) (q) There is a cat in front of me. (Contrast proposition)
Epistemic Contrastivism and Concept Distinction
71
Thus, if the concept “dachshund” were replaced in p by the concept “terrier”, p and r would be equivalent. This is stated by condition (c) in the definition of a defeating proposition. This occurs, according to condition (d), because she cannot distinguish between a terrier and a dachshund. If she could make such a distinction, she would know that p rather than r, this is, that there is a dachshund rather than a terrier in front of her. It is important to notice that one could determine a given set of propositions on which that depends, a set of difference-making propositions. Baumann also introduces the notion of an undermining proposition, which can be characterised in the following way (2008, 195): Undermining proposition: Given two contrast propositions q and s, proposition s is an undermining proposition with regard to q, just in case: a) q contains a concept H, b) s contains a concept K, c) if q contained the concept K instead of the concept H, q and s would be equivalent, d) the agent does not know how to distinguish H from K, and e) only because of condition (d), the agent does not know that q rather than s. One should note that undermining propositions are contrast propositions. Their role in relation with contrastivism is the following: If an agent knows that p rather than q, there is no undermining proposition s for q. Let us go back to the example and consider, additionally, the proposition that there is a mountain lion in front of Sue, symbolised as s. If she knew how to distinguish a cat from a mountain lion, she would know that there is a cat rather than a mountain lion in front of her. Now, if Sue knows that there is a dachshund in front of her rather than a cat, she must be able to distinguish a cat from a mountain lion. Otherwise, s would be an undermining proposition regarding the fact that Sue knows that there is a dachshund in front of her rather than a cat. After introducing the notions of a defeating proposition and of an undermining proposition, Baumann (2008, 195) proposes the following principle, establishing a restrictive condition for contrastive knowledge: Distinguish: If S knows that p rather than q, then the classes of potentially defeating or undermining propositions are restricted in such a way that there are no defeating propositions r for p and no undermining propositions s for q.
72
Chapter Six
Additionally, the following principles describe the form that a contrastive knowledge ascription should have when there is either a defeating proposition or an undermining proposition (Baumann 2008, 196): Knowledge involving a defeating proposition: If there was a defeating proposition r of p, it would be incorrect to say that S knows that p rather than q, but correct to say that S knows that (p or r) rather than q. Knowledge involving an undermining proposition: If there was an undermining proposition s of q, it would be incorrect to say that S knows that p rather than q, but correct to say that S knows that p rather than (q or s). Thus, if Sue does not know how to distinguish between a terrier and a dachshund, but can distinguish between a cat and a mountain lion, it may be appropriate to say that she knows that there is a terrier or a dachshund in front of her rather than a cat. Now, if she knew how to distinguish a terrier from a dachshund but did not know how to distinguish a cat from a mountain lion, we may claim that she knows that there is a dachshund in front of her rather than a cat or a mountain lion. One should note that according to the “Distinguish” principle shown above, the classes of defeating and undermining propositions should be restricted. This ensures that one leaves out sceptical propositions or any proposition that an agent may not discard. For instance, the proposition “There is a Cartesian demon deceiving me into thinking that there is a dog in front of me” may be considered, if the agent is sceptical enough, as a defeating proposition of the proposition “There is a dog in front of me”. Thus, as Baumann argues, knowledge ascriptions should also be restricted regarding the classes of potential defeating and undermining propositions. I think that the best way in which we can give an account of how these classes are determined is achieved by a contextualist account of knowledge. Baumann explains that there is no incompatibility between the “Specificity” principle and the “Distinguish” principle, where the letter R stands for a class of potentially defeating propositions: Finally, a worry: Doesn’t (Distinguish) clash with (Specificity) (sec. 1)? Doesn’t the latter allow for something the former excludes, namely that S knows that p rather than q while not knowing that p rather than r? No. When we say that S knows that p rather than q, we are excluding any proposition like r from R. However, we are not thereby denying that there are such propositions (like r)—whose existence is demanded by
Epistemic Contrastivism and Concept Distinction
73
(Specificity). The appearance of a clash between (Distinguish) and (Specificity) might well be due to not explicitly relativizing to the restricted class R. (Baumann 2008, 197)
The way in which an agent excludes certain propositions from R depends on the epistemic context in which she is involved, that is, the set of possibilities considered by her. Let us consider the following characterisation of contrastive knowledge, based on the restrictions suggested by Baumann: Contrastive knowledge: S knows that P rather than Q according to an epistemic context CS, if and only if: a) S believes that P according to CS, b) CS includes some proposition F c) and had F not obtained in a relevant context C', S would have believed that Q within C'. d) According to CS, there are neither defeating propositions for P nor undermining propositions regarding Q. Baumann claims that a contextualist account of knowledge should explain why some choices of a class of potentially defeating propositions seem inadmissible. Suppose, following a case discussed by Baumann, that Sue is in front of a dachshund and she knows how to distinguish cats from other animals. What if there is no defeating proposition and the only kind of dog she has experience with is the dachshund breed? It would be correct to claim that Sue knows that there is a dachshund in front of her rather than a cat, but in most normal contexts, it seems inadmissible. I would clarify the case, according to the definition of contrastive epistemic difference-making proposed at the beginning of this chapter and to Baumann’s considerations, with the following set of propositions: (1) According to Sue’s context CS, a dachshund cannot be distinguished from any other kind of dog. (2) According to a normal context CN, dachshunds can be distinguished from other kinds of dogs. (2) According to CS, there is no defeating proposition of the proposition “There is a dachshund”. (4) Thus, it is correct to claim that, according to CS, Sue knows that there is a dachshund in front of her rather than a cat.
74
Chapter Six
(5) According to CN, there are many defeating propositions of the proposition “There is a dachshund in front of Sue”. (6) Thus, it is incorrect to say, according to CN, that Sue knows that there is a dachshund in front of her rather than a cat. Thus, following the definition of contrastive knowledge proposed above, we may say that, according to Sue’s context, she knows that there is a dachshund in front of her rather than a cat. According to a normal context or to a context in which the most common dog breeds can be distinguished, we would not say that she knows that. I agree with Jonathan Schaffer, when he argues the following about Baumann’s example of Sue: I do not wish to quibble over intuitions, but I must say that nothing here sounds “abominable and incorrect” to me. Consider Sue on the first occasion when there is a dachshund in front of her, and consider the question: “Is the beast a dachshund or a cat?” Clearly Sue can get the right answer, and clearly she can do so in an epistemically proper way (on the basis of her evidence, without any guessing). So I think it is plausible to say that she does know the answer to the question—she knows whether the beast is a dachshund or a cat. And that is just to say that she knows that the beast is a dachshund rather than a cat. Now consider Sue on the second occasion when there is a terrier in front of her, and consider the question “Is the beast a terrier or a cat?” Again Sue can get the right answer, in a proper way. (Schaffer 2012, 421)
There are two main points in Schaffer’s consideration of these examples. The first point is that he deals with contrastive knowledge propositions in terms of disjunctive propositions. One may think that, according to this manner of considering them, one leaves out in some way the idea that if a given agent S knows that P rather than Q, then he knows that P. But this must not be so. Suppose that Sue knows that dachshunds and terriers are dogs, but that she cannot distinguish between a dachshund and a terrier. Additionally, she can distinguish dogs from cats. If some friend of her asked her, pointing to a small animal in front of her, “Is that a terrier or a cat?”, what should she answer? If there is actually a terrier and she claims that it is a terrier rather than a cat, she answers correctly. She may also argue saying that the small animal is surely not a cat, so it must be a terrier. But would her answer be correct? If both, Sue and her interlocutor, are sure that the only two possibilities are that the animal in front of them is a terrier or a cat, her answer can be considered as correct. Thus, she would know that there is a terrier. But, again, it depends on the context
Epistemic Contrastivism and Concept Distinction
75
according to which the knowledge ascription is made: If her friend asked further “But do you know that it is a terrier?”, perhaps it would be more appropriate for her to answer that she did not know, considering that she could not distinguish a terrier from a dachshund. A second important point in Schaffer’s consideration of the examples is the fact that he makes explicit the relevance of Sue’s evidence. On the basis of her evidence, Sue can get the right answer. This means, according to her epistemic context, she knows that there is a terrier rather than a cat. Baumann responds to Schaffer’s considerations about Sue’s case as follows: I argued that it is abominable and incorrect to say that Sue “knows that the beast is a dachshund rather than a cat”. Schaffer brings in the question “Is the beast a dachshund or a cat?” and continues: “(…) it is plausible to say that she does know the answer to the question—she knows whether the beast is a dachshund or a cat.” I don’t agree. Sure, if we assume that Sue has been explicitly asked “Is this a dachshund or a cat?”, then it would be reasonable for her to assume that by asking this question the questioner conveys (by implicature, for instance) the information that it is either a dachshund or a cat. So, the asking of the question gives Sue more information (given that she can trust the questioner as a source of information). Under these circumstances, she does know that the beast is a dachshund and not (rather than) a cat. The point, however, is that it is still as odd as before to say this if we don’t include the asking of such an explicit question in the scenario. (Baumann 2012, 429)
Thus, Baumann agrees with the fact that we may correctly say, under specific circumstances, that Sue knows that the small animal in front of her is a dachshund rather than a cat. Baumann adds that it would still be odd to say this if the question asked was not included in the context according to which the knowledge ascription was evaluated. I agree. In such a case, Sue may perhaps consider saying something about a dog rather than something about a dachshund, applying the principle regarding defeating propositions. That would depend on whether she knows that she cannot distinguish a dachshund from a terrier. Considering a context according to which she knows that she cannot distinguish them, it would be odd to say that she knows that there is a dachshund rather than a cat. However, considering a context according to which she ignores that she cannot distinguish a dachshund from a terrier, let us say, a context according to which she does not know what a terrier is, it may be correct to say that she knows that there is a dachshund rather than a cat.
CHAPTER SEVEN THE PRINCIPLE OF INDIFFERENCE
If, given a set of mutually excluded possibilities, one has no reason to think that one of them will occur rather than the others, one should assign the same probability to the occurrence of each one of them. This is, roughly, the key idea of the principle of indifference.
Formulation of the Principle and Multiple Partitions The classic version of the principle of indifference can be formulated in the following way, according to Rodolfo de Cristofaro, where e symbolises the evidence considered by a particular agent: Classic Principle of Indifference: “[I]f there are m mutually exclusive hypotheses, and e gives no more reason to believe any one of these more likely to be true than any other, then we should assign the same probability to every hypothesis”. (de Cristofaro 2008, 332)
In other words, if the evidence available to an epistemic agent, does not support believing in the occurrence of any of the hypotheses that he is considering, then all hypotheses should be equiprobable. It should be noted that the principle of indifference establishes a condition on a probability distribution on the basis of a state of belief. For instance, if Robert believes that the dice he is throwing is a fair dice, that it has six sides and that, if he throws it, only one side can be uppermost after it stops rolling on a plain surface, then, for every side, he should assign a probability of one sixth to the fact that the dice will stop rolling showing that side at its top. Similarly, if the scientific community is discussing two plausible hypotheses that may explain some data in a simple way, but no evidence supports the truth of any of them more than the other, then the community should assign the same probability value to both hypotheses until new evidence can show that one of them is more likely to be true than the other.
78
Chapter Seven
The information available is not the only important factor according to which an agent assigns the same probability to each possibility that she is considering. The experimental design, that is, the model according to which the information is going to be gathered is also of great importance. De Cristofaro (2008) calls this factor projective design. Turning back to the example of Robert, this is somehow implicit when he supposes that the dice is a fair dice. However, the projective design may also include some rules according to which the dice should be thrown, as well as restrictions about the surface’s conditions. De Cristofaro argues for the importance of the design factor in the following way, using the symbol e* to represent the evidence that an agent has before she carries out an experiment or project and the letter e to represent the complete set of evidence, including e* and the design factor d: In plain language, the judgment that e is indifferent or neutral between a set of exclusive alternatives should consider, not only e*, but also the evidence about the projected design d. e* may be ‘neutral’, but if d is the determining factor in a discriminating treatment of any hypothesis, then the prior cannot be uniform. Thus, we can assign the same probability to every hypothesis if the design used in the inquiry is fair or impartial, in the sense of ensuring an equal support to all hypotheses. (de Cristofaro 2008, 332)
A state of indifference will be acceptable if the design according to which the epistemic agent wants to evaluate the hypotheses is impartial. This means that the design should not influence the outcomes of the experiments that the agent is going to carry out. On the basis of the importance of the projective design, de Cristofaro (2008, p. 332) proposes the following definition of the principle of indifference, for a set of admissible hypotheses H, any member h of that set and a projective design d: Principle of indifference based on impartiality: An agent can assign the same probability to every h, if: i) prior information regarding any h is irrelevant and ii) there is no discriminating treatment to any h caused by d. Thus, this version of the principle of indifference involves, in a certain sense, a restriction with regard to the reasons on the basis of which an agent should assign the same probability to the hypotheses that she is considering. This restriction establishes that the design according to which the agent will evaluate the hypothesis should not causally influence the results of such evaluation.
The Principle of Indifference
79
An apparent weakness of the principle of indifference might be shown if it is confronted with the problem of multiple partitions. This problem is described by Martin Smith as follows. Here is the first part of the argument: Suppose I find myself in a factory that manufactures square plates. I know that these squares must have a side length of less than 2 feet—but this is all the information that I have about them. Suppose a new square is just about to roll off the production line. What should my credence be in the proposition that the square has a side length of less than 1 foot? There are two possibilities here—either the square has a side length of less than 1 foot or the square has a side length between 1 and 2 feet—and my evidence no more supports one than the other. According to POI [the principle of indifference], if I am rational then my credence in the proposition that the square has a side length of less than 1 foot must be 1/2. (Smith 2015, 606)
In this part of the case, the basic assumptions are given, as well as two possibilities considered by an epistemic agent: The square has either a side length of less than one foot or a side length between one and two feet. Since the agent does not have any reason to believe that one possibility is going to be instantiated rather than the other, she should consider them as equiprobable possibilities. Let us now consider the second part of the scenario described by Smith: If the squares produced by the factory all have side lengths of less than 2 feet, it follows, of course, that they all have areas of less than 4 square feet. What should my credence be in the proposition that the next square to roll off the production line has an area of less than 1 square foot? Here there are four possibilities—either the square has an area of less than 1 square foot or it has an area between 1 and 2 square feet or it has an area between 2 and 3 square feet or it has an area between 3 and 4 square feet—and my evidence no more supports any one of these than it does any other. According to POI, if I am rational then my credence in the proposition that the square has an area of less than 1 square foot must be 1/4. But the proposition that the square has an area of less than 1 square foot is, of course, equivalent to the proposition that the square has a side length of less than 1 foot. These are just two ways of describing the very same condition. POI has generated two conflicting pieces of advice. (Smith 2015, 606)
Thus, the agent has considered many possibilities. Two of them are equivalent. The problem is the fact that, according to the principle of indifference, the agent should assign a probability of 1/4 to one of them and a probability of 1/2 to the other. This means that, according to this
80
Chapter Seven
argument and by assuming the principle of indifference, we arrive at two incompatible consequences. Following Smith’s presentation, the complete argument can be schematised as follows, introducing the relation of evidential symmetry, which relates two propositions in such a way that a given agent has no reason to believe one rather than the other: Multiple partitions argument: (1) For a given agent, the fact that the square plate will have a side length of less than one foot is evidentially symmetric with the fact that it will have a side length between one and two feet. [Premise] (2) Let a be the area of the plate. The following possibilities are evidentially symmetric between each other: a < 1ft2, 1ft2 a < 2ft2, 2ft2 a < 3ft2, 3ft2 a < 4ft2. [Premise] (3) For the agent, the probability of the fact that the plate will have a side length of less than one foot is 1/2. [(1) and POI] (4) The probability of the fact that a < 1ft2 is 1/4. [(2) and POI] (5) The plate will have a side length of less than one foot if and only if a < 1ft2. Thus, these two facts are equiprobable. (6) 1/2 = 1/4 [(3), (4), (5)] The conclusion of the argument is clearly false, which leads one to put either one of the premises or the principle of indifference in doubt. As a first kind of solution to this issue, we can consider the fact that each set of possibilities, the one regarding the side length and the one regarding the area, corresponds to different epistemic contexts. According to this idea, a context is not only understood as a set of possibilities, but also as a way of distributing possibilities, a manner in which this set is partitioned. We should also consider the possibility of restricting a principle involved in the fifth proposition of the multiple partitions argument. According to this principle, an agent should assign the same probability to two propositions that are logically equivalent. We may formulate this as follows (Jeffrey 2004, 11): Equivalence: If two hypotheses H and G are logically equivalent, then P(H) = P(G). If we want to conserve the principle of indifference and, at the same time, avoid the embarrassing consequence of the multiple partitions argument, we might restrict the principle of equivalence as follows:
The Principle of Indifference
81
Equivalence in a partition: If two hypotheses H and G that are included in a set of mutually exclusive and collectively exhaustive hypotheses are logically equivalent, then P(H) = P(G) according to that set. Let us consider again the problem of the multiple partitions argument. The proposition “The plate has a side length of less than one foot” is logically equivalent to the proposition “The plate has an area of less than one square foot”. However, these propositions do not correspond to possibilities described within the same partition; they are not part of a set of mutually exclusive and collectively exhaustive hypotheses. Therefore, according to the modified principle of equivalence, these two propositions are not equiprobable and proposition (5) of the argument cannot be accepted in this sense. The modified version of the equivalence principle is based on a contextualist notion of probability. A context might be understood here as the partition according to which the probabilities are assigned.
The Principal Principle A principle formulated by David Lewis (1980), called the principal principle, is closely connected to the principle of indifference. In a recent publication, Hawthorne, Landes, Wallmann and Williamson (2015) argue that the principle of indifference is a consequence of the principal principle. The latter can be formulated in the following way, in which the notion of chance refers to an objective probability or propensity: Principal principle: P(A|XE) = x, where X says that the chance at time t of proposition A is x and E is any proposition that is compatible with X and admissible at time t. Consider the following example in order to understand this principle. Robert has a coin in his hand and is going to flip it. He has been well informed about the chance that the coin will land heads up. Proposition X states that “The chance that the coin will land heads up is x”. Proposition A states that “The coin will land heads up”. Suppose that Robert’s evidence related to A is compatible with X. Given this information, Robert should assign a probability of x to the fact that the coin will land heads up. The motivation associated with this principle lies in the importance of characterising the notion of objective chance and is presented by Lewis as follows:
Chapter Seven
82
We subjectivists conceive of probability as the measure of reasonable partial belief. But we need not make war against other conceptions of probability, declaring that where subjective credence leaves off, there nonsense begins. Along with subjective credence we should believe also in objective chance. The practice and the analysis of science require both concepts. Neither can replace the other. Among the propositions that deserve our credence we find, for instance, the proposition that (as a matter of contingent fact about our world) any tritium atom that now exists has a certain chance of decaying within a year. Why should we subjectivists be less able than other folk to make sense of that? (Lewis 1980, 263)
Thus, the principal principle establishes a way of relating subjective degrees of beliefs with an objective notion of probability. It does not only guide an epistemic agent on the degree of belief that he should assign to a given possibility, given his information about the chance associated with that possibility. It is also, as Lewis argues, a statement on the basis of which an agent can determine the chance of occurrence of some event, given a certain credence or degree of belief: [T]he chance distribution at a time and a world comes from any reasonable initial credence function by conditionalizing on the complete history of the world up to the time, together with the complete theory of chance for the world. (Lewis 1980, 98)
According to this idea, one might determine the objective chance of an event on the grounds of a description of the world’s history and a theory of chance that depends on such a description. Since the chance is not determined by any sort of credence assigned by any subject, this account does not imply a radical subjectivism about chance. Let us consider now the notion of a defeater, which is implied by the formulation of the principal principle (Hawthorne et al. 2015): Defeater: Proposition E is a defeater regarding P(A|X) just in case P(A|EX) x = P(A|X). Consider again the example described. If Robert’s evidence supports the fact that the chance that the coin will land heads up is not x, then his evidence is a defeater. Let F be a proposition that is not relevant to A, that is, believing in F should not change an agent’s degree of belief about A. Under the assumption that E is not a defeater regarding F, we may claim that P(F|XE) = 0.5.
The Principle of Indifference
83
This is a form of the principle of indifference. There is no evidence according to which one should believe F instead of ¬F. Therefore, both possibilities are equiprobable. Consider the following case. Proposition X states that “The chance that the coin will land heads up is x”. Proposition A states that “The coin will land heads up”. Suppose that, on the basis of Robert’s evidence, the coin will land heads up just in case the fact F is instantiated. Whether F occurs or not does neither affects what Robert believes about the outcome of the coin tossing, nor the probability that it will land heads up, given X. Then, Robert does not have any evidence that supports his belief in F more than his belief in ¬F. It is important to notice that this result does not depend on the information contained in X. Suppose that XE does not involve relevant information about F and that, according to X, the coin will land heads up with a chance of 0.8. In such a case, P(F|XE) would still equal 0.5. Of course, the mere fact that Robert’s previous knowledge, represented by XE, does not support F more than ¬F implies that P(F|XE) = 0.5. However, one does not have to assume such a fact. Hawthorne et al. (2015) show that one can arrive at the same result only through the principal principle and a particular set of assumptions. Suppose that, besides F, Robert learns AļF, which is also a non-defeater regarding his belief in A. Thus, P(A|FXE) = P(A|(AļF)XE) It may be shown (Hawthorne et al. 2015), applying Bayes’s theorem, that P(F|XE) = P(AļF|XE). This can be expressed in the following way: P(F|XE) = P(FA|XE) + P(¬F¬A|XE) Recurring to the general conjunction rule, which states that for any two events A and B, P(AB) = P(B|A)P(A), we can express the right side of the equation as P(A|FXE)P(F|XE) + P(¬F|¬AXE)P(¬A|XE), which is the same as
Chapter Seven
84
P(A|FXE)P(F|XE) + (1 – P(F|¬AXE))P(¬A|XE). Assuming that x = P(A|FXE), we may express the equation as follows: P(F|XE) = x P(F|XE) + (1 – P(F|¬AXE)) (1 – x). Applying again Bayes’s theorem, P(F|XE) = x P(F|XE) + (1 – P(¬A|FXE)P(F|XE) / P(¬A|XE)) (1 – x). This can be expressed in the following form, considering that P(¬A|XE) = 1 – x: P(F|XE) = x P(F|XE) + (1 – x) – P(F|XE) (1 – x). This is equivalent to the following equation: 2(1 – x) P(F|XE) = 1 – x. Thus, P(F|XE) = 0.5. With this argument, Hawthorne, Landes, Wallmann and Williamson (2015) show that, under determined conditions, we might arrive at an indifference result from the basis of the principal principle. Interestingly, one does not have to assume that the principle of indifference is valid in order to arrive at this result. In other words, the principle of indifference is implied by the principal principle. The main problem of the above shown result is the fact that the principle of indifference has been strongly criticised before. Arguments against this principle usually involve cases similar to the multiple partitions argument, which we have already discussed. That is, the fact that the principle of indifference can be derived from the principal principle suggests that the latter is not a stable point of view with regard to the interpretation of probability (Hawthorne et al. 2015). I think that this issue could be tackled from the contextualist perspective on the basis of which I have discussed the multiple partitions argument.
The Principle of Indifference
85
Ignorance As shown, the principle of indifference states something about cases in which an epistemic agent does not have enough reasons to believe that A will occur rather than B, being A and B two mutually exclusive possibilities. This lack of reasons may be considered as lack of knowledge, that is, as ignorance about the conditions according to which A and B may occur. Let us consider the principle of invariance of ignorance, as formulated by John Norton (2008, 48): Principle of invariance of ignorance: An epistemic state of ignorance is invariant under a transformation that relates symmetric descriptions. In order to understand the fundamental idea of this principle, we need to focus on the notion of a description. According to Norton’s characterisation, a description is a certain set of sentences: [A] description is a set of sentences in some language. The sentences have terms and the transformations just mentioned are functions on these terms that otherwise leave the sentences unchanged. (Norton 2008, 48)
According to this notion, a description is not just a sentence or a set of sentences. Descriptions depend on the language in which they occur. At the same time, the sentences that constitute a description depend on the terms involved in them. Another important notion regarding the principle of invariance of ignorance is the relation of symmetry between descriptions. Two descriptions are symmetric just in case they describe exactly the same physical possibilities (Norton 2008, 49). Take, for instance, that Robert throws a dice. He may assign a certain degree of belief to the sentence “The dice will show a six at its uppermost side when it stops rolling”. Let F represent the description involving this sentence. If Robert does not have much information about the conditions that will determine which side of the dice is going to be the uppermost, he might assign a probability of one sixth to the fact that the dice will show a six. He might also consider the sentence “The face with only one dot will be at the bottom side of the dice when it stops rolling”. Let G represent the description constituted by this sentence. Description F and description G describe the same physical possibility, that is, the physical conditions that are met when the dice shows the six at its uppermost side are essentially the same as when the face with only one dot is at the bottom side of the dice. Hence, we may say that F and G are symmetric descriptions.
86
Chapter Seven
The symmetry relation between descriptions may be crucial to allow scenarios that defy the validity of the principle of indifference, such as the multiple partitions argument. Consider the following version of it (Keynes 1921; Norton 2008). Suppose that somebody asks Robert whether Susan lives in France, in Ireland or in Great Britain. Suppose that he does not have reasons to believe that any of those possibilities is more likely to be true than any other. According to the principle of indifference, Robert should assign a probability of one third to each possibility. Let F represent the possibility that Susan lives in France, IR be the possibility that Susan lives in Ireland and G represent the possibility that Susan lives in Great Britain. Then, P(F) = P(IR) = P(G) = 1/3. Now, let us re-describe the possibilities replacing (IR 䴒 G) by the sentence “Susan lives in the British Isles”, symbolised by B. Assuming that F and B are mutually exclusive possibilities, Robert should assign the same probability to each one of them, according to the principle of indifference. That is, P(F) = P(B) = 1/2. As a conclusion of this argument, we arrive at a clear contradiction: P(F) = 1/3 = 1/2. It seems that, as the multiple partitions argument and the case just described, all paradoxes related to the notion of indifference share a structure: All the paradoxes have the same structure. We are given some outcomes over which we are indifferent and thus to which we assign equal probability. The outcomes are redescribed. Typically the redescription is a disjunctive coarsening, in which two outcomes are replaced by their disjunction; or it is a disjunctive refinement, in which one outcome is replaced by two of its disjunctive parts. Indifference is invoked again and the new assignment of probability contradicts the old one. (Norton 2008, 52)
Consider again the scenario in which Robert is asked in which country Susan lives. According to a first set of descriptions, P(F) = 1/3. Then, the possibilities are re-described. According to the way in which we considered the case, the re-description is a disjunctive coarsening. A single description involving the concept of British Isles is coarser than the set of descriptions involving the concept of Ireland and the concept of Great Britain. Of course, the argument might have been developed on the basis of a disjunctive fine-graining as well. Since (IR 䴒 G) and B refer to the same physical possibility, they should be considered as two symmetric descriptions, according to the principle of invariance of ignorance. This is one of the grounds of the contradiction that might follow from the principle of indifference.
The Principle of Indifference
87
However, both principles seem to be plausible. Norton relates them in the following way: These two principles express platitudes of evidence whose acceptance seems irresistible. They follow directly from the simple idea that we must have reasons for our beliefs. So if no reasons distinguish among outcomes, we must assign equal belief to them; or if two descriptions of the outcomes are exactly the same in every noncosmetic aspect, then we must distribute beliefs alike in each. (Norton 2008, 52)
Thus, the principle of indifference as well as the principle of invariance of ignorance are based on the fact that beliefs are grounded on reasons. Norton develops an account that is based on ignorance degrees of belief rather than usual degrees of beliefs. According to it, we may assign equal degrees of belief to each member of a determined partition and also maintain those degrees after refining or coarse-graining the partition. Consider again the case in which someone asks Robert in which country Susan lives. After all, the crucial point about its contradictory result is the fact that the degrees of belief on the possibilities considered are not preserved after the coarsening of the partition. Thus, the solution to the problem regarding the principle of indifference and the principle of invariance of ignorance might lie in such a point. Norton presents his solution as follows: Let us develop the idea of invariance of ignorance under disjunctive coarsenings and refinements. If we have an outcome space ȍ partitioned into mutually contradictory propositions ȍ = A1 䴒 A2 䴒...䴒 An , an example of a disjunctive coarsening is the formation of the new partition of mutually contradictory propositions ȍ = B1 䴒 B2 䴒...䴒Bn-1 , where B1 = A1 , B2 = A2 , . . . , Bn-1 = An-1 䴒An.
(Norton 2008, 60)
This strategy is based on the assumption that there is a singular outcome space that can be partitioned in different ways. This permits to determine the equivalence between members of each partition, even if one of them is more coarse-grained than the other. The coarsening considered by Norton involves, as usual, the formation of a set that has fewer elements than the initial, more specific partition. It should be mentioned that a refinement is defined as the inverse process of a coarsening. Let the symbol [A|B] represent the degree to which proposition B confirms proposition A and let I represent a single ignorance degree of belief. Norton considers the following two propositions:
Chapter Seven
88
[A1|ȍ] = [A2|ȍ ] = . . . = [An|ȍ] = I. [B1|ȍ] = [B2|ȍ ] = . . . = [Bn|ȍ] = I'. According to Norton (2008, 60), one can show that the ignorance degree of belief I and the ignorance degree of belief I' are equivalent. Actually, the degree of belief I is unique for any coarsening process applied to the initial partition. As mentioned, Norton’s strategy is based on the fact that we may start with a single outcome space that may be partitioned in many ways. Interestingly, there is a single initial partition of the outcome space, according to which other sets are formed as a result of coarsenings or refinements. Let us briefly apply Norton’s account to the case involving the countries. Let “France” represent the possibility that Susan lives in France, let “Ireland” represent the possibility that she lives in Ireland and “Great Britain” represent the possibility that she lives in Great Britain. Consider the outcome space partitioned initially as follows: ȍ = France 䴒Ireland 䴒 Great Britain. Now let us generate the following partition by a coarsening process: ȍ = France 䴒 British Isles. Of course, (France = France) and (Ireland 䴒 Great Britain=British Isles). This corresponds appropriately to Norton’s characterisation of coarsening. Now, given that an ignorance distribution involves only three values (Norton 2008, p. 61), which are certainty, [ȍ|ȍ], ignorance, I, and complete disbelief, [Ø|ȍ], we may arrive at the following results: (1) According to the initial partition, [France|ȍ] = [Ireland|ȍ ] = [Great Britain|ȍ] = I. (2) According to the coarse partition, [France|ȍ] = [British Isles|ȍ] = I'. (3) I = I' and [Ireland|ȍ] = [Great Britain|ȍ] = [British Isles|ȍ]. In other words, if Robert is a rational agent, then he should answer that he simply does not know, in that situation, in which country Susan lives, disregarding the partition on the basis of which he is evaluating the question.
CHAPTER EIGHT CAUSAL REDUNDANCY
In this chapter some general approaches to the causal relation as well as the general aspects of causal redundancy will be introduced. I am going to explain the particular problems that causal redundancy may generate for these approaches, in order to show not only how such problems can threaten the plausibility of some theories of causation, but also that the theories can be somehow better understood considering them.
The Causal Relation Talking about causes and effects is very common in everyday language. We say, for example, that something broke and that it was an effect of somebody’s action or something else. We look for the causes of the fact that such an object, maybe a bottle, is broken. We may find out that some person, let’s say Suzy, threw a stone at the bottle, breaking it. The throw caused the shattering of the bottle. Such a sentence is more than easy to understand. Nevertheless in the history of thought there are only a few notions so often discussed and at the same time so frequently misunderstood as the causal relation. A very usual way to say that some event was the cause of some other event is by saying that if the former had not occurred, the latter would not have occurred either. We might assume that while thinking, for example, “If I had not been so distracted, I would have got there earlier”. The distraction is in this way regarded as the cause of the late arrival. We might also say “If Suzy had not thrown that rock, the bottle would not have shattered”. According to such assertion, it would be right to say that Suzy’s throw caused the bottle’s shattering. This is the basis of the counterfactual account of causation, studied informally by David Hume (1748) and developed as a philosophical analysis by David Lewis (1973). According to a rough characterisation of Lewis’s proposal, one event causes another event just in case both events occurred and the description of the latter event depends counterfactually on the description of the former, which means that if the former had not occurred, the latter would not have occurred either. Another conception of the causal relation
90
Chapter Eight
involves the fact that effects regularly follow from their causes. This aspect of causation was also carefully studied by Hume (1748). As he argued, one can say that one event was the cause of another when events like the latter are followed by events similar to the former. If a certain event occurs, then some other event of a certain kind will follow. Of course, this is not enough for a definition of causation. The event of Suzy throwing a rock at the bottle alone is not enough to say that the shattering of the bottle will undoubtedly follow. Some background conditions and laws might be needed as well. Thus, we can say that some event caused another if both occurred and the proposition describing the latter follows from the proposition describing the former, in conjunction with a proposition describing a set of conditions and a set of general laws. The theory based on this definition of the causal relation is the regularity account of causation and was developed in a more systematic form by John Mackie (1974). According to his version of the regularity account, the cause of some event is a necessary but insufficient condition among a set of sufficient but unnecessary conditions for that event. Following the considered example of Suzy’s throw, it might be relevant, for instance, to include among the background conditions some facts about the atmosphere and the velocity of the rock, together with some mechanical laws. Both the counterfactual account and the regularity account of causation are near to a very intuitive and at the same time strong notion of what causes are. However, as I will show in the following section, causal redundancy has raised some problems for these theories, which have constituted a field of further development and discussion. The features of causal redundancy will be described as follows in order to consider some weaknesses of the introduced approaches to causation.
The General Notion of Causal Redundancy One important aspect of the counterfactual account of causation lies in the fact that causes are somehow necessary to bring about their effects. If the cause had not occurred, its effect would not have occurred either. Thus, situations in which some event causes another without being necessary for it to occur will be problematic situations for the counterfactual theory. Consider the following case of causal redundancy, a preemption scenario (cf. Lewis 1973a, Hall 2004): Preemption: Suppose that Suzy threw her rock and that the bottle, after being hit by it, shattered. Suppose further that Billy, a friend of Suzy,
Causal Redundancy
91
was also participating in that stone-throwing game and that he threw a rock that would have shattered the bottle an instant later, if Suzy’s bottle had not. Nobody will hesitate in saying that Suzy’s throw was a cause of the bottle’s breaking. Nevertheless, the counterfactual account of causation, in its rough and general form, does not describe Suzy’s throw as a cause. For it is not true that, if Suzy had not thrown her stone, the bottle would not have shattered. The bottle would have been shattered anyway by Billy’s rock. I will not focus now on the particular features according to which preemption scenarios should be distinguished from other cases of redundant causation. For now, I will just consider the described case as a general example of causal redundancy. Cases of causal redundancy are characterised by the fact that there are two or more events that can count as a cause of another event, considered as the effect. Other particularities of the situation just described are going to be explained bellow. What is important now, is the fact that causal redundancy generates problems regarding exactly the aspect of the counterfactual account of causation according to which causes are necessary for their effects. In cases of causal redundancy this does not seem to be true, because we can find events that, despite of being intuitively considered as causes, were not fully necessary for their effects to occur. The regularity account of causation is somehow also vulnerable to causal redundancy. Imagine the same situation as the one described above. Suzy and Billy are throwing rocks, Suzy’s rock hits the bottle, the bottle breaks and Billy’s rock would have hit it later otherwise. Would it be possible to find a set of background conditions (including Billy’s throw) in conjunction with the proposition about Suzy’s throw and a set of general laws related to the bottle’s glass structure, from which the proposition describing the bottle’s shattering might follow? Since it would, Suzy’s throw is regarded as a cause and the regularity account is not threatened in this sense by causal redundancy. Now, it would be clearly wrong to consider also Billy’s throw as a cause of the bottle’s breaking. Nevertheless, the proposition describing the fact that the bottle shattered might follow from the same set of background conditions and laws considered, together with the description of Billy’s throw. According to the regularity account of causation, Billy’s throw is wrongly described as a cause of the bottle’s shattering.
92
Chapter Eight
Before analysing these cases in more detail and studying the ways in which an account of causation should handle it, one must have in mind a basic assumption of causal redundancy, which is directly related to its treatment in the light of the regularity account. This assumption is the fact that every redundant cause is considered as being sufficient to bring about the effect.
Symmetric Overdetermination Cases of causal redundancy in which the redundant causes involved act without any relevant difference are cases of symmetric overdetermination. Suppose that Suzy and Billy throw their stones at the bottle and that both stones hit the bottle simultaneously. The bottle breaks. What caused the bottle to break? Can Suzy’s throw be considered as a cause of the shattering of the bottle? Let us see the answers of the theories that we have presented so far. According to the counterfactual theory of causation, neither Suzy’s nor Billy’s throw seems to be a cause of the bottle’s shattering. The shattering does not depend counterfactually on any of them. If Suzy had not thrown her stone, the bottle would have still been shattered. The same would be the case for Billy’s throw. At first glance, the fact that the bottle is broken does not have any causes (at least among the events considered). A more plausible answer is the following: The bottle’s shattering does not depend on any of the throws taken separately, but it depends on both taken together. If any of the throws had not occurred, the bottle would not have shattered as it actually did. Thus, this case of symmetric overdetermination can be reduced to a case of collaborative causation. On the side of the regularity account of causation, problems do not arise. As well as in the case of preemption considered above, the descriptions of both throws seem to be among a set of conditions from which, together with some laws, the proposition describing the occurrence of the bottle’s breaking follows. In cases of symmetric overdetermination, that is exactly what one wants to say, namely, that both Suzy’s and Billy’s throw were causes of the bottle’s shattering.
Asymmetric Overdetermination Situations involving a relevant difference in the way in which the potential causes could bring about the considered effect are cases of asymmetric overdetermination. The relevant difference is an asymmetry between an actual cause and the redundant causes, which is fundamentally the fact that
Causal Redundancy
93
the former causes the putative effect, while the latter does not actually cause it. Thus, in the case of preemption described above, only Suzy’s throw is supposed to cause the breaking. Billy’s throw is not supposed to be a cause. Despite of assuming that they do not cause the putative effect in cases of asymmetric overdetermination, redundant causes may affect the dependence between the effect and the actual cause and may also affect our assumptions about the conditions from which the description of the effect follows. As it has been explained, the first of these problems arises for the counterfactual analysis of causation, while the other one arises for the regularity account. It might not be clear what the asymmetry between the actual and the redundant causes is, besides the trivial fact that one is the cause and the other the effect. Also, it might not be clear what the symmetry between the redundant causes in cases of symmetric overdetermination refers to, besides the fact that both are assumed causes. A first interesting kind of symmetry that comes to mind is a temporal symmetry. While in situations of symmetric overdetermination both rocks hit the bottle simultaneously, in situations of asymmetric overdetermination one rock gets first to the point where the bottle stands and the redundant cause gets there later. There is even a distinction of cases of asymmetric overdetermination in which redundant causes are interrupted early from cases in which they are interrupted late, as will be shown next. But such temporal asymmetry just leads to another, more general and perhaps trivial asymmetry, which is precisely the fact that some of the events are causes and others are not. On this basis, the asymmetry is nothing but a causal asymmetry. This is why in cases of symmetric overdetermination it seems to be appropriate to say that each one of the redundant causes is actually a cause of the occurrence of the effect.
Early Preemption There are scenarios of asymmetric overdetermination, in which the redundant causal process is interrupted at some early stage. These are cases of early preemption (Lewis 1986). The vagueness of what an early interruption is and what it is not should not be of great concern at first. We can simply say that in cases of early preemption one can always find some point in the process that actually caused the considered effect, without which the effect would not have occurred. That is, there is at least one event in the process of the actual cause on which the effect counterfactually depends, while in the back-up process there is no such event.
94
Chapter Eight
Here is an example. Suzy and Billy are prepared to start throwing stones at the bottle. Before starting, they came to the agreement that whenever Suzy decides not to throw her stone she will give Billy a sign, allowing him to throw his stone. Suzy throws her rock, it hits the bottle and breaks it. Her throw preempted Billy’s throw at an early stage of his throwing process. Suzy’s throw is assumed to be the actual cause of the fact that the bottle shattered, although the latter event does not depend on the former counterfactually. If she had not thrown her stone, a stone thrown by Billy would have shattered the bottle. Regarding early preemptiom, there might also be problems for the regularity account of causation: The proposition describing the bottle’s shattering might follow from Billy’s throwing process, including the preparation stages and other background conditions. Answers to the problems raised by cases of early preemption are simple. Considering the counterfactual analysis of causation, one may appeal to the complete causal chain to define causation (Lewis 1986). Thus, an event e is taken to be caused by an event c, not just if the description of e counterfactually depends on the description of c, but also if there is a chain of events between both, such that the description of every event in that chain, including the effect, depends on the description of its predecessor. On this basis, the description of the bottle’s shattering depends counterfactually on the description of some state of Suzy’s stone directly before the shattering and that the description of that state depends counterfactually on the description of other previous states involved in Suzy’s throwing process, and so on, until the description of some of those intermediate events depends counterfactually on the description of Suzy’s throw. According to this, Suzy’s throw is deemed as an actual cause and, since there is no such causal chain between Billy’s preparation to throw and the bottle breaking, his throwing process is not an actual cause. A solution for the regularity account of causation seems much simpler. The proposition of the bottle’s shattering follows from the description of Suzy’s throw together with some background conditions. However, that is not the case for Billy’s throw, which was preempted at an early stage. Billy never threw his rock. We would not say that the proposition of the bottle’s shattering could follow from a set of conditions involving a description of the fact that Billy did not throw his rock.
Late Preemption Cases of late preemption are the ones in which the back-up causal process is interrupted at some late stage (Lewis 1986). Given the vagueness
Causal Redundancy
95
involved in determining a purely late event, one can say that in cases of late preemption there is supposed to be a causal chain going from the actual cause as well as from the back-up cause to the considered effect. Thus, it is assumed that there is no event in the causal chain of the actual cause that interrupts the effect (Paul & Hall 2013, 99). Suppose that Suzy’s throw is in this case followed by Billy’s throw a fraction of a second later. Suzy’s stone shatters the bottle and after that Billy’s stone flies between glass pieces just over the spot on which the bottle had stood. In principle, both Suzy’s and Billy’s throwing processes go to completion and, therefore, there does not seem to be any event in the process of Suzy’s throw, whose absence would have implied the absence of the bottle’s shattering. Billy’s throw involves several back-up causes, one for each event in Suzy’s throw. This is what the proponents of this kind of counterexample argue. However, it is not completely clear whether this is plausible. There is no satisfactory reason to think that cases of late preemption cannot be solved in the same way in which we can tackle cases of early preemption. Consider another condition for the causal relation, only for argumentation purposes: Cause and effect must be physically connected by some conserved quantity, such as energy or momentum (cf. Dowe 2000). Assume further that such a connection may be described with regard to a more specific set of events. If there must be a physical connectivity between a cause and its effect and if, in preemption cases, it is supposed that the back-up process is not actually a cause, then it must also be physically disconnected to the effect. In the case of late preemption, Billy’s throw is not physically connected to the physical process of the bottle shattering, at least not in a relevant way; his rock might just hit some of the flying glass pieces, but that is not supposed to be important for the effect’s description. One might argue against this and say that the interactions between his rock and the glass pieces are, after all, relevant. Those small interactions would change the manner in which the bottle shatters. But one could also say that the bottle is already shattered, from the instant in which the links in the glass composition are broken by the force of the rock hitting them. If this does not seem clear enough, one can always arbitrarily define a glass breaking in such manner. In this sense, Suzy’s rock really interrupts something. Thus, cases of late preemption can be reduced to cases of early preemption: The actual cause interrupts the back-up process. While the latter is not physically connected to the effect, the actual cause and the effect are physically connected.
96
Chapter Eight
Symmetric and asymmetric overdetermination are not only two different situations. They also present different problems, for different reasons. On the one hand, preemption cases are physically possible and we know what we mean by describing them. On the other hand, it is not so clear whether symmetric overdetermination cases are physically possible. Sometimes, we do not know what they physically represent. However, its possibility is a problem by itself in other theoretical contexts, for instance, in contexts related to mental causation. Thus, while the problem originated by preemption may directly demand a clear answer from a certain analysis of causation, the difficulties of symmetric overdetermination are just related to the description of the problem itself. Solving cases of preemption always involves a descriptive theoretical effort and many of them can also be regarded as physically possible situations after they have been re-described. By contrast, cases of symmetric overdetermination being re-described should be reduced to something else, perhaps to cases of collaborative causation or to cases of preemption.
CHAPTER NINE FURTHER EXAMPLES OF OVERDETERMINATION
In the present chapter, some examples of causal redundancy are considered, such as the redundancy of multi-level causation, of mathematical overdetermination and of constitutive causation. Examples of causal redundancy that can be found in physics and biology are also shown. In the last section, the so-called problem of many hands is briefly discussed as well as its relation to causal redundancy.
Multi-level Causation Cases of multi-level causation, that is situations in which events of a certain level or domain of description cause events of a different level, may involve causal redundancy under certain conditions. Suppose that the heat under a stew, represented by variable H, caused the water to evaporate, an event represented by E. Reformulating that causal fact, one might also say that the transference of a certain amount of energy from outside the stew to the system inside the stew caused the evaporation. According to this second formulation, the thermodynamical state represented by variable T was a cause of the thermodynamical state E. Thus, if both descriptions are appropriate, there are two variables, H and T, that overdetermine E. Other instances of multi-level causation are cases of mental causation. Suppose that Robert’s mental state at a given time, represented by M, caused the physiological state of his hand moving towards a matchbox, represented by P. The mental state M has a neural basis, N, which could be described as a physical state and considered sufficient to cause the occurrence of P. Thus, M and N overdetermine P. This is in conflict with the exclusion principle, which establishes that no event has two or more simultaneous sufficient causes, unless it is genuinely overdetermined (cf. Kim 1998). In order to understand the conflict, one must consider whether cases of genuine overdetermination
98
Chapter Nine
are physically possible. Assuming that they are not, Robert’s hand movement P cannot be an effect of both M and N. The problem generated in cases such as the one just described, which is highly relevant for the field of philosophy of mind, is to decide which one of both considered states is the real cause of the physiological state P. If the causal closure of the physical domain is also assumed—namely, that every physical event has a sufficient physical cause, then only the neural state N can be considered as the genuine sufficient physical cause of the physiological state of the hand movement. Of course, if the redundant causes did not correspond to different ontological domains, the problem would not be solved simply by assuming the physical closure principle and excluding the assumed mental cause. This is where the discussion as to whether cases of genuine overdetermination are possible arises. In order to clarify the overdetermination problem in multi-level causation one may assume a distinction between causal and explanatory exclusion (Fuhrmann 2002). According to a principle of explanatory exclusion of the physical domain, for every physical explanandum there should be a physical explanans in some context. Since the validity of explanations is context-dependent, explanatory overdetermination is permitted if both redundant explanantia correspond to different contexts and one of them is described in physical terms. Given this assumption, one could say, for instance, that the physiological state P is appropriately explained by the explanans involving a description of M in a psychological context, as well as by an explanans involving a description state N in a neurophysiological context. Explanatory overdetermination is possible in this sense, given a weaker principle of explanatory exclusion.
Mathematical Overdetermination A system of equations is considered to be overdetermined if it has more equations than unknown variables. When a considered system of equations has the form of a physical theory, the theory is overdetermined if its mathematical parts exceed its physical parts—namely, the terms that have an interpretation in physical reality (Scheibe 1994, Lyre 2011). An example of that feature are gauge theories, which are characterised for having various degrees of freedom with relation to the number of possible solutions for each equation, given a specific description of the initial conditions. In general, a system is mathematically overdetermined if the mathematical theory, according to which an empirical theory of the system is formulated, involves elements that do not correspond to any element of the empirical theory. Consider Erhard Scheibe’s characterisation:
Further Examples of Overdetermination
99
Just as an empirical theory often exhibits an unnecessarily rich structure when compared with the observational data to be explained by it so the mathematics introduced to formulate a physical theory frequently brings a wealth of structure into play that cannot be matched by the physical elements of that theory. In both cases we find ourselves deluded in our expectation that in order to reformulate a certain corpus of statements by submitting it to logical analysis there be only two things to be taken into consideration: 1) the concepts characteristic for the corpus in question, and 2) the logical expressions binding together those concepts. I say we are deluded in expecting this because in both cases of overdetermination the truth seems to be that a third component has to be considered. (Scheibe 1994, 186)
The third component that may connect two descriptions of a given piece of data is constituted by mathematical concepts. Since any two physical quantities associated with a particular system can be expressed in real numbers, there is in principle a way of describing the physical system with a single expression that involves both quantities. In analogy, this is what occurs in a causal model (Halpern & Pearl 2005) when one tries to describe overdetermination. Suppose that two forest fires reach a house at the same time, one approaching from the west and the other from the east, and succeed in burning it down. If the west fire had not occurred, the house would have burnt down anyway. The same thing would have happened had the east fire not occurred: The west fire would have burnt down the house. Suppose a causal model in which the variables W and E, representing the west and the east fires respectively, symmetrically overdetermine the effect variable H, which represents the house’s burning. The mathematical description of the model based on a certain set of conditions establishes that H depends on both W and E. A complete physical interpretation of the model would not imply that the burning down of the building had two completely independent causes and that the effect would have occurred in the same way had only one of them occurred. In a complete physical interpretation, only one specific condition would be necessary. In a causal model of overdetermination, the theoretical parts of the model exceed its physical parts. A correct physical description of the example involving only one condition may introduce a more specific description of the house’s burning, HS, which, for instance, has three possible values describing different sorts of combustion processes of the house’s materials. In this way, only if both W and E take value 1, HS may take value 2. An intervention on any of W or E would be followed by changes in the value of HS, which permits considering both, the west and the east fires, as contributory causes of the building’s burning down.
100
Chapter Nine
Constitutive Overdetermination There is a way according to which the characteristics of some event or object can be explained in terms of the parts of which it is composed or constituted. One might say, for instance, that the occurrence of each one of several battles may explain some properties of the entire war that they constitute. Note that causal explanations are different from constitutive explanations. In general, causal explanations describe how the occurrence of a certain event, the effect, can be inferred from the occurrence of a set of facts that include a description of the cause. According to Ylikoski (2013), the main object of constitutive descriptions is to explain the causal capacities of a system. He characterises the notion of causal capacity in the following manner: I use the term causal capacity to refer to a wide variety of dispositional notions: disposition, ability, power, affordance, liability, propensity, tendency, etc. There are subtle differences between these notions, but for the purposes of this paper they do not matter. What is crucial is that they all give an account of what would happen if the entity to which the dispositional predicate is applied were to end up in a certain kind of causal setting. The dispositional predicates provide modal information; they do not state what is happening or has happened, but rather what could, or will, happen in the right kind of causal conguration. (Ylikoski 2013, 279)
Following this focus on causal capacities, when we claim that a war is constituted by several battles, we may, for example, explain the destructive capacities of that war given its constitutive elements. Thus, by considering the information about each one of the battles that constituted the war, we are explaining the war’s causal capacities. As Ylikoski remarks, the most important feature of a causal capacity is the fact that it contains information about what would happen under a given set of circumstances. For instance, we may ask why the war was so destructive. Considering that the property of being destructive is a causal capacity and not a categorical property, we may answer that question by describing the battles that constituted the war. Ylikoski (2013, 279) distinguishes constitutive questions, such as “Why was the war so destructive?”, from causal questions, such as “How did the war become so destructive?” or “In which way has the war destroyed the nation?”. Causal explanations can be distinguished, according to Ylikoski, as follows. While causal explanations involve information about why a given event occurred due to a set of events and circumstances, a constitutive explanation involves information about how a given system has a causal
Further Examples of Overdetermination
101
capacity due to its constitutive elements. We may characterise the notion of a constitutive explanation in the following way (Ylikoski 2013, 281): Constitutive explanation: A system S has a causal capacity k in circumstances T, due to S’s components s1, …, sn and their organisation O. It should be noted that the manner in which the constitutive elements of a system organise also involves a given set of causal capacities and we may describe such organisation on the basis of causal explanations. However, Ylikoski’s characterisation of constitutive explanations shows clearly in which sense these are distinct from causal explanations. The difference between constitution and causation seems clear if we focus on the fact that many theories include the constraint that the causal relata involved in a causal explanation must be distinct from each other in order to avoid problems related to compositional causation. An example of such problems may be a description of the whole universe as a cause of itself and of any other event as well. This is problematic because, for any event e, it is true that if the universe had not existed, e would not have occurred. Now, how should the antecedent of that counterfactual be understood? Would such an explanation be appropriate in most contexts? It would be controversial, for instance, to consider the state of the entire universe at time t as the cause of a particular forest fire at time t. Thus, it seems appropriate to assume that, whenever an event is considered to be a cause of another, both are distinct events. By contrast, the causal capacities of a system that are explained in a constitutive explanation are not distinct from the constitutive parts of the system. Consider now the following case. Suppose a war that consisted of ten battles. This means that one might elaborate a constitutive explanation of the war’s capacities mentioning each one of those ten battles. Let us assume that five battles would have been enough to constitute the war. Considering that the war was actually constituted by ten battles, suppose further that it could be divided into two groups of five battles each. Thus, if any of these groups had not occurred, the war would have occurred anyway. This may imply a problem regarding the question of the essential constituents of a system and is similar to the usual problem that a counterfactual account of causation faces in overdetermination cases. We may call this scenario a case of constitutive overdetermination. In order to clarify this problem, we may claim that, although the war would have occurred if only five battles had occurred, it would not have had the same destructive capacity that it actually had. Only a description
102
Chapter Nine
involving the two groups of battles and information on how they interacted can explain the destructive capacity of the war, under the determined circumstances in which it developed. This helps us to understand the importance of the fact that constitutive explanations do not simply explain the occurrence of a certain process, but they explain the particular causal capacity of that process. It should be clear that cases of constitutive overdetermination are not cases of causal redundancy. However, they involve explanatory redundancy, which is a reason to study them carefully and to attend to the problems that they might produce. The connection between cases of constitutive overdetermination and multi-level causation is an important feature that must be considered (cf. Paul 2007). A particular attention to the question of how different domains of description are involved in cases of constitutive overdetermination might be relevant. Somehow, the domain of description under which battles are characterised is different from the domain under which wars are characterised. In the same manner, organs and cells are characterised under different domains of description. The different ways in which a system can be described can be determined clearly on the grounds of Ylikoski’s notion of constitutive explanation. Consider the description of a war as a system. A correct description should not only involve information about the process, but also about its causal capacities, such as the destructive power, and about the circumstances under which it developed. A correct description of the battles that constitute the war can also be guided by the notion of constitutive explanation considered here: It should not only involve information about the battles taken separately, but also about the ways in which they organised during its development. Let us now focus on some important properties of the constitution relation considered by Ylikoski: First, this relation is asymmetric: the system’s causal capacities do not constitute its parts and their organization. This asymmetry reects the direction of explanation in constitutive explanation. [...] Second, constitution is an irreexive relation, nothing constitutes itself. Third, the relation of constitution is synchronous: constitution does not take time. Similarly, if changes in the basis give rise to changes in the causal capacities of the system, these changes take place in the same instant. Thus it does not make sense to talk about processes in the case of constitution. While causal processes take time, constitution does not. Underlying this feature is the third important property of constitutive relation: the relata are not independent existences. In the case of causation it is possible to require that cause and effect are distinct existences, but the same idea does not work in the case of constitution. To have certain causal capacities is to
Further Examples of Overdetermination
103
have a certain constitution (i.e. certain parts organized in a certain way). (Ylikoski 2013, 282)
Both the constitution relation as well as the causal relation are asymmetric. We would not say that the war constituted the battles that were part of it. Perhaps some events during the development of the war causally influenced the origin of some of the battles that constituted the complete war. However, it would be a case of causal influence and not of constitution. Additionally, although some events during the development of the war may have caused other later events that were also components of the war, we would not claim that the war constituted the causal capacities of its components. The second feature is also very important: Constitution is irreflexive. Consider, for instance, the universe and the set of all events that constitute it. If that set of events constitutes and is identical to the entire universe, wouldn’t we say that the universe constitutes itself? In order to answer this question we should focus, again, on the fact that constitutive explanations should contain information about the causal capacities of the considered system. In this sense, the explanandum cannot be identified with the explanans. An explanation of this case should have the following form: The universe has a certain causal capacity at a given time and under a given set of circumstances due to the set of events E and to their organisation. This explanation may have sense, on the one hand, if we consider the universe at a given time and assume that it has the capacity of having an influence on a posterior set of events. On the other hand, if we refer to the entire universe as an object, involving its complete history until its end, and if we assume that there is nothing outside it (and that it is not part of a multiverse), then there would be no causal capacity on the basis of which we could construct the considered explanation. Another feature of the constitution relation is synchrony. The basic parts of a system constitute it while it develops. For example, the causal capacities of a war are constituted at the same time at which each of the battles takes place. We would not say that, firstly, the battles occur and thereafter the war acquires its destructive disposition. By contrast, a cause occurs before its effect. We may say, for instance, that Robert throwing a rock against a window caused the window to break. In such a case, as in any case in which causation is considered as a relation between events, the cause precedes its effect. The synchrony between the occurrence of the constitutive elements of a system and its capacities is connected to the property of dependency. As already mentioned, the causal capacities of a system are neither distinguishable nor independent from its basic elements. If we say, for
104
Chapter Nine
instance, that a war has certain destructive capacities in given circumstances due to the battles that constitute it and to the ways in which the battles are interrelated, we also claim that the battles and how they are organised are not distinct from the war and its destructive capacities.
Overdetermination in Physics and Biology The question of whether genuine cases of overdetermination are possible can be studied from the perspective of the natural sciences. Cases of overdetermination are easily found in biology. For example, liver cancer can be symmetrically overdetermined by two distinct sets of conditions (Gatto & Campbell 2010). The first set involves the presence of aflatoxin in the organism, a highly carcinogenic substance, and the absence of the gene GSTm1, which encodes enzymes responsible for reducing the susceptibility of cancer generation. A second set of conditions involves the presence of hepatitis C and a certain level of alcohol consumption. Any of these sets is, together with the relevant background conditions, sufficient to cause cancer. If both sets of conditions were present simultaneously, we would have a case of overdetermination. An intervention in the set of conditions involving aflatoxin would not result in a change in the effect variable; the cancer would still be generated. Thus, the generation of cancer does not depend counterfactually on the set of conditions involving the presence of aflatoxin. But this does not mean that aflatoxin cannot cause cancer. Of course, we may clarify the situation changing the original causal model and fixing the set involving hepatitis C by eliminating alcohol consumption. However, that would change the assumption that both sets of conditions are symmetrically overdetermining the cancer. Another example of symmetric overdetermination is found in electrical circuits, when the independent activations of two switches are sufficient to cause a resistor to produce heat. If any of the switches had been turned off, the resistor would have still produced heat. Anyhow, it would seem plausible to re-describe the case considering the amount of resistance produced. Thus, if two switches had been activated, the amount of electrical resistance would have been higher than if only one of them had. The case could be reduced to a scenario of collaborative causation. A case of preemption in physics can be found in closed quantum systems, in which defect formation—i.e. an induced excitation of the system—preempts a dynamic symmetry breaking of the system (Ortix, Rijnbeek, & van den Brink 2011). Examples of preemption can also be found in biology. Consider the following: Red blood cells are eliminated by macrophages residing in the
Further Examples of Overdetermination
105
spleen and in the liver (de Back et al. 2014). When the spleen ceases to function correctly, its efficiency in eliminating old red blood cells may decrease. When this occurs, that function can be assumed by the liver together with the bone marrow. This means that a correctly functioning spleen is constantly preempting processes that the liver and the bone marrow could also perform. If the functioning of the spleen had been manipulated to inactivity, the old red blood cells would have been eliminated anyway. On this basis, the elimination of red blood cells does not depend counterfactually on the activity of the spleen. But this does not mean that the spleen does not actually contribute to the elimination of those cells. These cases of symmetric overdetermination and preemption in physics and biology show that there are good empirical reasons to assume that genuine causal redundancy deserves serious attention at least in some contexts.
The Problem of Many Hands The notion of overdetermination is closely connected to the notion of responsibility. A problem based on the difficulty of determining a single responsible action of a given effect is the so-called problem of many hands. It can be characterised as follows (van de Poel et al. 2011, 50): Problem of many hands: In a given situation, it may be impossible or very difficult to identify the persons that were responsible for a certain event, due to the complexity of the situation and the number of actors involved. One of the main concepts that we should consider before understanding the problem of many hands is the concept of responsibility. Ibo van de Poel et al. claim that while one can distinguish different kinds of responsibility, the notion of responsibility-as-blameworthiness is one of the most relevant notions on the basis of which one can discuss the problem of many hands. This notion can be defined as follows (van de Poel et al. 2011, 53): Responsibility as blameworthiness: An agent S is responsible for an occurred event e, if and only if: (a) S was capable of producing e (capacity), (b) S caused e by performing a set of actions A (causality),
106
Chapter Nine
(c) S performed A intentionally and knowing that A may have caused e (knowledge), (d) S was not obligated to perform A (freedom), (e) the set of actions A is considered to be wrong (wrong-doing). Suppose that we blame a company’s gas emissions for contributing to climate change. We may assume that the company was capable of contributing to that effect, that the company actually contributed to it on the basis of a particular set of actions and that the members of the company’s headquarters were aware of the damage that their production may have caused. Thus, the three first conditions for responsibility-asblameworthiness are fulfilled. Suppose further that the actions that lead to the company’s contribution to climate change are considered to be wrong actions. This may be questioned if the company also contributed to the well-being of many people to an extent that is much higher than the degree of the harm that it caused. This questioning of the company’s responsibility for contributing to climate change depends, of course, on the ethical principles on which we base our judgement. We may claim, against such a questioning, that the company’s well-doing is fully irrelevant and that the wrong actions that had serious effects on climate change are still wrong as such. Assume further that the members of the company’s headquarters acted with free will. Is the company responsible for contributing to climate change? At a first glance, we would like to say yes. Consider the following argument that the company’s representatives might consider in order to defend themselves from the accusations. The company’s production of the last years caused gas emissions that contributed considerably to climate change. However, suppose that if the company’s economic production had been smaller than it actually was, the competition’s production would have increased. In such a case, the competition’s production would have increased in such a way, that the whole industry would have contributed to climate change in the same manner as it contributed to climate change on the basis of the company’s actual production. This would be an argument based on a preemption structure and it is a way in which the problem of many hands arises. We may conclude that the above shown five conditions that should be fulfilled in order to attribute responsibility-as-blameworthiness to an agent are not enough. In order to avoid the preemption argument, we may consider the following further condition for the responsibility attribution: Contextualist responsibility condition: According to a context of attribution C, if an agent S is responsible for an occurred event e, then,
Further Examples of Overdetermination
107
if the actions performed by S that caused e had not been performed, e would not have occurred. If, according to a given context C, there is no evidence supporting the fact that the competition would have compensated the contribution to climate change, we may consider that the company was indeed responsible for contributing to climate change. The importance of the description of the effect regarding responsibility attributions should be considered. The company’s representatives may deny that the company’s gas emissions actually contributed to climate change in a causal sense, because, if the company had not emitted the quantity of gas it emitted, the climate change would have been the same, given all other actual sources of greenhouse gas. That is, the company’s actions made no difference regarding the effects on climate change. We would like to say that the company’s gas emissions actually contributed. In order to attribute responsibility to the agents involved in this type of case, the contextualist condition helps. If the climate change could be described, according to a context of evaluation, with enough specificity such that the gas emissions appear to make a difference in them, we may conclude that the company is actually responsible for contributing to climate change.
CHAPTER TEN CAUSATION, VARIABLE CHOICE AND PROPORTIONALITY
It should be clear now that the problematic consequences of causal overdetermination scenarios might be avoided by modifying the descriptions of the events involved in each case. A new problem arises, which is the question of how we should describe the events in a given causal scenario.
Principles for Variable Choice Let us consider again the notion of a causal model, which can be characterised, roughly, as a set of variables and a specification of how the values of each variable depend on the values assigned to other variables. Consider, for instance, a case in which Robert throws a rock at a bottle, causing it to shatter. We might represent the shattering of the bottle by B, a binary variable that can take the value 1, when the bottle shatters, or the value 0, when it does not shatter. We can also introduce the binary variable R to represent the fact that Robert might throw a rock, according to which the value of R is 1, and that he might not throw it, according to which the value of R is 0. We may also include a set U of exogenous variables. Let us assume that these variables determine the values of R and that the value of B depends on the values that R takes. This is a causal model that describes how Robert’s throw caused the bottle to shatter. Since variables in a causal model may represent events, the problem about how we should describe the events involved in a causal scenario can be considered as a problem about how to choose the variables of a causal model. James Woodward presents this problem in the following way: Suppose we are in a situation in which we can construct or define new previously unconsidered variables either de novo or by transforming or combining or aggregating old variables, and where our goal is to find variables that are best or most perspicuous from the point of view of causal analysis/explanation. Is there anything useful to be said about the
110
Chapter Ten considerations that ought to guide such choices? (Woodward 2016, 1048)
Woodward distinguishes this version of the problem from a version according to which one considers a certain set of variables of which some can be selected. As Woodward claims, the interesting problem consists of the question about the criteria on the basis of which one should design new variables. Consider again the causal scenario in which Robert throws a rock at a bottle. Can we describe the case adding a new variable? What should motivate this? Which would be the benefits of adding a new variable? For instance, we may want to add a variable L in order to describe whether the sound of the bottle crashing was loud or not. If it was loud, the value of L will be 1, and if it was not, the value of L will be 0. Of course, our expectations and perhaps our observations about the usual physical processes involved when a bottle shatters may be perfectly compatible with the addition of L to the causal model. However, we can still ask whether there are better ways of adding such kind of information or whether that addition is necessary at all. It should be noted that the problem of variable choice is not a problem that only affects theories of causation that explicitly consider the causal relation as a relation between variables, such as theories based on causal models, on graphs or on structural equations. Following the account developed by Woodward (2003), as well as the theory proposed by Halpern and Pearl (2005), we might define the notion of actual causation in terms of variables and on the basis of the concept of intervention. Let “X = x” symbolise the fact that the value of a variable X equals x, that is, an event. The notion of an actual cause is defined as follows: Actual causation: In a causal model M, X = x is the actual cause of Y = y, just in case a) the actual value of X is x and the actual value of Y is y, b) there is a possible intervention on X, such that, if it was carried out, the value of Y would change and c) if X were held fixed and other variables in M, other than X and Y, changed their values, the value of Y would not change. In order to grasp the notion of actual causation correctly, we need to characterise the concept of an intervention: Intervention: Given a causal model M and a variable X that is included in M, the change of the actual value of X is an intervention, if it does not imply changes of other variables of M on which the value of X might normally depend.
Causation, Variable Choice and Proportionality
111
As already mentioned, we may think of the problem of choosing the variables that should be involved in the representation of a causal scenario as the problem of describing the events that are supposed to be causally related in that scenario. To think that the problem of variable choice affects only theories that explicitly involve the notion of a variable is a misunderstanding that Woodward considers in the following way: There is a tendency among some philosophers of science to suppose that problems of variable choice are particularly or distinctively a problem for approaches to causal inference and reasoning that make use of structural equations and directed graphs. […] In my view this assessment is wrongheaded; variable choice is equally an issue for any theory of causation or causal inference or apparatus of causal representation. For example, if X and Y are counterfactually independent, V = X +Y and U = X í Y will be counterfactually dependent. (Woodward 2016, 1050)
I agree with Woodward’s considerations on this issue. The problem of selecting variables is important for any account of causation that is based on a certain way of describing events, since to choose a variable is, in a certain sense, to choose a description. Let us focus again on the possibility of adding a variable that represents the loudness of the bottle’s shattering to a causal model. On the basis of which criteria should we choose whether to include that variable in the analysis or not? These criteria should be connected to other questions, like the following ones. What are, for a given causal inquiry, the benefits of including variable L into a causal model? Is the inclusion of L necessary? Clearly, these are questions related to the aims of the investigation. According to Woodward, in order to give an appropriate account of the problem of variable choice, one must focus on the notion of the goal of inquiry: My view, which will motivated in more detail below, is that the problem of variable choice should be approached within a means/ends framework: cognitive inquiries can have various goals or ends and one can justify or rationalize candidate criteria for variable choice by showing that they are effective means to these ends. Put differently, one should be able to specify what goes wrong in terms of failure to achieve one’s ends if one chooses the “wrong” variables. (Woodward 2016, 1051)
Thus, in order to find the appropriate variables for a causal model one should consider the ends that one wants to achieve by the particular causal investigation. Once one has determined a set of goals, it should be clearer which is the set that contains the best means—in this case, the most suited
112
Chapter Ten
variables—through which one might achieve those goals. For instance, if we want to know which kind of action Robert performed that caused the bottle to shatter, we may design a causal model including a variable representing Robert’s throwing a rock, as well as other variables representing other actions that he performed near the bottle that day. For instance, we may include a variable representing the fact that Robert stomped on the floor with his foot near the bottle. For the given goal of causal inquiry just mentioned, including this variable seems reasonable. Suppose now, that we know that Robert shattered the bottle with his throw but we want to know how relevant the force of his throw was with regard to the bottle’s shattering. In this case, it would not be necessary to include a variable representing Robert’s stomping on the ground. It might be appropriate, however, to consider many values for the variable that represents Robert’s throw, such that each value could represent a given degree of intensity. According to such a model, we might find out that Robert did not need to throw his rock with the strength with which he actually threw it: The bottle would have been shattered also by throwing it a little slower. These brief examples show in which sense the goal of causal inquiry determines or guides the choice of the set of variables that might be appropriate and that one should consider including a causal model. Woodward (2016) proposes a set of criteria on which one might base the choice of variables according to a given causal inquiry. It should be noted that, in general, these criteria do not impose restrictions on causal analysis, but rather propose suggestions according to which information about causal relations is more easily acquired than if one does not follow them. These criteria might be presented as follows. Variables should be clear targets of interventions: One should choose variables that describe qualitative or quantitative properties, such that it is clear what it means that the value of that variable changed through an intervention. A variable representing age is an example of a variable that might not meet this criterion. For instance, it may not be clear according to which properties we should introduce a variable that represents Robert’s age. Furthermore, it may be really hard to determine the possible values that a variable representing age should involve. Variables should have unambiguous effects: One should select variables in such a way, that it is clear what would occur if their values changed.
Causation, Variable Choice and Proportionality
113
A variable representing age is also a good example of variables that do not meet this criterion. For instance, it may be hard to determine what would have occurred if Robert had been older. Perhaps, we could have a general idea of the changes that such a supposition could imply. Maybe he would not have been able to throw the rock with the same strength with which he actually threw it. But maybe he would have been able to throw it with the same strength. It is not clear or, at least, it would be too hard to determine what would have occurred. It should be possible that variables change, independently of other variables’ values: We should choose variables whose values can be changed by interventions in such a way, that the change does not conflict with the values of other variables that are part of the model. An example of a variable that does not meet this criterion would be a variable that logically conflicts with another variable of the causal model. Suppose that we design a causal model according to which Robert throws a rock at a bottle shattering it. Suppose that we are considering the inclusion of a variable that represents Robert’s body movements. The actual value of this variable represents the fact that he has moved his body in some way. It may be hard to make sense of an intervention applied to that variable. Suppose that, after an intervention, the value of that variable represents the fact that Robert did not move his body at all. This could hardly make sense, considering that he threw a rock. According to a usual definition of throwing a rock, the fact that someone throws a rock implies that he moves his body in some way. One could construct a causal model involving this kind of conflict and apply the needed modifications of the values according to the evaluation of the causal relations considered. However, it would be easier to learn something about the causal relations of a model by avoiding the introduction of variables that may be incompatible in this way. Variables should influence only a few variables: One should choose variables that affect a small number of other variables in a model. According to the given goals of causal inquiry and a given variable X, the set of variables that depend on X should be as small as possible. Suppose, for instance, that we want to include information about the number of glass pieces in which the bottle shattered. We might consider the set of all physically possible glass pieces that could be formed by the impact between a bottle and a rock and introduce a variable into the causal
114
Chapter Ten
model corresponding to the possible formations of each one of the glass pieces. Let Y be such a set. If a glass bottle can be broken in two big pieces after being hit by a rock, then we would have a variable for each one of those pieces. If it is physically possible that a bottle shatters into fifty glass pieces after being hit by a rock, then our causal model will involve a variable for each one of those pieces. Let Y1 be one of those variables. It symbolises the fact that one of these pieces of glass is formed after the impact. Let Y1, as all variables that are members of Y, be a binary variable. The value of Y1 is 1 just in case the corresponding glass piece is formed after the impact and it is 0 if that piece is not formed. Suppose further that Robert’s throw is represented by X, a three-valued variable. The value of X will be 0 just in case Robert does not throw a rock at the bottle, 1 just in case he throws his rock slowly and 2 just in case he throws it fast. Assume that, actually, Robert threw a rock at the bottle slowly and that the number of variables in Y whose value equals 1 is n. The variables in Y causally depend on X, such that, if Robert had thrown his rock fast, the number of variables in Y taking the value 1 would be m, where m > n. According to the criterion of variable choice that I am now considering, the way of choosing variables just described is not the more appropriate and should be avoided. Again, according to this criterion, the number of variables that are affected by other variables should be as small as possible. We can think of a simpler causal model that describes the same scenario with the same level of specificity. Let X be the same threevalued variable considered above and Y be a variable representing the shattering of the bottle. The number of possible values of Y is q and each value represents one of all the glass pieces that can be possibly formed after a bottle is hit by a rock. As things actually occurred, Robert threw his rock slowly and Y took the value n. Additionally, if Robert had thrown his rock fast, the value of Y would have been m, where m > n. Let us now consider the following criterion suggested by Woodward (2016): Variables should permit to relate them deterministically: According to this criterion, the variables in a causal model should be chosen in such a way, that one could be able to find deterministic causal relations between them. For instance, suppose that Robert tells us that the force with which he throws rocks varies randomly and that, in our causal model, variable X symbolises the fact that he throws a rock at the bottle. The value of Y, a
Causation, Variable Choice and Proportionality
115
variable representing the bottle’s shattering, depends, let us assume, on X and also on the force of the throw, symbolised by F, a random variable. We should prefer a causal model that involves the reasons on the basis of which Robert chooses the strength of his throws. If we know that, as a matter of fact, he does not choose the force of his throws randomly, as he thinks he does, but that it is affected by the shape of the rocks, we might consider a slightly different causal model. Let X, Y and S be the same variables considered in the model just described. However, now the values of S are not randomly determined, but they depend on S, a variable that symbolises the shape of the rock that Robert has in his hand before throwing. As Woodward clearly points out, variables that are related in a deterministic way can be controlled more easily than variables related according to random factors. For instance, in the example just considered, we could influence Y according to the shape of the rock that we put in Robert’s hands. By contrast, we could not control the value of Y with the same accuracy if we just assumed that the force of his throw varies randomly. Relations between variables should be stable: According to this criterion, one should prefer variables that permit to relate them in a stable way. That is, the causal relations between variables should hold even if one changes the background conditions of the causal model or the set of variables. For instance, suppose that Y represents the fact that a glass bottle shatters after being hit by Robert’s rock and N represents the fact that an annoying noise is produced. Variable N depends causally on variable Y. Now assume that, according to another causal model, N is not present and Y causally influences S, which symbolises the fact that a glass shattering sound is produced at a certain frequency. The causal relation that might hold between Y and N seems to be less stable than the causal relation that might hold between Y and S. The reason for this is the following: The dependence between Y and N only holds in scenarios including persons feeling annoyed by the sound of a breaking bottle. In such scenarios, the causal relation between Y and S would also hold. However, it would also hold in scenarios that do not include persons feeling annoyed by the sound. Woodward explains the importance of stability as a criterion for variable choice as follows: Stability thus has to do with the extent to which a causal relationship is exportable from one set of circumstances or another. It does not seem
116
Chapter Ten controversial that, other things being equal, it is preferable to identify more rather than less stable causal relationships—more stable relationships are more generalizable and provide more information relevant to manipulation and control. Stability considerations are relevant to decisions about the choice of variables because some sets of variables may enable the description of more stable causal relationships than others, given what we are able to know and calculate. (Woodward 2016, 1068)
According to Woodward, stable causal relations are easier to control than unstable relations. Considering again the example that I presented above, a variable that symbolises how a particular sound could be annoying to someone is harder to control than a variable that only symbolises the frequency of that sound. Stable causal relations are also easier to generalise. Considering the example, a general description of the situations in which a particular sound might be annoying is harder to determine than a general description of the situations in which a particular sound would correspond to a given frequency range. Dependency relations should be represented accurately: According to this criterion, the variables of a causal model should be chosen in such a way, that the causal graph obtained from the model’s description represents the relations between the variables in a clear and accurate way. For example, one may design a causal model that describes the probability with which the bottle might be shattered given the fact that Robert throws a rock against it. In a causal graph, the causal dependence between Robert’s throw and the bottle’s shattering might not be clearly represented. If we included the factors on the basis of which Robert could miss his throw in the model, we could perhaps be able to describe more accurately on which variables the shattering of the bottle depends. This dependence could be represented more clearly in a causal graph. According to Woodward, correlations between variables should also be clearly represented. Suppose, for instance, that every time that Robert picks up a rock from the ground, Susan also picks up one. If we want to describe this in a model, it would be appropriate to explain why those events are correlated.
Proportionality and Context An important feature that we might consider in order to determine causal relations is proportionality. Shapiro and Sober (2012) present the
Causation, Variable Choice and Proportionality
117
following propositions: Assumption: Bulls cannot distinguish different shades of red. (1) The cape’s being red caused the bull to charge. (2) The cape’s being crimson caused the bull to charge. We can assume, just for the sake of the argument, that the colour of a cape makes a difference to a bull’s irritation and behaviour. That is, if the cape had not been red, the bull would not have charged. As Shapiro and Sober point out, proposition (2) seems to be incorrect because the cause involved in it is not proportional with regard to the effect. They consider two principles of proportionality according to which we may establish that proposition (2) is incorrect: Semantic principle of proportionality: If the description of the cause involved in a causal claim P says more than what is necessary to bring about the effect, P is false. According to this principle, proposition (2) should be considered to be false. Since the fact that the cape is crimson involves more information than the information that is needed in order to establish the conditions on which the considered effect depends, the whole causal claim is false. Let us now focus on the other principle considered by Shapiro and Sober: Pragmatic principle of proportionality: If the description of the cause involved in a causal claim P says more than what is necessary to bring about the effect, it is conversationally inappropriate to assert P. According to this principle, proposition (2) is not false, but it would be inappropriate to assert it in some conversational contexts. The description of the cause involved in it contains more information than the information needed, as we assumed, in order to establish that the description of the effect depends on it. For instance, in contexts of conversation in which people discuss seriously about bullfighting, it may sound irrelevant to say that the fact that the cape was crimson caused the bull to charge. We may support the semantic principle of proportionality on the grounds of a counterfactual notion of causation. Proposition (2) is false, because it is not true that, if the cape had not been crimson, the bull would not have charged. The cape might have been scarlet and the bull would have charged anyway. Under the assumptions considered, (1) can be regarded as true. If the
118
Chapter Ten
cape had not been red, the bull would not have charged. Suppose that if, instead of red, the cape had been of any other colour, excepting violet, the bull would not have charged. Thus, suppose, that the bull would have also charged if the cape had been violet. In this scenario, which could be regarded a case of overdetermination, proposition (1) cannot be considered to be true. Contrastive accounts of causation seem to offer a clear way of facing problems of overdetermination. We might characterise a general contrastive theory of causation as follows (Hitchcock 1996, Schaffer 2005, Northcott 2008): Contrastive causation: An event C causes an event E, just in case a) C and E occurred, b) there are two alternative events, C* and E*, that did not occur, c) and if C* had occurred, E* would have occurred instead of E. Shapiro and Sober (2012, 91) compare two propositions in order to show the difference between a false causal claim and a conversationally inappropriate one. Consider the following true proposition (I will stick to the numeration used by Shapiro and Sober): (9) It is the cape’s being crimson rather than white that caused the bull to charge rather than stand still. According to the contrastive account of causation, this proposition is true, because (let us assume) if the cape had been white, the bull would not have charged. Let us now focus on the other proposition considered by Shapiro and Sober: (10) It is the cape’s being crimson rather than scarlet that caused the bull to charge. This proposition can be considered to be inappropriate in some contexts of conversation. However, it could be appropriate in other contexts. For instance, one may falsely think that if the cape was scarlet, the bull would not have charged. One should notice that, in such a scenario, the proposition would also be true. Shapiro and Sober argue that proposition (2) might be considered to be false if one decided to evaluate it on the basis of proposition (10). Proposition (10) is false, because it is not true that, if the cape had been
Causation, Variable Choice and Proportionality
119
scarlet, the bull would not have charged. Now, considering that proposition (9) is true, we might say that proposition (2) could be regarded as a true proposition, if it was evaluated on the basis of proposition (9). Thus, proposition (2) can be true or false, depending on the contrastive proposition in relation to which we interpret it. However, according to Shapiro and Sober, this is not the relevant problem: Suppose you are in a context in which (2) is false because the contextually indicated contrast with crimson is scarlet. This means that you evaluate (2) by attending to (10), and (10) is false. This, of course, does not affect the fact that (9) is true. Context determines what the contrast is in (2), but context plays no such role in (9), because there the contrast is explicit. When philosophers of bulls consider cape causation, should they focus on the (contextually) false Statement (2) or on the true Statement (9)? We suggest that this question poses a false dichotomy. The point that needs to be recognized is that (10) is false and (9) is true. Both facts are relevant to understanding cape causation. There is no further question about whether the cape’s being crimson really caused the bull to charge. What is more, there is no conflict between the cape’s being red (rather than not red) and the cape’s being crimson (rather than white)—both are true descriptions of what made the bull charge. (Shapiro and Sober 2012, 92)
I disagree with the claim that context plays no role with regard to proposition (9). Of course, the contrast is explicitly involved in the proposition. We might say that the truth value of proposition (9) does not depend on how the context determines the relevant contrast class. However, its truth value depends on another assumption, which can also be considered as part of a relevant context. We should recall that proposition (9) is regarded as a true proposition, under the assumption that, if the cape’s colour had not been a shade of red, the bull would not have charged. In a context in which such an assumption was not present, proposition (9) could be considered to be false. I agree with Shapiro and Sober about the importance of considering the truth values of (9) and (10) in order to evaluate in which way the colour of the cape is causally relevant to the fact that the bull charged. However, their truth values are important in that sense on the basis of the assumptions according to which they are determined. It is also important to focus on the fact that true contrastive causal claims can be considered to be inappropriate. Consider the following proposition: The fact that the bull saw a crimson cape rather than a pair of blue shoes caused him to charge.
120
Chapter Ten
We might say that this proposition is true, according to the contrastivist account of causation. If the bull had not seen a crimson cape and, instead, had seen a pair of blue shoes, he would not have charged. Anyhow, depending on the conversational context, it may be inappropriate to assert such a proposition.
CHAPTER ELEVEN DECISION THEORY AND INDIFFERENCE
A theory of rational decision should offer a way to determine the best option for a rational agent involved in a particular decision situation, given her beliefs and desires. Thus, theories of decision are usually based on a definition of expected utility and define a rational decision as the one that maximises the expected utility of its corresponding option.
Evidential Decision Theory Let A be an option, O be a set of possible outcomes, P be a probability function and V a value function. The notion of expected utility on which evidential decision theory is based can be defined as follows (Jeffrey 1965): EU(A) = Ȉj P(Oj|A) V(AOj) For instance, suppose that Susan wants to shatter the bottle that stands a few metres in front of her. Her options are the following: She can either throw a rock against it or a tennis ball. Let R symbolise the first option and T, the second. Additionally, let S symbolise the fact that the bottle shatters. Assume that the value that Susan assigns to the possibility that the bottle shatters is two and the value that she assigns to the possibility that it remains unbroken after she performs one of the considered actions is one. Suppose further that she assigns a probability of 0.9 to the fact that the bottle will shatter, given that she throws a rock at it, and a probability of 0.5 to the fact that it will shatter, given that she throws the ball. Thus, on the one hand, the expected utility of throwing a rock against the bottle is the following: EU(R) = P(S|R)2 + P(¬S|R)1 EU(R) = 1.8 + 0.1 = 1.9
122
Chapter Eleven
On the other hand, the expected utility of throwing a tennis ball against the bottle is determined as follows: EU(T) = P(S|T)2 + P(¬S|T)1 EU(T) = 1 + 0.5 = 1.5 Since the expected utility associated to the option of throwing a rock at the bottle is higher than the expected utility associated to the option of throwing a tennis ball at the bottle, the first option is the most rational one, given Susan’s interests and beliefs. Susan should throw a rock against the bottle instead of throwing a tennis ball against it. In general, nobody should doubt that a rational agent should choose to perform actions that cause her benefit or, at least, cause her no harm. However, evidential decision theory is not explicitly based on any notion of causation. For instance, if Susan decides to throw a rock instead of a tennis ball, it will be more likely that the state that she prefers occurs, namely, the fact that the bottle shatters. But, in order to determine that the best option is to throw a rock, we do not need to consider with which probability degree Susan’s throw will cause the bottle to shatter. We should only consider how likely the fact is that the bottle shatters, given the fact that Susan throws a rock against it.
Causal Decision Theory In some scenarios, called Newcomb problems (Nozick 1969), the action to which a rational agent assigns, according to evidential decision theory, the highest expected utility is an action that cannot, in any way, cause the preferred outcome. Let us consider Skyrms’s description of a case of this kind: Suppose that the connection between hardening of the arteries and cholesterol intake turned out to be like this: hardening of the arteries is not caused by cholesterol intake like the clogging of a water pipe; rather it is caused by a lesion in the artery wall. In an advance state these lesions will catch cholesterol from the blood, a fact which has deceived previous researchers about the causal picture. Moreover, imagine that once someone develops the lesion he tends to increase his cholesterol intake. We do not know what mechanism accounts for this effect of the lesion. Cholesterol intake among those who do not have the lesion appears to have no effect on vascular health. Given this (partly) fanciful account of the etiology of atherosclerosis, what would a rational man who believed the account do when made an offer of Eggs Benedict for breakfast? I say that he would accept. He would be a fool to try to “make it the case that he had not
Decision Theory and Indifference
123
developed the lesion” by curtailing his cholesterol intake. (Skyrms 1980, 128)
It seems clear why a rational agent should, according to Skyrms, accept the breakfast: He knows that cholesterol intake will not cause hardening of the arteries. Thus, there is no good reason, apparently, to reject the offer of the Eggs Benedict. Nevertheless, an agent that based his decision on the notion of expected utility presented above would reject it. Let us think that Robert is in the situation described by Skyrms. Let also B symbolise the action of accepting the breakfast and A the possibility that Robert suffers from atherosclerosis. Assume that Robert does not know whether he has atherosclerosis or not and, as should be expected, that the value of having atherosclerosis is considerably less than the value of not having it. Let the former be 6 and the latter be 1. Let also the value of having the breakfast equal 1. Let us assume further that Robert assigns a probability of 0.7 to the fact that he might have atherosclerosis given that he accepts the breakfast. This probability is high considering the assumption pointed out by Skyrms, according to which persons that have developed the lesion in the artery wall tend to consume more cholesterol. Thus, the expected utility that he assigns to the option of accepting the breakfast is the following: EU(B) = P(A|B)V(A&B) + P(¬A|B)2 EU(B) = (0.7) (-5) + (0.3) (2) = -2.9 Assume also that Robert assigns a probability of 0.7 to the fact that he might not suffer from atherosclerosis, given that he does not accept the breakfast. The expected utility of rejecting the breakfast is then the following: EU(¬B) = P(A|¬B)V(A&¬B) + P(¬A|¬B)1 EU(¬B) = (0.3) (-6) + (0.7) = -1.1 Since the expected utility of rejecting the breakfast (¬B) is higher than the expected utility of accepting it (B), Robert should reject the breakfast, on the basis of evidential decision theory. This contradicts Skyrms’s argument, according to which it would be irrational to expect that one can lower the probability of a state that cannot be causally influenced by one’s action. This is, however, the recommendation of evidential decision theory. Newcomb’s problem is described by David Lewis in a more general manner as follows:
124
Chapter Eleven Suppose you are offered some small good, take it or leave it. Also you may suffer some great evil, but you are convinced that whether you suffer it or not is entirely outside your control. In no way does it depend causally on what you do now. No other significant payoffs are at stake. Is it rational to take the small good? Of course, say I. (Lewis 1981, 8)
In the case described by Skyrms, the Eggs Benedict are the small good, while the possibility of having atherosclerosis corresponds to the great evil. The lesion in the artery wall is an initial condition that increases the probability of both, the small good and the great evil. Actually, having the lesion causes the great evil. Thus, an agent cannot influence that condition by rejecting the small good. Nevertheless, according to evidential decision theory one should reject it. I think, with Skyrms and Lewis, that this is irrational. In contrast to evidential decision theory, causal decision theory is based on the idea that a rational agent should consider what her actions can cause instead of just considering what her actions are evidence for. For an option A and a set of outcomes O, the notion of expected utility that grounds causal decision theory can be defined as follows (Gibbard & Harper 1978): EU(A) = Ȉj P(A ĺ Oj) V(AOj) This notion can be called causal expected utility and is based on the probability assigned to a counterfactual conditional. A counterfactual conditional “A ĺ B” symbolises that if A occurred, B would occur and characterises causal dependence. It is non-trivially true at a possible world w just in case in all possible worlds that are closest to w at which A holds, B also holds (Lewis 1973b). Let us reconsider the example of the artery lesion, now in the light of causal decision theory. Assume the same values that we assumed above. On the basis of this theory, the probability that Robert should assign to the fact that he would suffer from atherosclerosis if he accepted the breakfast cannot be high. The reason for this is the following: Among the possible worlds that are closest to the actual situation, there are worlds in which Robert has his breakfast and nevertheless does not suffer from atherosclerosis. Let again B symbolise the action of accepting the Eggs Benedict and A symbolise the possibility that Robert has atherosclerosis. The counterfactual B ĺ A is not true. Actually, at the moment of his deliberation, Robert does not know whether he has atherosclerosis or not. Thus we might assume that the counterfactual conditionals B ĺ A and B ĺ ¬A are equiprobable.
Decision Theory and Indifference
125
On the basis of similar reasons, we might assume that the counterfactual conditionals ¬B ĺ A and ¬B ĺ ¬A are also equiprobable. It is not true that Robert would not suffer from atherosclerosis if he did not accept the breakfast. It is neither true that he would suffer from atherosclerosis if he did not accept it. Thus, it seems reasonable to consider both conditionals as equiprobable. Following these considerations, the expected utility of accepting the breakfast should be the following: EU(B) = P(B ĺ A)V(B&A) + P(B ĺ ¬A)V(B&¬A) EU(B) = (0.5)(-5) + (0.5)(2) = -1.5 The expected utility that Robert should assign to the option of rejecting the breakfast, on the basis of causal decision theory, is the following: EU(¬B) = P(¬B ĺ A)V(¬B&A) + P(¬B ĺ ¬A)V(¬B&¬A) EU(¬B) = (0.5)(-6) + (0.5)(1) = -2.5 The expected utility of having the breakfast is greater than the expected utility of rejecting it. Robert should accept the Eggs Benedict. As Skyrms argues, an agent would be a fool if he tried to influence the fact that he might have the lesion. On the basis of causal decision theory, Robert is no fool. He acknowledges that he cannot influence the possibility of having atherosclerosis by accepting or rejecting a single breakfast. In this sense, causal decision theory is a more adequate account of rational decisions than evidential decision theory.
Decisional Overdetermination As already explained (Chapter 8), an account of causation based on counterfactuals might face problems in scenarios involving overdetermination. Is a theory of decision that is based on counterfactuals, such as causal decision theory, able to face problems with scenarios involving overdetermination? Let us consider the following case presented by Christopher Hitchcock: Suppose, for example, that Suzy desires that the bottle be shattered, and that she has no preference for throwing or not-throwing considered intrinsically. I.e. if T says that Suzy throws her rock, and S that the bottle shatters, her desirabilities satisfy D(T&S) = D(~T&S) > D(T& ~S) = D(~T& ~S). Suppose also that if Suzy throws, her rock will shatter the bottle, and that if she doesn’t throw, the bottle will still be shattered by
126
Chapter Eleven Billy’s rock. Finally, suppose that Suzy knows all of this (or at least believes it with degree of belief 1). Then CDT [causal decision theory] tells her that she should be indifferent between throwing her rock and not throwing. But this clearly is the right recommendation. If all she cares about is whether the bottle will shatter, and it will shatter no matter which action she chooses, then she should be indifferent. The fact that Suzy’s throw would count as a cause of the bottle’s shattering, despite the lack of counterfactual dependence, is irrelevant to her deliberations. (Hitchcock 2013, 138)
The desirability function D and the value function V can be considered to be the same function. We can characterise this case as a scenario of decisional overdetermination on the basis of the following definition: Decisional overdetermination: A decisional scenario is a scenario of decisional overdetermination just in case there is a certain outcome that would occur as an effect of any of the agent’s options. By comparing expected utilities, it can be shown that, on the basis of causal decision theory, Suzy should remain indifferent with regard to her options. Let T represent the option of throwing a rock against the bottle and S represent the fact that the bottle shatters. Suppose that Suzy assigns a value of 2 to the fact that the bottle shatters and a value of 1 to the fact that it does not. We also assume, changing Hitchcock’s scenario a bit, that Suzy assigns a probability of 0.9 to the fact that the bottle would shatter if she threw her rock at it, as well as to the fact that it would shatter if she did not throw. The expected utility of throwing the rock at the bottle would be the following: EU(T) = P(T ĺ S)V(T&S) + P(T ĺ ¬S)V(T&¬S) EU(T) = (0.9)(2) + (0.1)(1) = 1.9 Now consider the expected utility of not throwing: EU(¬T) = P(¬T ĺ S)V(¬T&S) + P(¬T ĺ ¬S)V(¬T&¬S) EU(¬T) = (0.9)(2) + (0.1)(1) = 1.9 Since the expected utility is the same for both options, Suzy should adopt an attitude of indifference in this scenario. It should be noticed that this attitude should be adopted, as Hitchcock makes clear (2012, p. 138), when the following holds:
Decision Theory and Indifference
127
V(T&S) = V(¬T&S) > V(T&¬S) = V(¬T&¬S) That is, the value of the fact that Suzy throws and the bottle shatters is the same as the value of the fact that Suzy does not throw and the bottle shatters. This value must be greater, as assumed, than the value of the fact that she throws and the bottle does not shatter, which is equal to the value of the fact that she does not throw and the bottle does not shatter. In other words, neither throwing nor refraining from throwing has any considerable value to Suzy. It should be mentioned that Suzy should adopt an attitude of indifference also on the basis of evidential decision theory. The expected utility of throwing would be the following, considering the same values and probabilities that were assumed before: EU(T) = P(S|T)V(T&S) + P(¬S|T)V(T&¬S) EU(T) = (0.9)(2) + (0.1)(1) = 1.9 And the expected utility of refraining from throwing is the following: EU(¬T) = P(S|¬T)V(¬T&S) + P(¬S|¬T)V(¬T&¬S) EU(¬T) = (0.9)(2) + (0.1)(1) = 1.9 As in the evaluation of the options on the grounds of causal decision theory, the expected utility of throwing equals the expected utility of not throwing. Thus, according to evidential decision theory indifference is also recommended. Now, is indifference the most rational attitude in every case of decisional overdetermination? I think that it is not. Consider the following decisional overdetermination scenario: Suppose that Robert wants to survive (as one would expect of most animals) and also desires that the bottle be shattered. He has no preference for throwing a rock against a glass bottle or not throwing. Assume further that if he throws his rock at the bottle, the bottle will be shattered by it, and if he does not throw, the bottle will be shattered by his friend Billy. Additionally, if the bottle does not shatter, Robert will be assassinated by a psychopath. Suppose also that he knows this. On the grounds of causal decision theory, Robert should be indifferent of his option of throwing his rock at the bottle and not throwing it. Robert would be a fool if he did not throw his rock at the bottle and decided to put all his confidence in Billy. His life is at stake; he should ensure that the bottle shatters by throwing anyway. Indifference does not
128
Chapter Eleven
seem to be a rational attitude in this case. Scenarios with these features can be characterised as follows: Decisional overdetermination involving survival: A scenario of decisional overdetermination involves survival if one of the possible outcomes implies the death of the deliberating agent and if that outcome can be avoided by her. Not all decision scenarios that involve circumstances of survival have negative consequences for causal decision theory though. We might consider a simple case, similar to the one described above, in which Robert will be assassinated if the bottle is not shattered and Billy is not present. That is, if Robert throws, the bottle will be shattered by his rock, and if he does not throw, the bottle will not shatter. This scenario involves survival circumstances, but does not seem problematic for causal decision theory. Note that, as in any decisional overdetermination scenario, an agent basing her deliberation on evidential decision theory should also be indifferent in cases of decisional overdetermination involving survival. This means that, considering these cases, we may not only suggest that there is a problem with the two particular accounts of expected utility that are associated with those theories. We may further suggest that cases of decisional overdetermination involving survival are problematic for a general notion of expected utility. The definitions of expected utility on which we have focused here establish something about how desires and degrees of belief determine the best option for a given agent. However, they do not say anything about the ways in which relevant desires and relevant degrees of belief might influence each other.
Newcomb’s Problem and the Prisoners’ Dilemma The classic version of a Newcomb problem can be formulated as follows, on the basis of Robert Nozick’s (1969) original description. Suppose that Suzy has two options: She can either take the content of an opaque box, which may contain either $1,000,000 ($M) or nothing, or she can take both the opaque box and a transparent box, which contains $1,000. The content of the opaque box depends on the action of a very reliable predictor according to the following conditions: 1) If the predictor predicts that Suzy will take both boxes, he will not put the $M in the opaque box.
Decision Theory and Indifference
129
2) If the predictor predicts that Suzy will take only the opaque box, he will put the $M in the opaque box. Suppose further that the content of the opaque box is determined before Suzy starts deliberating. It is highly probable that the opaque box contains the $M, given the fact that Suzy takes just one box, and it is highly probable that the opaque box is empty, given the fact that Suzy takes both boxes. Thus, according to evidential decision theory, Suzy should take only the opaque box. However, the predictor made his prediction and determined the content of the opaque box before Suzy started deliberating, which means that she cannot causally influence with her action what is inside the box. According to causal decision theory, Suzy should take the two boxes. Consider now Lewis’s (1979) formulation of the Prisoners’ Dilemma: You and I, the “prisoners,” are separated. Each is offered the choice: to rat or not to rat. […] Ratting is done as follows: one reaches out and takes a transparent box, which is seen to contain a thousand dollars. A prisoner who rats gets to keep the thousand. […] If either prisoner declines to rat, he is not at all rewarded; but his partner is presented with a million dollars, nicely packed in an opaque box. (Lewis 1979, 235)
Suppose that Suzy and Billy are the prisoners. Here is the decision problem that both are facing: (1) I am offered a thousand—take it or leave it. (2) Perhaps also I will be given a million; but whether I will or not is causally independent of what I do now. Nothing I can do now will have any effect on whether or not I get my million. (3) I will get my million if and only if you do not take your thousand. (Lewis 1979, 236)
Thus, if both Suzy and Billy rat, both get a thousand and if both decide not to rat, both will get a million dollars. There are two more possibilities: If Suzy rats and Billy does not, Billy gets nothing and Suzy gets a million dollars. If Billy rats and Suzy does not, Suzy gets nothing and Billy gets a million dollars. Lewis argues that the Prisoners’ Dilemma is a Newcomb problem, claiming that a Newcomb problem also involves conditions (1) and (2). Additionally, it involves the following third condition (Lewis 1979, 300): (3') I will get my million if and only if it is predicted that I do not take my thousand.
130
Chapter Eleven
We can assume that Billy is a replica of Suzy. Observing Billy’s behaviour might serve, depending on how similar both are, as a reliable prediction process about Suzy’s actions. Thus, on the basis of proposition (3'), we might say that Suzy will get a million dollars if and only if Billy does not take his thousand. This is equivalent to proposition (3). Thus, a Newcomb problem satisfies the same conditions according to which the Prisoners’ Dilemma can be described. On this basis, Lewis (1979, 239) shows that the Prisoners’ Dilemma is a Newcomb problem. José Luis Bermúdez (2013) argues that Lewis’s argument fails, fundamentally because epistemic factors are not being considered. What matters in Newcomb’s problem is not simply that there be a two-way dependence between my receiving $1,000,000 and it being predicted that I not take the $1,000. That two-way dependence only generates a problem because I know that the contingency holds. So, if Lewis is correct that the PD [Prisoners Dilemma] is an NP [Newcomb problem], then comparable knowledge is required in the PD. (Bermúdez 2013, 427)
Thus, according to Bermúdez, we should focus on what Suzy believes about proposition (3) in the description of the Prisoners’ Dilemma. Let Cp be a function applied to propositions to which a given player p assigns a high degree of confidence (Bermúdez, 2013, 427). The epistemic version of proposition (3) is the following: (4) Cp (I will get my million if and only if you do not take your thousand) In other words, Suzy believes with a high degree of confidence that she will get her million just in case Billy does not take his thousand. As Bermúdez argues (2013, 427), we should also consider what the agent that is confronted with the decision problem believes about the relation between the predictive process and the fact that the other player might not take the thousand dollars. Following Bermúdez, we may state this as follows: (5) Cp (A predictive process warrants a prediction that I do not take my thousand if and only if you do not take your thousand) According to Bermúdez, this proposition is a crucial point in order to arrive at Lewis’s conclusion. By focusing on the epistemic position of each player, a new feature of the argument might be considered. Suzy can be confident that the fact that Billy does not take his thousand is predictive
Decision Theory and Indifference
131
of Suzy not taking her thousand. As well, Suzy’s not-taking her thousand is predictive of Billy’s not-taking his thousand. This is the weak part of Lewis’s argument, as Bermúdez argues: At the limit, where I believe that the other person is a perfect replica of me, I am committed to thinking that the two scenarios just envisaged are impossible. But if I think that two of the four available scenarios in the pay-off matrix of the PD are to all intents and purposes impossible, then I cannot believe that I am in a PD. In effect, what I am committed to believing is that my only two live alternatives are [...] the scenarios where we both cooperate or we both fail to cooperate. In other words, I am committed to thinking that I am in a completely different decision problem—in particular, that I am in a decision problem that is most certainly not a PD, because it lacks precisely the outcome scenarios that make the PD so puzzling. So, there is an inverse correlation between (5) (which tracks my degree of confidence in the similarity between me and the other player) and my degree of confidence that I am in a PD. Since Lewis’s argument that the PD and NP are really notational variants of a single problem rests upon (5), this means that Lewis’s argument effectively undermines itself. (Bermúdez 2013, 428)
Thus, according to Bermúdez, the weakness of Lewis’s argument lies in the fact that each player should, in an extreme case, leave out the possibility that the other player does something different. In moderate cases, each agent should assign a low degree of confidence to that possibility. I disagree with Bermúdez about the fact that the epistemic point of view of the agent should be considered, in the way he does, as relevant in order to conclude that the Prisoners’ Dilemma is a Newcomb problem. This focuses the discussion directly towards the reliability of the replica. Bermúdez bases a part of his counterargument on the extreme case in which the agent believes that the other prisoner is a perfect replica of him. However, as Lewis argues, there are replicas and replicas, that is, the other prisoner must not be a perfect copy of the agent: As Newcomb’s Problem is usually told, the predictive process involved is extremely reliable. But that is inessential. The disagreement between conceptions of rationality that gives the problem its interest arises even when the reliability of the process, as estimated by the agent, is quite poor—indeed, even when the agent judges that the predictive process will do little better than chance. (Lewis 1979, 238)
132
Chapter Eleven
Whether the replica is a perfect replica or not is irrelevant. As Arif Ahmed (2015, p. 113) remarks, it is simply not true that in a genuine Newcomb problem the agent must almost be certain that the other player will perform the same action. The only condition is that the reliability of the predictive process has to imply that one of the outcomes is more likely than the other, given the fact that one of the agent’s actions is carried out. From this follows that an agent should, according to evidential decision theory, take the thousand. Although the reliability of the predictive process is not relevant with regard to whether a decisional scenario is a Newcomb problem or not, it could be relevant, one may argue, regarding the fact that the Prisoners’ Dilemma is a Newcomb problem. Bermúdez argues for this latter option, claiming that the agent’s degree of belief about whether the other player will make the same decision as himself is relevant: In a typical game each player tries to formulate their best response to what the other players might do. But the other players are independent variables, in the sense that their actions are not determined by my actions. In a Lewis-style PD, however, this crucial condition is not met. The whole basis of his argument is that player A knows that player B will make the same choice as her. It is true that she doesn’t know, in advance of making her own choice, which choice the other player will make. But she does know what the other player’s response to each of her possible choices will be. She knows, in effect, that the other player’s action will duplicate her own action. For that reason she has enough information to know that she only needs to take into consideration possible outcomes where she and player B make the same choice. (Bermúdez 2015)
I disagree about considering the other prisoner’s decision as a “response” to the agent’s decision, as Bermúdez does. I also disagree with regard to the claim that the agent knows that the other prisoner is going to perform the same action as her. In the Prisoners’ Dilemma, what the agent knows is that the other player is a reliable replica of her. Analogously, what the agent knows in a Newcomb problem is that her decision was predicted or that people having a particular lesion in the arteries tend to increase their cholesterol intake. Anyhow, the fact that an agent knows this in those situations does not imply that she knows what is going to occur after she performs one of her possible actions. Even if we focus on the extreme case in which the other player’s action is considered as an almost perfect predictive process of the agent’s action, the possibility that both players make different choices is still open. Analogously, in the original version of Newcomb’s problem, even if the
Decision Theory and Indifference
133
agent thinks that the predictor is almost perfect, she shall not discard the possibility that she might get the million if she decides to take the two boxes. In both decision scenarios, whatever action the agent chooses to perform, the outcome will not counterfactually depend on it. Differing with what Bermúdez claims, I think that the ground of the argument is the following: (6) The agent knows that her decision has been predicted by a process p with a determined degree reliability, if, according to the predictor, the probabilities assigned to the performance of each of the agent’s possible actions, given p, are not equal. Note that the agent only knows that the predictive process is relatively reliable, but he does not have to assign any degree of confidence to it. Thus, proposition (5) is not relevant with regard to whether the Prisoners’ Dilemma is a Newcomb problem. Proposition (5) is similar to the following, which is the basis of the Prisoners’ Dilemma: (7) The agent knows that there is a replica r of her (the other prisoner), such that, depending on how exact the replica is, the instantiations of the agent’s options are not equiprobable, given whatever action r performs. Note that the agent does not have to know how exact the replica is. On the grounds of propositions (6) and (7), one does not have to assume proposition (5) in order to accept the conclusion drawn by Lewis regarding the Prisoners’ Dilemma and the structure of a Newcomb Problem. If one agrees that, in a decision scenario that satisfies conditions (1) and (2), proposition (7) can play the same role as proposition (6), then one should agree that proposition (3') can play the same role as (3). Thus, it seems right to conclude that the Prisoners’ Dilemma is a Newcomb Problem.
BIBLIOGRAPHY
Ahmed, A. 2015. Evidence, decision and causality. Cambridge University Press. Adler, S. & Orprecio, J. 2006. “The eyes have it: visual popout in infants and adults”. Developmental Science 9 (2) 189–206. Baumann, P. 2008. “Contrastivism rather than something else? On the limits of epistemic contrastivism”. Erkenntnis 69 (2):189 – 200. —. 2012. “Response to Schaffer’s Reply”. In Stefan Tolksdorf (ed.), Conceptions of Knowledge. De Gruyter 425-431. Bach-y-Rita, P. 1972. Brain Mechanisms in Sensory Substitution. Academic Press. Bak, P. 1996. How Nature Works. Springer. Bermúdez, J. L. 2013. “Prisoner’s dilemma and Newcomb’s problem: why Lewis’s argument fails”. Analysis 73 (3):423-429. —. 2015. “Strategic vs. Parametric choice in Newcomb’s Problem and the Prisoner’s Dilemma: Reply to Walker”. Philosophia 43 (3):787-794. Chaitin, G. J. 1969. “On the Simplicity and Speed of Programs for Computing Infinite Sets of Natural Numbers”. Journal of the ACM 16 (3): 407–422. Clayton, P. & Davies, P. 2006. The Re-Emergence of Emergence. Oxford University Press. Comesaña, J. & Sartorio, C. 2014. “DifferenceMaking in Epistemology”. Noûs 48 (2):368-387. Davies, P. 2006. “Preface”. In Clayton & Davies (eds.). The Re-Emergence of Emergence. Oxford University Press. de Back, D. Z., Kostova, E. B., van Kraaij, M., Van den Berg, T. K., & Van Bruggen, R. 2014. “Of macrophages and red blood cells; a complex love story”. Regulation of red cell life-span, erythropoiesis, senescence and clearance, 79. de Cristofaro, R. 2008. “A New Formulation of the Principle of Indifference”. Synthese 163 (3):329-339. Deacon, T. 2003. “The Hierarchic Logic of Emergence: Untangling the Interdependence of Evolution and Self-Organization”. In Weber and Depew (eds.). Evolution and Learning: The Baldwin Effect Reconsidered. MIT Press. Dowe, P. 2000. Physical Causation. Cambridge University Press. Dretske, F. 1981. Knowledge and the Flow of Information. MIT Press.
136
Bibliography
Ellis, G. 2006. “On the nature of emergent reality”. In Clayton & Davies (eds.). The Re-Emergence of Emergence. Oxford University Press. Emmeche, C. 2004. “Causal processes, semiosis, and consciousness”. In Seibt, J. (ed.). Process Theories: Crossdisciplinary Studies in Dynamic Categories. Dordrecht: Kluwer. Fuentes, M. 2014. “Complexity and the Emergence of Physical Properties”. Entropy 16 (8): 4489-4496. Fuhrmann, A. 2002. “Causal exclusion without explanatory exclusion”. Manuscrito 25 (3):177-198. Gärdenfors, P. 2000. Conceptual Spaces. MIT Press. Gärdenfors, P. & Warglien, M. 2015. “Meaning Negotiation”. In Gärdenfors & Zenker (eds.). Applications of Conceptual Spaces. Springer. Gatto, M. & Campbell, U. 2010. “Redundant causation from a sufficient cause perspective”. Epidemiologic Perspectives & Innovations 7(5). Gell-Mann, M. & Lloyd, S. 1996. “Information Measures, Effective Complexity, and Total Information”. Complexity 2(1): 44-52. Gibbard, A. & Harper, W. 1978. “Counterfactuals and Two Kinds of Expected Utility”. In A. Hooker, J. J. Leach & E. F. McClennen (eds.). Foundations and Applications of Decision Theory. D. Reidel. Hall, N. 2004. “Two concepts of causation”. In Collins, Hall & Paul (eds.). Causation and Counterfactuals. MIT Press. Halpern, J. & Pearl, J. 2005. “Causes and explanations: A structural-model approach. Part I: Causes”. British Journal for the Philosophy of Science 56 (4):843-887. Hawthorne, J., Landes, J., Wallmann, C., & Williamson, J. 2015. “The Principal Principle Implies the Principle of Indifference”. British Journal for the Philosophy of Science: axv030. Hitchcock, C. 1996. “The role of contrast in causal and explanatory claims”. Synthese 107 (3):395-419. —. 2013. “What is the 'Cause' in Causal Decision Theory?” Erkenntnis 78 (1):129-146. Hume, D. 1748/2008. An Enquiry concerning Human Understanding. Oxford University Press. Hutto, D. & Myin, E. 2012. Radicalizing Enactivism: Basic Minds Without Content. The MIT Press. Jeffrey, R. 1965. The Logic of Decision. University of Chicago Press. —. 2004. Subjective Probability: The Real Thing. Cambridge University Press. Keynes, J. M. 1921/2013. A Treatise on Probability. Dover. Kim, J. 1998. Mind in a Physical World. MIT Press.
Notes on Knowledge, Indifference and Redundancy
137
Kolmogorov, A. N. 1965. “Three Approaches to the Quantitative Definition of Information”. Problemy Peredachi Informatsii 1(1): 3-11. Lewis, D. 1969. Convention: A Philosophical Study. Harvard University Press. —. 1973a. “Causation”. Journal of Philosophy 70 (17):556-567. —. 1973b. Counterfactuals. Blackwell Publishers. —. 1979. “Prisoners’ dilemma is a Newcomb problem”. Philosophy and Public Affairs 8 (3):235-240. —. 1980. “A subjectivist’s guide to objective chance”. In Jeffrey (ed.). Studies in Inductive Logic and Probability. University of California Press. —. 1986. “Postscripts to 'Causation'”. In Philosophical Papers Vol. II. Oxford University Press. Lyre, H. 2011. “Is structural underdetermination possible?” Synthese 180 (2):235-247. Mackie, J. L. 1974. The Cement of the Universe. Oxford: Clarendon Press. Northcott, R. 2008. “Causation and contrast classes”. Philosophical Studies 139 (1):111-123. Norton, J. 2008. “Ignorance and Indifference”. Philosophy of Science 75 (1):45-68. Nozick, R. 1969. “Newcomb’s problem and two principles of choice”. In Rescher (ed.). Essays in Honor of Carl G. Hempel. Reidel 114-146. Ortix, C., Rijnbeek, J., & van den Brink, J. 2011. “Defect formation preempts dynamical symmetry breaking in closed quantum systems”. Physical Review B, 84(14), 144423. Paul, L. A. 2007. “Constitutive Overdetermination”. In J. K. Campbell, M. O’Rourke & H. S. Silverstein (eds.). Causation and Explanation. MIT Press 4-265. Paul, L. A. & Hall, N. 2013. Causation: A User's Guide. Oxford University Press. Russell, B. 1921. The Analysis of Mind. London: George Allen and Unwin. Sakai, T. & Haga, K. 2012. “Molecular genetic analysis of phototropism in Arabidopsis”. Plant & cell physiology 53 (9): 1517–34. Schaffer, J. 2005. “Contrastive knowledge”. In Gendler & Hawthorne (eds.). Oxford Studies in Epistemology Vol. 1. Oxford University Press. —. 2012. “Contrastive Knowledge: Reply to Baumann”. In Tolksdorf (ed.). The Concept of Knowledge. Walter de Gruyter. Scheibe, E. 1994. “On the mathematical overdetermination of physics”. In Rudolph & Stamatescu (eds.). Philosophy, mathematics and modern physics. Serences, J. T., & Kastner, S. 2014. “A multi-level account of selective
138
Bibliography
attention”. In The Oxford Handbook of Attention. Oxford University Press. Shapiro, L. & Sober, E. 2012. “Against proportionality”. Analysis 72 (1):89-93. Skyrms, B. 1980. Causal necessity: a pragmatic investigation of the necessity of laws. Yale University Press. —. 2010. Signals: Evolution, Learning, and Information. Oxford University Press. Smith, M. 2015. “Evidential Incomparability and the Principle of Indifference”. Erkenntnis 80 (3):605-616. Standish, R.K. 2001. “On complexity and emergence”. Complex International. 9 (1): 1-6. van de Poel, I., Fahlquist, J. N., Doorn, N., Zwart, S., & Royakkers, L. 2012. “The problem of many hands: Climate change as an example”. Science and engineering ethics, 18(1), 49-67. Woodward, J. 2003. Making Things Happen: A Theory of Causal Explanation. Oxford University Press. —. 2016. “The problem of variable choice”. Synthese:1047-1072. Ylikoski, P. 2013. “Causal and Constitutive Explanation Compared”. Erkenntnis 78 (2):277-297.
INDEX
actual causation, 110 attention, 24-26, 46 capacity, 23-24, 49, 100-103, 105 causal model, 109-116 chance, 50, 81-83, 131 communication, vii, ix, 5, 24, 36-38, 42, 46 complexity, viii, 1-3, 6-12, 33, 49, 105 connectionism, ix, 17-19, 30-33 consciousness, 48-52 content, 6-10, 19-26, 43-44, 48, 5152 contrastive knowledge, 71-74 contrastive causation, 118 context, x, 5-9, 12, 48, 53-69, 72-75, 80-81, 96, 98, 101, 105-107, 116-120 counterfactual, x, 53, 56, 59-60, 8994, 101, 104-105, 111, 117, 124126, 133 defeater, 82-83 disposition, 100, 103 emergence, viii-ix, 1-13, 17-19, 24, 31-33, 36, 43, 45, 48-51 embodiment, 21-22 enaction, 15, 18-26, 51 environment, vii, ix, 4-5, 9, 15-17, 20-26, 33-34, 41, 45-46, 49-52 evidential support, x-xi, 55, 58-63, 74-75, 77-83, 87, 107, 124 experience, ix, 18-21, 28-29, 36, 45, 48-52, 73 explanation, 4-5, 12, 29, 54, 98, 100-103, 109 intentionality, 16, 22-23, 26, 42, 106 intervention, 56, 60, 62, 99, 104, 110, 112-113 invariance of ignorance, 85, 87
justification, 12, 58-61 macrolanguage, 10-12 meaning, ix, 5, 18, 21, 30-31, 34-38, 41-45, 48-51 mental causation, 96-97 microlanguage, 9-13 mind, 20, 24-25, 28, 58, 60-61, 98 indifference, vii-xi, 2, 36-39, 44-45, 48, 51-52, 53, 57, 77-87, 126127 perception, vii, ix, 19-28, 36, 42-43, 51 phenomenal interpretation, 28-29 preemption, x, 90-96, 104-106 proportionality, 116-117 proposition, viii, 30, 35, 53-75, 7983, 87, 90-92, 94, 117-120, 130, 133 redundancy, ix-x, 32, 41, 46-48, 52, 89-93, 97-107 relevance, vii, ix, xi, 3-4, 9, 15, 2426, 32, 37, 46-48, 52, 55-57, 5965, 67-69, 73, 75, 78, 82-83, 92, 95, 104, 106, 112, 116-119, 126, 128, 131-133 regularity, 7-8, 10, 12-13, 32, 45, 50-52, 90-94 representation, ix, 15-33, 37-38, 43, 111 responsibility, 105-107 scientific interpretation, 28-29 signal, ix, 41-52 survival, 42, 127-128 system, viii-ix, 1-5, 9-12, 15, 17-20, 25, 28, 30-34, 41, 44-46, 49-52, 60-62, 90, 97-104 variable, viii, xi, 16-17, 33, 67, 9799, 104, 109-116, 132