Reason Better: An Interdisciplinary Guide to Critical Thinking

It’s time the standard critical thinking curriculum was rethought. Reason Better: An Interdisciplinary Guide to Critical

185 34 16MB

English Pages 282 Year 2019

Table of contents :
TopHat_Covers_REASON_BETTER_Manley
Contents & Thanks _ Top Hat
1. Reasoning _ Top Hat
2. Mindset _ Top Hat
3. Clarity _ Top Hat
4. Entailment _ Top Hat
5. Evidence _ Top Hat
6. Generalizations _ Top Hat
7. Causes _ Top Hat
9. Theories _ Top Hat
10. Decisions _ Top Hat

Recommend Papers

HBR Guide to Critical Thinking 9781647824471, 9781647824464

Tackle complex situations with critical thinking. You're facing a problem at work. There are many ways you can app

109 89 3MB Read more

An Introduction to Critical Thinking 9788131734568, 8131734560

Description The ability to think clearly and critically is necessary forsuccess in every walk of life and is especially

698 73 1MB Read more

Molecular Modeling and Simulation: An Interdisciplinary Guide: An Interdisciplinary Guide (Interdisciplinary Applied Mathematics, 21) 1441963502, 9781441963505

This book evolved from an interdisciplinary graduate course entitled 'Molecular Modeling' developed at New Yor

109 47 30MB Read more

Better Guide than Reason - Federalists and Anti-Federalists 1560001313, 9781560001317

In this seminal volume, M. E. Bradford defines the Old Whig political tradition in American thought, showing that the in

159 115 2MB Read more

Critical Feminism and Critical Education: An Interdisciplinary Approach to Teacher Education 2015047179, 9781138120563, 9781315651651

440 62 3MB Read more

Guide to Teaching Data Science: An Interdisciplinary Approach 3031247574, 9783031247576

Data science is a new field that touches on almost every domain of our lives, and thus it is taught in a variety of envi

276 5 8MB Read more

Thinking Through Material Culture: An Interdisciplinary Perspective 9780812202496

Thinking Through Material Culture provides a new theoretical framework for understanding the pivotal role of material cu

104 42 6MB Read more

Turbulent Sounds: An Interdisciplinary Guide 9783110226584, 9783110226577

No sound class requires so much basic knowledge of phonology, acoustics, aerodynamics, and speech production as obstruen

167 0 7MB Read more

Thinking like an Economist: A Guide to Rational Decision Making

416 91 4MB Read more

Come, Let Us Reason: An Introduction to Logical Thinking [Paperback ed.] 0801038367, 9780801038365

The perfect introductory textbook, this simplified study of logic prepares readers to reason thoughtfully and to spot il

1,032 55 41MB Read more

Reason Better: An Interdisciplinary Guide to Critical Thinking

Author / Uploaded
David Manley

Commentary
Interactive features will be missing, but the quizzes are present.

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Exported for Jason Bao on Tue, 12 Sep 2023 21:28:11 GMT

Reason Better An interdisciplinary guide to critical thinking

Table of Contents         1 | Reasoning                        What it takes                                  · Specific vs. general skills                                 · The right mindset                        Our complex minds                               · Two systems                                 · Direct control                                 · Transparency                                  · Effort                                 · Clarifications                                 · Systems in conflict                                 · A metaphor                          Guiding the mind                                  · Distracted minds                                  · Shortcuts                                 · Motivated minds                                  · A closing caveat

          2 | Mindset                              Curious                                  · Defense or discovery?                                  · Accurate beliefs                          Thorough                                · Search for possibilities                                · Search for evidence                           Open

                                · Decoupling                                 · The bias blindspot                                 · Considering the opposite                                  · Openness to revision

         3 | Clarity                          Clear inferences                                 · The two elements                                 · Suppositional strength                                  · Implicit premises                                · Deductive vs. inductive                                 · The tradeoff                                  · The ground floor                           Clear interpretation                                   · Standard form                                   · Interpretive charity                                   · Reconstruction                           Clear language                                 · Ambiguity                                   · Vagueness                                   · Sharp Borders Fallacy

          4 | Entailment                          Deductive validity                                   · Step by step                                   · Flipping the argument                           Logical form                                   · Argument recipes                                   · Some valid sentential forms                                   · Some valid predicate forms                                   · The limits of logical form                           Pitfalls                                 · Overlooking validity                                   · Biased evaluation                                   · Some invalid forms

          5 | Evidence

                       What is evidence?                                 · The evidence test                                   · The strength test                                   · Evidence & probability                          Selection effects                                   · Survival & attrition                                   · Selective recall                                   · Selective noticing                         Media biases                                   · News and fear                                   · Echo chambers                                   · Research media

          6 | Generalizations                         Samples as evidence                                   · Selection effects                                   · Sample size                                   · The law of large numbers                           Better samples                                 · Big enough                                   · Sampling methods                                   · Survey pitfalls                         The big picture                                   · Measures of centrality                                   · The shape of the data                                   · Misleading presentations                          Thinking proportionally                                  · Loose generalizations                                   · Representativeness heuristic

           7 | Causes                         Causal thinking                                   · An instinct for causal stories                                   · One thing after another                                   · Complex causes                          Causes and correlations

                                  · The nature of correlation                                   · Illusory correlations                                   · Generalizing correlations                            Misleading correlations                                 · Reverse causation                                   · Common cause                                   · Side effects                                   · Regression to the mean                                 · Mere chance                                   · Evidence & experiments

          8 | Updating                           How to update                                   · The updating rule                                   · The die is cast                                   · More visuals                                   · The detective                           Probability Pitfalls                                   · One-sided strength testing                                   · Base rate neglect                                   · Selective updating                                   · Heads I win; tails we're even

          9 | Theories                          Compound claims                                   · Conjunctions                                   · Disjunctions                          Criteria of theory choice                                 · Coherence                                   · Simplicity                                   · Breadth                                   · A case study                           The best explanation                                   · When the best explanation is probably false                                 · IBE and statistical generalization                           The scientific method

                                  · The order of observation                                   · Ad hoc hijinks

        10 | Decisions                          The logic of decisions                                 · Possible outcomes                                 · Expected monetary value                                 · Mo' money, less marginal utility                                 · The value of everything else                                   · Expected utility                           Decision Pitfalls                                   · Outcome framing                                   · New vs. old risks                                   · The endowment effect                                   · The possibility and certainty effects                                   · Honoring sunk costs                                   · Time-inconsistent utilities

Thanks Many people were helpful in the development of this text. I'd like to especially thank Anna Edmonds and Eduardo Villanueva, who were amazing head graduate student instructors in my critical reasoning courses, gave incredibly helpful feedback on the text, and helped develop innovative section materials that they've allowed me to share with users of the text. I'd also like to thank several graduate students at the University of Michigan who helped me with these materials, especially Mara Bollard, Brendan Mooney, Sumeet Patwardhan, and Laura Soter. Their feedback and implementation has been invaluable. I'd also like to thank some early adopters who have sent very useful comments on the text and their experiences in their classes, especially Philip Robbins, David Fisher, and Timothy Houk. Others, too, have read and provided very helpful feedback on (at least parts of) the text, including Eric Berlin and Ian Fishback.

Reason Better: An Interdisciplinary Guide to Critical Thinking   © 2019 David Manley. All rights reserved version 1.4 Exported for Jason Bao on Tue, 12 Sep 2023 21:28:11 GMT

Exported for Jason Bao on Tue, 12 Sep 2023 21:16:33 GMT

Reason Better An interdisciplinary guide to critical thinking

Chapter 1. Reasoning

Introduction Thinking is effortless enough when we're just letting our minds wander. But reasoning—especially good reasoning—is hard. It requires us to cut through all the irrelevant noise and form our beliefs in ways that reliably reflect how things are. Good reasoning helps us acquire accurate beliefs and make good decisions. But, instead, we often use our reasoning skills to justify our prior beliefs and actions, which just cements our original mistakes. As the evidence from cognitive psychology shows, we're especially likely to do this when we're thinking about the things that are most important to us—all the while believing ourselves to be free of bias. This text is not intended to help you build a more persuasive defense of the beliefs you already have and the actions you've already chosen. Rather, it's intended to help you reason better, so you can develop

more accurate beliefs and make better choices. To this end, we will draw from several disciplines to identify the most useful reasoning tools. Each discipline will shed light on a different aspect of our subject: cognitive psychology: how systematic cognitive errors unfold, and can be fixed philosophy: how to clarify our inferences and understand the nature of evidence statistics: how to make reliable generalizations probability theory: how to adjust our confidence in response to evidence decision theory and behavioral economics: how the logic of decision-making works and why we so often make choices that are predictably irrational. The process of gathering and organizing this material has made me better at reasoning; my hope is that reading this book will bring the same benefits to you.

Learning objectives By the end of this chapter, you should understand: the difference between specific and general reasoning skills why strong reasoning requires facts and skills, but also the right mindset which features distinguish the processes of Systems 1 and 2 how cognitive illusions arise from conflicts between these systems the broad outlines of the availability heuristic, the evidence primacy effect, belief perseverance, confirmation bias, and motivated reasoning

1.1 What it takes This book contains a lot of information about good and bad reasoning, but the main point is not to learn facts about good reasoning: it's to acquire the skill of reasoning well.

It would be great if we could improve our reasoning ability just by learning how good reasoning works. But unfortunately that's not enough, and it often doesn't help at all. Some of the systematic errors we make are so ingrained that we continue to fall prey to them even when we are tested immediately after learning about them! [1] So we need more than new facts; we need new skills. And as in any other area of life, developing skills takes discipline and practice. Athletic ability is a useful analogy. You can know all about the moves, strategies, and training regimen needed to excel at a sport. But all of that knowledge isn't enough. Being good at the sport also requires training hard to develop a certain skill set. The same is true for good reasoning. Here's another way that reasoning skills are like any other skill: if we've developed bad habits, we must start by unlearning those habits first. People who have been swinging golf clubs badly or playing the cello without any training must retrain their muscle memory if they want to excel. Likewise, the science of cognition has uncovered many bad habits and instincts that we all exhibit in our reasoning. And since we've been forming beliefs and making decisions all our lives, we've been reinforcing some of those bad habits and instincts. We'll have to unlearn them first before we can become good reasoners.

Specific vs. general skills We've all been told that a major reason to go to college is that doing so makes us better at reasoning. But the reality is that college students spend almost all their time developing a number of very specific ways of reasoning. They take specialized courses and are taught how to think like accountants, biologists, art historians, engineers, or philosophers. Do these "domain-specific" reasoning skills have anything in common? And do they really help our general reasoning ability? The answer appears to be a qualified yes: a standard college education, with its array of narrow subject areas, does improve our general reasoning ability to some degree. At the same time, there is evidence that explicit instruction in critical reasoning is even more effective at improving our general reasoning ability. [2] If our goal is to become skilled at reasoning, then, it's useful to focus on the general features of good thinking. Many of the reasoning errors we'll discuss are made by people in all walks of life—even by

experts in areas like medicine and law. If we can fix those errors, we'll reason better, no matter what our career or area of interest. A similar thing holds true in sports. Training for almost any sport will enhance one's general athleticism to some degree. But well-rounded athletes make sure to have the foundations of physical fitness in place —endurance, flexibility, balance, and coordination—and can't spend all their time on specific skills like scoring three-pointers or making corner-kicks. So, to supplement your more specialized courses, this course will help you develop the foundations of reasoning. Here are some of the things it will help you do: understand the structure of our reasons for beliefs and actions draw appropriate conclusions using logic and probability search for evidence in a way that avoids selection effects understand when and how to generalize on our observations identify correlations and distinguish them from causal relationships assess theories for plausibility and explanatory power make good decisions by comparing the value and probability of outcomes These general reasoning skills should be of use to you regardless of how you specialize.

The right mindset Even more important than the skills on this list, however, is a mindset that allows for good reasoning. [3] Once again, an analogy to sports is helpful. Think of a highly skilled athlete who only cares about looking good on the field rather than helping the team win or growing as an athlete. This person might be highly skilled but is held back by the wrong mindset. Similarly, if I have great reasoning skills but I only use them to justify my pre-existing views or to persuade others, I'm not putting those skills to good use. And if my views happen to be wrong to begin with, I'll just compound my original errors as I become more confident in them. The right mindset for reasoning requires honestly wanting to know how the world really is. This leads us to actively seek evidence and continuously revise our own beliefs. (These themes will recur throughout the text.) In the rest of this chapter, we'll consider insights from cognitive psychology regarding how we form beliefs, and why we make certain predictable mistakes. Then, in Chapter 2, we'll look more closely at the kind of mindset needed to resist those systematic errors.

Section Questions 1-1 Learning all about how to reason well...

A

is not enough to become a good reasoner; the right skills and mindset are also necessary

B

is enough to become a good reasoner as long as your knowledge is paired with the right skills

C

is not important because, just like in sports, the only thing that matters is skill

D

is unnecessary: all you need is the right mindset of curiosity, openness, and perseverance

1-2 Focusing on general reasoning skills and not just specific reasoning skills...

A

is important because acquiring specific reasoning skills does not improve general reasoning skills

B

is the only way to become better at reasoning

C

is a more effective way to improve general reasoning skills

D

is unhelpful because acquiring specific reasoning skills is just as effective a way to become a good reasoner in general

1.2 Our complex minds The ancient Greek philosopher Plato compared the human soul to a team of horses driven by a charioteer. The horses represent the desires and emotions that motivate us, pulling us along. The charioteer represents our Reason, which controls the horses and reins in bad impulses. (A similar analogy can be found in the Hindu Upanishads.) Unfortunately, this analogy is misleading. Plato lumps together all of the aspects of thinking and deciding as though performed by a single "part" that is fully under our control. But thinking and deciding are actually performed by a complex web of processes, only some of which are conscious and deliberate. Certainly, it may feel like our souls are directed by a unified charioteer of Reason that reflects on our perceptions and makes deliberate choices. In reality, though, our conscious mind is only doing a fraction of the work when we form beliefs about the world and decide how to act.

Two systems Using a common theme from cognitive psychology, we can divide our thought processes into two groups, based on various features they share. [4] On the one hand, we have a set of conscious and deliberative processes that we primarily use when we: figure out how to fix a sink; work on a difficult math problem; decide whether to bring an umbrella; or weigh the pros and cons of a policy proposal. On the other hand, we have a set of cognitive processes that operate beneath our awareness, but that are no less essential to how we form beliefs and make decisions. They interpret sensory information for us, provide us with impressions and hunches, and help us make "snap judgments". We're primarily using this set of processes when we: recognize people's faces; sense that someone is angry, using cues like body language and tone; get the sudden impression that something is scary or disgusting; or instantly sense the sizes and locations of objects around us through vision.

These specialized processes seem to operate automatically in the background. For example, if you have normal facial recognition abilities, you can recognize people you know well without consciously trying. You don't need to actively think about their identity; your facial recognition system just instantly tells you who each person is. By contrast, some people have a severe impairment of this process called prosopagnosia. [5] When they see the faces of people they know, they just don't experience that flash of instant recognition. Because of this, they need to compensate by memorizing distinctive features and then consciously working out whose face they're looking at. But this is much harder, slower, and less accurate than relying on subconscious facial recognition. Just take a moment to reflect on how different it would feel if you had to consciously recall the features of a loved one's face in order to recognize him or her. This should give you a good sense of the difference between these two types of processes. So what should we call these two types of processes? The convention in cognitive psychology is to bundle together the faster, more automatic processes and call them System 1, and to bundle together the slower, more deliberate processes and call them System 2. (Cognitive psychology is not known for its creative labels.) To help remember which is which, note that System 1 has its name for two reasons: first, it is more primitive, since it evolved earlier and is common to many non-human animals; and second, it reacts faster to sensory input, so it is the first process to respond. Because we are far more aware of System 2's activities, it's tempting to assume that we form our beliefs and make our decisions using only our conscious and deliberate system. In addition, whenever we actually reflect on or explain why we believe something, we have to use System 2. But that doesn't mean the beliefs were formed by System 2. For example, suppose I make a snap judgment that someone is angry, based on something about her facial expression or behavior. It's likely that I don't have direct access to the factors that gave rise to this snap judgment. But if you ask me why I think she's angry, the explanation I provide will be a kind of reconstruction that makes it sound like I was using System 2 the whole time. Let's dig a little deeper into three features (aside from speed) that distinguish the two types of processes: direct control, transparency, and effort.

Direct control We consciously decide whether to work on a difficult math problem, or figure out how to fix the sink. By contrast, System 1 processes operate automatically and can't be turned on or off at will.

For example, if I see someone I know well, I can't just choose not to recognize them. Likewise, when I hear someone speaking English, the sounds coming from their mouths are automatically recognized as meaningful words. I can't just choose to turn this interpretive system off and hear words in my own language as merely noise, the way a completely unknown language sounds to me. (This is why it's much harder to "tune out" people speaking in a language you understand!) Here's another example. If you look straight-on at the checkerboard on the left, your visual system will tell you that area A is darker than area B.

But, in fact, they are exactly the same shade. Your eyes are receiving the same wavelength and intensity of light from those two places. (You can verify this using the image on the right, or by covering up the rest of the image with paper.) However, even when you realize that those squares on your screen are the same color, you can't make your visual system stop seeing them as different. You can't directly control how your brain interprets the image, not in the same way you can control how you go about solving a math problem.

Transparency When we work on a difficult math problem or figure out how to fix the sink, we are aware of all the individual steps of our reasoning. The process we use to reach our conclusion is open to our conscious inspection. In other words, System 2 is fairly transparent: we can "see" into it and observe the reasoning process itself. System 1 processes, by contrast, are not very transparent. When you look at the checkerboard image, your visual system also goes through a process to reach a conclusion about the colors of the squares. The process it performs is complex: starting with raw visual information from your retinas, combining it, and automatically adjusting for environmental cues. The illusion arises from a sophisticated way of taking into account the apparent distribution of shade in the image. Your visual system adjusts for the

fact that if square B is in the shade, and square A is in direct light, the only way your retinas would get the same input from them is if the squares were different shades of gray. So it concludes that they are different shades, then tells you that by making them look different, even though your retinas are getting the same input from those two squares. But all of this adjustment and interpretation happens below the threshold of awareness. You can't turn it off, and you can't even sense it happening, let alone look inside the process to see what complex cues your visual system is using to conclude that the squares are the same shade. Likewise for the processes that recognize faces, interpret sounds as meaningful words, and figure out how to catch a ball with a parabolic trajectory. The outputs of these processes may reach your conscious awareness as impressions, but all the information processing happens under the hood. Sometimes even the outputs of these processes aren't worth bringing to your attention. For example, there are processes that monitor sounds while you're sleeping, and don't bother waking you unless the sounds seem to matter. Even when you're unconscious, these processes monitor words people are saying and respond differently to important words, such as your name. [6] The fact that System 1 operates below the threshold of awareness might seem like a weakness, but it's actually crucial to our sanity. Reading this text, you have tuned out all kinds of sensory inputs: background noise, things in your peripheral vision, and the sensations in your feet that you didn't notice until I mentioned them just now. You have System 1 processes that monitor these inputs in case any of them deserves to be brought to your attention while you focus on reading. We may, for example, sense danger before we have any conscious idea of a specific threat. The same goes for the numerous other social and contextual cues that are being interpreted all the time by System 1. [7] If we constantly had to pay conscious attention to all of our sensory inputs, we would be overwhelmed and completely unable to function.

Effort For most of us, multiplying two-digit numbers in our heads is hard. We don't notice it, but our pupils dilate and our heart accelerates with the effort. (In one set of studies, subjects were told to wait a while and then, at a time of their choosing, solve a multiplication problem in their heads. Meanwhile, psychologists who were measuring their pupils could tell exactly when the subjects started solving the multiplication problem.)

Unlike System 1 tasks, all System 2 tasks require effort. Consider the last time you weighed the pros and cons of a decision. How long did you actually think in a single sitting? It's one thing to sleep on a problem, idly turning it over for a few days. It's another thing to actually sit down and think hard, even for five minutes. Seriously. Try setting a timer, closing your eyes, and actually thinking hard about a problem for five minutes without distraction. You probably have some life problems that would benefit from this exercise. But my guess is that you won't actually try this because it sounds too difficult; and if you do try, you'll find it very hard not to let your mind wander. Very few of us regularly force ourselves to think hard. This means we are avoiding the effort of really engaging System 2: it's just too strenuous and we don't have the mental endurance. We are what psychologists call cognitive misers: averse to the energy expenditure required by activating System 2 for very long. [8] In other words, we avoid thinking hard about things for the same reason we avoid exercising: we're lazy. This is a great shame because thinking hard (like exercise) brings great benefits. Of course, we don't get this feeling of effort when we use System 1 processes. Imagine that someone throws you a ball. Using only a couple of seconds of visual input, your automatic processes can work out where you need to position your hands to catch it. This involves approximating some complex equations in Newtonian physics to calculate the ball's trajectory, but it all happens effortlessly. When we consciously work out difficult math problems, our brains fatigue very quickly. But no one ever said, "I'm so tired from sitting here while my visual system interprets the two visual feeds from my eyes, integrates them into a three-dimensional field of objects, and calculates the trajectory of objects in that field." A great strength of System 1 processes is that, though limited to performing specific tasks, they perform those tasks extremely quickly and well. This is remarkable because, as a sheer matter of processing complexity, recognizing faces and words is much harder than multiplying three-digit numbers. This is why, even though the first mechanical calculator was invented in the 17th century, we have only recently managed to create software that can recognize speech and faces as accurately as humans do, after billions of dollars in investment.

Clarifications Now that we grasp the basic distinction between the two systems, I want to make some crucial clarifications. First, it's important not to be misled by the "system" labels. What we're describing are not two discrete units in the mind, but rather two types of processes. (In particular, System 1 is not very unified, being composed of many specialized processes in different areas of the brain.) Furthermore, the distinction is really one of degree: in reality, there is an array of processes that are more or less slow, transparent,

controlled, and effortful. We call something a System 1 process if it leans to one side of this continuum, and a System 2 process if it leans to the other. But there will also be cases in the middle where neither term fits. Second, I've used certain examples of tasks that are primarily performed by processes of each type. But again, things are more complex than that. For one thing, a given cognitive task might be performed in a controlled, conscious way on some occasions, and automatically on others (like breathing). A musician who starts off needing to consciously focus on a tough fingering task, might perform it automatically after enough practice. This kind of muscle memory allows us to off-load some tasks onto System 1, freeing up System 2 for other work. In addition, many cognitive tasks involve a complex combination of both types of processes. For example, when considering the pros and cons of a public policy, you are primarily using conscious deliberation. But System 1 is also at work, giving you subtle impressions of positivity and negativity, some of which have their source in subconscious associations. As we will see, one of the most important reasoning skills you can develop is the ability to notice such impressions and moderate their influence with specific strategies.

Systems in conflict Consider again the checkerboard above. Even once you've become convinced that the two gray squares are the same shade, your visual system will keep telling you they're different. In this sense, your conscious, deliberative reasoning system and your automatic visual system are each telling you something different. Likewise, if you let your eyes wander over the image below, you'll get the impression that the dots in the image are gently moving. In fact, the image is completely still. (If you freeze your gaze for several seconds, you should stop seeing movement. Or, if you need more proof, take a screenshot!) But even once you've realized this, you can't just command your visual system to stop seeing movement. As long as you let your eyes explore the image, your visual system will keep telling you that the dots are moving. How it interprets the image is not under your direct control. (Getting the motion to stop by freezing your gaze only counts as indirect control, like staying still so that a squirrel doesn't run away.)

When our visual system keeps getting things wrong in a systematic way even after we realize that it's doing so, we call that a visual illusion. But other System 1 processes can also get things wrong in a systematic way even after we realize that they're doing so. We can call this more general category of errors cognitive illusions. [9] For example, it's common for people to be afraid of flying even though they know perfectly well that commercial jets are extremely safe—about a hundred times safer than driving the same distance. [10] Our automatic danger-monitoring process can keep telling us that we are in mortal danger even when we know at a conscious level that we are safe, and we can't turn off that System 1 process. We can't just reason with a low-level process that evolved over millions of years to keep animals from falling from trees or cliffs. And it doesn't care about safety statistics; it just feels very strongly that we should not be strapped into a seat forty thousand feet above the ground. Here is a useful interview of the Nobel Prize-winning psychologist Daniel Kahneman, in which he describes the strengths and weaknesses of the two systems, and introduces the notion of a cognitive illusion:

Video Please visit the textbook on a web or mobile device to view video content.

As we will see throughout this text, System 1 is subject to many other kinds of systematic and predictable errors that impair our thinking. Good reasoning requires us to notice the influence of subconscious processes and discern which are trustworthy and which are not.

A metaphor Despite their limitations, metaphors can be useful. At the outset of this section, we encountered a picture of the mind as made up of horses and a charioteer. But the psychologist Jonathan Haidt suggests an alternative that better suits what we've learned about the mind. To describe the relationship between System 1 and System 2, he uses the analogy of an elephant and rider The rider, he says, is

"conscious, controlled thought," whereas the elephant includes "gut feelings, intuitions, and snap judgments." [11]

The elephant and the rider each have their own intelligence, and when they work together well they enable the unique brilliance of human beings. But they don't always work together well.                 —Jonathan Haidt, The Happiness Hypothesis This metaphor aptly captures the elements of control, transparency, and effort. Unlike a charioteer, whose reins attach to bits in the horses' mouths, an elephant rider has no chance of directly controlling the animal as it walks a given path. The elephant has a mind of its own, the inner workings of which are hidden from the rider. Moreover, from the rider's perspective, the elephant's actions are automatic and effortless. (If the elephant is well-trained, the rider can even doze off now and then!) The rider's role is to know the ultimate destination, to make plans, and to guide the elephant in that direction. But an elephant can't be steered like a car. Elephants are extremely intelligent and will follow a path all on their own, making good decisions along the way about where to step, and so on. If the elephant knows the path well, having been trained over many occasions to walk that same path, it needs little to no input from the rider. In many situations, however, the rider must be alert and ready to correct the animal. The elephant may not know the final destination, and may even have its own preferences about what to do along the way. It is liable to get distracted, spooked, or tired. It may wander towards a different goal, or choose a treacherous shortcut. In such situations, the rider can't rely on brute force: the elephant must be guided, coaxed, or even tricked. It takes repetitive training for the elephant to become responsive to the rider's corrections. Our minds, then, are both rider and elephant. On the path of good reasoning, we need a well-trained and responsive elephant as well as a discerning rider who knows when to trust the animal and when to nudge it in a different direction.

Section Questions 1-3 In this section, the example of prosopagnosia was primarily used to illustrate...

A

the difference between the process that recognizes faces and the process that interprets emotions

B

the difference between transparency and effort in facial recognition

C

the fact that facial recognition occurs in a specialized region of the brain

D

how different it would feel if we had to use System 2 to recognize faces

1-4 System 1 has the name it does because...

A

it is the most important system, and therefore considered primary

B

it is older and responds more quickly in a given situation

C

it was the first to be identified by cognitive psychologists who study thought processes

D

it is more accurate and effective and therefore considered primary

1-5 The "transparency" of System 2 refers to the fact that

A

its processes can be turned on or off at will

B

its reasoning process itself is open to our awareness

C

our threat-detection system has innate knowledge of several ancient threats to humans

D

it cannot be monitored because it is invisible

1-6 Visual illusions are like cognitive illusions in that...

A

they illustrate how System 1 can be trained to become more accurate in automatic judgments

B

the illusions do not arise at all for people who are sufficiently careful to monitor their System 1

C

they show us that we can't know the truth about how the world really is

D

it is hard to shake the incorrect impression even after we are aware that it is incorrect

1.3 Guiding the mind Although much of our everyday reasoning is very good, we are sometimes faced with reasoning tasks that the human mind is not well-suited to perform. In such situations, we may feel that we are reasoning perfectly well, while actually falling prey to systematic errors. Avoiding these pitfalls requires an attentive rider who is ready to correct the elephant when needed. Eventually, with enough training, the elephant may get better at avoiding them all on its own. The errors that I call cognitive pitfalls include not only mental glitches uncovered by cognitive psychologists, but also errors in probabilistic reasoning, common mistakes in decision making, and

even some logical fallacies. In short, when we tend to mess up reasoning in a systematic way, that counts as a cognitive pitfall. We will encounter many more cognitive pitfalls in subsequent chapters, but it is worth introducing a few examples in order to illustrate why we run into them. We can break these down into three sources of error: we like to take shortcuts rather than reason effortfully; we hold onto beliefs without good reason; and we have motivations for our beliefs that conflict with accuracy. All of these errors tend to go unnoticed by our conscious thought processes. Moreover, thinking in a hurried and impulsive manner makes us more susceptible to them.

Shortcuts The elephant often knows where to go, but there are points along the way where its inclinations are not reliable. In those cases, if the rider isn't careful, the elephant may decide on a "shortcut" that actually leads away from the path. One of the most important skills in reasoning is learning when we can trust the elephant's impulses and when we can't. Too often, when faced with a question that we should answer through effortful reasoning, we simply allow System 1 to guess. This is a cognitive shortcut we use to avoid the effort of System 2 thinking. But for the sort of question that System 1 is not naturally good at answering, it'll hand us an answer using a quick process that's ill-suited to the situation. For example, we answer extremely simple math problems using System 1. If I ask you what 2 + 2 equals, the answer just pops into your head effortlessly. Now consider this question: A bat and a ball together cost $1.10. The bat costs a dollar more than the ball. How much does the ball cost? Take a moment to answer the question before reading on. The answer might seem obvious, but what if I told you that most Ivy League students, and 80% of the general public, get it wrong? If the answer took virtually no effort on your part, that's a sign that you just let your System 1 answer it, and you might want to go back and check your answer.

Nearly everyone's first impulse is to say that the ball costs 10 cents. This answer jumps out at us because subtracting $1 from $1.10 is so easy that it's automatic. But some people question that initial reaction, at which point they notice that if the bat costs $1 and the ball costs 10 cents, the bat does not actually cost a dollar more than the ball: it only costs 90 cents more. We don't get the answer wrong because the math is difficult. Instead, the problem is that a plausible answer jumps out, tempting us to use no additional effort. The tendency to override that temptation is what psychologists call cognitive reflection, and the bat-and-ball question is one of a series of questions used to measure it. What makes the question so hard is precisely that it seems so easy. In fact, when people are given versions of the bat-and-ball problem with slightly more complex numbers, they tend to do better. Faced with a trickier math problem in which no answer seems obvious, we have to actually engage System 2. And once we've done that, we're more likely to notice that we can't solve the problem simply by subtracting one number from the other. [12] Here's another example. Two bags are sitting on a table. One bag has two apples in it, and the other has an apple and an orange. With no idea which bag is which, you reach into one and grab something at random. It's an apple. Given this, how likely is it that the bag you reached into contains the orange? Think about this long enough to reach an answer before moving on. The answer that jumps out to most people is that the probability is 50%, but it's easy to see why that can't be right. You were more likely to pull out an apple at random from a bag if it only had apples in it. So pulling out an apple must provide some evidence that the bag has only apples. Before you reached into the bag, the probability it contained only apples was 50%, so that probability should be higher now. (As we'll see, this is one of the central principles of evidence: when you get evidence for a hypothesis, you should increase your confidence in that hypothesis.) If you are like most people, this explanation still won't dislodge your intuition that the right answer should be 50%. It might help to consider the fact that now there are three unknown fruits left, and only one of them is in the bag you reached into. In Chapter 8, you'll learn the principles of probability that explain how to work out the right answer. For now, though, the point is that our brains are not very good at even simple assessments of probability. This is important to know about ourselves, because probability judgments are unavoidable in everyday life as well as many professional contexts. These two examples illustrate why we should be suspicious of answers handed to us by System 1. In some cases—such as the bat-and-ball example—we can start by second-guessing our initial reaction

and then simply calculate the correct answer using System 2. In other cases—such as the bags-of-fruit example—we may not know how to calculate the correct answer. But at least we should think twice before assuming that our intuition is correct! The first and most important step is noticing when we don't really know the answer. Here is a third kind of case where System 1 tends to get thrown by a tempting answer. Suppose you're wondering how common some kind of event is—for example, how often do people die from shark attacks, tornados, or accidents with furniture? An easy way to guess is by asking ourselves how easily we can bring to mind examples from our memory. The more difficult it is to summon an example, the more uncommon the event—or so it might seem. As far as System 1 is concerned, this seems like a pretty good shortcut to answering a statistical question. Any cognitive shortcut that we commonly need to use to bypass effortful reasoning is a heuristic, and this particular one is called the availability heuristic. But, of course, the ease with which we can recall things is affected by irrelevant factors including the vividness and order of our memories. So the availability of an example to our memory is not always a good indication of how common it is in reality. If the examples we remember come from what we've seen or heard in the media, things are even worse. There's no systematic relationship between how many media reports cover a given type of event and how many events of that type actually occur. Shark attacks and tornados are rare and dramatic, while drownings and accidents with furniture are usually not considered newsworthy. In a world of sensational news media, System 1's availability heuristic is an extremely unreliable way to evaluate the prevalence of events. In contemporary life, it's crucial for us to be able to make accurate judgments in cases where our automatic intuitions may not be reliable. Even if System 1 is not well-suited to answer the kind of question we face, it might strongly suggest an answer. And it requires a concerted effort to check that impulse. Situations like those described in this section should raise our suspicions, because System 1 is not very good at assessing costs, probabilities, or risks. Better to set aside the tempting answer, grit our teeth, and actually think through the problem.

Stubbornness It takes effort to change our minds, so we tend not to. Once a belief has lodged itself in our heads, it can be hard to shake, even if we no longer have any support for it. In our analogy, when the elephant is marching happily along, effort is required to get the animal to change course. If the rider is not paying

attention or is too lazy to make a correction, the elephant will just keep marching in the same direction, even if there is no longer any good reason to do so. This phenomenon is known as belief perseverance, and it is well-established in cognitive psychology. For example, people allow their beliefs to be influenced by news and journal articles even after learning that those articles have been debunked and retracted by the source. [13] The same effect has been found in all kinds of cases where people discover that their original grounds for believing something have been completely undercut. In a standard kind of experiment testing this phenomenon, subjects first answer a series of difficult questions. Afterward, they receive scores that are given completely at random—though at first they do not know this. Eventually, they are told the truth: that their scores had nothing to do with their performance. Still, when asked later to assess themselves on their ability to answer the relevant kind of question, those scores continued to impact their self-evaluations. [14] Another type of study involves two groups being told opposite madeup "facts". For example, one group of people is told that people who love risk make better firefighters, while another group is told that they make worse ones. Later, everyone in the study is informed that the two groups were told opposite things, and that the claims had been fabricated. Even after the original claims were debunked, though, when asked about their own personal views regarding risk-taking and firefighting ability, subjects continued to be influenced by whichever claim they had been randomly assigned. [15] Once a piece of misinformation has been incorporated into someone's belief system, significant cognitive effort is required to dislodge it! This is true not only in cases where people don't have the opportunity to seek out additional evidence, like those in the studies just described. It's also true even when people do have that opportunity. This is because we are subject to confirmation bias: we tend to notice and focus on potential evidence for our pre-existing views while neglecting or discounting evidence to the contrary. [16]

Confirmation bias is perhaps the best known and most widely accepted notion of inferential error to come out of the literature on human reasoning.                  —Jonathan Evans, Bias in Human Reasoning The impact of confirmation bias on our reasoning is hard to overstate. Again and again, and in a wide range of situations, psychologists have noted that people find ways to confirm their pre-existing beliefs. Sometimes this is because we are emotionally attached to those beliefs and thus are motivated to seek

out only sources of evidence that supports them. But confirmation bias also occurs when the issue is something we don't care much about. This is because our expectations color how we interpret ambiguous or neutral experiences, making them seem to fit well with our pre-existing views. [17] In other words, we tend to see what we'd expect to see if our views were true, and not to notice details that would conflict with our views. Confirmation bias leads to a related cognitive pitfall involving cases where we form beliefs by making a series of observations over time. In such cases, we tend to develop opinions early and then either interpret later evidence in a way that confirms those opinions (due to confirmation bias), or simply pay less attention to new information out of sheer laziness. The result is known as the evidence primacy effect: earlier evidence has greater influence on our beliefs. [18] Imagine you are asked to judge a murder case based on a series of facts. You are given a description of the case followed by about 20 relevant pieces of evidence, half supporting guilt and the other half supporting innocence. (For example, the defendant was seen driving in a different direction just before the murder; on the other hand, the defendant and the victim had recently engaged in a loud argument.) After considering the evidence, you are asked to assess the probability of the defendant's guilt. Would it matter in what order you heard the evidence? Most of us would like to think that the order of evidence would have little effect on us. As long as we see all of the evidence, we expect to be able to weigh it fairly. But in a study where subjects were put in exactly this scenario, those who saw the incriminating evidence first assigned an average probability of 75% to the defendant's guilt, while for those who saw the exonerating evidence first assigned an average probability of 45%. [19] It is hard to escape the unfortunate conclusion that the order of evidence in court could determine whether or not someone is thrown in jail for the rest of their lives. The importance of first impressions goes beyond the courtroom, of course. Suppose, for example, I see Bob being friendly to someone, and form the impression that he is friendly. If I subsequently see Bob interacting normally with people, I am more likely to notice things about those interactions that fit with my image of Bob as friendly. If I then see Bob being unfriendly, I am more likely to assume that he had a good reason to behave that way. But, if I had witnessed all of these scenes in reverse order, I would probably have ended up with a very different impression of Bob. I would have interpreted all of the later experiences through the lens of the unfriendly interaction, which I would have seen first. Either way, I'd be making a mistake: the first time you meet someone is no more representative of their true nature than any other time. Perhaps less so, if they are trying especially hard to make a good impression!

To sum up: when we have formed beliefs, we often cling to them even when our original reasons are debunked. And then we go on to interpret subsequent bits of evidence in a way that confirms them. Even when our beliefs concern things we don't particularly care about, they are hard to shake. So what happens when our beliefs do concern things we care about? In that case, they are even harder to shake, as we're about to find out!

Motivated reasoning Mostly, we want to have accurate beliefs. But other motivations are involved as well, often beneath the threshold of our awareness. In other words, even if the rider's goal is to form accurate beliefs, the elephant may have goals of its own. In forming and maintaining beliefs, we are often at some level motivated by how we would like things to be rather than merely by how they actually are. This is called motivated reasoning. For example, we like to have positive opinions of ourselves, and at least in cases where there is plenty of room for interpretation, we are inclined to believe that we have personal traits that are above average. Studies have found that 93% of American drivers rated themselves better than the median driver; a full 25% of students considered themselves to be in the top 1% in terms of the ability to "get along with others"; and 94% of college professors thought they did "above-average work". [20] Many studies have found a human tendency to consider oneself better than average on a wide range of hard-to-measure personality traits. Of course, we can't just choose to believe something because we want it to be true. I can't just convince myself, for example, that I am actually on a boat right now, however much I may want to be on a boat. Motivated reasoning doesn't work like that: we can't just override obvious truths with motivated beliefs. [21] So our motivated reasoning has to be subtle, tipping the scales here and there when the situation is murky. We typically don't let ourselves notice when we are engaging in motivated reasoning, since that would defeat the purpose. We want to feel like we've formed our beliefs impartially, thereby maintaining the illusion that our beliefs and desires just happen to line up. (It's important to stress that many of our beliefs are "motivated" in this sense even though we're unaware that we have the relevant motivations: most of our motivated reasoning is entirely subconscious and non-transparent.) For this reason, we become more susceptible to motivated reasoning the less straightforward the evidence is. For example, it is fairly easy to just believe that we are better than average when it comes to

traits that are hard to measure, like leadership ability and getting along with others. This is much harder when it comes to beliefs about our trigonometry skill or ability to run long distances. [22] We may have no strong evidence that we are better leaders than other people, but then we don't have strong evidence against it either. When we want to believe something, we are tempted to apply lower standards for how much evidence is required. That goes a long way towards allowing us to adopt the beliefs we like, without the process being so obvious to us that it defeats the purpose. In fact, we tend not to notice the process occurring at all.

It is neither the case that people believe whatever they wish to believe nor that beliefs are untouched by the hand of wishes and fears.                 —Peter Ditto and David Lopez, 'Motivated Skepticism' Motivated reasoning can also work behind the scenes to influence our response to evidence. For example, since we have limited time and energy to think, we often focus on the bits of evidence that we find interesting. But, without our realizing it, the "interesting" bits of evidence tend to be those that happen to support our favored views. This general tendency has been found in dozens of studies of human reasoning. Here are just a few: When people with opposing views examine the same evidence (which provides support for both sides), most people end up more confident in whatever view they started with. [23] People who are motivated to disbelieve the conclusions of scientific studies look harder to find problems with them. [24] In a series of studies, subjects were "tested" for a made-up enzyme deficiency. Those who "tested positive" considered the test less accurate and the deficiency less serious than those who "tested negative", even though everyone was given the same information about the deficiency and the test. [25] Together, motivated reasoning and confirmation bias make a powerful mix. When our opinions are not only preconceived but also motivated, we are not only more likely to notice evidence that confirms our motivated opinions, but also to apply selective standards when evaluating evidence. This means we tend to accept evidence that supports our views uncritically, while also seeking ways to discredit evidence that conflicts with them. Because the way motivated reasoning works is typically not transparent to us, we are often unaware of the real causes of our beliefs. Of course, if we're challenged by someone else, we can often come up with

justifying reasons for holding the belief. And we can rehearse some justifying reasons to ourselves if we need to help ourselves feel like we have good reasons. But they may not be our real reasons—in the sense of actually explaining why we have the belief. The real explanation might be a preference followed by a subconscious process of noticing primarily evidence that confirms this preference.

A caveat in closing Having considered the cognitive pitfalls involving System 1, you might get the impression that System 1 is extremely unreliable, and that we should never "listen to our gut". Or, as the main character puts it in High Fidelity:

"I've been thinking with my guts since I was fourteen years old, and frankly speaking, between you and me, I have come to the conclusion that my guts have shit for brains."                 —Nick Hornby, High Fidelity But this isn't the right way to think about the reliability of System 1 in general, for two reasons. First, we have deliberately been focusing on situations that tend to lead to bad outcomes. Our automatic processes are generally far better and faster at processing information than our deliberate ones. In fact, the sheer volume of the sensory data that System 1 handles would entirely overwhelm our conscious minds. No one could consciously perform enough real-time calculations to approximate the amount of processing that System 1 must do when, for example, we catch a baseball. The second reason not to disparage System 1 is this. Even for situations where our automatic judgments tend to get things wrong, they can sometimes become more reliable over time, while remaining much faster than deliberate reasoning. This means that with enough training of the right sort, we can sometimes develop our gut reactions into genuinely skilled intuition: the ability to make fast and accurate judgments about a situation by recognizing learned patterns in it. But this only works under the right set of conditions: in particular, a great deal of experience in an environment that offers clear and reliable feedback about the accuracy of one's judgments. [26] Unfortunately, it is common for people to think they have developed skilled intuition when they actually have not. One of the most important skills of rationality is knowing when the elephant can be trusted to find its way, and when it needs to be guided by an attentive rider.

Section Questions 1-7 The bat-and-ball example and the bags-of-fruit example both illustrate...

A

that in certain cases we should be wary of our immediate intuitions

B

that our System 1 is not very good at calculating probabilities

C

that we are "cognitive misers" when it comes to answering very difficult numerical problems

D

that under the right conditions, our System 1 can be trained to provide quick and reliable intuitions

1-8 The murder case was used to illustrate...

A

that motivated reasoning can color how we interpret ambiguous evidence

B

that our System 1 is not very good at estimating probabilities

C

that our beliefs are often affected by which pieces of evidence we get first

D

that we are more likely to judge a person as being guilty than as being innocent when we are given evidence on both sides

1-9 When we interpret evidence in a biased way due to motivated reasoning, we tend to...

A

simply decide that we want to believe something and then figure out ways to convince ourselves that it is true

B

knowingly apply selective standards in order to discredit conflicting evidence

C

deliberately ignore evidence on the other side so that we can bolster our own view

D

think we are actually being unbiased and fair

1-10 If System 1 is not naturally skilled at a certain kind of reasoning task, ...

A

it may still be possible, under the right conditions, to train it to improve

B

it is easy to tell that it is not skilled and avoid trusting its responses when faced with that kind of task.

C

then that task is not the sort of task that System 1 performs, because there is a clear division between System 1 tasks and System 2 tasks

D

the only way that reasoning task can be performed reliably is with effortful and deliberate thought processes

Key terms Availability heuristic: judging the frequency or probability of an event or attribute by asking ourselves how easily we can bring examples to mind from memory. Belief perseverance: the tendency to continue holding a belief even if its original support has been discredited, and in the face of contrary evidence.

Cognitive illusions: an involuntary error in our thinking or memory due to System 1, which continues to seem correct even if we consciously realize it's not Cognitive pitfalls: a common, predictable error in human reasoning. Cognitive pitfalls include mental glitches uncovered by cognitive psychologists, as well as logical fallacies. Cognitive reflection: the habit of checking initial impressions supplied by System 1, and overriding them when appropriate. Confirmation bias: the tendency to notice or focus on potential evidence for our pre-existing views, and to neglect or discount contrary evidence. Confirmation bias can be present with our without an underlying motive to have the belief in the first place. Evidence primacy effect: in a process where information is acquired over time, the tendency to give early information more evidential weight than late information. This tendency arises when we develop opinions early on, leading to confirmation bias when interpreting later information, or simply a failure to pay as much attention to it. Heuristic: a cognitive shortcut used to bypass the more effortful type of reasoning that would be required to arrive at an accurate answer. Heuristics are susceptible to systematic and predictable errors. Motivated reasoning: forming or maintaining a belief at least partly because, at some level, we want it to be true. This manifests itself in selective standards for belief, seeking and accepting evidence that confirms desired beliefs, and ignoring or discounting evidence that disconfirms them Skilled intuition: the ability to make fast and accurate judgments about a situation by recognizing learned patterns in it. This requires training under specific kinds of conditions. System 1: the collection of cognitive processes that feel automatic and effortless but not transparent. These include specialized processes that interpret sensory data and are the source of our impressions, feelings, intuitions, and impulses. (The distinction between the two systems is one of degree, and the two systems often overlap, but it is still useful to distinguish them.) System 2: the collection of cognitive processes that are directly controlled, effortful, and transparent. (The distinction between the two systems is one of degree, and the two systems often overlap, but it is still useful to distinguish them.)

Transparency: the degree to which information processing itself (rather than just its output) is done consciously, in such a way that one is aware of the steps being taken.

Footnotes [1] Some researchers explain this in terms of a "bias blindspot," a phenomenon we will examine further in Chapter 2. People tend to see others, but not themselves, as subject to cognitive biases, in part because they assign more weight to their own introspections in evaluating whether they are biased (Wetzel, WIlson & Kort 1981; Pronin, Lin & Ross 2002; Pronin, Gilovich & Ross 2004; Pronin & Kugler 2007). [2] For the effect of college education in general on reasoning skills, see Huber & Kuncel (2016) and McMillan (1987). (However, there are some contrary findings: Arum, & Roksa (2011) find that many students do not improve in various domains after attending college, including critical thinking and complex reasoning.) For meta-analyses of the effectiveness of critical thinking instruction, see Abrami et al. (2008); Abrami et al. (2015). Some examples of studies of the effects of critical reasoning training include: Fong et al. (1986)., Kosonen & Winne (1995); Larrick, Morgan, and Nisbett (1990); Nisbett et al. (1987). [3] See the discussion in Facione, Sanchez, Facione, & Gainen (1995). [4] For an overview, see Evans (2010), Stanovich (2011), Evans & Stanovich (2013). [5] For general information about this condition, see the Prosopagnosia Information Page. For a detailed, technical review, see Corrow, Dalrymple, and Barton (2016). [6] For evidence that humans respond to their own names during sleep, see Perrin et al. (1999); Holeckova et al. (2006); Blume at al. (2017); Blume et al. (2018). For other examples of how the brain can detect and respond to auditory stimuli while sleeping, see Portas et al. (2000); Kouider et al. (2014). For examples of the limits of responses to external stimuli while sleeping, see Andrillon et al. (2016); Makov et al. (2017). [7] For evidence of automatic processing and rapid responses to fear-relevant stimuli, see Fox et al. (2000); Lang, Davis & Öhman (2000); Morris, Öhman & Dolan (1999); New & German (2015); Öhman, Lundqvist, and Esteves (2001); Öhman & Soares (1993; 1994); Öhman (2005); Whalen et al. (2004). There is even evidence that infants detect and respond to images of threat stimuli like spiders and snakes (Hoehl et al 2017). These rapid responses to fear-relevant stimuli are triggered via a neural pathway from the thalamus to the amygdala, which can respond quickly (LeDoux 1996, pp. 163-165; LeDoux 2000). For a review of the literature on nonconscious cues in social contexts, see Wegner & Bargh (1998).

[8] For further discussion, see Kahneman (2011, p. 45) or Stanovich (2009, p. 71). [9] See, for example, Kahneman & Tversky (1996) and the papers in Pohl (2004). [10] Per billion passenger miles, in the United States, the number of fatalities between 2000 and 2009 was 7.28 for driving or being a passenger in a car or light truck; it was 0.07 for passengers on commercial aviation. See Savage (2013). [11] Haidt (2006 , p. 17-21). [12] See Mastrogiorgio, A., & Petracca, E. (2014), and Bourgeois-Gironde, S., & Van Der Henst, J. B. (2009). The original "bat and ball" problem is from Frederick (2005). [13] See Nisbett & Ross (1980) and Anderson (1983) for overviews, and Chan et al. (2017) for a recent meta-analysis of relevant studies. For belief perseverance in the case of the retraction of an academic article, see Greitemeyer (2014). For false beliefs that continue to affect political attitudes, see Thorson (2016). [14] Ross et al. (1975) [15] Anderson, Lepper & Ross (1980) and Anderson & Kellam (1992). [16] See Nickerson (1998) for a review. [17] Ross & Anderson (1982). [18] For one example, see Peterson & DuCharme (1967). See also Nisbett & Ross (1980), p. 172 ff. [19] See Tetlock (1983). This effect went away almost completely when subjects were told before reading the evidence that they would have to justify their decisions to the experimenters. Of course, real jurors are not required to do this! [20] See Svenson (1981); Dunning, Heath, & Suls (2004); Cross (1977). [21] See the discussion of self-deception in section 2.3 and citations there. [22] Consider, for example, the hard-to-measure personality traits on which Pronin et. al. (2002) found people subject to the "better-than-average-effect", while subjects did consider themselves better than average at procrastination, public speaking, or avoiding the planning fallacy: all issues with salient feedback. [23] See Lord et. al. (1979). [24] See Sherman and Kunda (1989); see also Pyszczynski, Greenberg, & Holt (1985). [25] See Croyle and Sande (1988), Ditto, Jemmott, & Darley (1988), Ditto & Lopez (1992). [26] See Kahneman and Klein (2009).

References

Abrami, P. C., Bernard, R. M., Borokhovski, E., Wade, A., Surkes, M. A., Tamim, R., & Zhang, D. (2008). Instructional interventions affecting critical thinking skills and dispositions: A stage 1 meta-analysis. Review of Educational Research, 78(4), 1102-1134. Abrami, P. C., Bernard, R. M., Borokhovski, E., Waddington, D. I., Wade, C. A., & Persson, T. (2015). Strategies for teaching students to think critically: A meta-analysis. Review of Educational Research, 85(2), 275-314. Anderson, C. A. (1983). Abstract and Concrete Data in the Conservatism of Social Theories: When Weak Data Lead to Unshakeable Beliefs. Journal of Experimental Social Psychology. 19 (2): 93–108. Anderson, C. A., & Kellam, K. L. (1992). Belief perseverance, biased assimilation, and covariation detection: The effects of hypothetical social theories and new data. Personality and Social Psychology Bulletin, 18(5), 555-565. Anderson, C. A., Lepper, M. R., & Ross, L. (1980). Perseverance of social theories: The role of explanation in the persistence of discredited information. Journal of Personality and Social Psychology, 39(6), 1037. Andrillon, T., Poulsen, A. T., Hansen, L. K., Léger, D., & Kouider, S. (2016). Neural markers of responsiveness to the environment in human sleep. Journal of Neuroscience, 36(24), 6583-6596. Arum, R., & Roksa, J. (2011). Academically adrift: Limited learning on college campuses. University of Chicago Press. Blume, C., Del Giudice, R., Lechinger, J., Wislowska, M., Heib, D. P., Hoedlmoser, K., & Schabus, M. (2017). Preferential processing of emotionally and self-relevant stimuli persists in unconscious N2 sleep. Brain and Language, 167, 72-82 Blume, C., del Giudice, R., Wislowska, M., Heib, D. P., & Schabus, M. (2018). Standing sentinel during human sleep: Continued evaluation of environmental stimuli in the absence of consciousness. NeuroImage, 178, 638-648. Bourgeois-Gironde, S., & Van Der Henst, J. B. (2009). How to open the door to System 2: Debiasing the bat-and-ball problem. In Watanabe, S. et al. (eds.). Rational animals, irrational humans. Keio University. pp.235-252. Chan, M. P. S., Jones, C. R., Hall Jamieson, K., & Albarracín, D. (2017). Debunking: A meta-analysis of the psychological efficacy of messages countering misinformation. Psychological science, 28(11), 1531-1546.

Corrow, S. L., Dalrymple, K. A., & Barton, J. J. (2016). Prosopagnosia: current perspectives. Eye and brain, 8, 165. Cross, K. P. (1977). Not can, but will college teaching be improved?. New Directions for Higher Education, 1977(17), 1-15. Croyle, R. T., & Sande, G. N. (1988). Denial and Confirmatory Search: Paradoxical Consequences of Medical Diagnosis 1. Journal of Applied Social Psychology, 18(6), 473-490. Ditto, P. H., Jemmott III, J. B., & Darley, J. M. (1988). Appraising the threat of illness: A mental representational approach. Health Psychology, 7(2), 183. Ditto, P. H., & Lopez, D. F. (1992). Motivated skepticism: Use of differential decision criteria for preferred and nonpreferred conclusions. Journal of Personality and Social Psychology, 63(4), 568. Dunning, D., Heath, C., & Suls, J. M. (2004). Flawed self-assessment: Implications for health, education, and the workplace. Psychological science in the public interest, 5(3), 69-106. Evans, J. St. B. (1989). Bias in human reasoning: Causes and consequences. Hillsdale, NJ: Erlbaum Evans, J. St. B. (2010). Thinking twice: Two minds in one brain. Oxford University Press. Evans, J. St. B., Stanovich K.E. (2013). Dual-Process Theories of Higher Cognition: Advancing the Debate. Perspectives on Psychological Science, 8(3), 223-241. Facione, P. A., Sanchez, C. A., Facione, N. C., & Gainen, J. (1995). The disposition toward critical thinking. The Journal of General Education, 1-25. Fong, G.T., Krantz D.H., & Nisbett, R.E. (1986). The effects of statistical training on thinking about everyday problems. Cognitive Psychology, 18(3), 253-292. Fox, E., Lester, V., Russo, R., Bowles, R. J., Pichler, A., & Dutton, K. (2000). Facial expressions of emotion: Are angry faces detected more efficiently?. Cognition & Emotion, 14(1), 61-92. Frederick, Shane. (2005). "Cognitive Reflection and Decision Making." Journal of Economic Perspectives, 19 (4): 25-42. Greitemeyer, T. (2014). Article retracted, but the message lives on. Psychonomic bulletin & review, 21(2), 557-561.

Haidt, J. (2006). The happiness hypothesis: Finding modern truth in ancient wisdom. Basic Books. Hoehl, S., Hellmer, K., Johansson, M., & Gredebäck, G. (2017). Itsy bitsy spider…: infants react with increased arousal to spiders and snakes. Frontiers in Psychology, 8, 1710. Holeckova, I., Fischer, C., Giard, M. H., Delpuech, C., & Morlet, D. (2006). Brain responses to a subject's own name uttered by a familiar voice. Brain Research, 1082(1), 142-152. Hornby, N. (2005). High fidelity. Penguin UK. Huber, C. R., & Kuncel, N. R. (2016). Does college teach critical thinking? A meta-analysis. Review of Educational Research, 86(2), 431-468. Kahneman, D. (2011). Thinking, fast and slow. New York: Farrar, Straus and Giroux. Kahneman, Daniel, and Gary Klein. Conditions for intuitive expertise: a failure to disagree. American Psychologist 64.6 (2009): 515. Kahneman, D., & Tversky, A. (1996). On the Reality of Cognitive Illusions. Psychological Review, 103(3), 582-591. Kouider, S., Andrillon, T., Barbosa, L. S., Goupil, L., & Bekinschtein, T. A. (2014). Inducing task-relevant responses to speech in the sleeping brain. Current Biology, 24(18), 2208-2214. Kosonen, P., & Winne, P. H. (1995). Effects of teaching statistical laws on reasoning about everyday problems. Journal of Educational Psychology, 87(1), 33-46. Lang, P. J., Davis, M., & Öhman, A. (2000). Fear and anxiety: animal models and human cognitive psychophysiology. Journal of Affective Disorders, 61(3), 137-159. Larrick, R. P., Morgan, J. N., & Nisbett, R. E. (1990). Teaching the use of cost-benefit reasoning in everyday life. Psychological Science, 1(6), 362-370. LeDoux, J. (1996). The Emotional Brain: The Mysterious Underpinnings of Emotional Life. Simon & Schuster. LeDoux, J. E. (2000). Emotion circuits in the brain. Annual Review of Neuroscience, 23(1), 155-184.

Lord, C. G., Ross, L., & Lepper, M. R. (1979). Biased assimilation and attitude polarization: The effects of prior theories on subsequently considered evidence. Journal of Personality and Social Psychology, 37(11), 2098. Makov, S., Sharon, O., Ding, N., Ben-Shachar, M., Nir, Y., & Golumbic, E. Z. (2017). Sleep disrupts high-level speech parsing despite significant basic auditory processing. Journal of Neuroscience, 0168-17. Mastrogiorgio, A., & Petracca, E. (2014). Numerals as triggers of System 1 and System 2 in the ‘bat and ball’problem. Mind & Society, 13(1), 135-148. McMillan, J. H. (1987). Enhancing college students’ critical thinking: A review of studies. Research in Higher Education, 26, 3–29. Morris, J. S., Öhman, A., & Dolan, R. J. (1999). A subcortical pathway to the right amygdala mediating “unseen” fear. Proceedings of the National Academy of Sciences, 96(4), 1680-1685. New, J. J., & German, T. C. (2015). Spiders at the cocktail party: An ancestral threat that surmounts inattentional blindness. Evolution and Human Behavior, 36(3), 165-173. Nickerson, R. S. (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology, 2(2), 175. Nisbett, R. E., Fong, G. T., Lehman, D. R., & Cheng, P. W. (1987). Teaching reasoning. Science, 238(4827), 625-631. Nisbett, R. E., & Ross, L. (1980). Human inference: Strategies and shortcomings of social judgment. Prentice-Hall. Öhman, A. (2005). The role of the amygdala in human fear: automatic detection of threat. Psychoneuroendocrinology, 30(10), 953-958. Öhman, A., Lundqvist, D., & Esteves, F. (2001). The face in the crowd revisited: a threat advantage with schematic stimuli. Journal of personality and social psychology, 80(3), 381. Öhman, A., & Soares, J. J. (1993). On the automatic nature of phobic fear: Conditioned electrodermal responses to masked fear-relevant stimuli. Journal of abnormal psychology, 102(1), 121. Öhman, A., & Soares, J. J. (1994). "Unconscious anxiety": phobic responses to masked stimuli. Journal of abnormal psychology, 103(2), 231.

Perrin, F., Garcı́a-Larrea, L., Mauguière, F., & Bastuji, H. (1999). A differential brain response to the subject's own name persists during sleep. Clinical Neurophysiology, 110(12), 2153-2164. Peterson, C. R., & DuCharme, W. M. (1967). A primacy effect in subjective probability revision. Journal of Experimental Psychology, 73(1), 61. Pohl, R. (Ed.). (2004). Cognitive illusions: A handbook on fallacies and biases in thinking, judgement and memory. Psychology Press. Portas, C. M., Krakow, K., Allen, P., Josephs, O., Armony, J. L., & Frith, C. D. (2000). Auditory processing across the sleep-wake cycle: simultaneous EEG and fMRI monitoring in humans. Neuron, 28(3), 991-999. Pronin, E., Gilovich, T., & Ross, L. (2004). Objectivity in the eye of the beholder: Divergent perceptions of bias in self versus others. Psychological Review, 111(3), 781-799. Pronin, E., & Kugler, M. B. (2007). Valuing thoughts, ignoring behavior: The introspection illusion as a source of the bias blind spot. Journal of Experimental Social Psychology, 43(4), 565-578. Pronin, E., Lin, D. Y., & Ross, L. (2002). The bias blind spot: Perceptions of bias in self versus others. Personality and Social Psychology Bulletin, 28(3), 369-381. Prosopagnosia Information Page. (2018, July 2). National Institute for Neurological Disorders and Stroke. Retrieved from nih.gov. Pyszczynski, T., Greenberg, J., & Holt, K. (1985). Maintaining consistency between self-serving beliefs and available data: A bias in information evaluation. Personality and Social Psychology Bulletin, 11(2), 179190. Ross, L., & Anderson, C. A. (1982). Shortcomings in the attribution process: On the origins and maintenance of erroneous social assessments. In Judgment Under Uncertainty: Heuristics and Biases, edited by D. Kahneman, P. Slovic, & A. Tversky, Cambridge University Press, 129-152. Ross, L., Lepper, M. R., & Hubbard, M. (1975). Perseverance in self-perception and social perception: biased attributional processes in the debriefing paradigm. Journal of personality and social psychology, 32(5), 880. Savage, I. (2013). Comparing the fatality risks in United States transportation across modes and over time. Research in Transportation Economics, 43(1), 9-22.

Stanovich, K. E. (2009). What intelligence tests miss: The psychology of rational thought. Yale University Press. Stanovich, K. E. (2011). Rationality and the reflective mind. Oxford University Press. Sherman, B. R., & Kunda, Z. (1989, June). Motivated evaluation of scientific evidence. In Conference of the American Psychological Society, Arlington, VA. Sherman, S. J., Zehner, K. S., Johnson, J., & Hirt, E. R. (1983). Social explanation: The role of timing, set, and recall on subjective likelihood estimates. Journal of Personality and Social Psychology, 44(6), 1127.ƒ Svenson, O. (1981). Are we all less risky and more skillful than our fellow drivers?. Acta psychologica, 47(2), 143-148. Tetlock, P. E. (1983). Accountability and the perseverance of first impressions. Social Psychology Quarterly, 285-292. Thorson, E. (2016). Belief echoes: The persistent effects of corrected misinformation. Political Communication, 33(3), 460-480. Wetzel, C. G., Wilson, T. D., & Kort, J. (1981). The halo effect revisited: Forewarned is not forearmed. Journal of Experimental Social Psychology, 17(4), 427-439 Wegner, D. M., & Bargh, J. A. (1998). Control and automaticity in social life. In D. T. Gilbert, S. T. Fiske, & G. Lindzey (Eds.), The handbook of social psychology, 446-496 Whalen, P. J., Kagan, J., Cook, R. G., Davis, F. C., Kim, H., Polis, S., ... & Johnstone, T. (2004). Human amygdala responsivity to masked fearful eye whites. Science, 306(5704), 2061-2061.

Image Credits Banner image of craftsman with tools: image licensed under CC0 / cropped from original; Cello close-up: image licensed under CC0; Running man: image licensed under CC0; Chariot statue: image by Richard Mcall licensed under CC0; Elderly man with beard: image by Aamir Mohd Kahn, licensed under CC0 / cropped from original; Checkerboard and shadow illusion:

image by Edward Adelson, reproduced by permission; Bed with pillows: image by akaitori, licensed under CC2.0-BY-SA; Guitar close-up: image licensed under Pexels license; Illusion with blue dots and green background: image by Cmglee licensed under CC BY-SA 3.0; Elephant with children: image by Sasin Tipchai licensed under CC0; Rocky path with clouds and mountains: image licensed under CC0; A baseball bat and two baseballs: image by Chad Cooper licensed under CC2.0-BY; Tornado on water: image by Johannes Plenio licensed under Pexels license/ cropped from original; Firefighters with fire as backgrop: image licensed under CC0; Empty courtroom: image licensed under CC0; Sailboat at sunset: image licensed under Pexels license.

Reason Better: An Interdisciplinary Guide to Critical Thinking © 2019 David Manley. All rights reserved. version 1.4 Exported for Jason Bao on Tue, 12 Sep 2023 21:16:33 GMT

Exported for Jason Bao on Tue, 12 Sep 2023 21:18:44 GMT

Reason Better An interdisciplinary guide to critical thinking

Chapter 2. Mindset

Introduction What are the attributes of a person who reasons well? Perhaps being clever, smart, wise or rational. Notice, though, that these are importantly different things. For example, you can probably think of people who are very clever but not very curious—effective at persuading people of their own views but not driven by the desire to know the truth. Being a good reasoner requires more than the kind of intelligence that helps us impress others or win arguments. This idea is strongly supported by cognitive psychology. In fact, the kind of intelligence measured by IQ tests doesn't help much with some of the worst cognitive pitfalls, like confirmation bias. Even very clever people can simply use their thinking skills as a kind of lawyer to justify their pre-existing views to themselves. Reasoning well requires a mindset that is eager to get things right, even if that means changing our mind [1].

Many highly intelligent people are poor thinkers. Many people of average intelligence are skilled thinkers. The power of a car is separate from the way the car is driven.                 —Edward de Bono

The cognitive psychologist Jonathan Baron has divided the process of good reasoning into what he takes to be its three most important aspects: (i) confidence in proportion to the amount and quality of thinking done; (ii) search that is thorough in proportion to the question's importance; and (iii) fairness to other possibilities than those that we initially favor [2]. My goal in this chapter is to characterize the mindset that best promotes all three of these attributes. In slogan form, that mindset is curious, thorough, and open: 1. Curious. The goal is for our beliefs to reflect how things really are; this is best achieved when our confidence in a belief matches the strength of the evidence we have for it. 2. Thorough. It takes patience and effort to push past what initially seems true and thoroughly search for alternative possibilities and any available evidence. 3. Open. This means evaluating evidence impartially, considering weaknesses in our initial view, and asking what we'd expect to see if alternative views were true. We'll pay special attention to how these attributes can help us overcome confirmation bias (motivated or not), which is arguably the most ubiquitous and harmful of the cognitive pitfalls.

Learning Objectives By the end of this chapter, you should understand: the importance of aiming for discovery rather than defense what is meant by "accuracy," as it applies to binary beliefs and degrees of confidence how confirmation bias operates at the search and evaluation stages of reasoning the use of decoupling to overcome biased evaluation how the bias blindspot and introspection illusion operate, leading to the biased opponent effect two ways of "considering the opposite" to help overcome biased evaluation

2.1 Curious Curiosity, in our sense, is not just a matter of being interested in a topic, but of wanting to discover the truth about it. When Aristotle wrote that "all humans by nature desire to know," he was surely right—up to a point [3]. Much of the time, we are genuinely curious, but our minds are complex and many-layered things. Our beliefs, as we saw in Chapter 1, are also influenced by other motives that may conflict with our curiosity.

Defense or discovery? Suppose I'm in a heated disagreement with Alice. She's challenging a belief in which I am emotionally invested, so I try to find every fact that supports my case and every flaw in her points. If I think of anything that might detract from my case, I ignore or dismiss it. When she makes a good point, I get upset; when I make a good point, I feel victorious. Obviously, this would not be the ideal setting to avoid cognitive pitfalls, because my reasoning is clearly motivated. If I'm very skilled at reasoning, this might help me successfully defend my position, but it certainly won't help me reason well, because that's not even my goal. Think of the militaristic language we use for talking about debates: we defend our positions and attack or shoot down or undermine our opponent's statements. If I have a defensive mindset, any evidence that can be used against my opponents is like a weapon to defeat them. And any point that might challenge my side is a threat, so I must ignore it, deflect it, or defuse it. Since my goal is to defend myself, it doesn't matter if I make mistakes in reasoning, as long my opponent doesn't notice them! Facts and arguments are really just tools or weapons for achieving that goal. Now contrast the goal of defense with the goal of discovery. If I am an explorer or a scout, my goal is not to defend or attack, but simply to get things right. My job is not to return with the most optimistic report possible; it's to accurately report how things are, even if it's bad news. So even if I secretly hope to discover that things are one way rather than another, I can't let that feeling muddle my thinking. I have a clear goal—to find out how things really are [4].

Of course, it's one thing to decide abstractly that our goal is accuracy; it's another to really feel curious, especially when we are motivated to have a particular belief. The defensiveness we feel is not directly under our control, since it comes from System 1. The elephant requires training if it is actually going to adopt a different mindset. This involves learning how to feel differently. As Julia Galef puts it: "If we really want to improve our judgment... we need to learn how to feel proud instead of ashamed when we notice we might have been wrong about something. We need to learn how to feel intrigued instead of defensive when we encounter some information that contradicts our beliefs." Genuine curiosity matters more than any specific reasoning skills we can acquire from academic training. When we are really curious, we allow ourselves to follow the evidence wherever it leads without worrying whether we were right to begin with. We should even welcome evidence that conflicts with our beliefs, because ultimately we want to change our beliefs if they're wrong.

A man should never be ashamed to own he has been in the wrong, which is but saying, in other words, that he is wiser today than he was yesterday.                —Alexander Pope Finally, curiosity transforms how we interact with people who disagree with us. It's actually intriguing when people have different views, because it might mean that they have information that we lack. And we no longer try to find the easiest version of their view to attack—we wouldn't learn anything from that! Instead, we want to find the most knowledgeable and reliable people who disagree, since they might know things we don't. If they also share our goal of discovery, then discussing the issue becomes cooperative rather than adversarial. We fill in the gaps in each other's knowledge and feel curious about the source of any remaining disagreements. A single slogan sums it all up: don't use evidence to prove you're right, use it to become right.

Accurate beliefs At this point, it's worth taking a step back and reflecting on what exactly the goal of curiosity is. What does it mean to have accurate beliefs? The simple answer is that the more accurate our beliefs, the more closely they reflect how things actually are. So the goal is to believe things only if they match reality—for example, to believe that the cat is on the mat only if the cat is actually on the mat—and so on for all of our other beliefs.

A helpful analogy here is to consider the relationship between a map and the territory that it represents. An accurate map is one that represents a road as being in a certain place only when there actually is a road there. And the more accurate the map, the more closely its marks match the actual positions of things in the territory. So if our goal is to draw an accurate map, we can't just draw a road on it because we want a road there. Likewise, when we're genuinely curious, we aren't secretly hoping to arrive at a particular belief. Rather, we want our beliefs to reflect the world the way a good map reflects its territory [5]. One complication for our simple account of accuracy is that it treats beliefs as though they were entirely on or off—as though the only two options are to believe that the cat is on the mat, or to believe that the cat is not. That simple picture of binary belief fits the map analogy well, since maps either have a mark representing some feature, such as a road through the hills, or do not. Most maps have noway of indicating that there's probably or possibly a road through the hills. But our beliefs about the world come in degrees of confidence. We might be pretty confident that there's a road, or suspect that there's a road, or doubt that there's a road. (We sometimes express a moderate degree of confidence in X by saying "Probably X", or "I think X, but I'm not sure.") How confident we are makes a big difference to our decisions. For example, suppose we are thinking about planning a road trip through the hills. If there is no road through the hills, we risk getting lost or being unable to complete our trip. So, we need to be pretty confident that a road exists before planning or embarking on the trip. If we are not sufficiently confident, we can gather more evidence until we are confident enough to take action, one way or the other. So what does it mean to be accurate with beliefs like these? Suppose I think that a road probably cuts through the hills, but in fact there's no road. This means I'm wrong. But I'm not as wrong as I would have been if I had been certain about the road; I was only fairly confident. Or consider the example of weather forecasting. One weather forecaster predicts rain tomorrow with 90% confidence, while another predicts rain tomorrow with only 60% confidence. If it doesn't rain tomorrow, there's a sense in which both are wrong. But the lack of rain counts more strongly against the accuracy of the first forecaster than against the second. In short, the accuracy of a belief depends on two factors: how confidently it represents things as being a certain way, and whether things actually are that way. The more confidence we have in true beliefs, the better our overall accuracy. The more confidence we have in false beliefs, the worse our overall accuracy.

This conception of accuracy fits with our goals when we are genuinely curious. When we really want to get things right, we don't allow ourselves to feel confident in a claim unless we have sufficiently strong evidence for it. After all, the more confident we are in something, the more wrong we could be about it! This means we need higher standards of evidence to support our confident beliefs. When our goal is accuracy, we only get more confident when we gain more evidence, so that our degree of confidence matches the strength of our evidence. The next step is to start consciously thinking in degrees of confidence. The trick is to let go of the need to take a side, and to start being okay with simply feeling uncertain when there isn't enough evidence to be confident. It's also okay to just suspect that one view is correct, without the need to harden that suspicion into an outright belief. We can take time to gather evidence and assess the strength of that evidence. It can be somewhat liberating to realize that we don't need an outright opinion about everything. In a world full of brash opinions, we can just let the evidence take our confidence wherever it leads—and sometimes that's not very far.

Section Questions 2-1 In the sense used in this text, curiosity is primarily about...

A

having degrees of confidence rather than binary beliefs that are entirely "on" or "off"

B

having the right goal--namely, that our beliefs reflect how the world really is

C

not letting ourselves be affected by strong feelings in the midst of a disagreement

D

having a high degree of interest in rare and unusual things or occurrences

2-2 According to the text, the initial "map and territory" analogy has to be adapted for degrees of confidence because...

A

maps don't make decisions, but our degree of confidence makes a big difference to our decisions

B

unlike a map, we are capable of revising our beliefs when we encounter more evidence

C

marks on a map don't represent things as being probably or possibly a certain way

D

we have beliefs about things that are not represented in maps, like bikes and non-existent mountains

2.2 Thorough In general, we can divide the process of reasoning about an issue into three stages:[6] At the search stage, we identify a range of possible views, as well as potential evidence for each of them. At the evaluation stage, we assess the strength of the evidence we've identified. At the updating stage, we revise our degrees of confidence accordingly. It can be easy to forget the first stage. Even if we're completely impartial in our evaluation of evidence for alternative views, there may be some alternative views that haven't occurred to us, or evidence that we've never considered. In that case, our reasoning will be incomplete and likely skewed, despite our fair evaluation of the evidence that we do identify. We need our search to be thorough at the outset. This means, as far as is reasonable given the importance of the issue: (i) seeking out the full range of alternative views, and (ii) seeking out the full range of potential evidence for each view. Failure in either one of these tasks is a cognitive pitfall known as restricted search. In this section, we'll examine these two forms of restricted search more carefully.

Search for possibilities

Suppose we hear that a commercial jet plane recently crashed but we are not told why. There is a range of possible explanations that we can consider about the cause of the crash. (Two that often jump to mind are bad weather and terrorism.) Now suppose we were to guess, based on what we know about past crashes, how likely it is that the primary cause of this crash was bad weather. Go ahead and write down your guess. Now set aside this initial guess and list three additional possible explanations for the crash, aside from bad weather and terrorism. Write these down on a scrap of paper, and don't stop at two. (Go on: the exercise doesn't work unless you actually do this!) Now, at the bottom of your list, write "other" to remind yourself that there are probably even more alternatives you haven't considered. (Having trouble? Did you mention pilot error, engine failure, electrical system failure, maintenance error, fire on board, ground crew error, landing gear failure, loss of cabin pressure?) Now, as the final step of the exercise, ask yourself if you want to revise your initial estimate of whether the recent crash was due to bad weather. In a wide variety of examples, the natural tendency is to think of just a couple explanations when not prompted for more. As a result, we tend to be overly confident that one of the initial explanations is correct, simply because we have not considered the full range. When prompted to generate more alternative explanations, our probabilities are much more accurate. (Bad weather, incidentally, is considered the primary cause in less than 10% of commercial jet crashes, by the way.) Note that the need to search thoroughly for alternative possibilities doesn't just apply to explanations. Whether we are considering an explanation or some other kind of claim, there will typically be a wide range of alternative possibilities, and we often fail to consider very many. For example, suppose I am wondering whether my favored candidate will win in a two-party race. I might only think about the two most obvious possible outcomes, where one of the candidates straightforwardly receives the majority of votes and then wins. But there are other possibilities as well: a candidate could bow out or become sick. Or, in a U.S. presidential election, one candidate could win the popular vote while the other wins the electoral college. Or, an upstart third-party candidate could spoil the vote for one side. If I don't consider the full range of possible outcomes, I'm likely to overestimate the two most obvious ones. Let's call this cognitive pitfall possibility freeze [7]. The simple solution—if we're considering an issue that matters—is to make the effort to brainstorm as many alternatives as possible. (Of course, the hard part is actually noticing that we may not have considered enough options.) And it's best to not only list alternatives but to linger on them, roll them

around in our minds, to see if they might actually be plausible. Even if we can't think of very many, studies show that just imagining alternatives more vividly, or thinking of reasons why they could be true, makes them seem more likely. This helps to counteract confirmation bias, since it loosens the mental grip of the first or favored possibility [8]. (Confirmation bias, as you'll recall from the previous chapter, is the tendency to notice or focus on potential evidence for our pre-existing views, while neglecting or discounting any evidence to the contrary.)

Search for evidence The second component of the search stage, once all available views have all been considered, is the search for potential evidence that can support them. Failure to search thoroughly and fairly for evidence is a major pitfall that hinders accuracy [9]. It's worth being clear about how we're using the term evidence in this text, because we're using it in a slightly specialized way. Some people associate the word "evidence" only with courts and lawsuits. Others associate "evidence" only with facts that provide strong but inconclusive support for a claim. But we're using the term in a very broad sense. What we mean by "evidence" for a claim is anything we come to know that supports that claim, in the sense that it should increase our degree of confidence in that claim. (A more rigorous definition awaits us in Chapter 5.) In this sense, a piece of evidence might provide only slight support for a claim, it might provide strong support, or it might even conclusively establish a claim. Evidence is anything we know that supports a claim, weakly or strongly, whether in court, in science, or in mathematics, or in everyday life. (Two quick examples. Learning that I have two cookies gives me evidence that I have more than one. It may sound odd to put it this way, because the evidence in this case is conclusive—but in our broad sense, a conclusive proof still counts as evidence. At the other extreme, we may also find extremely weak evidence. For example, in a typical case, the fact that it's cloudy out is at least some evidence that it's going to rain. It might not be enough evidence to make us think that it will probably rain. But however likely we thought rain was before learning that it was cloudy out, we should think that rain is at least slightly more likely when we learn that it's cloudy. You may be doubtful about this last statement, but after reading Chapter 5 you should understand why it's true.) Now, back to the search for evidence. For decades, researchers have studied how people select and process information. Overall, in studies where people were given the opportunity to look at information on both sides of controversial issues, they were almost twice as likely to choose information that supported their pre-existing attitudes and beliefs [10]. Since this is our natural tendency, restoring

balance requires deliberately searching for facts that may support alternative views, as well as weaknesses with our own view. And that search, if it is really fair, will tend to feel like we are being far too generous to the alternative view. It is also crucial to seek out the strongest potential evidence for alternative views, articulated by the most convincing sources. Encountering sources that provide only weak evidence for an opposing viewpoint can just make us more confident in our own view [11]. It lets us feel like we've done our duty, but we end up thinking, "If this is the sort of point the other side makes in support of their view, they must be really wrong!" This is a mistake; after all, there are weak points and unconvincing sources on both sides of every controversy, so we learn nothing at all by finding some and ridiculing them. How can we get ourselves in the right mindset to seek out the best evidence for an opposing view? Let's focus on two key questions that we can ask ourselves. First, what would things look like if the opposing view were true? We tend to focus on the way things would look if our beliefs were correct, and to notice when they actually do look that way [12]. This means we often fail to notice when the evidence fits equally well with some alternative views. It's a mistake to treat our experiences as supporting our view if in fact those experiences are equally likely whether or not our view is true. We often miss this because we don't really ask what we'd expect to see if other views were true. Secondly, which observations don't fit quite right with my first or favored view? An open search for evidence means paying special attention to facts that stick out—that is, facts that don't seem to fit quite right with our favored hypothesis. Those are the facts that are most likely to teach us something important, because sometimes even very well-supported theories will begin to unravel if we pull on the threads that stick out. Moreover, if we're honest with ourselves, these are the very facts that we'd be focusing on if we had the opposite view! If we were motivated to seek out problematic facts, they would jump out much more easily. But instead, we tend to flinch away from them, or rehearse "talking points" to ourselves about why they're unimportant. As we'll see below, the ability to notice when we're doing this is a key skill of good reasoning.

So, how much search for evidence is enough? The answer is not that more searching is always better: there is such a thing as spending too much time gathering evidence. The answer should depend on the importance of the issue, and not on whether we happen to like our current answer. Our natural tendency is to search lazily when we are happy with the view supported by the evidence we already possess, and to be thorough only when we hope to cast doubt on it. In one study, subjects were asked to dip strips of test paper in their saliva to assess whether they had a mild but negative medical condition. One group of subjects were told that the test paper would eventually change color if they did have the condition, while the other group was told that the color would eventually change if they did not. (In reality, the "test paper" was ordinary paper, so it did not change color for either group.) Subjects who thought that no color change was a bad sign spent much longer waiting for the change before deciding that the test was over, and were three times more likely to re-test the paper (often multiple times). In other words, when the evidence seemed to be going their way, they stopped collecting evidence. But when the evidence was not going their way, they kept searching, in hopes of finding more support for their favored view [13]. When a search for evidence can be halted on a whim, it's called optional stopping. Scientific protocols for experiments are designed to make sure we don't do this, but in our everyday lives we may not even notice we're doing it. It can be a powerful way of skewing even neutral observations so that they seem to favor our view, for the same reason that you can bias the outcome of a sequence of fair coin tosses by choosing when to stop. Here's how: if the coin comes up heads the first time, stop. If not, keep tossing and see if you have more heads than tails after three tosses. If not, try five! This way, you have a greater than 50% chance of getting more heads than tails. But if you decided in advance to stop after one toss, or three, or five, you'd have only a 50% chance that most of the tosses come up heads. In other words, even when the method you're using for gathering evidence is completely fair (like a fair coin), you can bias the outcome in your favor just by deciding when to stop looking. So, when we are comfortable with our current belief, our tendency is to stop looking for more evidence. But when we feel pressured to accept a belief that we're uncomfortable with, we often seek more evidence in the hope of finding some that will allow us to reject the belief. Of course, whether we seek out more evidence should not turn on whether we want our beliefs to change or to stay the same. Instead, the amount of evidence we seek should simply match the importance of the issue. Another tactic we use to avoid uncomfortable beliefs is refusing to believe something unless an unreasonably high standard of evidence has been met. Meanwhile, when we want to believe something, we apply a much more lenient standard. In other words, we systematically cheat by applying different

thresholds for what counts as "enough evidence" for believing something: we require less evidence for things we want to believe than for things we do not. In fact, if we are sufficiently motivated to believe something, we might even require little or no search for evidence at all. We might be satisfied if we can tell ourselves that it "makes sense," meaning that it fits with the most obvious bits of evidence. But this is an absurdly weak standard of support. On almost every important issue, there are many contradictory views that "make sense." The point of seeking more evidence is to differentiate between all the views that "make sense"—to discover things that fit far better with one of them than with the others. The standard for forming reliable beliefs is not "can I believe this?" or "must I believe this?" but "what does the evidence support?" If we don't properly search for evidence, we won't have a good answer to this question.

For desired conclusions, it is as if we ask ourselves ‘Can I believe this?’, but for unpalatable conclusions we ask, ‘Must I believe this?’                 —Thomas Gilovich, How We Know What Isn't So

Note that the tactic of selectively applying a threshold for "enough evidence" really only works with a binary conception of belief: if we think of belief as a matter of degree, there isn't a special threshold of evidence that we have to cross in order to "adopt" a belief. We are always somewhere on a continuum of confidence, and the evidence simply pushes us one way or another. So we can mitigate this cognitive error by forcing ourselves to think in terms of degrees of confidence. If we are reasoning well, our goal in seeking evidence is not to allow ourselves to believe what we want and avoid being forced to believe anything else. Instead, the extent of our search should match the importance of the question, and our degree of confidence in our views should rise and fall in correspondence with the evidence we find.

Section Questions 2-3 Failing to think of sufficiently many possibilities...

A

leads to having over-confidence in the possibilities we do think of

B

leads us to not imagine our first or favored possibility with sufficient vividness

C

makes us almost twice as likely to choose information that supports our pre-existing attitudes and beliefs

D

leads us to revise our estimate of the first view that occurred to us

2-4 Asking what we'd expect to observe if our first or favored view were true...

A

helps balance our natural tendency to focus on how things would look if alternative views were true

B

should not be the focus of our search because it's already our natural tendency

C

is important because those are the facts we're likely to learn the most from

D

helps us notice that different views can do an equally good job of explaining certain facts

2-5 Our standard for how much effort we put into a search...

A

should be that additional search for information is always better

B

should be based on the importance of the issue under investigation

C

should be that we search for evidence until we have enough to support our favored belief

D

should be that we search for evidence until every view has equal support

2.3 Open The third element of the right mindset for reasoning is genuine openness. This means being open not only to evidence that supports alternative views but also to revising our own beliefs, whether they are initial reactions or considered views. As we'll see, research indicates that being consciously open in this way can help us to overcome confirmation bias and motivated reasoning.

Decoupling If we encounter facts that fit with our beliefs, we immediately have a good feeling about them. We expect them, and we take them to provide strong support for our beliefs. However, if we encounter facts that fit better with alternative views, we tend to ignore them or consider them only weak support for those views. As a result, it feels like we keep encountering strong evidence for our views, and weak evidence for alternatives. Naturally, this makes us even more confident. The problem with this process is it that it lends our initial reasons for a belief far more influence than all subsequent information we encounter, thereby creating the evidence primacy effect that we encountered in Chapter 1. If we listen to the prosecutor first and start believing that the defendant is guilty, we allow that belief to color our assessment of the defendant's evidence. If we listen to the defense lawyer first, it's the other way around. We start off in the grip of a theory, and it doesn't let go, even in the face of further evidence to the contrary. What this means is that confirmation bias affects us not only at the search stage of reasoning, but also at the evaluation stage. And like other instances of confirmation bias, this can occur whether or not our initial belief is motivated. Suppose there is an election approaching and my first inclination is to expect a certain political party to win. Even if I don't care who wins, the outcome that seemed plausible to me at first is the one on which I'll focus. Sources of evidence that support that view will tend to seem right, since they agree with what I already think. And things only get worse if I really want that political party to win. In that case, the belief that they will win is not only my first view but also my favored view, so I will actively seek flaws in sources of evidence that might undermine it [14]. By contrast, good reasoning requires that we evaluate potential evidence on its own merits. This means keeping the following two things separate: (1) our prior degree of confidence in a claim, and (2) the

strength of potential evidence for that claim. This is called decoupling . Evaluating the strength of potential evidence for a claim requires that we set aside the issue of whether the claim is actually true and ask what we'd expect to see both if it were true and if it were not true. Being able to sustain these hypotheticals is a highly abstract and effortful activity [15]. When someone presents what they take to be evidence for a claim, they are giving an argument for that claim. (In this text, an argument is a series of claims presented as support for a conclusion.) There are many strong arguments for false conclusions, and many bad arguments for true ones. Every interesting and controversial claim has some proponents who defend it with irrelevant points and others who appeal to genuine evidence. To distinguish these, we have to temporarily set aside our prior beliefs and evaluate the arguments on their own merits. Only then can we decide whether (and to what degree) our prior beliefs should be revised. In short, we can't fairly assess the whole picture if we immediately discredit potential evidence whenever it fails to support our initial view. But decoupling doesn't mean we set aside our prior beliefs forever. When we get a piece of evidence against a belief we have, we should first decouple and assess the strength of the evidence on its own merits. Having done that, we are in a position to weigh that new evidence against our previous reasons for holding the belief. Some of the later chapters in this textbook present a rigorous framework that explains exactly how this should be done.

The bias blindspot So to counteract confirmation bias we just need to remember to evaluate potential evidence on its own merits, right? Unfortunately, it turns out that this doesn't help much. In various studies about how people evaluate evidence, subjects were instructed to be "as objective and unbiased as possible," to "weigh all the evidence in a fair and impartial manner," or to think "from the vantage point of a neutral third party." These instructions made hardly any difference: in fact, in some studies they even made matters worse. People don't actually decouple even when they are reminded to [16]. The explanation for this is simple: we genuinely think we're already being unbiased. We think confirmation bias happens to other people, or maybe to ourselves in other situations. Even those who know about a cognitive bias rarely think that they are being biased right now, even when they know they are in precisely the kinds of circumstances that usually give rise to the bias. This effect is known as the bias blindspot [17].

It is not our biases that are our biggest stumbling block; rather it is our biased assumption that we are immune to bias.                  —Cynthia Frantz, 'I AM Being Fair'

One reason for this is that we tend to assume that bias in ourselves would be obvious—we expect it to feel a certain way. In other words, we expect bias to be transparent in the sense of Chapter 1. But it simply is not! Many of the cognitive processes that are biasing our evaluation of evidence are not accessible to introspection at all [18], and others require training and practice to recognize. We can see this in some of the follow-up interviews for the studies on confirmation bias. One famous study involved a large group of Stanford University students who already held opinions about the effectiveness of capital punishment in deterring murders. They were given two studies to read—one providing evidence for each side, respectively, as well as critical responses to the studies. The students showed a significant bias toward whichever study supported the conclusion they happened to agree with in the first place. As we'd expect, given our knowledge of confirmation bias, they also became more confident in their original view. What's striking about this study is the care with which the students examined the opposing evidence: they offered criticisms of sample size, selection methodology, and so on. They just happened to apply much stricter criteria to studies that threatened their initial view. Many subjects reported trying especially hard to be completely fair and give the other side the benefit of the doubt. It just happened that there were glaring flaws in the research supporting the other side! Several remarked that "they never realized before just how weak the evidence was that people on the other side were relying on for their opinions" [19]. This "completely fair" process consistently led both sides to greater certainty that they were right to begin with. The point is that even motivated confirmation bias operates under the radar. We don't consciously decide to apply selective standards. When faced with new evidence, it really seems like an honest assessment just happens to favor the view that we already started with. In other words: at the subconscious level we are skewing the evidence, while at the conscious level we are blithely ignorant of doing so. This is very convenient: it lets us believe what we want, while also considering ourselves to be fair and balanced in our assessment of the evidence.

This suggests an element of self-deception in motivated reasoning. We can't actually let ourselves notice that we are deceiving ourselves—or we wouldn't be deceived anymore! The psychology of selfdeception shows a fascinating pattern of covering up our own tracks so as to keep our self-image intact. For example, in a series of studies, subjects are asked to decide whether a positive outcome (a cash bonus, an enjoyable task, etc.) will be given to a random partner or to themselves. They're given a coin to flip if they want to, and then left alone to choose [20]. The researchers, however, had a way to tell whether they had flipped the coin. Unsurprisingly, the vast majority of people choose the positive outcome for themselves. What's more surprising is that whether people flipped the coin had no effect on whether they chose the positive outcome. In fact, people who had earlier rated themselves as most concerned about caring for others were more likely to use the coin, but just as likely to choose the positive outcome! Since they were alone when they flipped the coin, it seems they flipped it to convince themselves that they were being fair. If you flip a coin and it comes up the right way, you can just continue to think you made the decision fairly, and not think too much about how you would have responded had it come up the other way! If the coin comes up the wrong way, you can always try "double or nothing," or "forget" which side was supposed to be which. If all else fails, you can remind yourself that the coin could just as easily have come up the other way, so you might as well take the positive outcome for yourself. In other words, it seems people tossed the coin simply to preserve their own self-image. (Of course, none of them would realize or admit this!) A similar thing happens with motivated reasoning. If we knew we were reasoning in a biased way, that would defeat the whole purpose. We can only be biased in subtle ways that allow us to maintain the self-conception of being fair and impartial. This doesn't mean that it's impossible to catch ourselves showing little signs of motivated reasoning—for example, a slight feeling of defensiveness here, a flinch away from opposing evidence to the contrary there. But it takes vigilance and practice to notice these things for what they really are. This has important implications for how we think about people who disagree with us on an issue. Because we expect cognitive biases to be transparent, we simply introspect to see whether we are being biased, and it seems like we're not. The assumption that we can diagnose our own cognitive bias through introspection is called the introspection illusion. Meanwhile, since we can't look into other people's minds to identify a bias, we can only look at how they evaluate evidence! Since we think our own evaluation is honest, and theirs conflicts with ours, it's natural to conclude that they're the ones being biased. As a result, we systematically underestimate our own bias relative to theirs. Call this

phenomenon the biased opponent effect. Worse, since we assume that biases are transparent, we assume that at some level the other side knows that they're biased: they're deliberately skewing the evidence in their favor. This makes us think that they are engaging with us in bad faith. But, of course, that's not what's going on in their minds at all. It really does feel to both sides like they're the ones being fair and impartial [21]. Having learned all this about the bias blindspot, a natural response is to think: "Wow, other people sure do have a blindspot about their various biases!" But this is exactly how we'd react if we were also subject to a blindspot about our own bias blindspot—which we are.

Considering the opposite As we've seen, being told to be fair and impartial doesn't help with biased evaluation. But the good news is that there are some less preachy but more practical instructions that actually do help. In particular, there are two related mental exercises we can go through to help reduce biased evaluation when faced with a piece of evidence. The first is asking how we'd have reacted to the same evidence if we had the opposite belief. The second is asking how we would have reacted to opposite evidence with the same belief. Psychologists use the phrase considering the opposite for both strategies. Let's consider these strategies one at a time. The first is to ask: How would I have treated this evidence if I held the opposite view? We can implement this by imagining that we actually hold the opposite view, and asking ourselves how we would treat the evidence before us. This simple change of frame substantially impacts our reaction to potential evidence, making us more aware of—and receptive to—evidence that supports the opposing view. For example, suppose we believe to begin with that capital punishment makes people less likely to commit murder, so getting rid of it would increase the murder rate. We are given a study showing that in three recent cases where a state abolished capital punishment, the murder rate subsequently did increase. This would feel to us like a very plausible result—one that we would expect to be accurate—and so we are not on the lookout for potential reasons why it might be misleading. As a result, we take the study as strong evidence for our view. But now ask: how would we have reacted to this evidence if we had the opposite view, namely that capital punishment does not deter murders? The fact that the murder rate went up after abolishing capital punishment would have felt like an unexpected or puzzling result, and we'd immediately have

begin searching for other reasons why the murder rate might have gone down. For example, perhaps violent crime was rising everywhere and not just in those states. Or perhaps other changes in law enforcement in those states were responsible for the increase in murders. Or perhaps the authors of the study have cherry-picked the data and ignored states where capital punishment was abolished and the murder rate didn't change. And now it starts to feel like the study isn't providing such strong evidence that capital punishment deters murders. One dramatic way to get people to consider the opposing perspective is to have them actually argue for the opposite side in a debate-like setting. Studies using this method have found that it helps mitigate biased evaluation and even leads some participants to actually change their minds [23]. (This is striking because changing our minds, especially when it comes to a topic we feel strongly about, is very difficult to do.) But it isn't necessary to engage in an actual debate from the opposing point of view. Even just evaluating the evidence while pretending to take the opposing side can have a profound effect. In one study, subjects had to read extensive material from a real lawsuit, with the goal of guessing how much money the actual judge in that case awarded to the plaintiff. They were told they'd be rewarded on the accuracy of their guesses. But they were also randomly assigned to pretend to be the plaintiff or the defendant before reading the case materials. The result was that those who were in the mindset of "plaintiffs" predicted a reward from the judge that was twice as large as that predicted by the "defendants". Even though they were trying to be accurate in guessing what the judge actually awarded the plaintiff (so they could get the reward), they couldn't help but allow their assumed roles to influence their interpretation of the case materials as they read them [24]. A second version of this study helps show that the bias was operating on the evaluation of evidence, rather than just as wishful thinking on behalf of their assumed roles. In the second version, subjects were first asked to read the case materials, and only then assigned to their roles. This time, the roles made very little difference to their predictions, indicating that the subjects were actually aiming for accuracy and not just "taking sides". For example, taking on the role of the plaintiff after assessing the evidence didn't cause subjects to become more favorable towards the plaintiff's case. This suggests that the bias in the first study was affecting how subjects were assessing the evidence as they encountered it. The good news comes from a third version of this study, in which subjects were first assigned their roles, but were then instructed to think carefully about the weaknesses in their side's case and actually list them [25]. Remarkably, this simple procedure made the the discrepancy between the "defendant" and "plaintiff" disappear. Listing weaknesses in one's view requires employing System 2 to

examine our view carefully and critically—just as we would if we actually held the opposite view. The second strategy is to ask: How would I have treated this evidence if it had gone the other way? This one is a bit trickier to think about. Suppose again that we start off believing that capital punishment deters murders, and then learn that in three states that got rid of it, the murder rate went up. The second strategy tells us to imagine that the study had a different result: in those three states, the murder rate did not change. Again, we would find this puzzling and unexpected and start to look for reasons why the study might be misleading. Maybe the study authors are cherry-picking data and ignoring states where capital punishment was abolished and the murder rate went up. Maybe the murder rate was decreasing nationally, but did not decrease in those states because they abolished capital punishment. In a variation of the Stanford experiment about views on capital punishment, this way of thinking had a profound effect. In the original study, the two sides looked at the very same material and found reasons to strengthen their own views. Even explicitly instructing the subjects to make sure they were being objective and unbiased in their evaluations had no effect. However, a follow-up study found one method that completely erased the students' biased evaluation: instructing them to ask themselves at every step whether they "would have made the same high or low evaluations had exactly the same study produced results on the other side of the issue" [22]. The simple mental act of imagining that the study was confirming the opposite view made them look for ways in which this sort of study might be flawed. So why does telling people to consider the opposite help, but reminding them to be fair and unbiased does not? The answer is that people don't know how to carry out the instruction to "be unbiased" in practice. They think it means they should feel unbiased—and they already do! But instructing people to consider the opposite tells them how to be unbiased. After all, reasoning in an unbiased way is not a feeling; it's a set of mental activities. The exercise of considering the opposite might feel like a trick—but it's a trick that actually works to short-circuit the unnoticed bias in our evaluation.

Openness to revision If it feels like we're evaluating potential evidence fairly, but somehow our favored beliefs always remain untouched, then it's very likely that we're not really being open to alternative views. (For example, if we've never changed our minds on any important issue, it's worth asking ourselves: what are the odds that we happened to be right on all of these issues, all along?) After all, the only reason impartial evaluation is useful for our goal of accuracy is that it helps us identify the strongest evidence and revise our beliefs accordingly. But it can be hard, in the end, to push back against belief perseverance, grit our teeth, and actually revise our beliefs.

Why is this so hard? The psychologist Robert Abelson has noted that we often speak as though our beliefs are possessions: we talk about "holding," "accepting," "adopting," or "acquiring" views. Some people "lose" or "give up" a belief, while others "buy into" it. To reject a claim, we might say, "I don't buy that." Abelson also points out that we often use beliefs in much the same way that we use certain possessions: to show off good taste or status. We can use them to signal that we are sophisticated, that we fit in with a certain crowd, or that we are true-blue members of a social or political group. As with possessions, too, we inherit some of our beliefs in childhood and choose other beliefs because we like them or think people will approve of them—as long as the new beliefs don't clash too much with those we already have. "It is something like the accumulation of furniture," Abelson remarks. This explains our reluctance to make big changes: our beliefs are "familiar and comfortable, and a big change would upset the whole collection" [26]. If, at some level, beliefs feel like possessions to us, then it's understandable why we get defensive when they're criticized. And if giving up a belief feels like losing a possession, it's no wonder that we exhibit belief perseverance. (This will be especially true if we acquired that belief for a social purpose like signaling group membership.) If our goal is accuracy, though, then beliefs that linger in our minds with no support shouldn't be cherished as prized possessions. They're more like bits of junk accumulating in storage; they're only there because effort is needed to clear them out. We sometimes have trouble letting go of old junk we've kept in our attics and garages, even if it has no sentimental value. At some level, we are often just averse to the idea of giving something up. If that's the problem, then reframing the question can be useful. Rather than asking whether we should keep an item, we can ask whether we'd take it home if we found it for free at a flea market. If not, it has no real value to us. Likewise, for beliefs. Instead of asking, "Should I give up this belief," we can ask, "If this belief weren't already in my head, would I adopt it?" Framing things this way helps counteract the sense that we'd be losing something by letting it go. Now remember that the language of "holding" or "giving up" beliefs suggests a binary picture of belief. But of course, we can have a degree of confidence anywhere between certainty that a claim is false to certainty that it's true. When we discover that an old belief has less support than we initially thought, we may only need to gently revise of our degree of confidence in it. Loosening our grasp on a belief is easier if we remember that the choice between "holding it" or "giving it up" is a false one.

Unfortunately, changing our minds is often perceived as kind of failure, as though it means admitting that we should not have had the belief we gave up. But as we'll see in Chapter 5, this is a mistake. If our goal is accuracy, the beliefs we should have at any given time are beliefs that are best-supported by our evidence. And when we get new evidence, that often changes which beliefs are best-supported for us. So the fact that we used to hold the most reasonable belief, given our evidence, doesn't mean that it's still the most reasonable belief now. Revising our old beliefs in response to learning new facts is not "flipflopping" or being "wishy washy": it's called updating on the evidence, and it's what experts in every serious area of inquiry must do all the time. The real failure in reasoning is not changing our minds when we get new evidence. Sometimes, of course, we realize that we've been holding onto a belief that was actually unreasonable for us to accept. In that case, changing our minds really does involve admitting that we made a mistake. But if we don't do so, we're just compounding our original error—which is especially bad if the belief has any practical importance to our lives or those of others.

A man who has committed a mistake and doesn't correct it, is committing another mistake.               —Confucius

Section Questions 2-6 Match each item with the effect that it causes (not its definition). Take care to choose the best match for each answer. Premise

Response

1

introspection illusion



A

assuming we are being more honest than those who disagree with us

2

possibility freeze



B

no change

3

pretending to take the other side



C

too much confidence in our first or

favored view

4

being reminded to avoid bias in our evaluation



D

finding it easier to revise our beliefs

5

keeping in mind that our beliefs don't need to be on/off



E

reduction in biased evaluation

2-7 Which best describes how confirmation bias operates at the evaluation stage of reasoning?

A

when we are motivated to believe something, we construe potential evidence as favoring it

B

our first or favored beliefs influence our assessment of the strength of potential evidence

C

we assume that people on the other side of a controversial issue are evaluating information in a biased way, but we are not

D

we decouple our prior degree of confidence in a claim from the strength of a new piece of evidence

2-8 The text discusses studies in which people could flip a coin to make a decision in order to illustrate...

A

the lengths we go to believe that we're being fair even when we're not

B

that we too often allow ourselves to be influenced by random factors like coin tosses

C

that we tend to choose positive outcomes (e.g. cash bonuses) for ourselves

D

that we should never use random factors like coin tosses to make fair decisions

2-9 Subjects assessing studies that provided evidence about capital punishment...

A

were asked to guess how much money an actual judge awarded to the plaintiff, and predicted a higher award when pretending to be the plaintiff

B

successfully decoupled after being instructed to be fair and impartial in their assessment of evidence

C

overcame biased evaluation by taking great care to examine the evidence offered by the study that challenged their view

D

successfully decoupled after asking what they would have thought of a study if its result had gone the other way

2-10 Pretending to take the opposing side of an issue...

A

is a bad idea because it triggers a confirmation bias in the direction of the side we're pretending to take

B

does not work as well as thinking carefully about weaknesses in our own case

C

will allow us to perceive bias in ourselves through introspection

D

helps counter the confirmation bias we already have in favor of our side

Key terms Accuracy: the extent to which our beliefs reflect the way things actually are, much like a map reflects the way a territory is. The concept of accuracy applies not only to binary beliefs but also to degrees of

confidence. For example, if the cat is not on the mat, then believing that the cat is definitely on the mat is less accurate than believing that it's probably on the mat. Argument: a series of claims presented as support for a conclusion. Bias blindspot: the tendency not to recognize biases as they affect us, due to the fact that the processes give rise to them are not transparent, even when we recognize them in others. Biased opponent effect: a result of the introspection illusion. Given that we think our own reasoning is unbiased, and that our opponent comes to very different conclusions, we commonly conclude that their reasoning must be biased. Binary belief: treating beliefs as if they are on/off. For example, we either believe that the cat is on the mat or that the cat is not on the mat, without allowing for different degrees of confidence. Considering the opposite: a technique to reduce biased evaluation of evidence, where we ask ourselves either one of two questions: (i) How would I have treated this evidence had it gone the opposite way? or (ii) How would I have treated this evidence if I held the opposite belief? Decoupling: separating our prior degree of confidence in a claim from our assessments of the strength of a new argument or a new piece of evidence about that claim. Degrees of confidence: treating beliefs as having different levels of certainty. Just as we can be absolutely sure that x is true or false, we can have every level of certainty in between, e.g. thinking that x is slightly more likely to be true than false, very likely to be true, etc. . Evaluation stage: the second stage in the reasoning process, when we assess the strength of the potential evidence we’ve gathered. Evidence: a fact is evidence for a claim if coming to know it should make us more confident in that claim. The notion of evidence is more rigorously defined in Chapter 5. Introspection Illusion: the misguided assumption that our own cognitive biases are transparent to us, and as a result, and thus, that we can diagnose these biases in ourselves through introspection. Optional stopping: allowing the search for evidence to end when convenient; this may skew the evidence if (perhaps unbeknownst to us) we are more likely to stop looking when the evidence collected so far supports our first or favored view

Possibility freeze: the tendency to consider only a couple of possibilities in detail, and thereby end up overly confident that they are correct. Restricted search: the tendency not to seek out the full range of alternative views or the full range of evidence that favors each view. Along with biased evaluation, this is an instance of confirmation bias. Search stage: the first stage of the reasoning process, where we identify a range of possibilities and any evidence that may support them. Updating on the evidence: revising our prior beliefs in response to new evidence, so that our confidence in a belief will match its degree of support. Updating stage: the third and final stage of the reasoning process, when we revise our degree of confidence appropriately.

Footnotes [1] See Stanovich, West, & Toplak (2013; 2016). In the literature, motivated confirmation bias is sometimes called "myside bias"; however, to avoid proliferating labels, I'll just use the more informative term "motivated confirmation bias." [2] Baron (2009), pg 200. I've re-ordered the three attributes. [3] Aristotle, Metaphysics, Book 1. I've altered Kirwan's translation. [4] The contrast between these two mindsets is based on the distinction between the "soldier mindset" and the "scout mindset" in Galef (2017). For a few reasons I have changed the labels. [5] The map/territory metaphor of the relationship between mental and linguistic representation and the world comes from Korzybski (1958). [6] See Baron (2009), pp. 6-12. In line with Baron's three attributes of good reasoning (pg. 200), I have split his inference stage into evaluation of the strength of evidence and updating one's beliefs appropriately. [7] This is a riff on Julia Galef's term "explanation freeze" is intended to apply to cases beyond explanations. For evidence of this effect, see Kohler (1991); Dougherty et. al. (1997); Dougherty and Hunter (2003). [8] See Carroll (1978); Gregory et al. (1982); Levi et al. (1987); Koriat et al. (1980). [9] See for example, Haran, Ritov, & Mellers (2013).

[10] Hart et. al. (2009). [11] The combination of a strong argument and a credible source appears to be especially important: weak arguments from credible sources have been found in some studies to be less persuasive than weak arguments from less credible sources, perhaps based on the expectation that the credible source should have better arguments to make if any were available. [12] One effective way of getting people to focus on an alternative is to come up with reasons why that alternative might be true. See Slovic & Fischhoff (1977); Anderson (1982, 1983); Hoch (1985); Anderson & Sechler (1986); Hirt & Markman (1995). Koriat, Lichtenstein, & Fischhoff (1980) found that overconfidence was reduced when subjects listed counterarguments to their predictions. [13] See Ditto & Lopez (1992). [14] The term "biased assimilation" is sometimes used. See Kunda,(1990); Ditto & Lopez (1992); Dunning, Leuenberger, & Sherman (1995); Munro & Ditto (1997); Zuwerink & Devine (1996); Munro, Leafy, & Lasane (2004). [15] See Stanovich, West & Toplak (2013) for a discussion of decoupling in the context of motivated confirmation bias ("myside bias"), and Stanovich (2011), ch. 3, for a discussion of the more general cognitive ability. [16] See Lord, Lepper, & Preston (1984); Thompson (1995); Babcock and Loewenstein (1997); Frantz & Janoff-Bulman (2000). [17] See Ross, Ehrlinger, & Gilovich (1998); Wetzel, Wilson, & Kort (1981); Pronin, Lin, & Ross (2002); Pronin, Gilovich, & Ross (2004); Frantz (2006), Pronin & Kugler (2007). This research has found that the bias blindspot applies to a wide range of biases, though interestingly not the planning fallacy, which we'll encounter in Chapter 10. In one study, for example, subjects were told that in general, people generally overestimate the amount of money that they'd give to charity if asked. They were then asked how much they would give to a charity if asked. Knowing about the bias didn't help: the estimates were still far higher than what was actually donated by similar groups who had actually been asked to give (Hansen et. al. 2014). [18] See Banaji & Greenwald (1995), Lieberman et al. (2001); Wilson, Centerebar, & Brekke, (2002); Kahneman (2003); Ehrlinger, Gilovich, & Ross (2005); Kronin & Kugler (2007). [19] The original study is Lord et al. (1979), which provides selections from the interviews; the quoted text is from Lord & Taylor (2009), pg. 834, which provides additional information on the interviews. See also Sherman & Kunda (1989). [20] Batson et al. (1997); Batson et al. (1999); Batson, Thompson, & Chen (2002). [21] See Ross & Ward (1995); Pronin, Gilovich, & Ross, (2004); Pronin & Kugler (2007); Lord & Taylor (2009). For evidence that we may be more cynical about others' biases than necessary, while being far too optimistic about our own, see Kruger & Gilovich (1999). [22] See Lord et. al. (1984) [23] Budesheim & Lundquist (1999); Green and Klug (1990), pg. 466; De Conti (2013).

[24] Babcock & Loewenstein (1997); Babcock, Loewenstein, & Issacharoff (1997). [25] See Babcock, Loewenstein, & Issacharoff (1997); see also Koriat, Lichtenstein & Fischhoff (1980). [26] Abelson (1986).

References Abelson, R. P. (1986). Beliefs are like possessions. Journal for the Theory of Social Behaviour, 16(3), 223250. Babcock, L., Loewenstein, G., & Issacharoff, S. (1997). Creating convergence: Debiasing biased litigants. Law & Social Inquiry, 22(4), 913-925. Babcock, L., & Loewenstein, G. (1997). Explaining bargaining impasse: The role of self-serving biases. Journal of Economic Perspectives, 11(1), 109-126. Bacon, F. (1620; translation 1869). Novum Organum. From The Works of Francis Bacon, Vol. 8, eds. Spedding, J., Ellis, R., & Heath, D.: Hurd and Houghton. Baron, J. (2000). Thinking and deciding. Cambridge University Press. Batson, C. D., Kobrynowicz, D., Dinnerstein, J. L., Kampf, H. C., & Wilson, A. D. (1997). In a very different voice: unmasking moral hypocrisy. Journal of Personality and Social Psychology, 72(6), 1335. Batson, C. D., Thompson, E. R., & Chen, H. (2002). Moral hypocrisy: Addressing some alternatives. Journal of Personality and Social Psychology, 83(2), 330. Batson, C. D., Thompson, E. R., Seuferling, G., Whitney, H., & Strongman, J. A. (1999). Moral hypocrisy: appearing moral to oneself without being so. Journal of Personality and Social Psychology, 77(3), 525. Bond, R., & Smith, P. B. (1996). Culture and conformity: A meta-analysis of studies using Asch's (1952b, 1956) line judgment task. Psychological Bulletin, 119(1), 111. Budesheim, T. L., & Lundquist, A. R. (1999). Consider the opposite: Opening minds through in-class debates on course-related controversies. Teaching of Psychology, 26(2), 106-110.

Carroll, J. S. (1978). The effect of imagining an event on expectations for the event: An interpretation in terms of the availability heuristic. Journal of Experimental Social Psychology, 14(1), 88-96. De Conti, M. (2013). Dibattito regolamentato e sua influenza sugli atteggiamenti dei partecipanti. Psicologia dell’Educazione, 7(1), 77-95. Ditto, P. H., & Lopez, D. F. (1992). Motivated skepticism: Use of differential decision criteria for preferred and nonpreferred conclusions. Journal of Personality and Social Psychology, 63(4), 568. Dougherty, M. R., Gettys, C. F., & Thomas, R. P. (1997). The role of mental simulation in judgments of likelihood. Organizational Behavior and Human Decision Processes, 70(2), 135-148. Dougherty, M. R., & Hunter, J. E. (2003). Hypothesis generation, probability judgment, and individual differences in working memory capacity. Acta Psychologica, 113(3), 263-282. Dunning, D., Meyerowitz, J. A., & Holzberg, A. D. (1989). Ambiguity and self-evaluation: The role of idiosyncratic trait definitions in self-serving assessments of ability. Journal of Personality and Social Psychology, 57(6), 1082. Ehrlinger, J., Gilovich, T., & Ross, L. (2005). Peering into the bias blind spot: People’s assessments of bias in themselves and others. Personality and Social Psychology Bulletin, 31(5), 680-692. Frantz, C. M. (2006). I AM being fair: The bias blind spot as a stumbling block to seeing both sides. Basic and Applied Social Psychology, 28, 157–167. Galef, Julia (2017). Why you think you’re right, even when you’re wrong. Accessed at Ideas.ted.com. (Exact quote in this chapter is from the video version.) Gilovich, T. (2008). How we know what isn't so. Simon and Schuster. Green III, C. S., & Klug, H. G. (1990). Teaching critical thinking and writing through debates: An experimental evaluation. Teaching Sociology, 462-471. Gregory, W. L., Cialdini, R. B., & Carpenter, K. M. (1982). Self-relevant scenarios as mediators of likelihood estimates and compliance: Does imagining make it so?. Journal of personality and social psychology, 43(1), 89. Hansen, K., Gerbasi, M., Todorov, A., Kruse, E., & Pronin, E. (2014). People claim objectivity after knowingly using biased strategies. Personality and Social Psychology Bulletin.

Haran, U., Ritov, I., & Mellers, B. A. (2013). The role of actively open-minded thinking in information acquisition, accuracy, and calibration. Judgment and Decision Making, 8(3), 188. Hart, W., Albarracín, D., Eagly, A. H., Brechan, I., Lindberg, M. J., & Merrill, L. (2009). Feeling validated versus being correct: a meta-analysis of selective exposure to information. Psychological bulletin, 135(4), 555. Hirt, E. R., & Markman, K. D. (1995). Multiple explanation: A consider-an-alternative strategy for debiasing judgments. Journal of Personality and Social Psychology, 69(6), 1069. Hoch, S. J. (1985). Counterfactual reasoning and accuracy in predicting personal events. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11(4), 719. Koriat, A., Lichtenstein, S., & Fischhoff, B. (1980). Reasons for confidence. Journal of Experimental Psychology: Human learning and memory, 6(2), 107. Korzybski, A. (1958). Science and sanity: An introduction to non-Aristotelian systems and general semantics. Institute of General Semantics. Kruger, J., & Gilovich, T. (1999). "Naive cynicism" in everyday theories of responsibility assessment: On biased assumptions of bias. Journal of Personality and Social Psychology, 76, 743-753. Kundu, P., & Cummins, D. D. (2013). Morality and conformity: The Asch paradigm applied to moral decisions. Social Influence, 8(4), 268-279. Latané, B., & Wolf, S. (1981). The social impact of majorities and minorities. Psychological Review, 88(5), 438. Levi, A. S., & Pryor, J. B. (1987). Use of the availability heuristic in probability estimates of future events: The effects of imagining outcomes versus imagining reasons. Organizational Behavior and Human Decision Processes, 40(2), 219-234. Lord, C. G., Ross, L., & Lepper, M. R. (1979). Biased assimilation and attitude polarization: The effects of prior theories on subsequently considered evidence. Journal of Personality and Social Psychology, 37(11), 2098. Lord, C. G., Lepper, M. R., & Preston, E. (1984). Considering the opposite: a corrective strategy for social judgment. Journal of Personality and Social Psychology, 47(6), 1231.

Lord, C. G., & Taylor, C. A. (2009). Biased assimilation: Effects of assumptions and expectations on the interpretation of new evidence. Social and Personality Psychology Compass, 3(5), 827-841. Mullen, B., Brown, R., & Smith, C. (1992). Ingroup bias as a function of salience, relevance, and status: An integration. European Journal of Social Psychology, 22(2), 103-122. Munro, G. D., Leafy, S. P., & Lasane, T. P. (2004). Between a Rock and a Hard Place: Biased Assimilation of Scientific Information in the Face of Commitment. North American Journal of Psychology, 6(3). Pronin, E., Lin, D. Y., & Ross, L. (2002). The bias blind spot: Perceptions of bias in self versus others. Personality and Social Psychology Bulletin, 28(3), 369-381. Pronin, E., & Kugler, M. B. (2007). Valuing thoughts, ignoring behavior: The introspection illusion as a source of the bias blind spot. Journal of Experimental Social Psychology, 43(4), 565-578. Ross, L., Ehrlinger, J., & Gilovich, T. (1998). The Bias Blindspot and its Implications. Contemporary Organizational Behavior in Action. Stanovich, K. (2011). Rationality and the reflective mind. Oxford University Press. Stanovich, K. E., West, R. F., & Toplak, M. E. (2013). Myside bias, rational thinking, and intelligence. Current Directions in Psychological Science, 22(4), 259-264. Stanovich, K. E., West, R. F., & Toplak, M. E. (2016). The rationality quotient: Toward a test of rational thinking. MIT Press. Thompson, L. (1995). " They saw a negotiation": Partisanship and involvement. Journal of Personality and Social Psychology, 68(5), 839. Wilson, T. D., Centerbar, D. B, & Brekke, N. (2002). Mental contamination and the debiasing problem. In T. Gilovich, D. W. Griffin, & D. Kahneman (eds.), Heuristics and biases: The psychology of intuitive judgment, 185-200.

Image Credits

Banner image of person looking over landscape: rdonar/Shutterstock.com, cropped from original; Green toy soldiers: image licensed under CC0; Map in book with glasses: image by Dariusz Sankowski licensed under CC0; Bicyclist with hills in background: image licensed under CC0; Broken chain-link fence close-up: image by Johannes Plenio licensed under CC0; Decoupled train wagons: Anderl/Shutterstock.com, cropped from original; Rearview mirror: image by J.W.Vein, licensed under CC0 / cropped from original; Coin on table: image by Konstantin Olsen, licensed under Pexels license; Glass ball on wooden boards inverting light: image by Manuela Adler licensed under Pexels license; Statue of Justice: image by Edward Lich licensed under Pixabay license; Vintage items at flea market: image by Thomas Ulrich licensed under CC0 / cropped from original; Plane seen through rainy window: image licensed under CC0; Silhouette of two people on hillside, one pointing: image by Mircea Ploscar licensed under CC0/ cropped from original; One bright ovoid object among many dark ones: image by JoBisch licensed under CC0.

Reason Better: An interdisciplinary guide to critical thinking © 2019 David Manley. All rights reserved. version 1.4 Exported for Jason Bao on Tue, 12 Sep 2023 21:18:44 GMT

Exported for Jason Bao on Tue, 12 Sep 2023 21:19:39 GMT

Reason Better An interdisciplinary guide to critical thinking

Chapter 3. Clarity

Introduction In the first two chapters, we considered some factors that operate below the level of awareness to affect our reasoning, causing us to make systematic errors. These factors have the strongest influence when our minds are cloudy. For example, when we're stressed or tired, it's harder to reason carefully. In such situations, we're more likely to form beliefs or make decisions without understanding why we did so. Mental clarity helps us resist these effects. The purpose of this chapter is to introduce three kinds of clarity essential to good reasoning: Clear inferences: how exactly do our reasons support our beliefs? Clear interpretation: how should we understand the reasons people give for their beliefs? Clear language: how can we avoid confusion when giving reasons for our beliefs?

The task of philosophy is to make thoughts clear.             —Ludwig Wittgenstein

Learning Objectives By the end of this chapter, you should understand: the two criteria for good inferences and arguments the difference between deduction and induction the trade-off between plausibility of premises and suppositional strength what sorts of beliefs belong at the "ground floor" of a structure of beliefs how to use interpretive charity in reconstructing arguments the differences between ambiguity, vagueness, and generality why real and important categories can still have borderline cases

3.1 Clear inferences When our reasoning is muddled, we're not sure exactly why we have the beliefs that we do, or make the decisions that we make. When our reasoning is crystal-clear, we can easily identify how our beliefs are supported, and which facts we take to support them. Suppose I come to believe that a certain star in the sky is the North Star because I have two other beliefs:       • Alice said so; and    • Alice is an expert on stars When I arrive at a belief because I take it to be supported by other beliefs, I've made an inference. In this case, neither of my supporting beliefs taken alone provides much support for the new belief. Alice's opinion does not lend strong support unless her views on stars are reliable. And her

expertise is irrelevant unless she said something about that particular star. It is only taken together that these two beliefs provide strong support for my new belief that the star I have in mind is the North Star. If someone asks me why I have this belief, I can express my inference in the form of an argument—which, as we saw in the previous chapter, presents a series of claims as support for a conclusion. I could say, "Alice said that's the North Star. And she's an expert on stars. So: that's the North Star." The statements that express my supporting beliefs are called premises, and the statement that expresses the belief they support is called the conclusion. If we want to express an inference clearly, we need to make it clear what the premises and conclusion are, and also how the premises support the conclusion. Note that the word "argument" is often informally used to describe a heated discussion, but an "argument" in our sense needn't even be an attempt to persuade. We might write down our conclusion and the support we think it has simply in order to get clear about one of our own inferences. Or we might be explaining to someone why we have a belief, without the goal of persuading them to accept that belief. In such cases, we are expressing an argument for the purposes of clarity rather than persuasion. Sometimes, when we share one of our inferences with other people in the form of an argument, they find the argument unpersuasive. But with the right mindset, this shouldn't make us defensive; it should genuinely interest us to learn why others aren't persuaded. If they can show us a weakness in the argument, then our reasons for believing were not as strong as we supposed. They might even save us from an inaccurate picture of the world!

The two elements Once we are clear about the reasons we have for a belief, we are in a better position to assess the quality of the inference: how well is our belief supported by those reasons? Or, to use the language of arguments: how well do the premises of the argument support its conclusion? Think of a climber suspended by ropes. She is well-supported as long as: (1) she has good ropes; and (2) they are adequately tied to her harness. Both things need to be true for the climber to be well-supported. Even the best ropes are no use if they're not securely tethered to her harness; and expert knots are no use if the ropes are too weak to hold the climber's weight. It's crucial to note that the two conditions are independent from each other. If

the climber falls because the ropes break, you can't blame the person who tied the knot. But if the climber falls because the knots were badly tied, the ropes themselves aren't to blame. Similarly, the premises of an argument must support the weight of its conclusion in two ways. If you know that the premises are true, they are like dependable ropes: you know you can trust them. But you can only trust them to hold up a particular conclusion if they are also properly connected to it! So, a good argument has two features: (1) the argument has true premises; and (2) they are adequately tied to the conclusion Again, these conditions are independent. It's no use knowing that your premises are true if they aren't relevant to the conclusion. And false premises can't properly support a conclusion, even if they are so closely tied to it that their truth would guarantee the conclusion. Logic is the study of the connection between premises and conclusions. In focusing on this aspect of an argument, logicians set aside the question of whether the premises are actually true. Just as the quality of a knot is a separate matter from the quality of the rope, how well a premise is tied to its conclusion is a separate matter from its truth. Logicians study the knot and not the rope. Climbing knots are adequate if they would support a climber—supposing the ropes are also adequate! Likewise, an argument's premises are adequately tied to their conclusion if they would support the conclusion—supposing the premises are true!

Suppositional strength This feature of an argument—whether its premises support its conclusion supposing they are true—we will call its suppositional strength. To assess an argument's suppositional strength, we ask:     • How much evidence do its premises provide for its conclusion if we suppose they are true? As we will see in later chapters, evaluating the suppositional strength of an argument helps us figure out how much to adjust our confidence in the conclusion if we learn that the premises actually are true If we know that our argument is suppositionally strong and we are sure that its premises are true, then we are

fairly safe in accepting its conclusion. (This is because, as we have seen, a good argument is both suppositionally strong and also has true premises.) Just as bad ropes can be well-tied, it's easy to see that an argument with false premises can still be suppositionally strong. All it takes is for the argument to have premises that would support the conclusion, supposing they are true. Consider this example:          All giraffes can fly.          I am a giraffe.          So: I can fly. The premises are obviously false. But the argument is suppositionally as strong as any argument can be —if we suppose the premises are true, we have to accept the conclusion as well. There would be no way to avoid the conclusion if the premises were true. So, even if we happen to know that the premises of an argument are false, we have to set that aside and remember that the argument can still be suppositionally strong, as long as if the premises were true, they would provide good reason to accept the conclusion Of course, the giraffe argument is still completely hopeless because we know the premises are false. A good argument—one that adequately supports its conclusion—needs true premises as well as suppositional strength. And having just one feature isn't enough to make an argument sort of good or even slightly good. Next, just as good ropes can be poorly tied, an argument can have obviously true premises that don't support its conclusion at all. Such an argument will have no suppositional strength and also be hopeless:         All cats are mammals.         Some mammals bark.         So: some cats bark. In this case, it's easy to see that the argument is suppositionally weak. We know that the premises are true, and yet they don't give us any reason to think that some cats bark. Even if we had no idea whether cats bark, learning that cats are mammals and that some mammals bark wouldn't give us a reason to conclude that cats are among the mammals that bark. Now consider this similar argument with a true conclusion:

        All cats are mammals.         Some mammals bite.         So: some cats bite. This argument is just as bad as the previous one, but it's easy to get distracted by the fact that the conclusion, this time, happens to be true. In a good argument, the premises must adequately support the conclusion. And in this argument, they don't. In fact, the bite argument is suppositionally weak for exactly the same reason that the bark argument is. If we had no idea whether any cats bite, learning that cats are mammals and that some mammals bite wouldn't give us a reason to conclude that cats are among the mammals that bite. We saw in the previous chapter that decoupling is critical for properly evaluating the reasons we have for a belief. When we look at the two arguments above, the bite argument looks better only because of biased evaluation: we already accept the conclusion, and that makes the rest of the argument seem pretty good. (We might even fall into the trap of thinking that the conclusion is guaranteed by the premises!) But in fact, the arguments are equally bad: in this case, the bark is no worse than the bite. So studying logic can help us fix some of our systematic errors. By teaching us to separate truth from suppositional strength, for example, it helps us decouple our prior beliefs about a conclusion from our assessment of the reasons being given for it.

Implicit premises Sometimes the suppositional strength of an argument depends on information that is taken for granted but not stated. For example, consider this one-premise argument:          Fido is a dog.          So: Fido has a tail. Learning that Fido is a dog gives us a reason to think that Fido probably has a tail, but that's only because we already know that most dogs have tails. In a conversation where we can assume that everyone knows that most dogs have tails, someone presenting this argument can leave that fact unspoken. When some claim is being taken for granted in this way, we can treat it as part of the argument for purposes of assessing the argument's suppositional strength. Things do not always work out so well, however. In making an argument, we might take something for granted without realizing that it's not common knowledge to the intended audience. Or we might skip over a premise because it's questionable and we're hoping no one notices! In any case, when we leave a

claim unspoken but take it for granted when making our argument, that claim is called an implicit premise. Of course, the exact content of an implicit premise is not entirely clear. For example, in the argument above, the implicit premise could be "Most dogs have tails," or "Almost all dogs have tails," or "Dogs typically have tails." For that reason, and because we are often wrong about whether other people share our assumptions, it's best to be as explicit as possible when presenting an argument. Spelling out all of our premises explicitly might feel long-winded, but it can be revealing for us as well as our partners in communication. It's common to discover that something we were taking for granted suddenly begins to seem much less obvious as soon as we start trying to state it in clear and explicit terms.

Deductive vs. inductive So suppositional strength comes in degrees. The highest degree of suppositional strength is when the connection between the premises and conclusion is so tight that there is no wiggle room at all. We say that the premises of an argument entail the conclusion when they guarantee it completely: if the premises were true, the conclusion would have to be true. No exceptions: there is no way that the conclusion could be false and the premises true. So if you know that the premises of an argument entail its conclusion, then the argument is as suppositionally strong as it can be—learning the premises would make the conclusion inescapable. But again, even arguments with the highest level of suppositional strength can be terrible arguments. Recall the argument from above:          All giraffes can fly.          I am a giraffe.          So, I can fly. There's no way for the premises to be true and the conclusion false: the premises entail the conclusion. But the argument is still not going to convince anyone, because its premises are obviously false. The process of arriving at a conclusion because we accept premises that entail that conclusion is called deduction. When an argument is presented as entailing its conclusion, we say that the argument is deductive. (We call it that even if the premises don't actually entail the conclusion—for example, if the speaker is making a logical mistake.) And when an argument's premises actually do entail its conclusion, we'll say that the argument is deductively valid.

However, most arguments that people give in real life are not intended as deductive. Usually, the truth of the premises is not supposed to guarantee the truth of the conclusion. Instead, the premises are just intended to provide support for the conclusion—perhaps a great deal of support, but not so much that the conclusion follows with the certainty of mathematical proof. When we arrive at a conclusion because we accept some premises that merely support the conclusion, we call that induction. And any arguments presented as merely supporting their conclusions are called inductive arguments. (The Fido argument above is one example.) Correspondingly, the study of suppositional strength for deductive arguments is called "deductive logic", while the study of suppositional strength for inductive arguments is called "inductive logic".

The tradeoff In many cases, we could offer either a deductive or inductive argument for a conclusion. For example, consider these two arguments:         All ravens are black.         So, if there is a raven in that tree, it's black.         All ravens I have observed so far have been black.         So, if there is a raven in that tree, it's black. The first argument is deductively valid; the second argument is not. However, the second argument may have been meant as an inductive argument rather than a deductive one. So which argument is better? We might be tempted to think that it's always better to have an argument whose premises entail its conclusion. But, on the other hand, the first argument's premise is extremely hard to establish as true. If someone were to challenge it, we would probably have to provide an additional argument to back it up—perhaps even using the very premise that's used in the second argument! In real life, most of us have only observed black ravens but are also aware that most species have some albino members. So neither argument will convince us with certainty, but for different reasons. One has a doubtful premise that would guarantee its conclusion if it were true, while the other has a wellsupported premise that does not guarantee its conclusion. This comparison illustrates the fact that,

when we are deciding how to formulate an argument, there is often a trade-off between suppositional strength and how much support the premises have. Navigating this tradeoff requires careful attention to the argument's context. For example, suppose the speaker thinks that all ravens are black, based on a limited number of observations. In that case, the second argument is more informative because it provides the reason for the speaker's generalization. An argument should articulate the structure of reasoning in the speaker's mind, and the observational premise is more fundamental in that structure than the premise that all ravens are black. In other words, the second argument gets to the real reason why the speaker expects the raven in the tree to be black. However, even the second argument is presented in an overconfident manner. It would be better to add "probably" to the conclusion, to clarify that the argument is not meant to be deductively valid.

The ground floor As we have seen, evaluating an argument involves asking:     •   whether the premises are true    •   whether the premises are adequately tied to the conclusion We've been focusing on the second requirement, but haven't said much about the first. How do we know the premises are true? If our premises support our conclusion, what supports our premises? For example, consider again the argument that a certain star is the North Star. The premises were: Alice said so; and she's an expert on stars. But why should I think the premises are true? Maybe she wasn't talking about the same star. Or maybe she's only pretending to be an expert. So, now my two supporting beliefs also require support. Many of our beliefs are supported by multiple tiers of more fundamental beliefs. Picture a building with several floors, where the top floor is directly supported by the floor below, and that floor is in turn supported by the floor below it. But, of course, the lowest floor of such a structure is supported directly by the ground. This raises the question: what would correspond to the ground floor of a well-built structure of beliefs? These would be beliefs that don't require support from other beliefs, or maybe don't require any support at all. Let's consider two sorts of beliefs that seem to fall in this category. First, there are beliefs that are supported directly by perception, rather than by other beliefs. These are directly perceptual beliefs. If I look up and see an apple on the table in front of me, I will form the belief that an apple os there. If someone asks, "Why do you think an apple is there?", I will say "Because I see

it." My belief that there's an apple on the table doesn't seem to be based on any other beliefs—except maybe the belief that apples look like this and that I seem to see this. Of course, it's possible that there's no apple at all. Maybe someone spiked my cornflakes with LSD, and I'm hallucinating. In that case, I seem to see an apple but there is no apple at all. But I would have to be pretty unlucky for that to happen. If I have a history of reliable sense-perception, and no reason to suspect anything strange is going on, then just accepting what I clearly perceive with my senses is a fairly reliable way to form beliefs. I will set aside some tricky questions about what counts as a directly perceptual belief. For example, if I'm looking at an apple, is my belief that there is an apple right there directly perceptual, or is it supported by the belief that I seem to see an apple? Either way, my belief that there is an apple ultimately rests on a perceptual process, so the reliability of that belief depends on the reliability of that perceptual process. Another tricky question has to do with exactly what sorts of cognitive processes count as directly perceptual, in the sense that they don't require further support from other beliefs. For example, would a strong feeling that someone is watching me count as a directly perceptual belief? Should we treat religious experience as a kind of direct perception, offering a foundation of support for religious beliefs? And if so, how can we assess the reliability of these ways of forming beliefs? Again, I will have to set such questions aside in this text; but if you are interested in pursuing them more deeply, I encourage you to take a class in epistemology, the branch of philosophy that studies belief, knowledge, and rationality. Another traditional category of beliefs that can be treated as being on the ground floor of our belief structures are self-evident beliefs. These beliefs are so obviously true that we don't know how we would even go about supporting them with evidence. For example: 1 + 1 = 2. Green is a color If someone was murdered, they died. If x is taller than y, and y is taller than z, then x is taller than z. What would evidence for these claims even look like? Suppose we want to check whether everyone who was murdered has died. This would involve identifying all the murders and seeing if the victims died. But death is one of the criteria we would use to identify cases of murder in the first place!

There are hard philosophical questions about how we know self-evident things. One view is that we know some of them simply by understanding certain concepts. According to this view, understanding the concepts green and color is all it takes to know that green is a color. There are also hard questions about which beliefs are really selfevident. For example, consider a clear moral truth like inflicting pain for no reason is bad. Some philosophers hold that obvious moral truths are self-evident, while others think they are known through a kind of moral sense that is more like perception. Whatever the answers to these questions, we should be very careful when treating a claim as self-evident, because as we will see, we have a tendency to think that things are far more clear than they really are. We humans have a bad track record when it comes to things we consider to be completely obvious. For example, there have been many studies in which people are asked to rate their level of confidence in various ordinary and verifiable claims. Sadly, this research finds that when we say we know something with 100% certainty, we are correct only about 80% of the time; and when we claim 90% confidence, we are correct only about 70% of the time. [1]

Section Questions 3-1 If I believe something based on the support of other beliefs, then...

A

I have made an inference

B

I have made a claim

C

I have made a deductive argument

D

I have made an inductive argument

3-2 If the truth of the premises in an argument does not guarantee the truth of the conclusion, then...

A

the argument is not an acceptable argument

B

the argument is not suppositionally strong enough to make an inference

C

the argument should not be presented as deductive but may still be a good argument

D

there is no reason to accept the conclusion even if you know the premises are true

3-3 For an argument to be suppositionally strong means...

A

that its premises are convincing and would give us good reason to accept the conclusion

B

that its premises are well-supported and provide strong evidence for the conclusion

C

that its premises would give us good reason to accept the conclusion if they were true

D

that it is a good inductive argument but should not be presented as deductive

3-4 If we have support for a claim that we can present either as an inductive or a deductive argument...

A

the inductive version is likely to have better-supported premises

B

the deductive version will be suppositionally stronger and thus a better argument

C

there can't be a deductive version whose premises entail its conclusion

D

it never matters which type of argument we choose because both can be good arguments

3-5 Which of the following is not true?

A

Directly perceptual beliefs need not be supported by other beliefs

B

It is reasonable to treat any claim as self-evident as long as we are clear in our argument that we are doing so

C

Usually, believing what our senses clearly present to us is a fairly reliable way of forming beliefs

D

If we think we know something with certainty, there is often a decent chance we are wrong.

3.2 Clear interpretation Getting clear about our own reasoning requires understanding exactly what it is that we believe, as well as the structure of our beliefs. Likewise, we should be clear when we express our reasoning in the form of arguments. This means we should: state every premise and conclusion in clear language make the structure of the argument clear If you are reading or hearing an argument, it can take some effort to figure out exactly what the premises and conclusion are, and what the structure of the argument is supposed to be. This can be equally hard if you are the one putting the argument forward!

It's often useful to begin by identifying the conclusion. Ask yourself: what is the argument driving at? Which claim is supposed to be supported, and which other claims are intended to support it? What, if anything, does the author or speaker want to convince me of? Next, try to find all of the claims being made in support of the conclusion. Sometimes a premise provides support for the conclusion all on its own; sometimes it only provides support for the conclusion after being linked with other premises. And, in a bad argument, a premise might provide no support for the conclusion at all. What makes it a premise is that it is intended as support for the conclusion. Certain expressions often function to indicate the presence of a premise: since, given that, seeing as, after all. These are called premise indicators. But they are not very reliable, because they can be used in other ways as well. For example, in the sentence: Since Alice is happy, her team must have won the game the word "since" is used as a premise indicator. The fact that Alice is happy is being offered as a reason to conclude that her team must have won the game. But in the sentence: Since yesterday, I've had a terrible headache the word "since" is not a premise indicator; it just indicates the duration of the headache. The only way to tell when such words are being used as premise indicators is to consider what role they're playing in their context. Likewise for conclusion indicators like therefore, so, thus, and as a result. For example, "so" is obviously not being used as a conclusion indicator when it is used as an intensifier, as in "I am so-o-o done with this crap about indicators." In short, these indicators are worth paying attention to, but they don't take the place of asking what claim the argument is supposed to establish, and what reasons are being offered for that claim.

Standard form Often we are not very clear in our own minds about exactly what our reasons for a belief are, or how exactly they are supposed to support that belief. Forcing ourselves to write down our reasons in the form of an argument can greatly help clarify our own thinking.

Getting clear on the structure of someone else's argument presents special challenges. Sometimes people fail to make their arguments clear because they are sneakily trying to convince us of something by means of a bad argument. But often they are just confused about the structure of the reasons in their own minds. If we can carefully re-state the argument, it helps everyone—even the person putting the argument forward—to assess whether the premises adequately support the conclusion. To make an argument as clear as possible, it's useful to write it down as an ordered series of sentences, with each premise and the conclusion labeled, as follows:     P1. If it's morning, then the direction of the sun is east.    P2. It's morning.    C1. The direction of the sun is east. (from P1 & P2 by deduction)    P3. We should be going east. C2. We should be going in the direction of the sun. (from C1 and P3 by deduction) This way of laying out an argument is called standard form. As you can see, this argument has two steps. First there's a mini-argument establishing C1, and then there's another mini-argument that uses C1 as a premise to argue for C2. This makes C1 an interim conclusion. For each conclusion, there's a remark in parentheses that tells us which premises (or previous conclusions) directly support that conclusion, and whether that support is inductive or deductive. Note that while P1, P2 and P3 all support C2 when taken together, the first two do so only indirectly by way of supporting C1. Since their support for C1 has already been noted, we don't list them again as directly supporting C2. Like final conclusions, interim conclusions are treated differently from premises. Premises should either be acceptable on their own or supported from outside the argument. But conclusions must always be supported by premises within the argument itself. Note also that if any of the inferences in a multi-step argument is inductive, the argument as a whole is inductive, because the final conclusion is not guaranteed by the argument's premises taken collectively.

Interpretive charity When interpreting someone's argument, we should always try to identify the best version of the argument that the author could plausibly have intended to put forward. This is known as the principle of charity. The point is not just that finding the best version of the argument is a nice thing to do. If we

are genuinely curious about the truth, then good arguments are far more useful to us than bad ones. We are far more likely to learn something from a good argument. In order to be as charitable as possible when reconstructing someone's argument in standard form, we sometimes have to look beyond what they explicitly state. In ordinary conversation, we often take things for granted because we think they are obvious or unremarkable. As a result, interpretive charity requires us to try to identify the most helpful implicit premises that could plausibly have been intended by the speaker. Often it requires careful attention to the context of an argument to figure out what a speaker or writer might have been assuming that would improve the quality of the argument. Another application of interpretation arises when we must asses whether an argument is intended as deductive or inductive. It can be very hard to tell which type of argument a speaker intended. People often present arguments overconfidently, as though the premises prove the conclusion, when actually they just provide some evidence for the conclusion. In that case, even if the argument fails as a deductive argument, it may have enough suppositional strength to be taken as a good inductive argument. Unless we are confident that the speaker intends their argument to be deductive, it is often more cooperative to treat it as inductive. (Often people present an argument without giving enough thought as to whether they intend it as deductive or inductive.) Suppose Alice says, "My pets are acting scared. And they get scared whenever it's thundering. So it must be thundering out." Alice may intend this as a deductive argument, but it's not suppositionally strong enough to be a good deductive argument: the premises could be true even if the pets are actually afraid of something else instead. Maybe the pets get scared when it's thundering and also when the mail carrier comes to the door. (Alice might be confused by the fact that the premises would entail the conclusion if her second premise were reversed: "Whenever they are scared, it's thundering".) On the other hand, it's also possible that Alice intended her argument as an inductive argument. Maybe she didn't mean to suggest that the conclusion is absolutely guaranteed by the premises: she just thinks the premises provide good reason to accept the conclusion. If we interpret her argument that way, it is a much better argument. The premises do provide some inductive support for the conclusion. (How can we tell? That's a question about inductive logic, which we will return to in Chapter 5.)

Reconstruction Let's consider two examples of arguments involving implicit premises. Suppose someone argues as follows:

"God must be the cause of the universe. After all, everything with a beginning has a cause. And what else could have caused the universe except God?" To reconstruct this in standard form requires first identifying the conclusion. The argument is attempting to establish that God created the universe, so even though that is the first line, it is still the argument's conclusion. The first premise is also clear: everything with a beginning has a cause. Next, the author asks a rhetorical question. A question isn't a statement about how things are, so it can't be a premise in that form. But from context, we know that we are supposed to give the answer "nothing," which gives us a premise we can write down: "Nothing could have caused the universe except God." Now how does this relate to the first premise? The author is taking for granted the premise that the universe has a beginning. That is an implicit premise that should be explicitly stated in our reconstruction. We now have:           Everything with a beginning has a cause.           The universe has a beginning.          ...           Nothing could have caused the universe except God.          So, God is the cause of the universe. But something else is missing. If everything with a beginning has a cause, and the universe has a beginning, then we can conclude that the universe has a cause. This time the implicit item is an interim conclusion, and we should state it. This gives us our full form:         P1. Everything with a beginning has a cause.         P2. The universe has a beginning.         C1. The universe has a cause. (from P1 & P2 by deduction)         P3. Nothing could have caused the universe except God.         C2. So, God is the cause of the universe. (from C1 & P3 by deduction) As you can see, this looks much more complex than the original argument. But it is a plausible reconstruction of what was intended by that argument, and it clarifies the structure of the argument so that we can begin to assess its strengths and weaknesses. The premises clearly entail the conclusion, so its suppositional strength is

impeccable. Those who reject the conclusion typically reject at least one of the three premises, usually the first and third. Let's consider one more example of reconstruction. Suppose someone says: "An infinitely good and powerful being wouldn't allow children to die horribly. So God does not exist." Again, there are some implicit premises. First, the speaker is taking for granted the well-known fact that some children die horribly. And second, the speaker is taking for granted the assumption that if God exists, God is an infinitely good and powerful being. (This is accepted by most people who believe in God, but is not entirely uncontroversial.) With these implicit premises made explicit, we can put the argument into standard form like this:      P1. If God existed, God would be infinitely good and powerful.      P2. An infinitely good and powerful being would not allow any children to die horribly.      C1. If God existed, no children would die horribly. (from P1 & P2 by deduction)      P3. But some children do die horribly.      C2. God does not exist. (from C1 and P3 by deduction) As stated, the premises entail the conclusion, and people can examine the premises to determine whether they are plausible. Usually, in this case, people end up disagreeing about whether an infinitely good and powerful being could have good reasons to allow such horrible events to occur. As with some of the previous arguments, it may be that the most charitable reconstruction of the speaker's original argument treats it not as a deductive argument, but as an inductive one. For example, perhaps the second premise should really be stated as "Probably such a being would not allow any children to die horribly." In that case, (C1) would be noted as following inductively rather than deductively. And we would construe the argument as intending to offer support—though not conclusive support—for the conclusion that God does not exist. Again, there is a trade-off here, because the revised premise is easier to accept, but the argument is less suppositionally strong because it doesn't guarantee the conclusion.

Section Questions

3-6 When someone presents an argument that seems unconvincing, we should proceed as though...

A

the premises are intended to guarantee the conclusion

B

there are no implicit premises that the speaker is taking for granted

C

the argument should not be treated as inductive

D

the argument was not explicitly presented in its most convincing form

3-7

Multiple answers: Multiple answers are accepted for this question

Suppose someone says: _ "Imposing taxes takes money from people without their consent; and any action that takes money from people without their consent is theft. So, you can't avoid the conclusion that raising taxes is theft!" _ Two of the answers below would NOT be lines included in a correct reconstruction of the argument. Which ones are they?

A

P1. Imposing taxes takes money from people without their consent.

B

P2. Any action that takes money from people without their consent is theft. (from P1 by induction)

C

P2. Any action that takes money from people without their consent is theft.

D

P3. You can't avoid the conclusion

E

C1. So raising taxes is theft (from P1 & P2 by deduction)

3-8

Multiple answers: Multiple answers are accepted for this question

Someone says, _ "I think dogs are better pets than cats. After all, it's easier to teach dogs to do tricks, which require intelligence. And also, since dogs care more about humans, they're more fun to be around. So that's two reasons why they're better pets." _ Suppose we want to reconstruct this argument, as it was intended by the speaker, using standard form. This is a complex and tricky problem; take care and make sure that the improvements you choose fit with a charitable reconstruction of what the speaker intended. _ Choose the two best ways to improve the following reconstruction. (To earn this point, you must get both answers right to get this point.) _ P1. It is easier to teach dogs to do tricks than to teach cats to do tricks. P2. Tricks require intelligence. C1. So dogs are smarter than cats (from P2 by deduction) P4. Dogs care more about humans than cats do C2. So dogs are more fun to be around than cats are (from P4 by induction) C3. So dogs are better pets than cats (from C2 by induction)

A

P2 should be labelled as directly supported by P1

B

C3 should be labelled as directly supported by P4 & C2

C

C2 should be labelled as a premise and not as a conclusion

D

C1 should be labelled as directly supported by P1 & P2 by induction

E

C3 should be labelled as directly supported by C1 & C2 by induction

F

C1 should be labelled as directly supported by P1 & P2 by deduction

3.3 Clear language When we express our own inferences in the form of arguments, or reconstruct arguments offered by others, we should also take care to be as clear we can with our language. This means trying to use language that is unambiguous and precise—at least, as far as possible.

Ambiguity If someone says they spent all day "at the bank," it might not be clear whether the mean the edge of a river or a financial institution. It's tempting to say "the word bank has two meanings," but that's not exactly correct. A linguist would not say that bank is one word with two meanings, but that there are two words that happen to be spelled b-a-n-k and pronounced the same way. Those two words share a written and spoken form: the same marks on paper or voiced syllables are used to represent both. There are many cases where two or more words share the same written and spoken forms: consider bat, trunk, racket, cross, fan, and many other examples. These written and spoken forms are ambiguous. "Ambiguous" is used informally for any situation lacking clarity; we often say that the end of a movie is ambiguous if it's unclear what really happened. But we are using "ambiguity" in a more technical sense here to refer to cases where two words or sentences share the same written and/or spoken form. When ambiguity occurs at the level of individual words, it's called lexical ambiguity. This can make for some terrible jokes (here I've put the ambiguous form in italics): The absent-minded professor was arrested for not finishing his sentence. Cannibals were not so bad; they only wanted to serve their fellow men. I still miss my ex-husband, but my aim is improving. Lexical ambiguity can also arise when we just leave a word out, and it is unclear which word was supposed to fill the gap. A major source of this kind of ambiguity occurs in English when our claims about groups of people or objects omit terms that would clarify number or quantity, like all, some, the, a, most, or three. A common structure for simple sentences in English is this: [determiner] [noun phrase] [verb phrase] For example: "All dogs bark," "Most cats are mean," "Twenty Canadians started laughing." If we leave off the determiner, we get "Dogs bark," "Cats are mean," and "Canadians started laughing." Nouns used in this way are called bare plurals and they are extremely common in English. They are also highly prone

to ambiguity. "Birds have wings" means that most or typical birds have wings, whereas "Birds are singing outside my window" only means that some birds are singing outside my window. Meanwhile, being told "Local birds carry avian flu" gives us very little sense of what proportion of local birds are afflicted. Unfortunately, people often use ambiguity to their advantage in rhetorical settings. They may, for example, say something ambiguous between a general claim and a much more limited claim. That way, the audience gets the impression that the general claim is true, but the speaker can always back off if challenged and pretend that only the more limited claim was intended. Bare plurals are a particularly common source of this nasty rhetorical trick. For example, suppose someone says: Young men have been turning to crime because they can't find jobs The bare plural makes it highly unclear what is being claimed. No one would find it surprising if there are some young men who turn to crime because they can't find jobs. But because the speaker thought this claim was worth making, we assume they mean more young men than usual or an alarming number of young men. However, if the speaker is challenged for evidence, they can back down to the more limited claim that some young men have turned to crime, and then support it with a few anecdotes. In practice, this means that people can use claims like this in order to strongly suggest something general and unsupported, while being able to back off to specific and uncontroversial claim when challenged. If you look for bare plurals, you will start to notice this ambiguity everywhere: when you see claims about immigrants or women or Republicans, ask yourself exactly what is being claimed and exactly what is supported by the evidence. (We'll return to this issue in Chapter 6.) Even when it's clear which individual words are being used in sequence, there can be ambiguity about which sentence structure is intended. This is called syntactic ambiguity. Jokes based on this kind of ambiguity were a favorite of the Marx Brothers—for example, "One morning I shot an elephant in my pajamas. How he got in my pajamas, I don't know." The ambiguity here is not a matter of which words are being expressed; instead, it concerns whether the phrase "in my pajamas" applies to the shooter or the elephant. Newspaper headlines are a major source of this kind of ambiguity, because they tend to leave out short words like articles that help clarify the sentence structure: Complaints about NBA referees growing ugly; Man shoots neighbor with machete;

Crowds rushing to see Pope trample man to death. Finally, of course, there are cases where both lexical and syntactic ambiguity are involved, as illustrated by another well-known line from Groucho Marx: "Time flies like an arrow, but fruit flies like a banana." The first clause sets you up to expect that the second will have the same structure, so it takes a moment to understand that second clause could be about the preferences of fruit flies rather than about the way fruit flies through the air.

Vagueness Some land formations are clearly mountains, like the Matterhorn, pictured below. Very small hills, on the other hand, are clearly not mountains. To put it differently, the word mountain clearly applies to some things, and clearly not to others. But somewhere between the Matterhorn and small hills, there are land formations that fall into a grey area. They don't fall clearly into the category of mountains, but they don't fall clearly into the category of non-mountains, either. And importantly, this is not because we need to know more about them: we might know their exact size and shape and still decide that they are on the fuzzy border of what we mean by mountain. The problem is not that we lack sufficient information about these land formations. The problem is that we don't use the word mountain in a way that is precise enough to sharply distinguish mountains from non-mountains. As a result, there are borderline cases where the word doesn't clearly apply, but also doesn't clearly not apply.

Imagine a formation that is clearly a mountain, eroding over hundreds of millions of years until it is just a very small hill. It would be absurd to point to one day in particular (or even year) as the very day when it stopped being a mountain. Instead, there are periods in which it's clearly a mountain, followed by periods in which it is a borderline case, and eventually periods in which it's clearly not a mountain.

When a word has borderline cases like this, we say it's vague: and the greater the range of its borderline cases, the more vague it is. As we will see, a great many descriptive terms, whether adjectives, common nouns, or verb phrases, are vague in this sense—and likewise for the concepts, categories, and distinctions that we use these terms to express. (The word mountain is vague, but so is the concept of a mountain, as well as the distinction between mountains and non-mountains.) Even though vagueness is everywhere, we seldom notice it. That's because vague words and concepts are perfectly fine and useful when we use them to apply to clear cases. For example, we can call a ripe tomato red without worrying that the word red is vague. But, of course, the word red is in fact vague: if you start with something that is red and then slowly fade out the color, you will eventually end up with something that is not red, but rather pink or even white.

And you won't be able identify a particular given millisecond at which the object switched from being red to being not-red. In other words, there are plenty of borderline cases of red. But this doesn't matter when we are saying that tomatoes are red and limes are not red: those are clear cases and the fact that there are also borderline cases doesn't affect the truth or even usefulness of our claims. It's worth noting that I've been simplifying things by using one-dimensional graphs. Red doesn't only shade off into white; it shades off into other colors as well. And likewise, there isn't really a onedimensional scale of being more or less mountainous: being a mountain involves a complex combination of factors, including size, shape, and distinctness from surrounding formations. A more obvious example of multi-factored vagueness is chair: there are several factors that make something count as a chair, and we can find borderline cases for each. For example, a chair must be a seat for a single person, so we can find objects that are borderline cases of chair due to their width, like the object on the left. But a chair must also have a back, so we can find borderline cases of chair due to only sort of having backs, like the item on the right.

Note that our definition of vagueness is a technical one. In everyday life, we use words like vague or ambiguous pretty loosely to describe a variety of situations in which communication is not clear. But experts on language have isolated different kinds of explanations for a lack of clarity in communication. And in particular, what they call ambiguity is a very different thing from what they call vagueness. In the interests of clarity, I've adopted the more precise terminology here. To illustrate the difference between ambiguity and vagueness, imagine we have many different objects of all kinds depicted on a piece of paper, and we are supposed to draw a circle around all the things that bat applies to. Due to the ambiguity of bat, we want to draw two circles: one around all the baseball bats, and another around all the little flying mammals. But if we're given a vague term like red, we want to draw a single circle with a fuzzy border—at least if the paper has both clear and borderline cases of red objects. Some objects clearly belong inside the circle while others clearly belong outside it, but there are also objects on the fuzzy border between them. It's also worth distinguishing both ambiguity and vagueness from a third feature of words—namely, generality. What makes an expression more general than another is simply that it applies to more things. So, for example, land formation is more general than mountain, because all things that are mountains are land formations, but not vice versa. Likewise, cutlery is more general than fork, and child is more general than toddler, etc. But just because one word is more general than another doesn't mean that it's more vague. To return to our example of drawing circles around objects: generality has to do with how many objects fall within the circle, while vagueness has to do with how many objects fall on the borderline, and ambiguity has to do with how many circles we need to draw [2].

Sharp borders fallacy Although we should try to minimize vagueness in our language, we can't avoid it entirely. Even many scientific categories are at least a slightly vague, if you look closely enough. For example, distinctions

between species in biology are not so precisely defined that every node in an evolutionary tree specifies a single generation where one species of organisms branched off from another. But that doesn't detract from the usefulness of those distinctions. So vague terms are often unavoidable; but this need not be a big problem because we are not always dealing with borderline cases. Many sentences containing vague terms are clearly true and informative, like "The Matterhorn is a mountain" or "My boots are red." Depending on our goals, being more precise in these cases might be unnecessarily verbose. However, vague terms can generate confusion if we fail to understand how they work. In particular, there is a temptation to assume that every useful or important category must have completely precise borders. This is mistaken, because the categories of mountains, chairs, and red things are useful and important despite being vague. Assuming that real and useful categories can't have borderline cases is a fallacy we'll call the sharp borders fallacy. For example, consider this argument: There is no such thing as friendship really, because you can't draw the line where two people suddenly become friends as they get to know and like each other more. The mistake here is to assume that a sharp border must distinguish relationships that are friendships from those that are not. And that would be to neglect the fact that the word friend is vague, as is the category it expresses. But friendship is still a useful category that we can use to make distinctions and convey information, and there are many clear examples of people who are friends with each other and people who are not. Likewise, suppose someone says: It makes no sense to talk about dangerous weapons, because there is just a range of weapons that are more and less capable of harm. If the speaker were simply saying that we should be more precise in classifying weapons, then this would not be a case of sharp borders fallacy . But if the speaker really means that it makes no sense to talk about dangerous weapons because there is no sharp cutoff point between dangerous and nondangerous weapons, then he is making a mistake. Again, there are items on either side of the distinction that can correctly and informatively be described as "dangerous" or "non-dangerous." Finally, consider this kind of argument:

Alice isn't rich. She started off poor and only made a little money each week. You can't say that one of those weeks she suddenly became rich. The assumption here seems to be that you can't go from non-rich to rich by increments, because that would require a precise point at which the person crossed the line from non-rich to rich. But again, this ignores the fact that with vague categories there can be clear cases on either side, despite the absence of a sharp borderline. So going from non-rich to rich does not require the crossing of any clear boundary.

Section Questions 3-9 Which of the following statements is not true?

A

In the metaphor about drawing category boundaries, ambiguity has to do with how many circles we need, while generality has to do with how many objects are inside the circle

B

If we choose to use a descriptive word, we should always be able to say exactly where to draw the line between things the word applies to and things it doesn't apply to

C

Even distinctions that are fairly well-defined, like species distinctions, are vague if you look closely enough.

D

Syntactic ambiguity has to do with more than one possible sentence structure that might have been meant by the speaker or writer

E

Some sentences containing vague words are clearly true.

3-10 It is good to minimize vagueness in our language...

A

because if we use vague terms we are committing the sharp borders fallacy

B

when we might be dealing with borderline cases

C

because real and important categories have sharp borders

D

because vagueness is a kind of ambiguity

Key terms Ambiguity: when it is unclear which word or sentence structure a speaker intends to express, because two words look or sound the same. (That is, the same spoken or written forms express different meanings.) See lexical ambiguity and syntactic ambiguity. Bare plurals: the use of a plural noun phrase without a determiner (e.g., “Canadians started laughing” instead of “Some Canadians started laughing”). Borderline cases: cases where it is unclear whether a category applies to an individual, and not because we need more facts about the individual (e.g., you might sit on something that seems somewhat like a chair and somewhat like a stool). Conclusion: the statement expressing the belief that our argument is meant to support. Conclusion indicators: expressions that often signal the presence of a conclusion (e.g., therefore, so, thus, as a result). Deductive argument: an argument whose premises are presented as entailing its conclusion. Deductive validity: an argument is deductively valid when its premises entail its conclusion. Directly perceptual beliefs: beliefs that are supported directly by perception, rather than by other beliefs. Implicit premise: a claim that is left unstated but which is taken for granted in making an argument. Inductive argument: an argument whose premises are presented as providing support for its conclusion but not entailing it. In these cases, the truth of the premises is not supposed to guarantee the

truth of the conclusion with certainty. Instead, the premises are just intended to provide support for the conclusion Inference: a single step in a line of reasoning. We take these steps in our reasoning to arrive at a belief, because we take that belief to be supported by other beliefs. Suppositional strength: a measure of how well the premises connect to the conclusion of an argument. The suppositional strength of an argument is a matter of how much evidence the premises would provide for the conclusion if we suppose them to be true. Interim conclusion: a conclusion that is supported by some premises in the argument, and that also provides support for a further conclusion of the argument. Lexical ambiguity: when it is unclear which of many possible words, which share the same written or spoken form, the speaker is intending to express (e.g., the absent-minded professor was arrested for not finishing his sentence). Premise indicators: expressions that often signal the presence of a premise (e.g., because, since, as, given that, seeing as). Premises: the statements expressing the supporting beliefs in an argument; i.e. those intended to support the conclusion. Principle of charity: when interpreting an argument, we should always try to identify the best version of the argument that the author could plausibly have intended. Self-evident beliefs: beliefs that are so obviously true that we don't know how we would even go about supporting them (e.g., “1+1=2” or “green is a color”). Sharp borders fallacy: assuming that real and useful distinctions with clear cases on either side can't also have borderline cases. Standard form: a representation of an argument in terms of an ordered series of declarative sentences, with each premise and the conclusion (or conclusions) labeled. Next to each conclusion, it should be noted which premises directly support that conclusion, and whether that support is inductive or deductive. Syntactic ambiguity: when it is unclear which sentence structure a speaker intends to express (e.g., man shoots neighbor with machete).

Vagueness: when a word, concept, or distinction has borderline cases (i.e., where it is unclear whether the word/concept/distinction applies).

Footnotes [1] See Fischhoff, Slovic, & Lichtenstein (1977); Lichtenstein, Fischhoff & Phillips (1982); Russo and Schoemaker (1992); Klayman, Soll, Gazalez-Vallejo & Barlas (1999). [2] As linguists and philosophers use the terms, the opposite of generality is specificity, the opposite of vagueness is precision, and the opposite of ambiguity is... non-ambiguity.

References Fischhoff, B., Slovic, P., & Lichtenstein, S. (1977). Knowing with certainty: The appropriateness of extreme confidence. Journal of Experimental Psychology: Human perception and performance, 3(4), 552. Klayman, J., Soll, J. B., Gonzalez-Vallejo, C., & Barlas, S. (1999). Overconfidence: It depends on how, what, and whom you ask. Organizational behavior and human decision processes, 79(3), 216-247. Lichtenstein, S., Fischhoff, B., & Phillips, L. (1982). Calibration of probabilities: The state of the art to 1980. D. Kahneman, P. Slovic, and A. Tverski (Eds.) Judgement under uncertainty: Heuristics and Biases. Russo, J. E., & Schoemaker, P. J. (1992). Managing overconfidence. Sloan Management Review, 33(2), 717.

Image Credits

Exported for Jason Bao on Tue, 12 Sep 2023 21:20:17 GMT

Reason Better: An interdisciplinary guide to critical thinking

Chapter 4. Entailment

Introduction As we have seen, an argument has the highest degree of suppositional strength when we know that its premises entail its conclusion. If learn that the premises not only entail the conclusion but also are true, then the conclusion is inescapable. However, it's not always obvious whether an argument's premises entail its conclusion. In this chapter, we will consider some methods and tests that can help us identify entailment, and investigate some common reasons why entailment gets misdiagnosed.

Learning Objectives By the end of this chapter, you should understand: what entailment is, and what logical forms are how to evaluate an argument by flipping it how to identify the sentential and predicate form of simple arguments why there are deductively valid argument without deductively valid forms the relevance of biased evaluation to assessing entailment the forms modus ponens, modus tollens, hypothetical syllogism, disjunctive syllogism, affirming the consequent and denying the antecedent.

4.1 Deductive validity As we saw in the last chapter, we say that some premises entail a conclusion when they meet this condition: if the premises were true, the conclusion would also have to be true. Also, we saw that an argument in which the premises are presented as entailing the conclusion is called a deductive argument. Let's add one more piece of terminology: an argument in which the premises actually do entail the conclusion (rather than just being presented as if they do) is called deductively valid. So, in a deductively valid argument, the truth of the premises guarantees the truth of the conclusion. It's important to stress, though, that the kind of guarantee we are talking about is an absolute guarantee, not just a strong support. For example, consider these two arguments:           Every raven ever encountered by anyone has been black.           So, the next raven we see will be black.           Only 1 in a million ravens is not black.           So, the next raven we see will be black. Neither argument is deductively valid. These arguments may have great suppositional strength, but in neither case does the premise absolutely guarantee the truth of the conclusion. In both cases, it could happen that the premise is true and the conclusion false. That may be very unlikely, but it's still possible.

When premises entail a conclusion, there is no possible situation in which the premises are true and the conclusion false. So, in evaluating whether premises entail a conclusion, we don't take for granted anything else we happen to know about the world. Instead, we pretend that all we know about the world is that the premises are true. For example, "Some cars are red" does not entail "Some people drive red cars" because that would mean assuming that people drive red cars. It would be a very strange world if there were red cars but nobody ever drove any of them—but that situation is not impossible. Often, of course, when someone presents an argument, they do want us to take for granted something they take to be an obvious truth about the world. In that case, it's not helpful to point out that the premises they actually spoke out loud don't entail the conclusion. Instead, we should treat them as having intended an additional implicit premise; the resulting argument may be deductively valid after all. And this is important to realize, because we may really disagree about the truth of the implicit premise, not whether the explicit premises entail the conclusion.

Step by step If an argument's premises entail its conclusion and its premises are true, then the conclusion is inescapable. But some hard thinking may be needed to realize that the conclusion is inescapable, because it is sometimes hard to tell whether the premises are true, or whether they entail the conclusion. One kind of case where it's difficult to see that the premises entail the conclusion is when the argument is very complex. Many "logic puzzles," including those on some standardized tests, are like this. For example, consider this argument:         Some monkeys are cute.         Only friendly animals are eating veggies.         No friendly animals are cute.         So, some monkeys are not eating veggies. For most of us, some effort is required to work out that this is a case of entailment. We can do this by actively imagining that all the premises are true, which requires holding several different claims in mind at the same time, and then seeing if they guarantee the conclusion. With a complex argument, it's often too hard to hold all the premises in mind at once in order to suppose that they are true. So it's often easier to take the argument in several steps. In the monkey

argument, for example, we can take the premises two at a time and see if any pair guarantees an interim conclusion. In particular, take the first and third premises.          Some monkeys are cute.          No friendly animals are cute. From these two premises, we can conclude that some monkeys are not friendly animals. We can now try using this interim conclusion along with the second premise, and seeing what they entail together:          Some monkeys are not friendly animals.          Only friendly animals are eating veggies. It's now easier to see that these entail premises the conclusion, namely that some monkeys are not eating veggies. Even if it took us multiple steps to get here, we now know that the premises of the original argument do entail its conclusion.

Flipping the argument In addition to taking an argument step-by-step, it sometimes helps to try flipping the argument. This means starting with the conclusion instead of the premises. When we consider an argument in the forwards direction, we suppose all of its premises to be true, and ask whether the conclusion must then be true. When we flip the argument, we suppose its conclusion to be false, and ask whether it's possible for all the premises to be true. If they can still be true even though the conclusion is false, they don't really entail the conclusion. The key point to understand here is that the following two features of an argument amount to the same thing: If all the premises were true, the conclusion would have to be true. If the conclusion were false, the premises could not all be true. If you don't see why these amount to the same thing, it's worth taking a minute to ponder until you do. Imagine a single premise that entails a conclusion—for example, This horse is a mare entails This horse is female. We know that if a horse is a mare, then that horse must be female. But that also means that if the horse is not female, the horse is not a mare. If the conclusion is false, the premises can't all be true—and in this case, there's just one premise.

Flipping the argument can be useful when the argument is hard to evaluate in the forwards direction. It can also be useful as a way of double-checking for entailment. Let's consider a simple example:          All mosses are plants that Bob loves.          Some plants that Bob loves are ferns.          So, some ferns are mosses. Let's try flipping the argument. Suppose the conclusion is false. That means no ferns are mosses. Could both premises still be true? The premises say that all mosses—and some ferns—are plants that Bob loves. But that could be true even if none of the ferns are mosses. (After all, the premises don't say that all the plants Bob loves are mosses!) So the premises could be true even if the conclusion were false. But that means the premises don't entail the conclusion.

Section Questions 4-1 Consider this argument: _ Bob is 13 feet tall So, Bob is taller than most other people _ Suppose there are no implicit premises: this is the whole argument. Which of the following is true about this argument?

A

The premise entails the conclusion because, if the premise is true, the conclusion has to be true

B

The premise does not entail the conclusion because, if the premise is true, the conclusion could still be false (setting aside facts not supplied by the premise)

C

The premise entails the conclusion because, setting aside facts not supplied by the premise, there is no way the premise could be true and the conclusion false.

D

The premise does not entail the conclusion because it has no suppositional strength.

4-2 Consider this argument: _ All dogs bark No fish are dogs So, no fish bark _ What would it mean to "flip" this argument?

A

Suppose that no fish bark, and ask whether it could also be true that all dogs bark and no fish are dogs. The answer is "yes", so the premises do not entail the conclusion.

B

Suppose that at least some fish bark, and ask whether it could also be true that all dogs bark and no fish are dogs. The answer is "no", so the premises entail the conclusion.

C

Suppose that at least some fish bark, and ask whether it could also be true that all dogs bark and no fish are dogs. The answer is "yes", so the premises do not entail the conclusion.

D

Suppose that no fish bark, and ask whether it could also be true that all dogs bark and no fish are dogs. The answer is "no", so the premises entail the conclusion.

4.2 Logical form It would be nice if there were an easy way to test for entailment, so we wouldn't have to think hard about whether the premises could be true and the conclusion false. Can't we just identify entailment by the general shape of an argument? Well, I have some good news and some bad news. The good news is there are some easy tests that can sometimes identify entailment. These involve checking whether an argument has a certain kind of form —and that any argument that fits the form has premises that entail its conclusion. The bad news is that this won't help us identify every case of entailment. In fact, there is no easy test for entailment that will identify every case.

Argument recipes Let's start by focusing on the good news. When we say that two arguments have the same form, we mean they both follow a kind of general recipe for constructing arguments. For example, consider these two arguments:

        If mice can talk, then I'm a wizard.         Mice can talk.         So, I'm a wizard.         If the moon is up, the birds are singing.         The moon is up.         So, the birds are singing. In both arguments, the premises entail the conclusion. But they have something else important in common—in both, the entailment has to do with the sentential connective if...then.... A sentential connective combines two sentences to form a larger one; and in the case of if...then..., we call the larger sentence that is formed a conditional. In a conditional, the sentence that immediately follows if is called the antecedent, and the sentence that immediately follows then is called the consequent. The mice argument and the moon argument both contain a conditional premise, but they also fit a particular recipe for constructing arguments, which we can describe as follows: Components: Any two sentences The sentential connective if... then... Method: For the first premise, link the two sentences with the connective to form a conditional. For the second premise, use the antecedent of the conditional. For the conclusion, use the consequent of the conditional. This recipe allows for any two sentences to be combined this way to form an argument. And any argument that follows this particular recipe will share certain general features, which we can illustrate by using the variables P and Q to stand for the two sentences:          If P then Q          P         _________          Q This schema of an argument, or its logical form, is really just a simple way of presenting the recipe given above. (Note that although we've arbitrarily chosen an order for these two premises, the order doesn't matter.) Any argument we can create by substituting two sentences for P and Q has this logical form.

This particular logical form is called modus ponens. And it has an important feature: regardless of which two sentences we substitute for P and Q, the premises will entail the conclusion. We can see this just by knowing the positions of those two sentences, and understanding what the words if and then mean. So we can tell that any argument that has the form of modus ponens will be deductively valid. Consider a case where we don't even understand the sentences that substitute for P and Q. Here's an argument with the form of modus ponens using real but seldom-known English words:          If the road is anfractuous, I shall divagate.          The road is anfractuous.          So, I shall divagate. We don't need to know what the words anfractuous and divagate mean in order to see that the premises entail the conclusion—as long as the premises aren't nonsense. All we really need to know is that the argument is made from two meaningful sentences combined in the right way using if, then, and so. If it has the form of modus ponens, it's deductively valid. When we have a logical form like modus ponens—one that guarantees deductive validity—we call it a deductively valid form. As we will see, there are many deductively valid forms, all of which can help us identify entailment. Before looking at other deductively valid logical forms, though, let's return briefly to the bad news. Even though logical forms can help us identify many deductively valid arguments, they can't help us identify them all. And that's because some arguments are deductively valid even though they lack a deductively valid form. That is, sometimes premises entail their conclusion, even when they don't have any form that is a general and reliable recipe for entailment. We'll look at some examples once we've become more familiar with various logical forms. The point for now is that logical forms can often be used to diagnose entailment, but can never be used to diagnose the absence of entailment.

Some deductively valid sentential forms Now let's look at some other deductively valid forms that involve replacing whole sentences with variables. The next form uses two conditionals in a row, linked by a single sentence that is the consequent of one premise and the antecedent of the other. This one is called hypothetical syllogism:          If P then Q          If Q then R

       _________        If P then R This is pretty straightforward, and we can see why it has to be a deductively valid form because of modus ponens. If both premises are true, then we can see that if P were also true, we would have to conclude R—simply by applying modus ponens twice. The conclusion captures this fact about the two premises: knowing that they are true would allow us to conclude R from P. Here's an argument with the form of a hypothetical syllogism:         If Bob studies hard, he'll do well on the test         If Bob does well on the test, he'll be smiling         So, if Bob studies hard, he'll be smiling. Note that the conditionals we're using in this chapter must be unqualified in order for their forms to be deductively valid. They can't be hedged with probably, as in, for example, If P then probably Q, or even If P then almost certainly Q. This is especially clear in the case of hypothetical syllogism. It could be that if P, there is a greater than 50% chance of Q; and if Q, there is a greater than 50% chance of R; and yet it's not true that if P, there is a greater than 50% chance that R. (Maybe P is 'it's raining here', Q is 'it's raining one town over', and R is 'it's raining two towns over'.) As we'll see in a later chapter, this is due to the way that probabilities combine. Despite this, if an argument genuinely has the form of a hypothetical syllogism— which requires that its conditionals are not hedged—we can be sure that if all the premises are true, the conclusion is also true. Next, here is a deductively valid logical form called modus tollens:         If P then Q            Not Q         _________         Not P Like the recipe for modus ponens, the recipe for modus tollens calls for any two sentences and the sentential connective if... then.... But it also calls for not, which is shorthand in this logical form for any way of forming the negation of another sentence. The negation of a sentence is true when the original sentence is false, and false when the original sentence is true. Sometimes the negation of a sentence can simply be formed by putting the word not in

front. For example, the negation of the sentence All poppies are red can be formed simply by putting not in front: Not all poppies are red. But sometimes negating a sentence doesn't work that way. For example, to form the negation of some poppies are red we can't just put not in front. Not some poppies are red is not even grammatical! Instead, we have No poppies are red or There are no poppies that are red, or All poppies are not red. (But be very careful about that last construction, because in informal English it is syntactically ambiguous between meaning that no poppies are red and meaning that not all poppies are red—i.e., that some poppies are not red.) When the logical form of modus tollens calls for Not P, it's really calling for any way of negating P. So, for example, here's an argument with the form of modus tollens:         If the poppies are in bloom, Spring is over         Spring is not over         So, the poppies are not in bloom In this example, not is used to form the negation of P and Q, but not just by putting it in front of the whole sentence. And if you give it enough thought, you'll see that the argument's premises entail its conclusion. Indeed, any argument with the form of modus tollens will entail its conclusion: so modus tollens is a deductively valid logical form. So far, we've seen logical forms involving conditionals, but our next form involves disjunctions. The disjunction of two sentences P and Q is a sentence that is true as long as either P is true or Q is true, and false only when both P and Q are false. Sometimes we form the disjunction of two sentences by joining them with either... or..., but disjunctions can be formed in other ways. For example, to form a disjunction from Bob danced and Alice danced, we could say Either Bob danced or Alice danced, but we could also use the simpler sentence Bob or Alice danced. Either way, the disjunction is true as long as at least one of them danced, and false only if neither danced. (Depending on how you say it, this disjunction can also suggest that they didn't both dance; but let's ignore that and assume that P or Q is true when P and Q are both true.) [1] A common deductively valid logical form using disjunction is illustrated by the following argument:        The sun is either rising or setting          The sun is not rising          So, the sun is setting.

You can tell that the argument above entails its conclusion. But another way to identify the entailment is to notice that it has the deductively valid form known as disjunctive syllogism. The form of disjunctive syllogism is this:          Either P or Q          Not P          _________          Q Note that this also works if the negated premise is Q and the conclusion is P: if either component sentence in the disjunction is false, you can conclude that the other is true.

Some deductively valid predicate forms So far, we've only looked at deductively valid logical forms containing variables that take the place of entire sentences. But consider this argument:         All dogs are mammals.         Fido is a dog.         So, Fido is a mammal. The premises entail the conclusion. In fact, any argument formed by replacing the words dog, mammal, and Fido with other words in the same grammatical category would also entail its conclusion. For example:         All toasters are time-travel devices.         Fred is a toaster.         So, Fred is a time-travel device. Of course, this argument is a weird one: the first premise is obviously false, and who names their toasters? Still, if the premises were true, the conclusion would have to be true. The two arguments we just considered entail their conclusions for the same reason: they seem to share a form. But if we replace all the non-compound sentences in these arguments with variables, the form we get is:

        P         Q         ______         R This form is absolutely not deductively valid. So, to illustrate the structural feature that the two arguments above share, we can replace parts of each sentence with variables. In particular, we'll hold fixed the words they share and replace the common nouns with variables, using F, G, H... rather than P, Q, R... in order to clarify that they are not substituting for whole sentences. And for any proper name, we'll use a lower-case n. This gives us the following deductively valid logical form:          All F are G          n is an F          _________          n is a G Any argument formed by replacing F and G with common nouns, and n with a proper name, will be one in which the premises entail the conclusion. But wait: what happens when we're not just using common nouns? For example, All dogs are mammals contains two nouns, but what if the first premise had been All dogs bark or All dogs are happy? Logicians solve this by recasting those sentences slightly to fit the mold:         All dogs bark —> All dogs are barking things         All dogs are happy —> All dogs are happy things This way, barking thing and happy thing count as substitutions for G in the deductively valid form given above. In other words, we can treat them as predicates in our logical form. (This is why logical forms that use variables for predicates and names are called predicate forms, and logical forms that use variables for entire sentences are called sentential forms.) There are many deductively valid predicate forms, and it's worth looking at some of them and making sure you understand why they're deductively valid. (Two important points for how these forms are used:

as used in these forms, some F are G means at least one F is G, and does not guarantee that some F are also not G. Also, every name is assumed to refer to a real object.)

These are all deductively valid logical forms, and there are many others. We won't be giving them names in this text, because it's unrealistic to memorize a long list of deductively valid logical forms in the hopes of recognizing arguments with those forms in everyday life. If we do encounter arguments with these forms, it's easier just to assess whether the premises guarantee the conclusion! However, it is useful to acquire the skill of quickly identifying entailment. And one aspect of that skill is being able to notice when an argument has a deductively valid form. The best way to acquire this ability is not to memorize a list of deductively valid forms: it's to repeatedly engage with arguments and logical forms until you begin noticing more easily which forms are deductively valid—that is, to practice. The forms above are listed so that you can spend some time trying to understand why each is deductively valid. (In some cases, it may be easier to understand why a form is deductively valid if you flip it: assume the conclusion is false, then see why the premises could not then be true.) It's good practice.

Counterexamples If we are evaluating the claim All dogs bark, all that we need in order to show that the claim is false is to find a dog that doesn't bark. More generally, when faced with a universal claim—one that says about a class of things that they all have some feature—all it takes for that claim to be false is for one of the things in that class not to have that feature. We call this a counterexample. If Fido is a dog that doesn't bark, then Fido is a counterexample to the claim that all dogs bark.

So how is this relevant to logical forms? If a logical form is deductively valid, then every instance of it with true premises has a true conclusion. So all it takes to show that a logical form is not deductively valid is a single instance in which the premises are true and the conclusion false. That would be a counterexample to the claim that the logical form is deductively valid. For example, consider this logical form:        All F are G          Some G are H          _________           Some F are H Is this form deductively valid? One way to assess this question is to see if we can come up with a counterexample to it. If so, then it's not deductively valid. If we can't—well, we might be missing something. Our inability to come up with a counterexample doesn't prove the absence of counterexamples. Luckily, in this case our job is easy. We can find a simple example of an argument that is an instance of this form in which the premises are true and the conclusion false: All dogs are mammals. Some mammals are wombats. _________ Some dogs are wombats. Clearly, the premises are true and the conclusion false. So this argument is not deductively valid. And it's an instance of the logical form above. Any time we can provide a counterexample like this, it demonstrates that the logical form is not deductively valid. We can do this with sentential forms as well. Consider:        If P and Q, then R          Not R          _________           Not P Is this a valid deductive form? Well, let's try to come up with an instance in which the premises are true and the conclusion false. It's not very difficult to do:

         If Fido is a dog and Fido is good, then Fido gets treats.          Fido does not get treats.          _________           Fido is not a dog. Something has gone badly wrong here: the premises can be true even if the conclusion is false (e.g., in the case where Fido is a bad dog and doesn't get treats). So the argument is not deductively valid. But it's an instance of the form above, so it's a counterexample to the deductive validity of the form.

The limits of logical form Let's recap this section. While seeking an easy way to identify entailment, we noticed that some arguments share structural features that guarantee entailment—namely, deductively valid logical forms. We then considered two types of deductively valid logical forms: some that replace entire sentences with variables, and some that replace only predicates and names with variables. We can now be clearer about the "good news" and the "bad news" mentioned at the start of this section. The good news is that deductively valid forms can often help us identify entailment. And it's possible, with practice, to get good at recognizing deductively valid logical forms. The bad news is that not every case of entailment involves an argument with a deductively valid form. Consider, for example, these two arguments:         Bob was killed         So, Bob died            Alice is older than I am         So, I am younger than Alice is In both cases, the premise entails the conclusion, but neither argument seems to be an instance of a general reliable recipe for deductive validity. In other words, they don't have deductively valid forms—at least not forms that can be constructed using the two methods of generating logical forms that we've encountered [2]. So using a deductively valid form, if you do it properly, will always ensure that the premises entail the conclusion. And checking arguments for deductive validity, if you do it properly, will never lead you to falsely identify an argument as deductively valid. But that test will miss any deductively valid arguments that don't have deductively valid forms.

Section Questions 4-3 Consider this argument: _ If there's a rainbow in the sky, I'll have good luck tomorrow There's a rainbow in the sky. So, I'll have good luck tomorrow. _ This argument has the form of:

A

Modus ponens

B

Modus tollens

C

Disjunctive syllogism

D

Hypothetical syllogism

4-4 Consider this argument: _ If I'm tall, then I'm a good basketball player I'm not a good basketball player So, I'm not tall _ Is this argument deductively valid? Does it have any of the deductively valid forms given in the text?

A

The argument is not deductively valid

B

The argument is deductively valid, but it doesn't any of the deductively valid forms given in the text

C

The argument is deductively valid, and has the form of modus ponens

D

The argument is deductively valid, and has the form of modus tollens

4-5 Which of the following statements is true?

A

If an argument has a deductively valid form, its premises entail its conclusion

B

If an argument's premises entail its conclusion, the argument has a deductively valid form

C

The disjunction of two sentences is only true if both sentences are true

D

Even if an argument has the form of modus ponens, its premises may not entail its conclusion

4-6 Consider this argument: _ Some F are G No G are H _ Which conclusion would make this a deductively valid logical form?

A

No F are H

B

Some H are F

C

All G are F

D

Some F are not H

4-7 Consider this argument: _ If Jane likes Hugo, she will text him. She will text him. So, she likes him. _ This argument has the form "If P, then Q. Q. So, P". Which of the following is a counterexample clearly showing that this form is not deductively valid?

A

If the history books are right about his height, Lincoln was tall. Lincoln was tall. So the history books are right about his height.

B

If the history books are right about his height, Lincoln was tall. The history books are right about his height. So Lincoln was tall.

C

If Hilary Clinton is president, then someone is president. Someone is president. So Hilary Clinton is president.

D

If Hilary Clinton is president, someone is president. Hilary Clinton is president. So, someone is president.

4.3 Pitfalls In this section, we'll mainly be considering cases where we think an argument's premises entail its conclusion even though they don't. But first, let's consider the opposite kind of case, in which we think an argument's premises do not entail its conclusion even though they do.

Overlooking deductive validity Let's consider three reasons why entailment might get overlooked. First, we might check to see if the argument has a deductively valid logical form, and if it doesn't, conclude that its premises don't entail its conclusion. As we just saw at the end of the previous section, though, that would be a mistake. Second, if an argument has false or poorly supported premises, we might be tempted to assume that it can't be good in any other respect either. The key here is to remember that there are two separate elements of a good argument: how well the premises link to the conclusion, and how well-supported

the premises are. Entailment only concerns the first of these, so even obviously false premises can entail a conclusion. Third, sometimes arguments are just complex or tricky. Even some of the simple deductively valid forms we've been considering are easier to recognize as deductively valid than others. (For example, when it comes to recognizing entailment, people have a much easier time with modus ponens than with modus tollens.) [3] To avoid cases like this, it's crucial to slow down. If an argument is complex, take it step by step. If it's unclear whether the premises guarantee the conclusion, try flipping it. And if you have a hard time recognizing whether a particular type of argument has a deductively valid form, try repeatedly thinking through arguments of that kind until the pattern sinks in.

Biased evaluation Let's turn now to cases where we mistakenly take an argument to entail its conclusion. One reason for this is our old friend biased evaluation, which we noticed in the last chapter can apply to deductive arguments. For example, is the form of this argument deductively valid?         All blueberries are brightly colored         Some healthy things are brightly colored         So, some blueberries are healthy It's easy to be fooled. We already accept the conclusion, which in turn makes us inclined to find every aspect of the argument good, even though we're consciously aware that entailment is unrelated to the truth of the conclusion. If we look closely, though, it should be clear that the premises don't actually guarantee the conclusion. (They allow for the possibility that none of the healthy and brightly colored things are blueberries.) Interestingly, when the same form of argument is presented with nonsense predicates, people are far less likely to think the premises entail the conclusion [4]:          All snorgs are approbines          Some jamtops are approbines          So, some snorgs are jamtops With no subconscious tug telling us that the argument is good because its conclusion is true, we can now assess whether the premises actually entail the conclusion. Using nonsense predicates forces us to

decouple our evaluation of the argument from our assessment of its conclusion—and the same is true for the evaluation of logical forms, since they abstract away from any actual conclusions. Of course, it becomes more obvious that this form of argument is not deductively valid when we consider this example:           All blueberries are brightly colored           Some oranges are brightly colored           So, some blueberries are oranges Not only is it possible for the premises to be true and the conclusion false, it's actually the case here! So this is a clear counterexample to the deductive validity of the form.

Some deductively invalid forms Despite this, logical forms can sometimes seem deductively valid when they're not. Many of the errors in reasoning that are traditionally called "fallacies" involve cases where we get fooled into thinking that a logical form is deductively valid. Let's consider a few well-known examples. The logical form known as affirming the consequent can be deceiving, because it looks a lot like modus ponens: in fact, it's modus ponens in reverse, which is not a deductively valid form:

Here is an example that might seem good at first:         If this plant is a fern, it has spores.         It has spores.         So, this plant is a fern. But this argument actually has true premises and a false conclusion. As it happens, ferns are not the only plants that have spores. Mosses do too, for example! So just from these premises, we can't conclude that the plant is a fern.

Another well-known deductively invalid form is denying the antecedent, which is like modus tollens in reverse:

An example would be:         If this plant is a fern, it has spores.         It's not a fern.         So, it doesn't have spores. It's true that ferns have spores, but (for example) mosses have spores too. So if the plant at hand turns out to be a moss, the premises will be true and the conclusion false. The argument form is not deductively valid.

Section Questions 4-8 Identify which of the following is a case of entailment:

A

If Napolean got married, he conquered Alsace. Napolean did not conquer Alsace. So, he got married

B

All foxes are mammals. Some mammals are hedgehogs. So, some foxes are hedgehogs

C

If Napolean got married, he conquered Alsace. Napoleon did not get married. So, he did not conquer Alsace.

D

Every mouse is larger than my cat. Freddie is a mouse. So Freddie is larger than my cat.

4-9 This chapter discusses how biased evaluation can incline us to...

A

think an argument is deductively valid because it has a deductively valid form

B

think an argument is deductively valid because we already accept the conclusion

C

look closely and notice that an argument's premises don't entail its conclusion

D

evaluate a logical form by looking for an argument with that form that actually has true premises and a false conclusion

4-10 Given the premises: _ If groks are tibbs, then snurfs are tibbs Snurfs are not tibbs. _ Which of the following can we conclude?

A

Groks are not tibbs

B

Groks are snurfs

C

Groks are tibbs

D

none of the above

Key terms Affirming the consequent: a deductively invalid logical form where P is inferred from the premises If P then Q and Q. Antecedent: in a conditional, the clause that expresses the condition—that is, the clause immediately following if. Conditional: a statement composed of two sentential clauses joined by if...then.... (Sometimes then is omitted, as in: If Bob starts dancing, I will leave.) Consequent: in a conditional, the clause that expresses what is said to follow if the antecedent is true. The consequent usually comes right after "then". Counterexample: an example that shows that a universal claim is false: for example, if Betty is happy, then Betty is a counterexample to the claim Everyone is unhappy—assuming the speaker is really talking about everyone! As we know, for a logical form to be deductively valid, every instance must be deductively valid. So a logical form can sometimes be shown to be deductively invalid by providing a counterexample to this universal claim—namely, an instance of it that is not deductively valid. Such a counterexample will be particularly vivid if the premises are actually true and the conclusion false. Deductively valid argument: what makes an argument deductively valid is that its premises entail its conclusion: if the premises were true, the conclusion would have to be true. Deductively valid logical form: what makes a logical form deductively valid is that every argument with that form is deductively valid. Denying the antecedent: a deductively invalid logical form where not Q is inferred from the premises If P then Q and not P. Disjunction: the disjunction of two sentences P and Q is a sentence that is true as long as either P is true or Q is true, and false only when both P and Q are false. It can be formed by combining P and Q with either...or.... Disjunctive syllogism: deductively valid logical form where Q is inferred from the premises Either P or Q and not P.

Flipping the argument: assuming that the conclusion is false, and asking whether all of the premises could still be true. If so, the premises do not entail the conclusion. Hypothetical syllogism: deductively valid logical form where If P then R is inferred from If P then Q and If Q then R. Logical form: A structure that can be shared by different arguments; it can be illustrated by replacing certain words or sentences with variables until the argument are the same. Modus ponens: deductively valid logical form where Q is inferred from the premises If P then Q and P. Modus tollens: deductively valid logical form where not P is inferred from the premises If P then Q and not Q. Negation: the negation of a sentence is true when the original sentence is false, and false when the original sentence is true. Often formed by inserting not into the sentence that is being negated.

Footnotes [1] This is the definition of what is called an inclusive disjunction, which is true only when either one or both of the two disjuncts is true. An exclusive disjunction would be true only when one but not both of the two disjuncts is true. [2] Of course, if we let ourselves substitute only the name and keep the predicates, there is nothing to stop us from pointing out that any argument with the form "n was killed; so, n died" entails its conclusion. But this "form" is so specific that it only applies to an extremely narrow range of arguments: it's not a very useful generalization about arguments! Note also that we haven't considered all the formal systems that people use to generate logical forms: e.g., some that generalize on the deductive validity of sentences like "Alice lifted that box; So, it is possible to lift that box". [3] See Marcus & Rips (1979). [4] See e.g., Evans, Barston & Pollard (1983), Stanovich & West (2008).

References Evans, J. S. B., Barston, J. L., & Pollard, P. (1983). On the conflict between logic and belief in syllogistic reasoning. Memory & Cognition, 11(3), 295-306. Marcus, S. L., & Rips, L. J. (1979). Conditional reasoning. Journal of Memory and Language, 18(2), 199. Stanovich, K. E., & West, R. F. (2008). On the relative independence of thinking biases and cognitive ability. Journal of Personality and Social Psychology, 94(4), 672.

Image credits Banner image of wooden steps with moss: image by Sasin Tipchai licensed under CC0, cropped from original; Lake with pastel colors: image licensed under CC0; Monkey: image by Jimmy Chan licensed under Pexels license, cropped from original; Moon and tree: image by David Besh licensed under Pexels license; Winding road: image by Tobias Aeppli licensed under Pexels license, cropped from original; Poppies: image by Suzanne Jutzeler, licensed under CC0; Desert and sunrise: image by Patricia Alexandre licensed under CC0; Toaster: image licensed under CC0; Puppy: image licensed under Pexels license; Wombat: image by pen_ash licensed under Pixabay license. Blueberries: image by Beth Thomas, licensed under CC0; Fern: Image by Albina, licensed under CC0.

Reason Better: An Interdisciplinary Guide to Critical Thinking © 2019 David Manley. All rights reserved version 1.4 Exported for Jason Bao on Tue, 12 Sep 2023 21:20:17 GMT

Exported for Jason Bao on Tue, 12 Sep 2023 21:20:53 GMT

Reason Better An Interdisciplinary Guide to Critical Thinking

Chapter 5. Evidence

Introduction This chapter explores the nature of evidence, and why we sometimes don't see the whole picture when we make observations. In §5.1, we'll learn how to use two tests: one to assess whether something is evidence for a claim, and the other to assess how much evidence it provides. We'll also reflect on the nature of probability used in our judgments about evidence. In §5.2, we'll look at some common ways that we are misled by our evidence—in particular, when the evidence available to us is being filtered in a way that makes it unreliable. And in §5.3, we'll consider how our picture of the world can be skewed by our sources of information. In particular, we'll examine the biases inherent in the news media, social media, and academic journals.

Learning Objectives By the end of this chapter, you should understand: how to assess the strength of evidence provided for a hypothesis what is meant by "E", "H", "P", the vertical bar, and the tilde in the notation of probability what is meant by "probability" in the evidence and strength tests what selection effects are, and how to identify specific kinds, such as survivor and attrition bias, selective recall, selective noticing, and media biases why our standard news diet gives rise to an inaccurate picture of the prevalence of scary events, and the probability that we will be harmed by various threats

5.1 What is evidence? The word "evidence" might call to mind a detective examining fingerprints through a magnifying glass, and that's a good example of someone learning from evidence. But picture these other scenarios too: a physicist observes how light behaves as it passes by very massive objects, and notes that it fits better with a certain theory of space-time than with rival theories wondering why an engine is overheating, a mechanic checks for exhaust in the radiator, a sign that the head gasket is blown a literary scholar notices a similarity in imagery between a novel and the poetry of Hesiod, indicating that the novelist is alluding to classical mythology a paleontologist examining sharp stones near a Neanderthal site spots chip marks that suggest they were being used as tools walking by my friend's home, I see his bike locked up outside. He usually takes his bike when he goes out, so I decide he's probably home. These are all examples of situations where evidence is identified, and a person's degrees of confidence is adjusted in response. From archeology to zoology, as well as in everyday life, responsiveness to evidence is at the heart of reasoning. And regardless of the discipline, the same general rules of evidence

apply. These rules tell us what makes an observation count as evidence, what makes one piece of evidence stronger than another, and how we should update our beliefs when faced with new evidence.

The evidence test We have already seen how important it is to search for a full range of possibilities when reasoning. In this chapter, we'll assume that a thorough search has been conducted and we're now considering evidence for each of the competing possibilities. In each of the examples above, a possibility is under investigation—we'll call that a hypothesis—and a new fact is discovered that is potentially evidence for that hypothesis. In the example with the bike, my hypothesis is that my friend is home. And my evidence is something like I saw my friend's bike locked outside his home at 6pm. (When specifying our evidence, it's best to be as specific as possible.) Once we have identified a new fact that is potentially evidence for or against some hypothesis, we can run two tests to assess it. The first test will determine whether a new fact is actually a relevant piece of evidence at all. Here, we must set aside the fact that we've already learned the evidence, and ask ourselves how likely that evidence would be if we supposed the hypothesis to be true. In the bike scenario, I first ask myself how likely it is that his bike would be here if I suppose that he's home. And then I ask how likely it is that his bike would be here if I suppose that he's not home. Comparing these two things gives us our test. Using "H" to stand for the hypothesis, and using the standard term "given" instead of "supposing," we simply ask:        The evidence test: Is this more likely given H than given not-H? Suppose I know that my friend usually takes his bike with him when he goes out. So, given that he's home, I'm more likely to see the bike locked up outside. So, seeing his bike locked up counts as evidence that he's home—it supports that hypothesis. Now suppose I knock on the door and my friend doesn't answer. That observation would be a piece of evidence that he's not home, because it's more likely given that he's not home. (This is true even if he rarely answers when he's home—he's even less likely to answer if he's not home at all!) Sometimes an observation doesn't support either hypothesis. Then the answer to the evidence test is "neither." For example, suppose that my friend leaves his porch light on all the time, regardless of

whether he is home. In that case, the light is just as likely to be lit given either hypothesis. So, seeing the light on gives me no evidence at all regarding whether he's home. In a case like this, we say that the observation is independent of the hypothesis. Going back to our list of examples, we can apply the evidence test in each case: Is this behavior of light more likely given that the theory of space-time is true? Are gases in the radiator more likely given that the head gasket is blown? Is this similarity more likely given that the novelist is alluding to classical mythology? Are these chip marks more likely given that the stones were being used as tools? If the observation is more likely given that the hypothesis is true, then it's at least some evidence for that hypothesis. Thus, learning that evidence should make us more confident in the hypothesis. In fact, that's our first rule of evidence:        The first rule of evidence:  If we get evidence for H, we should become more confident in H. Importantly, this does not mean we need to become far more confident in H! If the evidence only provides weak support for H, our degree of confidence may shift only very slightly. In general, how much more confident we should become depends on the strength of the evidence—and we have another test for that.

The strength test Some pieces of evidence are stronger than others. For example, suppose we get one piece of evidence in favor of a hypothesis and then two independent pieces against it. What should we think? Well, it depends on the strength of the evidence. If the first piece is twice as strong as each of the others, we should be back where we started; the evidence on each side balances out. So how can we assess the strength of our evidence? We tend to just rely on a vague feeling about how strong our evidence is, but we can do better: we can actually quantify its strength, allowing us to determine in a rigorous way how much stronger one piece of evidence is than another. There is a simple test to measure the strength of evidence—one that underlies all the various ways that evidence is evaluated by different disciplines—and it's the most important tool in this book:        The strength test:  How much more likely is this given H than given not-H?

In the bike example, I have to ask myself first how likely it is that the bike would be locked up given that he's home, and then how likely it is that the bike would be locked up given that he's not home. Suppose he usually leaves his bike locked up there if he's home, but rarely if he's not home. If I had to put a number to this difference, I'd estimate that it's about ten times more likely that his bike would be locked up if he's home than if he's not home. This gives us the answer to the strength test for this observation: the answer is "ten times more likely." So we now have a measure for the strength of this evidence: its strength factor is 10. (This is usually called the Bayes factor, but you don't have to remember that name.) Note the similarity between the strength test and considering the opposite from the chapter on mindset. We learned that, to avoid biased evaluation, it helps to ask ourselves how we would treat the evidence if we believed that the opposite view were true. One reason for this mental exercise is that we often make the mistake of treating something as evidence for our favored view just because it's what we would expect to observe if our view were true—forgetting to ask whether we'd also expect to see it if our view were false! Considering the opposite is a useful exercise for getting closer to the proper evaluation of evidence using the strength test, which requires assessing the likelihood of our observation given both H and its negation. To clarify our two tests, let's introduce just a bit of notation: "P" followed by a claim in parentheses indicates the probability/likelihood of that claim "E" stands for our observation, which is potential evidence The upright bar ( | ) means assuming/given The tilde ( ~ ) means not We can restate our three rules using this notation:        The evidence test: Is P( E | H ) greater or less than P( E | ~H )?        The first rule of evidence: If P( E | H ) > P( E | ~H ), learning E should make us more confident in H.        The strength test:  How many times greater is P( E | H ) than P( E | ~H )? Note that we need a comparative answer for the strength test: to get the strength factor, we divide P( E | H ) by P( E | ~H ). The greater the resulting number, the stronger the evidence it provides. (If you get a

strength factor of less than 1, then that means the observation was less likely given H, so it's actually evidence against H.) A traditional but arbitrary threshold for "strong evidence" is a strength factor of approximately 10. Although our answer to the strength test tells us how much evidence E provides for H, it doesn't directly tell us how confident we should become in H after learning E. That question will be addressed in Chapter 8. (Except when the probability of the evidence given H or ~H is zero, in which case we can be sure the alternative is true.) So don't make the mistake of assuming that if E has a strength factor of 10 for H, we should become 10 times more confident in H. That wouldn't make sense: for example, if we started off 80% confident in H, we can't become 800% confident in H! For now, we'll just use strength factors to assess the relative strength of pieces of evidence. In the most extreme case, E could only have occurred if H were true, so learning E should make us certain that H is true. For example, suppose I knock on my friend's door and he opens it from inside. Then I know for sure that he's home! However unlikely it was that he would open the door if he was home, he couldn't have done it at all if he wasn't. Treating the fact that he answered the door as E, in this case P(E | ~H) is zero, but P(E | H) is greater than zero. Technically, if we apply the strength test, we get an undefined answer because we can't divide by zero. So we need to add the condition that in such a case E counts as maximally strong evidence for H. It's worth stepping back to see how we can apply these ideas when we are evaluating an argument. As we saw in Chapter 3, when we consider premises offered in support of a conclusion, we have to evaluate whether they are true, and also how much support they provide for the conclusion. We can now see how this relates to the strength of evidence. When we know that some premises are true, we can treat them as evidence and then evaluate how strongly that evidence supports the conclusion using the strength test. The stronger the evidence provided by the premises (assuming we know them to be true), the greater the argument's strength. To take a case in point, suppose we are certain of the premises, and together they entail the conclusion. When premises are known with certainty, we can treat them as evidence. And if they could only be true if the conclusion (H) is true, then they provide maximally strong evidence for it, according to the strength test. And this is exactly what we should expect: an argument with known premises that entail its conclusion is maximally strong.

Evidence & probability

The evidence and strength tests both have to do with how likely or probable the potential evidence is. But what exactly do we mean by these terms? For the purposes of this text, the likelihood of a claim is just our degree of confidence in that claim, though I will assume that our degree of confidence is guided by a variety of factors, including relevant statistical facts. Let me explain. As you'll recall from §2.1, we can have various levels of confidence in a claim. For instance, if we haven't looked out the window, we can have different degrees of confidence that it's raining, and we can express our state of mind by saying things like: It's probably raining. I'm pretty sure it's raining. I suspect it's raining. I'm fairly confident that it's raining. ... and so on. We can also compare our levels of confidence in different claims: for example, we might be more confident that it's cloudy than we are that it's sunny. We might even get specific and decide we're twice as confident that it's cloudy as we are that it's sunny. To represent all of these degrees of confidence, we can pretend they are super-precise and give them a real value between 0 and 1, where 0 is certainty that the claim is false, and 1 is certainty that the claim is true. So, if we're equally confident that it's raining as we are that it's not raining, then we can represent our degree of confidence as 0.5, or we could say we're 50% confident that it's raining. One way that we express degrees of confidence is with the language of likelihood or probability. We say things like "It's probably raining" or "It's unlikely that my friend is out." We can also use that language to compare degrees of confidence, as in "It's more likely to be cloudy than sunny". And, although we don't do this often, we can use this language to be even more precise, and say things like, "The probability that it's raining is 0.7 or 70%". We can also ask about our confidence that something is the case given or assuming that something else is the case. For example, assuming my friend is home, I'm confident that his bike would locked up outside. This is what is meant by the vertical bar in "P( E | H )". It's also worth stressing what is not meant by "probability" in this text. For example, suppose I say, before looking out the window, "It's probably raining." Consider these two replies: "That makes no sense. Probability only applies to future events that are still up in the air. But whether it's raining now is already completely settled. It's either raining now or it's not. If it is, then

the probability that it's raining is 1; and if it's not, then the probability that it's raining is 0." "That makes no sense. Probability only applies to classes of events, not individual events. For example, we can look at every March 1st in history and ask how often rain has fallen on those days. Or we can look at every day where it was 62.5° at noon, and ask how often it rained. But for a unique event—like whether it is raining on this particular day of this particular year—it makes no sense to talk about probability." The problem with these replies is that they use the word "probability" to express different ideas from the one I have in mind when I say "It's probably raining." I know perfectly well that it's either raining or it's not, and I also know perfectly well that I'm talking about a unique event. But what I'm saying still makes sense—I still say, "It's probably raining!" I'm just expressing a degree of confidence that it's raining, and that's the notion of probability we'll be using in this text.

The true logic of this world is in the calculus of probabilities.         —James Clerk Maxwell So the probability of a claim, in our sense, is just our degree of confidence in that claim. But this doesn't mean it can be anything we like. For example, the probability we assign to an ordinary coin toss should reflect the fact that we know coins come up heads half the time and tails half the time. In the absence of any other relevant evidence, we should be 50% confident that this particular coin toss will come up heads. Likewise, if we know a card has been randomly selected from a standard deck of fifty-two, then our degree of confidence that it's the ace of spades should be 1/52—unless we have some other relevant evidence [1]. This is not the only rule that our probability assignments should follow. For example, something is going badly wrong if I'm 80% confident that it's raining but somehow also 60% confident that it's not raining at all. (Maybe it's not even possible to make a mistake this blatant, but there are certainly trickier errors of probability that we can make, as we'll see in later chapters.) So one very simple rule is that, if I assign a probability of n to H, I should assign a probability of 1 − n to not-H. For example, if I am 80% confident that it's raining, I should be 20% confident that it's not raining (1 − .8 = .2). We will encounter some more complex rules of probability in Chapter 9.

Section Questions 5-1 If E is more likely given H than given ~H, then...

A

E is independent of H, and learning E should not change our confidence in H

B

E is evidence against H, and learning E should make us less confident in H

C

E is evidence for H, and learning E should make us more confident in H

D

We must apply the strength test before we know whether we should change our confidence in H at all

5-2 The strength test...

A

doesn't tell us exactly how confident to be in H, unless the probability of E given H (or not-H) is zero

B

tells us how strong the evidence is for H, and how confident we should be in H after learning E

C

tells us the strength factor of the evidence, which is how many times more confident in H we should be after learning E

D

tells us what value we should assign to P ( E | H )

5-3

In the sense of "probability" used in this text...

A

only future events can have a probability between 0 and 1

B

probability only applies to classes of events

C

there are no restrictions about what probabilities we can reasonably assign to things

D

the probability we assign to a claim is our degree of confidence in that claim

5.2 Selection effects Imagine a fisherman casting his net into the lake for the first time of the season and drawing it back full of fish. Every time he pulls the net back, he notices that all the fish he pulls out are longer than about 5 inches. He finds this very odd, since at this point in the season he usually pulls out minnows and fry (baby fish) along with the larger fish. So he becomes more confident that there are fewer tiny fish this season... until he notices a 5-inch hole in his net. Any smaller fish would have simply slipped through the hole! This story illustrates a selection effect: a factor that selects which observations are available to us in a way that can make our evidence unreliable if we are unaware of it. The fisherman's net was systematically selecting which fish he could observe and which fish he could not. Before he was aware of the hole, he reasoned as follows. "I'd be more likely to catch no tiny fish if there are fewer of them this season. So catching no tiny fish is evidence that there are fewer of them." And he was exactly right. Because he had no reason to think there was a hole in his net, his observation gave him a good reason to think there were fewer tiny fish this season. This means it really was evidence for him—just evidence that turned out to be misleading. But after he discovered the hole, his catch no longer provided any evidence about how many tiny fish there were in the lake.

Note that whether something is evidence for us at a given time depends on what else we have reason to believe at that time. Suppose that a family always leaves all the lights on, whether or not they are home. For someone who doesn't know this, seeing all the lights on is still evidence—albeit, misleading evidence—that someone is home. It should increase their degree of confidence that someone is home. But, of course, that same observation would not be evidence for someone who knows that the lights are always on even when no one's home. Like the fisherman, we are interested in the world beneath the surface of our immediate experiences, and we have ways of gathering information about it. But we are not always aware of the factors that filter the information we get. A source might provide only true pieces of information while still being misleading, just as the net with a hole brought up only real fish from the lake. By systematically screening out certain other pieces of information, even a source that only says true things can still leave us with a very inaccurate picture of how things are.

Survival & attrition The classic example is from World War II and involves Abraham Wald, a mathematician tasked by the US Air Force with helping them decide where to place heavy armor on their bombers. The goal was to place it strategically, because every square foot of armor increases the plane's weight, costing fuel and limiting its range. So the Air Force surveyed the bombers returning from flights over Europe and kept track of which areas had been pierced by artillery fire and intercepting fighters. They turned that data over to Wald and asked him to find the optimal locations for armor. Abraham Wald examined the data and gave the officers exactly the opposite recommendation from what they expected. The natural thought was to recommend additional armor for locations that showed the most damage on returning planes. But Wald recommended putting more armor in locations that showed the least damage! This is because he was taking into account something that the officers had overlooked: all the bombers that didn't make it back. The survey of bullet holes had found more damage to the fuselage than to the engines or fuel system. But the explanation for this was not that bombers weren't being hit in those places—in fact, damage from enemy fire was likely to be pretty randomly distributed. The reason those locations showed fewer bullet holes is that bombers hit in those locations were more likely to be lost. In contrast, the areas with heavy damage were the areas where bombers could take hits without going down.

Writing in 1620, Francis Bacon described the case of a skeptical man who was taken to a temple and shown portraits that had been painted of people who had made vows to the gods and subsequently escaped shipwrecks. "Surely now you must admit the power of the gods!" they said. He answered with a question that reveals an important insight about evidence: "But where are the portraits of those who drowned after making their vows?" [3]. The question was rhetorical: naturally, the temple hadn't memorialized anyone who had made vows but then died in a shipwreck anyway. The problem is that if we can only observe those who made vows and survived, we are not seeing the whole picture. The temple's bias in favor of happy endings created a selection effect.

But where are the portraits of those who drowned after making their vows?        —Francis Bacon, Novum Organum, 1620.

We can apply the evidence test to this case. If we'd expect some people to survive shipwrecks whether or not the gods were saving anyone, then painting the survivors who had made vows provides no evidence for protection by the gods. To test the hypothesis, we'd need to know the proportion of vow-takers who survive, and compare that to the proportion of people who survive in general. Examples of survivor bias abound even in ordinary life. Imagine looking over an investment company's portfolio of mutual funds; each fund shows returns that generally outperform the market over the life of the fund. Is this good evidence that the company's funds tend to outperform the market? Well, it depends on whether you're looking at all the funds or just all the surviving funds! Poorly performing funds tend to be retired and replaced with new funds that don't have a negative record. We've all heard complaints about how poorly buildings are constructed these days, or how well-built old tools are in comparison to "flimsy modern garbage." Maybe these claims are true, but pointing to sturdy ancient buildings or tools that have survived for generations is, by itself, not very good evidence that "they don't make 'em like they used to." After all, these are only the buildings and tools that managed to survive. The Parthenon still stands, but how many thousands of poorly-built ancient hovels are long forgotten? Or consider the common conception that, to create a successful company, we should model it after the most successful companies. Or the idea that we should study the habits and strategies of highly successful people, because adopting them is likely to increase our own chances of success. But what if paying attention to the most successful companies and individuals creates a survivor bias? For example, what if certain risky strategies lead both to a higher probability of extreme success and a

higher probability of failure, as in the figure below. If so, and we focus only on cases of success, it looks like the risky strategy worked better than the careful one—companies that followed the risky strategy are more successful.

The problem is that we may not have heard of the companies that failed, either because they didn't survive the process, or because we ignore them when we are looking for models to emulate. But if we could see the whole picture, we'd realize that the risky strategy also has a much bigger downside. Even scientific studies can be subject to survivor bias. For example, suppose we want to study how people's lives change over a decade. We survey a randomly selection of people and then contact them

again after a decade to see how their lives have changed. This raises two problems. The first is a simple survivor bias: some people will have died, and they are more likely to be older, sicker, and less wealthy. As a result, the answers provided by the survivors may not represent how things have gone for people in general. But the second problem is that some people may just drop out of the study, creating what's called an attrition bias. For example, maybe people whose lives are not going very well are more likely to drop out, in which case the survey's results will look more positive than they should. Or maybe people whose lives are going extremely well are just too busy with their happy lives to fill out the survey! These possibilities weaken the evidence provided by the survey, since they increase the probability that the results don't reflect the experiences of people in the country as a whole. For this reason, any study with attrition should be wary of attrition bias.

Selective recall Our own memory is a crucial source of information. But when we dip into our memory to find evidence for claims, certain kinds of memories are more likely to surface than others. As you'll recall from Chapter 1, we often treat the ease with which things come to mind as an indication of how common or probable those things are. (This cognitive shortcut is known as the availability heuristic.) But ease of recall can be affected by a variety of factors, including: the intensity of the emotion we felt at the time of the memory exactly how the question we're trying to answer is framed the position of the memory within a larger series of events To illustrate, suppose I am asked, "Are most Italians friendly?" As a non-Italian, my encounters with Italians are subject to a number of selection effects: the sample of Italians I have encountered is surely unrepresentative in various ways. But my memory itself will add an additional layer of selection effects too. First, any encounters with Italians that involved intense emotions will tend to stand out—even if those emotions have nothing to do with the encounter. For example, suppose that during a trip to Italy, an election occurred in my home country that I felt strongly about. In that case, I am more likely to remember the Italians I was with when I heard the result of the election, as opposed to the dozens of Italians I met under more mundane circumstances. Second, the ease with which friendly Italians come to mind can be influenced by how the question was framed. For example, ideally it should make no difference whether someone asked me, "Are most

Italians friendly?" or "Are most Italians unfriendly?" But this difference in framing can affect my answer. Faced with the first question, I will tend to search my memory for friendly Italians; faced with the second, I will tend to search my memory for unfriendly Italians. If I've known enough Italians to easily think of both kinds, the way these questions are framed will incline me towards a "yes" answer to both questions [4]. Finally, our memories can be influenced by their position in a series of events. Aside from the emotional peak of the series, we tend to remember the very first event and last event more than the rest—or else the first and last parts of an extended event. So I'm more likely to remember the first and last Italians I met on my trip. This is called serial position effect. In one study, patients undergoing an uncomfortable procedure were split into two groups. One had the ordinary procedure done, while the other had exactly the same procedure, extended by three additional minutes that were less uncomfortable than the rest. When they later evaluated their experience, the patients who received the extended procedure remembered less total pain and generally rated the whole experience as less unpleasant. How their procedure ended made a big difference to the overall impression, even though they had all experienced the same amount of discomfort [5].

Selective noticing The idea that the phase of the moon affects human behavior is startlingly common, and there is a long list of events believed to be more likely when the moon is full: violent acts, suicides, psychotic episodes, heart attacks, births, emergency room visits, outpatient admissions, and car accidents—to name just a few. Even many health professionals believe that the phases of the moon affect human behavior [6]. However, the belief in a mysterious lunar force affecting human behavior has no scientific basis and has been strongly disconfirmed by careful studies [7]. The only mystery is why so many people continue to believe in such a force. Psychologists think part of the answer is that we tend to selectively notice examples that fit with hypotheses we've heard of—even if they are not our own pre-existing views. Once an idea like the full moon affects human behavior gets into our heads, even if we don't initially believe it, we are far more likely to notice examples that confirm it than examples that disconfirm it. Let's look at how this works. To test the hypothesis that there are more medical emergencies during a full moon, for example, we'd have to track the number of emergencies both when the moon is full and when it's not. The days that fit the hypothesis best are days when the moon is full and there are more

emergency cases than usual, as well as days when the moon is not full and there are fewer cases than usual. These are the confirming instances. Meanwhile, days when the moon is full and there are fewer cases, or the moon isn't full and there are more cases, are disconfirming instances.

But our minds don't naturally test hypotheses with as much rigor as well-run scientific studies. Suppose you work in an emergency room, and you've heard of the full-moon hypothesis but don't believe it. On a night when you have lots of visits to the ER, you probably won't go out of your way to check the moon. In fact, you probably won't even think about the hypothesis. Nor will you remember it if you just happe to notice that the moon is full on an ordinary day. But if you notice both things on the same night, you're more likely to remember the hypothesis, and then you'll register the day as a confirming instance. The problem is: if you only ever think about the hypothesis when you notice a confirming instance, this creates a selection effect on your perception of the evidence. Your mind tallies up cases in the top lefthand box of the chart but fails to fill in the other boxes at all. So it feels like the evidence you're gathering supports the hypothesis. There are many other examples of this effect. Some people think they see repeated digits on digital clocks (like "11:11") more often than they should. (A quick Google search about this will take you down a depressing rabbit hole of bad reasoning.) Again, once this idea is lodged in our brains, it is hard to avoid falling into the trap of selectively noticing instances. When we look at the clock and there are repeated digits, we are reminded of the hypothesis and register a confirming instance. But every other time we glance at the clock, we don't even think about the hypothesis. If we were somehow forced to actually write down all the times we glanced at the clock, we would find that we see repeating digits no more often than we should by chance. (It's maybe worth noting that 11:11 is about when I start checking the clock to see if it's lunchtime in the morning, and to see if it's bedtime at night.)

Section Questions 5-4 In the story about the net with the hole in it,

A

The fisherman was subject to the serial position effect, because of the order with which he pulled fish out of the water

B

The fisherman was subject to a survivor bias, because he killed the fish and ate them

C

The fisherman's observation was never evidence for him that there were fewer tiny fish in the lake this season, because there was a hole in his net

D

The fisherman initially got evidence that there were fewer tiny fish in the lake this season, but he later realized his evidence was unreliable

5-5 The point of the example about the World War II bombers was to illustrate...

A

that without seeing all the bombers, including those that did not survive, there was no evidence about the best place to put armor

B

that the distribution of bullet holes on surviving bombers was subject to a selection effect

C

that the officer's memories were selective because they failed to recall the locations of all the bullet holes

D

that the officers had been engaging in a risky strategy with the bombers that was unlikely to lead to success

5-6

The example about patients who underwent an extended version of an uncomfortable procedure illustrates...

A

that we are more likely to remember a procedure the longer it takes

B

that we selectively notice experiences that confirm our expectations—for example, that the extended procedure is uncomfortable

C

that our impression of the last part of an extended event has an outsized influence on our impression of the whole event

D

that the ease with which we recall events is influenced by the way in which the question was framed: for instance, how pleasant/unpleasant was the extended procedure?

5-7 Selective noticing is when we notice...

A

confirming instances more than disconfirming instances of a hypothesis, and it only applies to our preexisting views

B

confirming instances more than disconfirming instances of a hypothesis we've heard of, whether or not it is a pre-existing view

C

disconfirming instances more than confirming instances of a hypothesis, and it only applies to our preexisting views

D

disconfirming instances more than confirming instances of a hypothesis we've heard of, whether or not it is a pre-existing view

5.3. Media biases Our last type of selection effect deserves its own section because it's arguably the most important selection effect of all, at least in contemporary life. The term media bias is usually used to mean political bias in the popular media. But there is also a bias common to all news outlets, magazines, social media, and even academic journals—namely, the bias towards content that is engaging, that catches and holds

the attention of readers' or viewers' attention. This bias alone is enough to radically skew our perception of the world, even when the media outlets themselves are not intending to support any particular views.

News and fear Even a 24-hour news channel can only cover a minuscule fraction of the day's events. So which do they choose? For the most part, the stories deemed "newsworthy" are simply those most likely to engage viewers—in other words, those that have an emotional impact, even if this means eliciting fear or rage [8]. For the most part, media outlets don't consider it their job to decide which events are actually going to impact viewers' lives. What matters is getting people's eyeballs to stay on the page or the screen. Unsurprisingly, then, the number of scary news stories about a particular threat bears no relation to the probability that we ourselves will be affected by it. A news story about a shark attack—or plane crash, or mass shooting, or bear mauling—will be watched by many people due to fear or morbid fascination. But coverage of these causes of death is far out of proportion with the number of people who die from them. In the United States, a single person dies from a shark attack about every two years. But the most dangerous animal in the US, it turns out, is the deer, which causes about 300 times as many deaths—mostly due to road accidents. Of course, this statistical detail has little emotional impact. System 1 is not wired to think statistically, but it does understand enormous shark teeth. The threat posed by sharks feels much scarier than the threat posed by deer, and that gut reaction is what the media responds to. The media's coverage then creates even more fear, causing a feedback loop between the public's emotional reaction and the amount of coverage devoted to the issue.

Deep down, we all know that news sources are subject to an enormous selection effect. But because so much of our information about the world comes from the news, it's difficult to resist forming judgments about how common things are by how much news coverage they receive. For example, Americans have a highly inflated sense of the violent crime rate. Overall, violent crime has dropped dramatically since the 1970s, but every single year a solid majority of Americans reports that they believe violent crime has

risen in the past year. [9] This is not surprising given the source of their information; media outlets seldom report on trends or statistics. They just continue to report the scariest violent crimes of the day, with the intensity of coverage only increasing due to the 24-hour news cycle and access to screens at all hours. Consider mass school shootings. These tragic incidents push all of our fear buttons: they are sneak attacks in public places with screaming, chaos, and the murder of innocent children. As a result, they have become a central part of the public discussion around gun control and gun rights. (As one activist group declared in their mission statement: "Our schools are unsafe. Our children and teachers are dying.... Every kid in this country now goes to school wondering if this day might be their last. We live in fear" [10].) But has the media coverage skewed our perception of the threat? The truth is that schools are just as safe as ever—indeed, the safest place for children to be, overall. Children spend a great deal of their lives at school, but less than 2% of child homicides occur at school, and an even lower percentage of child deaths in general occur at school. Nor has the overall threat from violent crime to children increased in recent years, at school or elsewhere [11]. What about the threat of public mass shootings in general? Going by the news coverage, they would seem to represent a huge slice of the total number of gun deaths in the U.S. But in fact, the average number of gun deaths is more than 30,000 per year (about two thirds of them suicides), while the number of deaths in public mass shootings is less than a hundred. [12] The coverage of public mass shootings is vastly disproportionate to the number of people killed with guns, let alone those who die from far more common causes of death, like heart disease. Another central focus of coverage in the US media over the last two decades has been acts of foreignborn terrorists. Terrorism, of course, is intended to generate as much fear as possible, so terrorist acts tend to be spectacular and terrifying. Anyone who was old enough to be watching the news during the tragic World Trade Center attack has images from news reports seared into their minds forever. It was a perfect storm of fear-eliciting elements: planes, skyscrapers, fire, explosions, foreign enemies, and nearly 3,000 innocents dead. No wonder it dominated the news, the national discussion, and ultimately US foreign and domestic policy for so long. A much less-known result of that attack is that it caused a significant drop in the rate of air travel over the following year, and a corresponding increase in the use of cars to travel long distances. Gerd Gigerenzer, a specialist in risk and decision-making, calculated that an additional 1,595 people died that year from motor vehicle accidents as a result of this shift to auto travel [13]. That's fully half the number of people who died in the attack itself, added again to the total casualties. But these additional

deaths went unreported. Perhaps that's because they are a very small fraction of the number of motor vehicle fatalities in a given year, which is usually between 30,000 and 40,000—ten times the number who died in the World Trade Center. But these deaths happen one at a time, they're not spectacular, and we've simply come to expect them. In short, they're not news.

We have an evolutionary tendency to fear situations in which many people die at one time.... it is very difficult to elicit the same fear for the same number of deaths spaced over a year.       —Gerd Gigerenzer

Our brains are suited to life in small tribes, so our intuitive heuristic for scary stories is simple: if something very bad happened to someone else, I should worry that it might happen to me. But we now have an information network that can find all the scariest things that have happened from among billions of people in the world, and funnel them all into our living rooms for the evening news. The picture we get is a scary one: there are tornados, new contagious diseases, violent gangs, cancer-causing pesticides, and serial killers out there. Which new threat should I fear? How many people does it have to kill in a country of 325 million before I should fear it? This are exactly the sorts of questions we would expect our System 1 processes to get terribly wrong, since they developed in a world where all of our news involved just a few hundred people and reached us through word-of-mouth. Faced with a denominator of hundreds of millions, our intuitive threat detectors have no sense of proportion, and the news media doesn't help. Even when we crunch the numbers with System 2, we don't really get a feel for the proportions. This graph shows the actual relative proportions of various causes of death in the US, compared with how much media coverage each cause received. (And this was coverage in The New York Times, not exactly our most sensationalistic news source). You can barely see homicide and terrorism on the graph showing what really causes deaths, but together they got more than half of all the coverage in the New York Times. Meanwhile, as you can see, all the most common ways for Americans to die are massively under-discussed in comparison with the threat they actually pose.

A more recent comparison is this: as of January, 2021, the COVID-19 outbreak was killing about as many people in the US every day as were killed all together in the 9/11 terrorist attacks. But the latter dominated the nation's discourse and politics for a decade, ultimately leading to more lives lost, and trillions of tax dollars spent, prosecuting the "war on terror". Or consider: this is approximately how many people die per year in the U.S. from selected causes: [14] sharks: < 1 foreign-born terrorists (average since 1975, including 9/11): 70 public mass shootings: < 100 accidents with deer: 170 airliner accidents (globally): 300 unintentional drownings: 3500 homicides: 17,000 radon: 20,000 motor vehicle accidents: 35,000 preventable accidents in hospitals: 100,000 - 200,000 premature deaths from air pollution: 200,000 cancer and heart disease: 600,000 each It's hard to make these numbers really sink in. But if we can't come to really feel the difference in risk between public mass shootings and car accidents, we're stuck with a cognitive illusion where our two systems disagree about the extent of the threat. We could turn to visual aids: if we made an ordinarysized bar chart of this data, only the items on the second half of the list would even be visible. Now imagine a news program that allocates time to scary stories in proportion to the number of lives lost. That would give us a truly representative reporting of threats. But what would it actually be like? For every minute spent on a public mass shooting, our representative news program would have to spend 27 hours on car accidents, more than 3 entire days on air pollution, and more than 18 days on cancer. Sitting through a report like that might actually give our System 1 a good sense of the threat we face from various causes. After weeks of coverage on deaths from heart disease and cancer, many of which are preventable through lifestyle choices, we'd probably worry much less about unlikely risks and focus much more on living healthier lives.

Echo chambers

Although news organizations have an incentive to produce content that's engaging to people in general, they are especially concerned with content that engages their primary audience. When a news outlet is widely associated with a certain political inclination, its coverage naturally plays to the expectations, hopes, and fears of people on that part of the political spectrum. The resulting selection effect is extremely potent even if all the reporting from that outlet is perfectly accurate. Is your audience generally in favor of environmental causes? They may be especially interested in a story about animals getting caught in manmade litter. Is your audience generally in favor of stricter immigration enforcement? They may be interested in a story about violent crimes committed by immigrants. Is your audience generally in favor of stricter gun control? They may be interested in a story about firearm accidents involving children. And so on. Even if all of these news reports are perfectly accurate, they will tend to skew our impression of how common the reported events are. (What percentage of preventable animal deaths are caused by human litter? What percentage of violent crimes are committed by immigrants?) The overall effect is to reinforce the viewers' sense of which are the most important issues facing the country and the world. Watching news reports about specific events is like trying to get a sense of a landscape through a magnifying glass—and one guided by somebody else! Today's social media compounds this problem by tailoring the information we get even more precisely, down to the views and preferences of each individual user. As always, the goal is engagement, so the algorithms that are used to rank and filter our social media feeds are optimized to make us react with views and likes and shares. But what sort of content has that effect? If the content bears on a political issue, we react most positively to what we perceive as convincing arguments for views we already agree with. Meanwhile we react angrily towards content illustrating the most ridiculous claims and arguments made by people we disagree with. We might even share these with our friends, as particularly egregious examples of how insane the "other side" is. As a result, the content most likely to show up on our feeds also happens to be that which is most likely to reinforce our pre-existing views. With this new kind of selection effect added to our natural tendency for confirmation bias, it's no wonder that political polarization has been rising sharply for years. The metaphor of an echo chamber is apt for describing the effect of tailored news and social media feeds: our own views are constantly reinforced as a result of reverberating around a closed system. This precludes us from discovering truths that conflict with our views, and leads us to a kind of stagnation in

our interests and concerns. When our news is tailored to cover issues we already care about, we don't get the chance to discover new issues that might be worth caring about even more. So what can we do? The problem is deep enough that it won't be remedied by occasionally checking news outlets and opinion pieces from the "other side." And most of us won't just abandon social media and politically biased news outlets entirely. So our only hope of curtailing the problem would be a strenuous effort to identify very smartest, most convincing proponents of views we reject, and a commitment to reading their books and articles and tweets with an open mindset. And it would also require unfollowing everyone who shares enraging caricatures of the "other side" and trite arguments for our own views. These things would be a start, but they also strike most of us as very unpleasant. My prediction is that very few readers of this text, even those who agree with the need for these measures, will actually carry them out.

Research media The bias for publishing things that are likely to engage consumers is not limited to public-oriented media: it also extends to scientific journals and books. In the context of scientific research, the most engaging results are those that somehow disrupt conventional wisdom. A study can do this by providing evidence against a standard view, or evidence for a surprising alternative. Meanwhile studies that support the conventional wisdom, or fail to provide support for alternatives, can be passed over because they're... well, boring. This is called the publication bias in research. Of course, researchers are aware of this bias, and it affects which papers and studies they choose to send to journals for publication, as well as which studies to actually turn into publishable papers. This creates a secondary bias that amplifies the publication bias, and it's known as the file drawer effect. You run a study to test a surprising hypothesis, but the results suggest the hypothesis is false, so your study simply reinforces the conventional wisdom. Is it even worth writing up if it will likely be rejected? There's a decent chance your data goes in a file drawer, never to be seen again by the scientific community at large. So why is this a problem? Suppose you're interested in what evidence there might be for a surprising claim. Let's say you find three or four studies that support it, and only one or two support the boring alternative that it's false. Overall, it may look like there is something to the hypothesis. But how many studies with evidence against the surprising claim are sitting in file drawers, or have been rejected by publishers for failing to be exciting? If there are ten, then analyzing all the data together might indicate that the surprising claim is false. But an analysis of the data available to

you might still indicate that the surprising claim is true. Even if the studies you're reading are models of scientific rigor, the evidence they provide is unreliable merely because there are other studies you're not seeing. It's worth noting that the standard threshold for a study to provide "statistically significant" evidence for some hypothesis H is roughly that if H were false, only one out of twenty studies performed in the same way would get results like this. The flip side is that if H is actually false and 20 studies like this are performed, we would expect one to provide "statistically significant" evidence for H. If that study is published and the other 19 go in a file drawer or a trash bin, all you have available to you is the one study providing the evidence that you'd expect if H were true. This means we've just selected our way from exactly the result we'd expect if the hypothesis were false, to exactly the result we'd expect if our hypothesis were true. For the most part, it's unlikely that the publication bias and file drawer effect are quite so strong—and plenty of studies show significance far beyond the threshold level—but the potential for misleading evidence is very worrisome. Happily, there is a movement to combat these effects by ensuring that studies are reported even if they are not published. Ideally, studies can even be pre-registered, meaning that the researchers announce their intent to perform a study with these protocols even before it begins. That way, if the study is written up but not published, or even abandoned midway, subsequent researchers can inquire about its results and expand their pool of data beyond the results that happened to be selected for publication.

Section Questions 5-8

Multiple answers: Multiple answers are accepted for this question

In this section, we discussed how System 1 struggles to grasp the true probability of being harmed by a threat reported in the news, for three reasons: (you must choose all three correct answers)

A

the scary events we hear about have been selected from such a large pool of people

B

our System 1 reacts more intensely to some threats than others even if they are equally harmful

C

news outlets report inaccurate facts in support of whatever political agenda is most common among their viewers

D

the coverage of threats is not proportional to the number of deaths they cause

5-9

Multiple answers: Multiple answers are accepted for this question

Aside from great memes, on social media we should expect to encounter... (to get the point you must choose both correct answers)

A

convincing arguments supporting claims opposed to our own

B

unconvincing arguments supporting claims we agree with

C

especially unconvincing arguments supporting claims opposed to our own

D

especially convincing arguments supporting claims we agree with

5-10 Due to the file-drawer effect and research publication bias...

A

research with surprising results is more likely to become available to us than research with unsurprising results

B

we are more likely to remember when we read surprising research than when we read an unsurprising research

C

research with surprising results should not be treated as providing any evidence for those results

D

research with surprising results is published even though it does not meet professional standards

Key terms Attrition bias: a selection effect (similar to survivor bias) in which some patients drop out of a research study, or data is lost in some other way that can make the evidence unreliable. Echo chambers: the feedback loop that occurs when our sources of information and opinion have all been selected to support our opinions and preferences. This includes our own selection of media and friends with similar viewpoints, but also results from the fact that social media tailors what we see, based on an algorithm designed to engage us. Evidence for H: when a fact is more probable given H than given ~H, it constitutes at least some evidence for H. By the first rule of evidence, this means we should increase our degree of confidence in H at least a tiny bit. Evidence test: if we are wondering whether a new fact or observation is evidence for a hypothesis H, we can ask whether that fact or observation is more likely given H or given ~H. If the former, it's at least some evidence for H. If the latter, it's at least some evidence for ~H. If neither, it's independent of H. The formal version of the evidence test is "Is P( E | H ) greater or less than P( E | ~H )"? File-drawer effect: this is a selection effect caused by the researchers themselves, who might not even bother to write up and send in a study that is unlikely to be published (viz. a boring study), but instead leave it in their file drawers. See the related entry for publication bias. Hypothesis: this is any claim under investigation, often denoted with the placeholder letter "H". Independent of H: see evidence test. Media bias: although this term is generally used in reference to the political biases of the media, we use it here to cover the general bias towards engaging content, though this may manifest in content of special interest to viewers with a certain political orientation or even outrightly slanted content. The general category of media bias also includes the highly tailored algorithms of social media. Publication bias: the tendency for academic books and journals to publish research that is surprising in some way. A piece of research can do this by providing evidence against conventional wisdom, or for a

surprising alternative. Meanwhile studies that support conventional wisdom, or fail to provide support for alternatives, can be passed over. Selection Effect: a factor that systematically selects which things we can observe. This can make our evidence unreliable if we are unaware it's happening. Selective noticing: when observations that support a hypothesis bring that hypothesis to mind, causing us to notice that they support it, whereas observations that disconfirm it do not bring it to mind. The result is that we are more likely to think about the hypothesis when we are getting evidence for it, and not when we are getting evidence against it. So it will seem to us like we are mainly getting evidence for it. This can happen even if the hypothesis is just something we've considered or heard about—it needn't be something we already believed. (So selective noticing can happen without confirmation bias, although it seems to be exacerbated when we do antecedently accept the hypothesis.) Serial position effect: the tendency to remember the very first and last events in a series (or the first and last parts of an extended event). Strength factor: a measure of the strength of a piece of evidence—namely the result of dividing P( E | H ) by P( E | ~H )—where we are measuring the strength of the evidence that E provides for H. The higher the strength factor, the stronger the evidence provided by E. (A strength factor of less than 1 is also possible, when E is less likely given H than ~H: this means it's evidence against H.) A traditional but arbitrary threshold for "strong evidence" is about a strength factor of 10. Strength test: a test of the strength of a piece of evidence. Informally, it involves asking: How much more (or less) likely is this if H is true than if H is false? Formally, the question is: How much greater (or less) is P( E | H ) than P( E | ~H )? Note that we need a comparative answer to the strength test, so we divide P( E | H ) by P( E | ~H ) to give us the strength factor. Survivor Bias: this is a more specific term for bias arising from an extreme form of selection effect, when there is a process that actually eliminates some potential sources of information, and we only have access to those that survive. For example, suppose that I've met lots of elderly people who have smoked all their lives and are not sick, so I decide that smoking is fairly safe. I may be forgetting that the people who smoked and died are not around for me to meet.

Footnotes [1] In short, probabilities are "subjective" in the sense that they express a state of mind, but not in the sense that they aren't governed by common criteria of rationality. [3] Francis Bacon, Novum Organum, Book 1: 46. [4] See Kahneman (2011) p. 81. [5] Redelmeier, Katz & Kahneman (2003) [6] Vance, D. E. (1995) and Angus (1973). [7] See Rotton & Kelly (1985) for an exhaustive (and exhausting) review of research prior to 1985 attempting to test such a connection. See also Thompson & Adams (1996) for a more recent study. [8] See for example Haskins (1984), Zillmann, Knobloch, & Yu (2001), Knobloch, Hastall, Zillmann & Callison (2003), Zillmann, Chen, Knobloch & Callison (2004). To see evolutionary explanations for our interest in themes like conflict and threat in news reports, see Shoemaker (1996), Davis & McLeod (2003), and Schwender & Schwab (2010). [9] See this Pew Research Center Report. [10] See the mission statement of March for Our Lives from early 2018. [11] See the Indicators of School Crime and Safety Report, 2017, by the National Center for Education Statistics. A summary of some results can be found here. In particular, "between 1992 and 2016, total victimization rates for students ages 12–18 declined both at school and away from school. Specific crime types—thefts, violent victimizations, and serious violent victimizations—all declined between 1992 and 2016, both at and away from school." A table on homicides in particular can be found here. [12] The exact definition of a "public mass shooting" is disputed but it is usually defined as a shooting in a public place in which four or more victims were killed, excluding gang violence and domestic or drug-related disputes. By this definition, the average is around 20, annualized between 1982 and 2018. Mother Jones has a frequently updated guide, and has made their database available here. Their definition includes more shootings than the one just given, but even by their definition, and even focusing only on recent years, the average is less than one hundred. [13] See Gigerenzer (2006). [14] Shark attacks: see the data here; Public mass shootings: see fn. 12; Foreign-born terrorism: the data is summarized in Nowrasteh 2016; Deer: see this discussion from the CDC; Airliner accidents: see the IATA 2017 Safety Performance report and accompanying fact sheet; Unintentional drowning: see the CDC's fact sheet; Homicides: see this CDC report and recent data; Radon: see Pawel & Puskin (2004), the EPA's assessment, and a summary here; Motor vehicle accidents: see the

Insurance Institute for Highway Safety's report. Preventable accidents in hospitals: see Makary & Daniel (2016) and James (2013). Air pollution: for a major quantitative analysis see Caiazzo, Ashok, Waitz, Yim, & Barrett (2013). A summary can be found here. See also Hoek, Krishnan, Beelen,Peters, Ostro, Brunekreef, & Kaufman, (2013), and Pope, Burnett, Thun, Calle, Krewski, Ito, & Thurston (2002). Cancer and heart disease: see the National Vital Statistics Report Vol 64 #2 (p. 5), and the summary by the CDC on leading causes of death. Note that several of the figures in this list are overlapping causes (e.g. radon and cancer).

References Angus, M. D. (1973). The rejection of two explanations of belief in a lunar influence on behavior. Thesis, Simon Fraser University, B.C., Canada. Bacon, Francis. Novum Organum, published online at thelatinlibrary.com. Caiazzo, F., Ashok, A., Waitz, I. A., Yim, S. H., & Barrett, S. R. (2013). Air pollution and early deaths in the United States. Part I: Quantifying the impact of major sectors in 2005. Atmospheric Environment, 79, 198208. Davis, H., & McLeod, S. L. (2003). Why humans value sensational news: An evolutionary perspective. Evolution and Human Behavior, 24(3), 208-216. Gigerenzer, G. (2006). Out of the frying pan into the fire: Behavioral reactions to terrorist attacks. Risk Analysis: An International Journal, 26(2), 347-351. Gramlich, J. (2016). Voters’ Perceptions of Crime Continue to Conflict with Reality. Pew Research Center. Haskins, J. B. (1984). Morbid curiosity and the mass media: A synergistic relationship. In Morbid Curiosity and the Mass Media: Proceedings of a Symposium (pp. 1-44). Knoxville: University of Tennessee and the Garnett Foundation. Hoek, G., Krishnan, R. M., Beelen, R., Peters, A., Ostro, B., Brunekreef, B., & Kaufman, J. D. (2013). Longterm air pollution exposure and cardio-respiratory mortality: a review. Environmental Health, 12(1), 43.

James, J. T. (2013). A new, evidence-based estimate of patient harms associated with hospital care. Journal of Patient Safety, 9(3), 122-128. Kahneman, D. (2011). Thinking, Fast and Slow. Macmillan. Knobloch, S., Hastall, M., Zillmann, D., & Callison, C. (2003). Imagery effects on the selective reading of Internet newsmagazines. Communication Research, 30(1), 3-29. Makary, M. A., & Daniel, M. (2016). Medical error—the third leading cause of death in the US. BMJ, 353, i2139. Musu-Gillette, L., Zhang, A., Wang, K., Zhang, J., Kemp, J., Diliberti, M., & Oudekerk, B. A. (2018). Indicators of School Crime and Safety: 2017. National Center for Education Statistics. Nowrasteh, Alex. 2016. “Terrorism and Immigration: A Risk Analysis.” Policy Analysis # 798. Cato Institute. Pope III, C. A., Burnett, R. T., Thun, M. J., Calle, E. E., Krewski, D., Ito, K., & Thurston, G. D. (2002). Lung cancer, cardiopulmonary mortality, and long-term exposure to fine particulate air pollution. Jama, 287(9), 1132-1141. QuickStats: Number of Homicides Committed, by the Three Most Common Methods — United States, 2010–2016. MMWR Morb Mortal Wkly Rep 2018;67:806. Redelmeier, D. A., Katz, J., & Kahneman, D. (2003). Memories of colonoscopy: a randomized trial. Pain, 104(1-2), 187-194. Rotton, J., & Kelly, I. W. (1985). Much ado about the full moon: A meta-analysis of lunar-lunacy research. Psychological Bulletin, 97(2), 286. Schwender, C. & Schwab, F. (2010). The descent of emotions in media: Darwinian perspectives. In Doveling, K., von Scheve, C., & Konijn, E. A. (Eds.) The Routledge Handbook of Emotions and Mass Media. (pp. 29-50). Routledge. Shoemaker, P. J. (1996). Hardwired for news: Using biological and cultural evolution to explain the surveillance function. Journal of Communication, 46(3), 32-47. Slattery, K. L., Doremus, M., & Marcus, L. (2001). Shifts in public affairs reporting on the network evening news: A move toward the sensational. Journal of Broadcasting & Electronic Media, 45, 290-302.

Thompson, D. A., & Adams, S. L. (1996). The full moon and ED patient volumes: unearthing a myth. The American Journal of Emergency Medicine, 14(2), 161-164. Vance, D. E. (1995). Belief in lunar effects on human behavior. Psychological Reports, 76(1), 32-34. Zillmann, D., Chen, L., Knobloch, S., & Callison, C. (2004). Effects of lead framing on selective exposure to Internet news reports. Communication Research, 31(1), 58-81. Zillmann, D., Knobloch, S., & Yu, H. S. (2001). Effects of photographs on the selective reading of news reports. Media Psychology, 3(4), 301-324.

Image Credits Banner image of fisherman casting net: image by Quang Nguyen vinh licensed under CC0 / cropped from original. Bicycle locked up on the snow: image by Tony, licensed under Pexels license. Blue building with outdoor light on: image by Molly Champion licensed under Pexels license / cropped from original. Person with umbrella in city street: image licensed under

CC0 / cropped from original. Playing cards with ace of spades prominent: image licensed under CC0. Red door with handle: image by MabelAmber licensed under CC0. Small net with hole in it: image by Emily Hopper licensed under Pexels license. Wall clock: image by Buenosia Carol licensed under Pexels license; WWII bomber: image by the U.S. Air Force in the public domain. Survivor bias comic: used by permission from Zach Weinersmith. Venetian canal: image licensed under CC0. Moon with leafless trees: image by David Dibert, licensed under Pexels license. Deer on road in snow: image by Amarpreet Kaur licensed under CC0 / cropped from original. Car window shattering: image by Vladyslav Topyekha, licensed under CC0 / cropped from original. Magnifying glass with "facts": image by Gerd Altmann licensed under CC0 / cropped from original. File drawer: Image by Sara Franses licensed under CC BY-SA 4.0.

Reason Better: An Interdisciplinary Guide to Critical Thinking © 2019 David Manley. All rights reserved version 1.4 Exported for Jason Bao on Tue, 12 Sep 2023 21:20:53 GMT

Exported for Jason Bao on Tue, 12 Sep 2023 21:21:33 GMT

Reason Better An interdisciplinary guide to critical thinking

Chapter 6. Generalizations

Introduction Our minds constantly form generalizations, often based solely on fleeting experiences, impressions, or anecdotes. And once the generalizations have been formed, they can be hard to dislodge from our collection of beliefs, even if they are grounded in very weak evidence. Being careful about our generalizations requires engaging System 2 and taking deliberate steps to avoid common pitfalls. But the skill of forming appropriate generalizations is very difficult to master, which is why an entire academic field has been dedicated to its study—namely, the field of statistics. A statistical inference uses specific observations as evidence for general claims of which they are instances—or vice versa. Metaphorically, if there's a large pool of facts in which we're interested, we can dip into it at various places and draw conclusions about the pool as a whole. Once we have those generalizations in place, we can start making predictions about what we'll see next time we dip into that same pool of facts.

In the simplest case, we're interested in a large group of items or individuals. A statistical generalization is an inference that moves outward from facts about a sample to draw a conclusion about the group at large. For example, maybe some proportion of individuals in our sample possess a certain trait. We then take this as some evidence that roughly the same proportion of individuals in the larger group possess that trait as well. (As we'll see, a lot hangs on how we take our sample, and what we mean by some evidence and roughly.) By contrast, a statistical instantiation moves inward from a fact about the larger group to draw a conclusion about a sample. For example, if we know that a certain proportion of individuals in a large group possess a certain trait, this should guide our degree of confidence that a random sample from the group will possess that trait. In this chapter, we'll consider some key features of good statistical inferences, and some common pitfalls to avoid.

Learning Objectives By the end of this chapter, you should understand: how to apply the strength test to statistical generalizations why the size and representativeness of a sample are critical how margins of error interact with the strength of evidence the pros and cons of various measures of centrality how to illustrate the shape of data and see through misleading graphs what makes loose generalizations and stereotypes problematic

6.1 Samples as evidence Suppose we want to know what percentage of cars in the United States are red. Checking every single car in the country would be far too hard, and we don't need a completely precise answer. A sensible method is to take a random sample of cars and see what proportion of them are red. If we are careful,

this can provide strong evidence for a fairly precise conclusion, such as 4%-6% of the nation's cars are red. So when does an observation that N% of cars in our sample are red provide strong evidence that roughly N% of cars in the whole country are red? Recall that the strength of a piece of evidence depends on a comparison of two probabilities: the probability that the observation would be made if the hypothesis were true, and the probability that the observation would be made if the hypothesis were false. In this case, we have:   Evidence = N% of cars in the sample are red        Hypothesis = roughly N% of cars in the country are red So we need to ask how much more likely our evidence is if the hypothesis were true than if it were false. The answer to this question will depend primarily on two things: the size of our sample; and whether the sample is subject to any selection effects. If our sample is a convenience sample—i.e., small and carelessly selected—then it doesn't provide very much evidence for the hypothesis at all. As we'll see, the composition of such a sample could easily fail to match that of the population as a whole, which means we could easily have observed E even if H were not true.

Selection effects In the previous chapter, we saw how selection effects can render our evidence unreliable. This is especially apparent in cases where we are drawing statistical inferences. Suppose I want to know what proportion of cars in the country are red. I look out the window and watch until a thousand cars go by, and 6% of them were red. Can I now be confident that roughly 6% of the cars in the country are red? In this case, the problem is not the size of my sample; the problem is that I am only looking at cars on a particular road in a particular town. This area could easily have a higher or lower proportion of red cars than the country as a whole. Perhaps fancy cars are more likely to be red and this is a particularly wealthy part of town. Or perhaps convertibles are more likely to be red, and this area is especially sunny and likely to have a high proportion of convertibles. Or perhaps younger people are more likely to own red cars, and this area has

a higher proportion of young people. There are plenty of possible selection effects that could render my evidence misleading. When we create a selection effect as a byproduct of the manner in which we choose to sample a population, that is called sampling bias. To see exactly why this is problematic, remember the strength test for evidence, which asks about the ratio between P( E | H ) and P( E | ~H ). In this case: E = 6% of cars in my sample are red.          H = roughly 6% of cars in the country are red. There's one problem, though: it could easily be that 6% of the cars in my sample are red even the national percentage of cars that are red is very different. For example, if folks around here are much more likely to own red cars than people in the rest of the country, then my sample will likely display a much higher percentage of red cars than the country as a whole. In other words, I'll probably observe something in my sample that does not hold true in the larger set. So the probability that 6% of cars in my sample are red is not a good indication that 6% of cars in the country are red. In other words, my evidence bears little relationship to whether the hypothesis is true: E could easily be true if H is false, and E could easily be false if H is true. So the values of P( E | ~ H ) and P( E | H ) are too close to each other. We want to design experiments where these values will be as far apart as possible: where we either get evidence that is far more likely to be observed if H is true than if it is false, or we get evidence that is far more likely to be observed if H is false than if it is true. Observations that are about equally likely to occur either way are useless.

Sample size Another clear pitfall is generalizing from a sample that's too small. To see why this is problematic, we need to return to the strength test. Suppose we have discovered a population of ravens on a remote island. We notice that at least some of them have a genetic mutation that distinguishes them from ravens on the mainland. To keep things simple, suppose we have narrowed the possibilities down to just two hypotheses: either all the ravens on the island have the mutation, or only 80% of them do. First, we randomly sample five ravens, and they all have the mutation. That evidence supports the hypothesis that all ravens on the island have the mutation, but not very strongly. probability of this result if all ravens on the island have the mutation = 1 probability of this result if 80% of ravens on the island have the mutation = about .33

(For this chapter, you don't need to know how to work out these probabilities.) This means our sample gives us evidence with a strength factor of 3, which is very weak. We should try to get evidence with a higher strength factor: we want the values for P( E | H ) and P( E | ~H ) to be far more different from each other than these are. Importantly, this doesn't mean we want to design a test where P( E | H ) is much higher than P( E | ~H ): we aren't setting out to confirm H rather than ~H. What we want is for our test to maximize the difference between P( E | H ) and P( E | ~H ), because that will mean we've observed strong evidence for one of our two hypotheses. Next, we sample 20 ravens. Suppose all of them have the genetic mutation. probability of this result if all the ravens have the mutation = 1 probability of this result if 80% of the ravens have the mutation = about .01 Now we're getting somewhere! This sample gives us strong evidence in favor of one hypothesis over the other. More precisely, our evidence has a strength factor of about 100. And this is true even if the total number of ravens on the island is in the millions: the key is not that we've sampled a high proportion of the ravens, but that we've given ourselves 20 separate chances to randomly pick a raven without the mutation [1]. If one in every 5 ravens lacks the mutation, we really should have found one after randomly sampling the population 20 times. Also, this is true regardless of how many ravens there are in total. (Of course, we can make our sample much stronger still if we can sample another 20 ravens. If they all possess the mutation as well, the strength factor of our evidence rises to about 10,000—again, regardless of how many ravens there are in total!)

The law of large numbers In short, it might not seem like there's a huge size difference between a sample size of 5 and a sample size of 20—especially if we're sampling from millions of ravens—but there really is! Which sample size we use makes an enormous difference to the strength of the evidence we get. The general principle here is that the larger our random sample from a population, the more evidence it offers regarding that population. And that's because the larger our sample, the more likely it is that its proportions closely resemble those of the population as a whole. This is called the law of large numbers. Consider the age-old coin toss. The true proportion of heads out of all the coin tosses in the world is 1/2. But as you sample coin tosses (say, by tossing a coin yourself), your initial results will tend to be very

different from 1/2. For example, after three tosses, the closest you can get is 1/3 or 2/3 heads. You could also easily have all tails or all heads. But the more tosses you make, the closer your sample will tend to approach the true distribution in the world. After dozens of coin tosses, the proportion of heads in your sample will almost certainly be very close to 1/2. More generally, this means that if you have several collections of samples, the most extreme proportions will tend to be from the smallest collections. This applies to small collections from the larger population that you might not automatically consider to be "samples." For example, suppose we are studying a rare disease. We are wondering if it is randomly distributed throughout the population or if there is something about living in a particular area that affects one's chances of getting the disease. So we survey health data for all the counties in the US and look for the highest and lowest rates of the disease. If nothing about living in a particular area affects one's chance of getting the disease, then we should expect the disease to be randomly distributed across the counties. In other words, as far as the disease is concerned, counties are effectively random samples of the population. So should we expect every county to have roughly the same percentage of people with the disease? No! Some counties only have a few hundred people, while others have millions. So, for example, if the national rate of the disease is 1 in 1000, we should expect some very small counties to have no one at all with the disease, and other very small counties to have two or more people with the disease just by chance. Both kinds of very small counties will have rates of the disease that differ vastly from the national average. So even if the disease is completely randomly distributed, we should expect very small counties to have the very highest and lowest rates of the disease. And the larger counties get, the more closely we should expect them to reflect the true national rate, simply because they are larger samples. Similarly, if you survey all of the hospitals in the country and look for those with the highest and lowest survival rates, you'll probably find them both at very small hospitals. And if you survey all the schools in the country and look for the highest and lowest test scores, you will likely find them both at very small schools, and so on. If you only look at the very highest test scores or the very lowest, you may get the impression that small schools are especially good or bad at teaching students, merely because of the law of large numbers.

Section Questions 6-1 Two major problems to avoid with statistical generalization are (i) selection effects and (ii) small sample size. These two problems have something in common—namely that each one, all by itself, can make it ...

A

too likely that we would have made this observation even if our hypothesis was false

B

too likely that we are making an error due to the law of large numbers

C

too likely that our method of selecting samples is systematically skewing the type of individual who gets sampled due to a sampling bias

D

too likely that we are making a statistical instantiation rather than a statistical generalization

6-2 Imagine that you are investigating ways of reducing the number of stray dogs in cities. You get solid data on stray animals for every city and find that most of those with the very lowest rates (per capita) of stray dogs are very small cities. You're tempted to conclude that small cities must do a better job of handling stray animals, but first, you decide to check something: might this result simply be due to the law of large numbers instead? Which of the following methods would most help you answer that question?

A

check to see if the cities with the highest rate of stray dogs are also among the smallest cities

B

ensure that your method of identifying cities does not create a selection effect

C

make sure you are looking at a large enough number of cities

D

it can't be the law of large numbers, because that would only affect large cities

6.2 Better samples As we've seen, samples provide strong evidence for generalizations only when they are big enough and identified in a way that avoids selection effects. Together, these features of a sample maximize the probability that it actually reflects the population as a whole. Let's take a look at how we can achieve this in practice.

Big enough When it comes to samples, how big is big enough? Well, that depends on how precise we need our conclusion to be—that is, how narrow we need our margin of error to be. Suppose we are randomly polling the US population to see if people approve of a particular policy. A typical national opinion poll has a sample size of about 1,000 people. Suppose we survey a sample of that size and find that exactly 65% of the people in our sample approve of the policy. Now consider two different hypotheses: Exactly 65% of the US population approves of the policy Between 60% and 70% of the US population approves of the policy. Even though the evidence is more likely given the first hypothesis, it actually supports the second hypothesis much more strongly. And that's because of how the strength test works. We would be far more likely to see this value in our sample if the true value is between 60% and 70% than if the true value is below 60% or above 70%. If the true value were that far away, it would be virtually impossible for our large random sample to have an approval rate of 65%. On the other hand, while the evidence we obtained is pretty likely if the true value is 65%, it's also fairly likely if the true value is only one or two percentage points away. If we call the first hypothesis H, P( E | H ) and P( E | ~H ) are just too close for E to provide very strong evidence for H. In short, it's easier to get strong evidence for wider ranges. This means that the more precise our conclusion needs to be, the larger the sample size must be support that conclusion. According to a fairly arbitrary convention, surveys should aim for a 95% confidence interval with a 3% margin of error. Roughly, this means that if the true value were outside this interval, a sample like this would yield this value

only 5% of the time. (This is not exactly the same as saying that we can be .95 confident that the true value lies in this interval, but it's close enough for our purposes [2].) There are some standard equations that allow us to identify how large a sample size we would need in order to achieve our desired confidence interval and margin of error. But people are often surprised that, when it comes to an opinion poll of the entire US population, a sample of only about 1000 people will give us a 95% confidence interval with a 3% margin of error. (Note that when surveys are labeled with a margin of error but no confidence interval, the confidence interval is usually 95%.)

Sampling methods We've seen that a large enough sample that is genuinely random is very likely to reflect the target population. But effective procedures for obtaining genuinely random samples are very hard to find. For example, how would we go about randomly sampling cars to find out what proportion of them are red? We could close our eyes and throw darts at a map, then visit each place that was hit by a dart, find the nearest road, and sample the first 100 cars we see on that road. That might sound like a pretty good method at first. But notice that it would over-sample cars on roads in sparsely populated areas, which are more likely to be near a dart, and for all we know, cars in those areas are more or less likely to be red than cars in general. Here's another method: we find an alphabetical list of all the roads in the country, randomly select roads, and then sample the first 100 cars from each road. But even this would over-sample cars on lowtraffic roads, because we are randomly selecting roads rather than cars, and a given car on a low-traffic road is far more likely to be sampled than a given car on a high-traffic road. We could try to fix this problem by sampling an hour's worth of cars from each road rather than the first 100 cars. But there's still a problem: the whole method of sampling cars on roads will over-sample cars that are driven a lot. If red cars spend more time in garages, they're less likely to be sampled by our procedure. The more we think about the range of possible selection effects, the more we realize the difficulty in selecting a sample that's truly immune to them. What can we do if no sampling procedure can truly guarantee randomness? One approach that can help with sampling bias is to try to directly ensure that our sample is as representative as possible. This means that the sample has the same sort of variety as its target population, in respects that we think might affect the probability of possessing the feature being studied. For example, suppose we are concerned that our sample of cars might have too many convertibles, because convertibles are more likely to be

red. If we happen to know what proportion of cars across the country are convertibles, we can simply structure our sample so that it ends up with the same proportion of convertibles. If 2% of cars in the country are convertibles, we can sample convertibles until we have 2% of our sample and then sample only non-convertibles for the remaining 98% of our sample. This approach—called stratifying a sample—helps eliminate some potential sources of sampling bias. In a survey, for example, if we want to know what proportion of people in the country approve of policy X, and we think people's age or gender might affect their reaction to policy X, it would be a good idea to make sure that our sample matches the country in terms of age and gender. Of course, we can't guarantee that our sample matches the overall population in every relevant respect. There may be relevant subgroups we haven't considered—it might not have occurred to us, for example, that wealthy people are more likely to approve of policy X than poor people. Or there may be subgroups whose proportion in the population we don't independently know. So just because we are stratifying our sample doesn't mean we can forget about randomization and simply sample whatever cars are convenient. If there's a relevant subgroup that we've forgotten—and there often is—then sampling local cars will still leave us with a sample that is unrepresentative. The solution is to randomize as much as possible within subgroups even if we are stratifying. For example, if we are sampling convertibles, sedans, and SUVs separately, we should still try to take a genuinely random sample of each kind of car. We can't just use local cars, even if we can't think of any reason why they would be unrepresentative. In short, stratification supplements—but does not substitute for—randomness.

Survey pitfalls If we are considering the results of a voluntary survey, it's important to be aware of any selection effects that can arise from who is willing to take the poll. Due to such participation biases, a sample may fail to be representative even if the initial selection of potential participants was perfectly random. For example, suppose we decide to knock on doors at random but not everyone is home, and not everyone who answers the door is willing to take our survey. In that case, our sample will likely overrepresent people who are retired, those who work from home, and those who have very strong views on the survey topic. (Generally speaking, participation bias in surveys does skew towards people with strong opinions.) We could try to increase the proportion of people who are willing to respond by offering cash for participation, but this would still skew the results unless it's enough to convince

everyone to participate; any fixed amount of money will represent a greater incentive to some than to others. Even if there's no selection effect arising from who responds and who doesn't, the way in which people respond to the poll can be systematically skewed. This kind of effect is called a response bias. For example, people may pretend to have a certain belief or opinion in order to avoid admitting ignorance; they may not be completely truthful if the truth is embarrassing; or they may give responses that are socially acceptable rather than true [3]. Relatedly, the exact wording of a poll can often strongly affect responses, even if the difference seems quite subtle. For example, a poll asking people about their view on the "death tax" is likely to get a much more negative result than one about the "estate tax," even though those terms refer to the same thing.

Section Questions 6-3 Imagine want to run a survey on people's attitudes in the US towards nuclear power, so you send a survey to 5000 people from a randomly generated list of people in the US. About one-third of them respond. You tally up the data and find that almost everyone in the sample has made up their minds on the issue: only 10% answered "unsure". About 90% of U.S. residents, you conclude, have made up their minds on nuclear power. The clearest concern raised by this method as described is:

A

failure to stratify the sample

B

insufficient sample size

C

sampling bias

D

participation bias

6-4

In general, the narrower we need our margin of error to be (for a given confidence interval)...

A

the less important it is to stratify our sample

B

the larger our sample size needs to be

C

the less precise our conclusion will be

D

the more confident we will be that the true value lies in that margin

6.3 The big picture Once our sample is sufficiently large and representative, and we know which generalizations it supports, we can start building a picture of the population as a whole. However, when faced with a large amount of statistical data, we may need to filter some of it out so we can understand or communicate the general shape of things. This calls for summary statistics. By its nature, a statistical summary will have to omit some of the facts. This means we can't avoid answering two difficult questions: what features of the data are most important to us? what's the clearest way to present those features? Since these questions can be answered differently, representations of a given body of data can vary widely. Indeed, by selectively choosing what to report, it's often easy to summarize data with claims that are true but highly misleading. At the same time, because statistical summaries tend to be precise and quantitative, people are often unaware of how deceptive they can be.

Measures of centrality Here's a simple instance. When thinking about a group of people or things, we're often interested in what they are typically like, or what they are like on average. Of course, these terms can be used in different ways. If we're interested in a quantitative feature of the objects—one that we can measure using numbers—there are several so-called "measures of central tendency" that we can use. We get the

arithmetic mean by adding up all the values and dividing them by the number of values (we'll just call this the "mean"). We get the median by ranking all the objects and taking the value of the object that's halfway down the list. And lastly, the mode is obtained by asking which value shows up most often. These can give us very different results. So which measure of central tendency is best? Well, that depends on what we want to know. Suppose a number of birds lands are sitting on a telephone line and we want to estimate how much weight the line is carrying. In that case, it would be most useful to use their mean weight. With that, we can easily estimate the total weight as long as we know how many birds there are. On the other hand, if we want a sense of how heavy a typical bird is, the median would be most suitable. (Imagine a case where there are many starlings and a single pelican: the average weight might be quite different from the actual weight of any of the birds.) To take another example, suppose we are interested in the typical wealth for people in a large meeting room, and the Jeff Bezos (the world's richest man) just happens to be present. His wealth will skew the mean so heavily that we'll end up with very little sense of how much wealth anyone else has. (His data point is an outlier in the sense of being very distant from the central tendency.) In that case, we'd want to use the median or a truncated mean, which is the mean of all the values that are not outliers. (The median and truncated mean are both highly resistant to being skewed by outliers.) On the other hand, if everyone in the room has pledged a certain percentage of their wealth to our cause, and our goal is to estimate how much money we will receive, we wouldn't want to ignore all the money we'd get from Jeff Bezos—so it would be more useful to use the mean. As another example, suppose there are seven kids, with candies distributed as shown on this graph. Most have two or three candies, but one kid is very lucky and has 22. So what's the best way to summarize this data with a single number? The mode is 2, but that's not very helpful. The mean is about 5.7, which doesn't seem like a very central number; it gives too much weight to the one kid with the most candies. The median of 3 seems like a better measure since it resists the outlier. Now what happens if we take away three candies? Here are two ways we could do that: We take three candies from the kid who has the most We take one candy each from the three kids who have the fewest

These two methods feel very different in terms of their effect on the central tendency of how many candies each kid has. The first method only takes away a small proportion of one kid's candies. But the second method takes fully half the candies away from three kids. So intuitively, it seems like the second method has a greater effect on the central tendency. But the mean behaves the same either way: it drops to 5.3. The mean pays no attention to how the candies are distributed. How mean! (Meanwhile, the median doesn't change at all in either case, because it pays no attention to what happens to the kids on either side of the middle kid.) Luckily, there is a measure of centrality we can use if we want to treat these candy cases differently: the geometric mean. Rather than adding up the values and using the number of values as a divisor, we multiply the values and then use the number of values as a root. The original geometric mean of candies is about 3.7, which seems like a good central value of how many candies each kid has. But the manner in which the candies are distributed also matters a great deal. So if we take all three candies from the kid with the most candies, the geometric mean barely changes: it only drops to 3.6. This reflects the fact that losing three candies was not much of a change as a proportion of that kid's candies. But if we take one candy each from the three with the fewest, the geometric mean plummets to 2.7. This reflects the fact that we took half of their candies away, which is a big deal to them. In other words, changes in the geometric mean are a good way of tracking changes in proportions within each value. (In fact, taking half of any one kid's candies will have the same effect on the geometric mean.) Why might this be useful in the real world? Here's one example. Suppose you give someone $5,000. How much would that increase their well-being? The answer is: it depends a lot on how much money they already have. It is a well-established finding in social psychology that, while people's reports of wellbeing increase with income (at least up to an income of $80,000 or so), the increase is not linear. [4] This means that the richer someone is, the more money it takes to increase their well-being further. If someone starts out with an income of $5,000, getting an additional $5,000 will increase their well-being about ten times more than if they started off with an income of $50,000 and got an additional $5,000. Now, suppose we wanted to measure the overall effect of an economic policy on people of varied incomes. If we report an increase in the mean income, we have no way of figuring out how much difference that increase is actually making to people's lives. But if we report a change in the geometric mean of income, we are giving a much more informative answer to those who are interested in how the policy actually affects people's well-being.

The shape of the data

Sometimes none of these measures is very useful for summarizing statistical facts. For example, consider how testicles and ovaries are distributed. The average person (in the sense of arithmetic mean) has one testicle and one ovary, while the median person has two testicles and zero ovaries, because there are slightly more males than females. (And unfortunately the geometric mean goes to zero any time there are values of zero!) Interestingly, none of these measures is very useful in this case, because there is no single central tendency. It's the shape of the data that matters: testicles and ovaries are distributed in a very specific way that these measures overlook. Even if we have a decent way to summarize the data with a single value, it's often best to just illustrate the shape of the data visually. For example, suppose we plot the number of candies per person along the y-axis and the number of people with that many candies along the x-axis. The graph below shows some different ways in which candies might be distributed over numbers of people. It illustrates that different distributions of candy can share a central tendency while varying greatly in terms of how unequal they are. Looking at the mean or median number of candies won't help us to distinguish these shapes. Is there a single measure that can give us a sense of these differences? Well, in the green scenario, most people have about the same number of candies as each other, while in the orange scenario lots of people are on the extremes. In other words, people in the green scenario are typically much closer to the average. We can quantify this idea by measuring how far each person is from the overall mean, and then taking the mean of those values. This is a rough first pass at the idea of a standard deviation. (For technical reasons, standard deviation is actually calculated by first squaring the differences, and then taking the square root of the mean result, but that's beyond the scope of this text.) The standard deviation also allows us to refine our definition of an outlier: for example, we can say that an outlier is any value that is at least two or three standard deviations away from the mean.

Misleading presentations Summary statistics present lots of opportunities for spinning facts. For example, if the rich are getting richer but no one else is, we can still say that the average person is getting richer—if we are referring to the mean. When we have a story to promote, there's often a handy statistic we can use that is technically true but misleading. Even national news organizations often present graphs that are highly misleading: a quick search online for "misleading news graphs" will turn up many examples!

Even graphs and charts can be highly misleading. A standard way to skew how data looks on a page is to truncate one of the axes (usually the y-axis). For example, look at these two ways of displaying the very same set of values:

The graph on the left truncates the y-axis at 30,000, making the four values look much more different from each other than they really are. Graphs are often deliberately truncated in this way to give a misleading impression about a change in values. Even worse, you may occasionally see a graph with no labels on the truncated axis, making it impossible to tell that the graph is truncated. (Such a graph is beyond misleading, and should be considered wholly inaccurate.) This doesn't mean it's always misleading to truncate an axis. For example, if someone has a fever and we are monitoring small changes in their temperature, it wouldn't make sense to use a chart where the temperature axis goes all the way down to zero. There is no risk of misleading people in this case, and using such a chart would make important changes difficult to see. At the other end of the scale, it's possible to minimize the drama of a truly dramatic trend by showing more of the y-axis above the data than necessary. For example, the two graphs below show exactly the same trend, but one makes it appear far more extreme:

Another devious technique for making graphs fit the story you want to tell is to cherry-pick the data. One TV news network wanted to push the narrative that gas prices had increased steadily over the previous

year, when in fact prices had vacillated widely with no overall trend, as shown on the left below. Using the same data and covering the same period as the graph on the left, the network simply cherry-picked three points and drew straight lines between them, creating the appearance of an upward trend [5]. I've reconstructed their graph on the right below. Notice that it also treats the intervals between the points as equal, even though one is "last week" and one is "last year"!

By mendaciously cherry-picking the data and using an inconsistent scale on the x-axis, the network managed to take a meandering but largely downward trend over the year and make it appear to millions of viewers like a solid upward trend—while technically not lying.

Section Questions 6-5 Suppose we are summarizing data about a numerical trait of individuals. A good reason to present data with a measure of central tendency that's resistant to outliers is...

A

To provide a number that is useful if we intend to combine the values and estimate a total

B

To indicate the overall shape of the data

C

To indicate how many standard deviations from the mean a data point is located

D

To provide a sense of what value the trait has for a typical individual

6-6 Give the geometric mean of these values, rounded to two decimal points: 3, 5, 5, 6, 10

No correct answer has been set for this question

6.4 Thinking proportionally As we go through our daily lives, we are constantly making generalizations. And we usually don't have the time or inclination to ensure that our samples are random and our conclusions properly summarized. Instead, we tend to just encounter examples of a larger group and quickly form general impressions about that group. Not only are these generalizations often poorly supported, but they can be so unclear that we're not sure exactly what we believe. We even have a hard time keeping track of the difference between Most Fs are G and Most Gs are F.

Loose generalizations Imagine walking down a city street past a billboard that shows two happy and carefree people wearing a certain brand of clothing. The goal of the ad is to elicit an automatic generalization: those people wear the brand and they are happy and carefree, so people in general who wear the brand are happy and carefree. This creates a subconscious urge to buy clothes from that brand. At a conscious level, of course, we know that the billboard provides zero evidence for this generalization. The people on it are just models —or if not, they're a tiny and very carefully selected sample of people who wear the brand. It would be absurd to treat them as evidence about people who wear the brand in general. But our System 1 is not very good at taking such considerations into account. That's why these kinds of billboards exist. Many of the associations that we have hanging around in our heads are extremely fuzzy. For example, suppose I associate being Canadian with being polite. But I'm not sure whether I think the majority of Canadians are polite, or a surprising number of them are polite, or maybe just more of them are polite than other nationalities. This is a loose generalization that I might express by saying "Canadians are polite" or "Many Canadians are polite."

When a collection of loose generalizations about a social group is widely held, we call it a stereotype. My idea that Canadians are polite happens to a stereotype, but not every loose generalization is. (Some people think that Canadians are impolite; that's also a loose generalization but it's not widely held so it's not part of a stereotype of Canadians.) "So, what's the problem with stereotypes?" some will people ask, "Aren't they often accurate?" In fact, the question of their "accuracy" is a complex matter. For one thing, we know that stereotypes are highly influenced by in-group bias, the tendency towards positive beliefs about members of our own groups, and negative beliefs about members of groups we don't identify with. But even setting that kind of bias aside, stereotypes are simply loose generalizations, and this means they are often too unclear even to assess for accuracy. They often simultaneously convey some truths and some falsehoods. For example, what exactly does "Canadians are polite" mean? Does it mean that most Canadians are polite? It can certainly convey that most Canadians are polite, but that's not exactly what it means. Suppose that 60% of people from every other country are polite and 51% of Canadians are polite. Then "most Canadians are polite" would be true but "Canadians are polite" seems false. So maybe it means that among Canadians, a higher proportion of people are polite? This doesn't seem exactly right either. Suppose that 2% of Canadians are polite, and 1% of people from other countries are polite. In that case, "Canadians are polite" seems wrong. It appears that the sentence "Canadians are polite" is just too slippery for us to nail down exactly what it means. If two people disagree about whether the sentence is true, that might just be because they're not using the words in the same way, not because they disagree about the proportion of Canadians who are polite. In that case, the disagreement is a bit silly: the problem is that the generalization "Canadians are polite" is too unclear to assess for accuracy. Worse yet, unclear language is particularly open to manipulation by speakers whose goal is to deceive. As we saw in Chapter 2, bare plurals are particularly good examples. Studies show that claims of the form Fs are G frequently convey that the vast majority of Fs are G. Even worse, the sentence Fs are G is often interpreted as conveying that there is a causal relationship between being F and being G [6]. This means that speakers can use bare plurals to sneak in suggestions about causation while being able to retreat to mere correlations if they are pressed. For example, someone can say "Homeless people are alcoholics" in order to convey that the vast majority of homeless people are alcoholics, or that alcohol dependence is the primary cause of homelessness. These claims are both false. But if challenged for evidence, they can always pretend that all they meant was that some homeless people are alcoholics, or that homeless people have a higher

rate of alcoholism than the general population. So the sentence expressing the stereotype is far too murky to even be assessed for accuracy. But in a way it's worse than inaccurate, because it acts as a kind of Trojan Horse that can smuggle false ideas into people's minds under the cover of all that murkiness. Another favorite way to express a loose generalization is to use the word many. If I say "Many Fs are G," this can convey that an especially high proportion of Fs are G, or that there is a special connection between being F and being G. But if challenged I can always pretend that I meant more than several. (After all, we can say, "Many Americans died in the outbreak" even if it was only a tiny proportion of all Americans, and other nationalities had rates that were just as high.) In other words, loose generalizations can be used to convey an idea while not committing yourself to it. Think of how often in public discourse we hear claims about men or illegal immigrants or minorities or old people. When people express themselves with loose generalizations, often they're just using muddled language to give voice to muddled thinking. But sometimes they are using a shady rhetorical trick to deniably communicate a claim for which they lack good evidence.

Representativeness heuristic One of the most important pitfalls uncovered by cognitive science is the representativeness heuristic. As you'll recall, a cognitive heuristic is a shortcut: rather than answering a hard question, our brains substitute an easier one. In the case of the representativeness heuristic, we're faced with a question about the statistical relationship between two or more features. Rather than answering that question, though, we ask ourselves about the strength of our mental association between those features. For example, we can ask two very different questions: What proportion of F things are G? (or, how likely is it that a random F thing is G?) What proportion of G things are F? (or, how likely is it that a random G thing is F?) Unfortunately we have a tendency to replace both questions with: How closely do I associate the two features? In other words, we often end up using the strength of a symmetric mental association to answer a nonsymmetric question regarding proportions.

Let's consider an example. Is a random student more likely to be a Persian literature major or a business major? (That is, assuming your school even has a Persian literature major!) When asked that pair of questions, you probably see immediately that the answer should be "business major" because there are so many more business majors than Persian literature majors. (There is a higher base rate of business majors.) Now what if I describe the student as a thoughtful poetry lover and then ask which major that person is more likely to have? In that case, people tend to forget entirely about the difference in base rates and answer the question solely based on the strength of their association between the two majors and being a thoughtful poetry lover. But the base rate still matters a great deal. For example, even if all the Persian literature majors are thoughtful poetry lovers, and only 1 in 20 business majors is a thoughtful poetry lover, it could still be that a random thoughtful poetry lover is far more likely to be a business major, as illustrated by the graph on the left. The point is that the following two questions are very different: what proportion of thoughtful poetry lovers are a Persian literature majors? what proportion of Persian literature majors are thoughtful poetry lovers? We should be answering the first question and not the second. But instead of carefully distinguishing between these questions, it's easier to just ask "How closely do I associate the two features?" Here is a similar example from the literature [7]. Subjects were asked to read the following description of an individual: Linda is thirty-one years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in antinuclear demonstrations. Subjects were then asked to put a list of statements in order of probability, including: Linda is a teacher in elementary school. Linda is active in the feminist movement. Linda is a bank teller. Linda is an insurance salesperson.

Linda is a bank teller and is active in the feminist movement. The description of Linda, of course, was intended to bring up a stereotype that subjects would associate with the feminist movement, and not with being a bank teller. And this is exactly what happened: the vast majority answered the probability question in terms of the strength of association they felt. But this led to a clear error: more than 85% ranked Linda is a bank teller and is active in the feminist movement as more probable than Linda is a bank teller. What's the problem with that? Well, take a thousand people who match Linda's description. Maybe a very high proportion of them are active in the feminist movement, and very few are bank tellers. Let's say there are only three bank tellers. In that case, how many are bank tellers who are active in the feminist movement? It can't be more than three, because there are only three bank tellers total, and there can't be more activist bank tellers than bank tellers! (That is, the overlap between yellow and orange can't be bigger than the entire yellow area!) In other words, we should be making the first comparison, and not the second: proportion of matches who are tellers vs matches who are activist tellers proportion of tellers who are matches vs. activist tellers who are matches But rather than get clear about the proportions, our brains just ask: does Linda seem more like a bank teller or an activist bank teller? Here is a further example: suppose I'm a physician and a patient comes in with a set of symptoms that are exhibited by everyone with rare disease X, but are sometimes also caused by a common condition Y. The patient's symptoms are a better match for disease X but it would wrong to use the representativeness heuristic to conclude that she probably has disease X. (Specifically, suppose 20% of people with Y have these symptoms, and 100% of people with X have these symptoms. If Y is 10 times more common than X in general, then it's still twice as likely that someone with these symptoms has Y! We will look at the formula for working out the numbers in a later chapter.) The representativeness heuristic can cause us to neglect the base rate—that is, how common the two conditions are to begin with. This is why there's a common saying among doctors: "If you hear hooves, think horse and not zebra." Even though the hooves of horses and zebras sound the same, horses are much more common to begin with (unless you're in a zoo or the grasslands of Africa).

Section Questions 6-7 True or false: every loose generalization is a stereotype.

A

True

B

False

6-8

Multiple answers: Multiple answers are accepted for this question

Which of the following can be conveyed by a claim using bare plurals like "Fs are G?" (Choose every correct answer.)

A

The vast majority of Fs are G

B

Fs are more likely to be G than other things/people are

C

Most Fs are G

D

Being F is causally related to being G

6-9 Suppose I learn from an excellent source the true statistical fact that teenage drivers are more likely to be killed in car accidents than other people. For this reason, the next time I hear about a death in a car accident, I think it was

probably a teenager who was killed. Which sentence best characterizes this inference?

A

My conclusion that the accident probably involved a teenager is a loose generalization

B

The inference is well-supported because I am thinking proportionally by comparing the proportion of teenage drivers who are likely to be killed in car accidents with the proportion of other people who are.

C

It's a well-supported statistical instantiation.

D

I'm falling prey to the representativeness heuristic.

6-10 Chapter question: Suppose I learn from an excellent source the statistical fact that 35% of homeless people have alcohol dependence. I conclude that there is roughly a 35% probability that a randomly selected homeless person has alcohol dependence. What kind of reasoning is this?

A

representativeness heuristic

B

loose generalization

C

statistical generalization

D

statistical instantiation

Key terms: Base rate: the overall proportion or probability of a feature in general or in the population at large. Central tendency: a value meant to summarize a set of observations by reporting the typical, principal, or middle point in the distribution of value(s) observed in the dataset. Measures of central tendency include the arithmetic mean, truncated mean, median, mode, and geometric mean. These are defined

in §6.3. They differ with respect to their resistance to outliers, among other things. For example, if we are concerned about something like the average productivity of a large group, because we're interested in what the group can achieve taken together, we may not want to be resistant to outliers. So in this case, the arithmetic mean would be appropriate. But in other contexts, we might be concerned about something like the wealth of a normal person in a group where a very small number of people have enormous wealth. Here median is likely a better measure. Confidence interval: roughly, the interval such that there is a low probability (less than .05, in the case of a 95% confidence interval) that if the true percentage in the population were outside the interval, a sample like this would have yielded the value it did. The size of this interval in either direction from the given value is called the margin of error. Convenience sample: a set of observations that is small and carelessly selected. Such samples generally do not provide much evidence for hypotheses because small samples can easily fail to match the population as a whole. Moreover, the examples that are conveniently available to us tend to be subject to selection effects. Heuristic: a cognitive shortcut used to bypass the more effortful type of reasoning that would be needed to reach an accurate answer. Heuristics are susceptible to systematic and predictable errors. Law of large numbers: the larger a sample, the more likely it is that its proportions closely reflect those of the population as a whole. Consider tossing a coin. After only a few coins, you may have a very high or low percentage of heads. However, it would be extremely unlikely to have even 2/3 of coin tosses come up heads after many tosses. Instead, the series start to converge on the true distribution of heads and tails in the "population of coins as a whole," which is 50/50. Loose generalization: when we associate one kind of thing or person with an attribute but we are unclear what proportions we take to be involved. For example, we might believe that Canadians are polite without having much sense of what this means, statistically speaking. Loose generalizations can be expressed using bare plurals (see Chapter 3) or with "many" as in "Many Canadians are polite". Margin of error: see confidence interval. Outlier: an observation that is very distant from a dataset’s central tendency, conventionally three standard deviations. For more on resistance to outliers, see central tendency. Participation bias: a selection effect arising from differences in the target population with regard to willingness to participate in a survey. Those who choose to respond might be importantly different from

those who choose not to respond. For example, those with strong opinions and who are less busy are more likely to take part in a survey than those who lack strong opinions or who are busier. Representative sample: a sample that has the same sort of variety as its target population, in respects that might affect the probability of possessing the trait being studied. That is, the proportion of every relevant subgroup in a sample matches the proportion of that subgroup in the overall population. A relevant subgroup is any subgroup defined by a feature that there is reason to think might be correlated with the property you are studying. For example, if one is conducting a national poll about political views, the sample should have a proportion of Democrats and Republicans that matches the proportion in the population as a whole. Likewise for any demographic feature that might be correlated with the views being surveyed. Evidence from an unrepresentative sample will have a lower strength factor, as these observations may still be very likely even if the target population does not match the sample. Representativeness heuristic: A heuristic used to answer questions about the statistical relationship between two or more features. Rather than answering that question, when using this heuristic, we ask ourselves about the strength of our mental association between those features For example, we might be wondering how common a feature F is among individuals that are G and, instead of answering that question, we determine how closely we associate being F with being G. Response bias: an effect whereby responses reported by respondents to a survey differ from their true value due to their beliefs or expectations about what answers are expected of them. For example, respondents may not know the answer or may lack an opinion, but be embarrassed to say so. Another example is when respondents believe that a particular answer is expected. For these reasons, it is especially important to avoid loaded questions, which can skew participants’ expectations, and then skew results. For example, referring to a “death tax” as opposed to an “estate tax” will likely lead to more negative participant responses, which may not reflect participants’ underlying opinions. Sampling bias: a selection effect in a sample created by the way in which we are sampling the population. Statistical generalization: an inference made about a population based on features of a sample. Statistical inference: an inference that uses specific observations as evidence for general claims of which they are instances—or vice versa. Statistical instantiation: an inference made about a sample based on features of a population.

Stereotype: a widely held loose generalization about a social group. For example, “Canadians are polite.” Stereotypes can pose problems because they are often too unclear to assess for accuracy, they commonly lead to failures of communication, and they are highly subject to in-group bias. Stratified random sampling: a way of trying to achieve a representative sample by ensuring that the proportions of relevant subgroups in your sample match those of the corresponding subgroups within the population as a whole (see representative sample). First, the population to be sampled is divided into subgroups, one for each relevant feature. Then, a simple random sample is taken from each subgroup, with sizes proportional to the population in each group. These simple random samples are then combined to create the complete sample. Summary statistics: the practice of summarizing and reporting statistical data. This involves making decisions about what the most important facts are, and how best to present them.

Footnotes: 1. Indeed, the size of the population makes no difference to the strength factor of the evidence as long as we are letting the ravens go into the wild before taking our next sample. (This is called sampling with replacement.) On the other hand, if we are sampling a sizable proportion of the total population, we can get even more evidence by holding onto each bird after sampling it. As an extreme example, imagine there are only 20 ravens in the whole population. If we sample without replacement, we have all the birds and know for sure how many possess the mutation. But if we sample with replacement, then each time we are randomly sampling from the entire population again. 2. What this is actually giving us is roughly a value for P( E | ~ H ) of less than .05, which is not the same as giving us 95% confidence that the evidence is true after updating on the evidence. This is the difference between the confidence interval and the credible interval, which requires more assumptions to calculate but corresponds more naturally to what almost all nonexperts think the confidence interval actually is. 3. See, e.g., Van de Mortel 2008. 4. See Jebb, Tay, Diener & Oishi (2018); Stevenson and Wolfers (2013); Kahneman & Deaton (2010). See also Chapter 10, section 1. 5. A discussion of that particular graph can be found here, and some similar graphs are discussed here. While internet

searches for misleading graphs tend to produce a disproportionate number from Fox News, misleading charts have been televised on every major network and I know of no careful study regarding whether they are more prevalent on one network than another. 6. See Cimpian et. al. 2010; Cimpian and Erickson, 2012; Rhodes et. al. 2012. 7. See Tversky & Kahneman 1983.

References Cimpian, A., & Erickson, L. C. (2012). The effect of generic statements on children's causal attributions: Questions of mechanism. Developmental Psychology, 48(1), 159. Cimpian, A., Gelman, S. A., & Brandone, A. C. (2010). Theory-based considerations influence the interpretation of generic sentences. Language and Cognitive Processes, 25(2), 261-276. Jebb, Andrew T., Louis Tay, Ed Diener, and Shigehiro Oishi. "Happiness, income satiation and turning points around the world." Nature Human Behaviour 2, no. 1 (2018): 33. Kahneman, Daniel, and Angus Deaton. "High income improves evaluation of life but not emotional wellbeing." Proceedings of the national academy of sciences 107, no. 38 (2010): 16489-16493. Rhodes, M., Leslie, S. J., & Tworek, C. M. (2012). Cultural transmission of social essentialism. Proceedings of the National Academy of Sciences, 109(34), 13526-13531. Stevenson, Betsey, and Justin Wolfers. "Subjective well-being and income: Is there any evidence of satiation?." American Economic Review 103, no. 3 (2013): 598-604. Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment. Psychological Review, 90(4), 293. Van de Mortel, T. F. (2008). Faking it: social desirability response bias in self-report research. The Australian Journal of Advanced Nursing, 25(4), 40.

Image Credits Banner photo of cabinet with many drawers: Image by Pexels licensed under CC0 / cropped from original; Red car: image by Daniel Nettesheim licensed under Pixabay license; Raven or crow: image licensed under Pixabay license; Blueberries: Image by Kristina Paukshtite licensed under CC0; Archer: image by Paul Barlow licensed under Pixabay license; Sunrise or sunset on rural road: image licensed under Pixabay license; Townhomes with colorful doors: image by TuendeBede licensed under Pixabay license / cropped from original; Woman feeding various birds by water's edge: image by George Desipiris licensed under CC0 / cropped from original; Four colorful normal distribution lines on a graph: image by Peter Hermes Furian licensed by Shutterstock.com / cropped from original; Billboards in city: image by Vlad Alexandru Popa licensed under Pexels license; Tent in front of graffiti: image by Alexas_Fotos licensed under Pixabay license / cropped from original. All other images are the authors' own.

Reason Better: An Interdisciplinary Guide to Critical Thinking © 2019 David Manley. All rights reserved version 1.4 Exported for Jason Bao on Tue, 12 Sep 2023 21:21:33 GMT

Exported for Jason Bao on Tue, 12 Sep 2023 21:22:22 GMT

Reason Better An interdisciplinary guide to critical thinking

Chapter 7. Causes

Introduction Our minds are built to perceive causal connections, and we do so continually without even realizing it. But as our world grows increasingly more complex, our penchant for simple causal stories becomes more and more likely to mislead us. As a result, jumping to causal conclusions is one of our most pervasive errors. In this chapter, we'll look at what it takes to get strong evidence about causal relationships. Along the way, we'll clarify what correlations are, and focus on several ways that correlations can look like causal relationships even when they're not. We'll also see how news stories about scientific findings can be misleading, and why the best kind of evidence for causal relationships comes from double-blind randomized controlled trials.

Learning objectives By the end of this chapter, you should understand: why stating a sequence of events often suggests a causal relationship how our proclivity for simple causal stories overlooks complex causal networks what correlations are, and what makes them statistically significant the many potential problems with reasoning from correlations to causal conclusions how to identify reverse causation, common cause, placebo effect, and regression to the mean why we must assess potential causal mechanisms before discarding a chance hypothesis why the strength of the evidence provided by a scientific study depends on its design

7.1 Causal thinking Consider the last time you saw a fight in a movie. One actor's fist launches forward; the other actor staggers back. You didn't make a conscious inference that one event caused the other—that judgment feels almost as automatic as perception itself. You just seem to see the causal connection. In reality, of course, you know that punches in movies are actually just choreographed with little or no actual contact. But knowing this doesn't stop System 1 from making automatic causal inferences. In other words, it's a kind of cognitive illusion: System 1 will continue to infer causation even when System 2 knows it isn't really there. Now imagine a red dot on your computer screen. A green dot moves towards it, and as soon as they touch, the red dot moves away at the same speed and in the same direction. If the timing is right, you can't avoid feeling like you saw a causal connection, a transfer of force. But if, instead, they don't touch and the red dot only moves after pausing for a moment, it feels like the red dot moved itself. Consciously, we know the dots are just pixels on a screen that don't transfer force at all. But we can't shake the sense that we're seeing a direct causal connection in one case and not the other.

An instinct for causal stories

Our perceptual judgments are just one example of a more general fact about our minds: we can't help thinking about the world in terms of causal stories. This tendency is critical to one of our superpowers in the animal kingdom: our ability to understand our environment and shape it to our liking. However, the cognitive processes we use to arrive at these causal stories were honed during much simpler times. Our ancient ancestors noticed countless simple patterns in their environment: when they ate certain plants, they got sick; when they struck some flint, there were sparks; when the sun went down, it got darker and colder. And these things still happen in our world. But our natural bias towards simple causal stories can lead us to oversimplify highly complex things like diseases, economies, and political systems. In addition, our minds are so prone to perceiving patterns that we often seem to find them even in random, patternless settings—for example, we see faces in the clouds and animal shapes in the stars. In the environments of our ancestors, there was a great advantage to finding genuine patterns, and little downside to over-detecting them. (Is that vague shape in the shadows a predator, or nothing at all? Better to err on the side of caution!) The result is a mind that's perhaps a bit too eager to find patterns. For example, if you consider the sequence "2...4...," the next number probably just pops into your head. But hang on—what is the next number? Some people think of 6, while others think of 8 or even 16. (Do we add 2 to the previous number, double it, or square it?) As soon as System 2 kicks in, we realize the answer could be any of these. For a moment, though, the answer may seem obvious. System 1 completes the pattern with the simple earnestness of a retriever bringing back a stick. Even when our observations have no pattern at all, we can't help but suspect some other factor at work. For example, suppose we map recent crimes across a city, yielding something like the picture on the left. Most of us would find it suspicious that there are so many clusters: we'd want to know what's causing the incidents to collect in some places and not others. But in fact this picture shows a completely random distribution. Real randomness generates clusters, even though we expect each data point to have its own personal space. This is called the clustering illusion, and it can send us seeking causal explanations where none exist.

One thing after another

The most rudimentary error we make about causation is to infer that B was caused by A just because B happened after A. This is known as the fallacy of post hoc ergo propter hoc—“after this, therefore because of this." To help you avoid this fallacy, just remember that everyone who commits the fallacy of post hoc ergo propter hoc eventually ends up dead! At some level, we know it's absurd to assume that one event caused another just because it happened first; but it's a remarkably easy mistake to make on the fly. Often the problem is one of communication. A report that two events occurred in sequence is often taken to convey that a causal relationship links them. For example, suppose I say "A fish jumped, and the water rippled." It's fairly clear that I'm suggesting the fish caused the ripples. Now suppose I say, "After meeting my new boyfriend, my grandma had a heart attack." The speaker might just be reporting a sequence of events, but it's still natural for an audience to seek a potential causal connection. As always, this lack of clarity in our language can be exploited. By reporting a sequence of events, you can insinuate a causal relationship without explicitly stating it. This is why politicians like to mention positive things that have happened since they took office, and negative things that have happened since their opponents took office. The message gets conveyed, even if they don't state the causal connections explicitly. They can leave that part up to the automatic associations of their listeners. And if challenged for evidence, they need only defend the claim they were making explicitly: "All I was saying is that one thing happened before the other! Draw your own conclusions!"

Complex causes The causal stories that come naturally to us are often very simple. For example, "The cause of the fire was a match." But in our complex world, many of the things we want to understand don't arise from a single cause. There may be no answer to the question, "What was the cause?"—not because there was no cause, but because there were too many interconnected causes—each of which played a part. In fact, it's rare for anything to have a single cause. When we talk about the cause of an event, we usually mean the one factor that is somehow most out of the ordinary. For example, if a fire breaks out, in most ordinary contexts, the cause is a source of ignition like a match. But there are other contexts where the presence of fuel or even oxygen might be the most out-of-the-ordinary factor. For example, imagine an experiment in a vacuum under extremely high temperatures. If the researchers were depending on a lack

of oxygen to keep things from burning up, then the unexpected presence of oxygen would count as the cause. We can also distinguish between the immediate causes of events and the distal causes that explain the immediate causes. For example, a certain drought might have a clear immediate cause, such as a long-term lack of rain. But it would be useful to know what caused that lack of rain: perhaps a combination of changes in regional temperatures and wind patterns. And these factors in turn may be part of a global change in climate that is largely due to rising levels of greenhouse gases. Each causal factor is a node in a network that itself has causes, and combines with other factors to bring about its effects. The real causal story is rarely ever a simple one.

7.2 Causes and correlations Most errors in causal reasoning are subtler than simply assuming that two events that happen in sequence must be causally related. Instead, we start by sensing a pattern over time: maybe we've seen several events of one kind that follow events of another kind. Or we notice that a feature which comes in degrees tends to increase when another feature is present. In other words, we notice a correlation between two kinds of events, or two kinds of features. From this repeated pattern, we then infer a causal relationship. In fact, this inference is so natural that the language we use to report a correlation often gets straightforwardly interpreted as reporting causation. When people hear that two things are "associated" or "linked" or "related," they often misinterpret that as a claim about a causal connection. But in the sciences, these expressions are typically used to indicate a correlation that may or may not be causal. So how can we tell if two apparently correlated factors are causally related? (Factors include anything that can stand in causal relationships—events, situations, or features of objects.) Correlations can certainly provide evidence of causation, but we need to be very careful when evaluating that evidence. Because there are many ways that we can go wrong in making this kind of inference, it is important to isolate three inferential steps that must be made:

1. We observed a correlation between A and B; 2. There is a general correlation between A and B (inferred from 1); and 3. A causes B (inferred from 2). Because we could go wrong at any step, we should become less confident with each interim conclusion. This means that a good argument of this sort requires very strong evidence at each step. (We'll look at the relevant rule for probability in Chapter 8, but just to get a sense of how inferential weakness can compound, suppose you're 80% confident that the first conclusion is true, and 80% confident that the second is true given that the first is true. In that case, you should only be 64% confident that they are both true. And if you're 80% confident that the third is true given that the first two are true, you should only be about 51% confident that all three conclusions are true [1].)

Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there.'                 —Randall Munroe So a good causal argument from correlation requires that we establish three things with a high degree of confidence: (1) that a correlation exists in the cases we've observed; (2) that this means there is a general correlation that holds beyond the cases we've observed; and (3) that this general correlation is not misleading: it really results from A causing B. We'll go through these steps one at a time, but first it's worth getting clear on exactly what correlations are.

The nature of correlation So what is a correlation, exactly? Here's the definition. There is a positive binary correlation between factors A and B when, on average: A occurs at a higher rate when B occurs than it does otherwise. And if we replace "higher" in this definition with "lower," then there is a negative (or inverse) binary correlation between the two factors. (If we don't specify and just say that two things are "correlated," we mean they are positively correlated.) Three crucial things are worth clarifying.

1. The definition has to do not with absolute numbers but with rates. For example, is there a correlation in the world between being male and owning a cell phone? Suppose we learn that most males own a cell phone and also that most cell phone owners are male. Can we conclude that owning a cell phone is correlated with being male? It's tempting to think the answer is "yes." Understanding why that's the wrong answer is crucial to grasping what a correlation is. To establish a correlation, we need to know whether males own cell phones at a higher rate than females do. But that simply does not follow from the fact that most males own cell phones and most cell phone owners are male. Here's why: Most people in the world own cell phones. So we should expect most males to own a cell phone, even if males and females own cell phones at the same rate. Most people in the world are male (by a small margin). So we should expect most cell phone owners to be male even if males and females own cell phones at the same rate. Putting these two facts together still doesn't give us a correlation, because they could both be true even if males and females own cell phones at the same rate. A useful rule of thumb that can help identify correlations is to ask yourself whether learning that factor A is present provides you with any evidence that factor B is also present. In this case, the two bullet points above don't give us any reason to think someone is more likely to own a cell phone after learning that they are male. 2. Correlation is symmetrical—if it holds in one direction, it also holds in the other. In other words, if A occurs at a higher rate when B occurs than it does otherwise, then B occurs at a higher rate when A occurs than it does otherwise. This may not seem obvious at first, but it's true. Consider an example. Suppose a correlation exists between rainy days and Mondays. This means that rainy days tend to occur at a higher rate on Mondays than they do on other days. Let's say a quarter of Mondays are rainy days, but only a fifth of other days are rainy. (That's a small correlation, but a correlation nonetheless.) Does this mean that a correlation exists between Mondays and rainy days? The answer is yes. It might sound strange to say, "Mondays occur at a higher rate on rainy days than they do otherwise," but it's true in our example. The proportion of rainy days that are Mondays will have to be higher than one in seven, and the proportion of non-rainy days that are Mondays will have to be lower than one in seven. (As it happens, more than a sixth of rainy days will have to be Mondays. If you know

how to work out these values, it's worth spending the time convincing yourself with examples that correlation is symmetrical.) This means that learning that it's a Monday is some evidence that it's rainy, and learning that it's rainy is also some evidence that it's a Monday (if you don't know what day it is). 3. Finally, in the definition above, the term "binary" specifies that the correlation we're talking about has to do with factors we're treating as all-or-nothing rather than as coming in degrees. The rate of a factor simply has to do with how often it's present and absent. For example, in the case above, it's either Monday or it's not, and we're treating whether it's rainy as a simple yes/no question. But some correlations have to do with the degree or intensity of a factor. For example, the height and diameter of trees both come in degrees. And at least on average, the greater a tree's diameter, the greater its height (and vice versa). In this sense, the two features are correlated. But it would make no sense to say that height occurs at a higher rate with diameter, because all trees have both height and diameter. Unlike a binary correlation, which relates all-or-nothing factors, this is correlation is scalar. We say there is a positive scalar correlation between factors A and B when: on average, A occurs to a greater degree when B occurs to a greater degree. If we replace only one instance of the word "greater" with "lesser," then one factor increases as the other decreases, giving us a negative (or inverse) scalar correlation. (There are other possibilities—for example, A is binary and B is scalar—but let's not worry about that here.) Unfortunately, the distinction between binary and scalar correlations is a bit trickier than it seems at first, because the same underlying correlation can often be measured in either a scalar or binary way. For example, although height and diameter are scalar, we could make our calculations simpler by selecting an arbitrary cutoff for what it takes to count as a tall tree and what it takes to count as a wide tree. That would give us two binary features, and we can then report a binary correlation: tallness occurs at a higher rate among wide trees than among non-wide trees. This isn't the best way to measure the correlation (it would be more precise to treat it as scalar), but even scientists take shortcuts.

Illusory correlations We turn now to the first of three ways in which we can wrongly conclude from an apparent correlation that a causal relationship exists between two factors. Recall the three inferential steps from above:

1. We observed a correlation between A and B; 2. There is a general correlation between A and B; and 3. A causes B. The first kind of error is that we're wrong about (1): it only seems to us like A and B correlate in our observed sample. But why might we get the false impression that A and B correlate in our sample? Taking just the case of binary correlation, we may be overestimating the rate at which A occurs along with B in our sample, or underestimating the rate at which it occurs without B in our sample—or both. This might happen for various reasons, such as motivated reasoning, selective recall, and selective noticing. For example, recall the idea that people behave strangely more often when the moon is full. That's a claim about the correlation between two factors. I might think that I have good evidence for that claim because I think a correlation exists between full moons and strange behavior in my experience, and then I can generalize from my experience. The problem is that if I'm subject to selective noticing, I might be wrong about the correlations in my own observations. Maybe the only time I think about the moon hypothesis is when I happen to notice that someone is behaving strangely and there's a full moon. I simply don't notice the times when there's a full moon and no one is behaving strangely, or the times when there's strange behavior and no full moon. As a result, it can feel like the rate of strange behavior is higher in my experience during a full moon, even though it's not. If I were to carefully tally up the occurrences of strange behavior that I observe, along with phases of the moon, I'd see that no correlation actually exists. Another kind of mistake is simply that we fail to think proportionally. For example, suppose we've only observed Bob when it's cold and we notice that he has worn a hat 70% of the time. Can we conclude that there's a correlation in our observations between his wearing a hat and cold temperatures? Of course not! What if he wears a hat 70% of the time regardless of the temperature? In that case, there's no special correlation between his hat wearing and the cold: he just loves wearing hats. If we are told, "Most of the time when it's cold, Bob wears a hat," it's easy to forget that this is not enough to establish a correlation. To infer a correlation, we have to assume that Bob doesn't also wear a hat most of the time even when it's not cold. Maybe this is a safe assumption to make, but maybe not. The

point is that if we just ignore it, we are neglecting the base rate, a mistake we encountered in the previous chapter. We can visualize the point this way. To establish a correlation between A and B, we must not only check the proportion of B cases in which A occurs, but also compare that with the proportion of non-B cases in which A occurs. In this chart, that means first asking what proportion of all the cases on the left are on the top-left, and then asking what proportion of all the cases on the right are on the top-right:

Consider a final example—this time, one of selective recall. As we saw in a previous chapter, when asked whether Italians tend to be friendly, we search our memory harder for examples of friendly Italians than for examples of unfriendly Italians. Where A is being friendly, and B is being Italian, that means we focus on the top left-hand side of the box, and do a poor job at estimating the proportion of B cases that are A. But things are even worse than that, because the loose generalization "Italians tend to be friendly" is plausibly a question of correlation—is the rate of friendly people among Italians higher than the proportion of friendly people among non-Italians? In that case, we have to evaluate not only the proportion of A cases in the B area, but also the proportion of A cases in the non-B area. So our selective search for cases in the top left-hand side is absurdly inadequate. We need to check all four boxes of cases we've observed.

Generalizing correlations Suppose we've avoided these errors and correctly identified a correlation in our experience. The next point at which our causal inference can flounder is when we generalize from our sample to conclude that a correlation exists in the general population. (After all, our observations usually only constitute a small sample of the relevant cases.) In the previous chapter, we saw several reasons why our sample

might fail to match the wider set of cases. All the same lessons apply when we're generalizing about correlations from a sample—for example, we need to be aware of sampling biases, participation biases, response biases, and so on. However, there is one important difference when we're dealing with correlations. When estimating the proportion of individuals with some feature, we said that a "sufficiently large" sample is one that gives us a sufficiently narrow confidence interval. But when we are interested in a correlation between two features, we want a sample large to make our correlation statistically significant. For example, if we're estimating the rate of respiratory problems in a country, we need a sample large enough to give us a narrow confidence interval; but if we want to know whether a correlation exists between respiratory problems and air pollution, we need a sample large enough that we have a good chance of finding a correlation that is statistically significant. So what does this mean, exactly? A correlation is statistically significant when we'd be sufficiently unlikely to find a correlation at least this large in a random sample of this size without there being some correlation in the larger population. We can work this out by supposing that there is no correlation in the larger population and then simulate taking many random samples of this size, and working out what proportion of those samples would show a correlation of at least the size that we observe, merely by chance. As with confidence intervals, the threshold for a statistically significant correlation is somewhat arbitrary. By convention, sufficiently unlikely in the social sciences means there's less than a 5% chance of seeing a correlation of this size or larger in our sample without there being some correlation in the larger population. This corresponds to a p-value of .05. (In areas like physics, however, the threshold is often more stringent.) So how strong is the evidence from a study that finds a statistically significant correlation? Note that if H = there is a correlation in the population and E = there is a correlation of at least this size in the sample, then statistical significance ensures a low value for P( E | ~H )—namely .05. Usually we can also assume that we'd be much more likely to see this correlation if there really is a correlation in the population as a whole, meaning that P( E | H ) is fairly high in comparison. In that case, statistical significance translates into a fairly high strength factor for the evidence provided by our sample. But note that the strength factor is not exactly overwhelming. A sample correlation that is just barely statistically significant will have at best a strength factor of 20 in favor of the generalization.

To make this point more vivid, imagine we find a barely statistically significant correlation in our sample. As we've seen, this means roughly a 5% chance of seeing a correlation like this in our sample even if there's no correlation in the population as a whole. So if twenty studies like ours were conducted, we should expect one to find a statistically significant correlation even if there's absolutely no correlation in the population! If we also take into account the file drawer effect and bias for surprising findings in scientific journals, we should be even more careful. When we see a published study with a surprising result and a p-value just under .05, we should keep in mind that may have been conducted that found no exciting or significant results and went unpublished. This means that the evidence provided for a surprising correlation by a single study with that level of significance may be far from conclusive. This selection effect only compounds for science reporting in the popular media. Studies with surprising or frightening results are far more likely to make their way into the popular media than those with boring results. In addition, such studies are often reported in highly misleading ways—for example, by interpreting a correlations as though they established causation. For these reasons, if we're not experts in the relevant field, we should be very careful when forming opinions from studies we encounter in the popular media. It can help to track down the original study, which is likely to contain a much more careful interpretation of the data, often noting weaknesses in the study itself, and rarely jumping to causal conclusions. But even this will not erase the selection effect inherent in the fact that we're only looking at this study—rather than other less exciting ones—because we heard about it in a media report. This is one of many reasons why there is really no substitute for consulting the opinions of scientific experts, at least if there is anything close to a consensus in the field. The experts have already synthesized the evidence from a wide variety of studies, so they're in a much better position to assess the real significance of new studies. Relatedly, we can look for a meta-analysis on the question—a type of study that tries to integrate the evidence from all the available studies on the topic.

Section Questions 7-1

Which of the following is true? (Use your commonsense knowledge of the world.)

A

Cloudy days are correlated with rainy days, and rainy days are correlated with cloudy days

B

Cloudy days are correlated with rainy days, but rainy days are not correlated with cloudy days

C

Rainy days are correlated with cloudy days, but cloudy days are not correlated with rainy days

D

None of the above

7-2 Suppose Jasmine smiles a lot, even when she's unhappy. Which fact would guarantee that her smiling and happiness are correlated?

A

Most of the time when she smiles, she's happy.

B

The fraction of her smiling time in which she's happy is greater than the fraction of her time in general in which she's happy.

C

Most of the time when she smiles, she's happy and most of the time when she's happy, she smiles.

D

Most of the time when she's happy, she smiles.

7-3 A correlation in our sample is statistically significant when:

A

the probability that a random sample would show a correlation at least as large as the one in our sample if there were no correlation in the general population is < .05

B

the probability that the correlation we observe does not exactly match the correlation in the general population is utility of Friday laziness This situation is represented by the fact that the green line is higher than the blue line on the left-hand side of this graph:

Unfortunately, as time passes and I move rightward on this graph, the benefit of lazing around on Friday morning starts to feel more significant because Friday morning gets closer.                 Friday morning: utility of Friday laziness > utility of Friday run I have now entered the fail zone, the area of the graph where one value curve pops up and temporarily overrides the other. For a short time—basically just while lazing around—I treat lazing around as more valuable than the benefits of running, then kick myself afterwards because I once again value the benefits of having gone for a run more than the benefits of having lazed around. In short, the only time I value lazing around on Friday more than running on Friday is the very time at which I actually get to decide what to do on Friday. When we place a great deal of weight on benefits at the present moment, we end up discounting benefits in the future in such a way that we are inconsistent across time. This means that we end up disagreeing with our past and future selves [10]. This is called time-inconsistent discounting. Another way to make this vivid is to test it with money. Suppose we ask people:                 Do you want $100 now or $120 in a month? Most would take the $100 now, even though 20% per month is an amazing rate of return. People have a discount curve for getting the money that drops very quickly during the first month. (A 'discount curve' is a chart like the first one about donuts above, showing the current utility of donuts at various future times.) But suppose we ask them:                 "Do you want $100 in 12 months, or $120 in 13 months?" That 1-month difference doesn't feel so large any more, since it's far out in the future along the horizontal slope of the utility curve. So they go for the $120 in 13 months.

This pair of decisions is problematic, because if I choose $120 in 13 months, then in 12 months I will disagree with my previous decision—at that point I suddenly decide that having the $100 after 12 months would have been a better choice. After all, at that point, I'll prefer $100 immediately to $120 in a month. The shape of our temporal discounting, revealed by our answers to these two questions, is timeinconsistent.

Our many selves It's sometimes useful to think of myself as comprised of various temporary selves over time—time-slices of myself. Almost all my time slices want me to go for a run and avoid donuts on Friday, except for the one that really matters, which is the time-slice in charge of what to do on Friday. That, in a nutshell, is the greatest impediment many of us face to more productive and successful lives. We have met the enemy, and he is us: or rather, whatever time-slice of us is in charge of what to do right now. In a sense, all my time-slices form a community in which each time-slice gets to be a temporary dictator for a short time. Generally, all the time-slices are against donuts and in favor of exercising at almost every moment. But each time-slice also has a soft spot for having donuts and lazing around at one particular time—namely, the very time where he happens to be in charge. The result is that every other time-slice looks on in horror as the dictatorship is continuously passed to the one time-slice with the worst possible values to be in charge at that very moment. How could such a community ever successfully complete a long-term project? The guy in charge never cares as much about the long-term as everyone else: he's always the most short-sighted individual in the community. One strategy for solving this kind of problem is for past time-slices to reach into the future and influence the decisions of future time-slices. Here are four practical ways to do that: restrictions costs resolutions rewards Let's consider these in order. First, if I know I might be tempted by a donut tomorrow, I can take steps now to restrict my access to donuts tomorrow. My future self is likely to be annoyed by this, but he doesn't know what's best for us! A famous example of restricting one's future self comes from the Odyssey: when his ship is about to pass the Sirens, Odysseus makes sure he won't jump overboard in pursuit of their singing by having his men tie him to the mast, so he could hear their song and still survive.

If I can't restrict my future self's access to donuts, I might still be able to add a cost to eating them, which might be enough to dissuade him. For example, if I know that there's one donut left and that I'll be tempted by it, I might promise someone that I will save it for them. Then when my future self considers eating it, he'll have to consider not only the health benefits of resisting the donut (which, in the moment, are not enough to outweigh the temptation) but also the additional cost of breaking that promise. A similar strategy sometimes works using self-promises or resolutions. If I make a very serious commitment to myself that I will get my homework done a day early, my future self might think twice about violating that commitment—at least if I convince myself that it matters to be the sort of person who keeps resolutions! Breaking the resolution becomes a cost that might be large enough to keep my future self from skipping the run. The strategy of racking up "streaks" or "chains" of actions works in a similar manner. If I've accumulated an unbroken streak of workouts, my future self will likely consider it a cost to break that streak. Failing to run on this occasion becomes a much bigger deal, potentially emblematic of my inability to get fit by committing to a habit. Thinking of the chain in this way may be enough to push my Friday morning self out the door. (In addition, he will know that our even later selves will be angry with him for breaking the chain!) Finally, if the long-term benefits of working out won't be enough to motivate my future self, I may be able to add additional short-term rewards that will sweeten the choice. For example, suppose I tend to be motivated by hanging out with a friend of mine. Then I may be able to motivate my future self to get out of bed by making a date to work out that morning with my friend. (This might work as a potential cost as well as a reward, if I expect my friend to fault me for not showing up!)

Section Questions 10-5 More physicians choose surgery over an alternative treatment when surgery is described as having a one-month survival rate of 90% than when it's described as having a one-month mortality rate of 10%. The reason for this is...

A

Surgery has better long-term outcomes if the patient survives the first month, but a chance of fatal complications in the first month

B

The first way of describing the surgery leads them to focus on the people who survive rather than those who die.

C

The surgery described as having a one-month survival rate of 90% has higher expected utility than the surgery described as having a one-month mortality rate of 10%

D

The surgery described as having a one-month mortality rate of 10% has higher expected utility than the surgery described as having a one-month survival rate of 90%

10-6 The text gives examples of outcome framing, new vs. old risks, possibility & certainty effects, and honoring sunk costs. Where people are making decision errors in these cases, what do they all have in common?

A

they do not know the benefits and harms of the various outcomes they are considering

B

they are failing to adequately assess the probability of the various outcomes

C

they are taking something into account in the decision other than expected utility

D

they are discounting future utility in a time-inconsistent way

10-7 Which of the following is, according to the text, likely a result of our desire not to have been "wrong" about a decision?

A

endowment effect

B

time-inconsistent discounting

C

outcome framing

D

honoring sunk costs

10-8 Jeff is digging through some things he doesn't use much anymore and deciding what to give away to a charity. He comes across a pair of sunglasses that he bought once for too much money, but never wore because they never looked right on him. Looking at them again, he can't help feel the urge to keep them just in case he ends up changing his mind about them. He puts them back in their case and tucks them away in his closet. Which of the following two pitfalls is Jeff likely exhibiting?

A

endowment effect and time-inconsistent utility

B

endowment effect and honoring sunk costs

C

time-inconsistent utility and honoring sunk costs

D

outcome framing and time-inconsistent utility

10-9 Some of the pitfalls discussed in this chapter can usefully be thought of as involving disagreements between System 1 and System 2. But which pitfall can usefully be considered a disagreement between ourselves at one time and ourselves at other times?

A

time-inconsistent discounting

B

possibility & certainty effects

C

outcome framing

D

endowment effect

10-10 Freya is gambling. You say: "If you multiply your possible gain by your chance of winning, you'll see that it's not worth it to gamble this money." She puts down another bet and says: maybe, but think of how great it would be if I won? Freya is most obviously exhibiting:

A

the diminishing marginal utility of money

B

the endowment effect

C

honoring sunk costs

D

the possibility effect

Key terms Diminishing marginal utility: when something has diminishing marginal utility, each additional unit provides less and less utility. For example, the difference between having $0 and having $1000 is very stark, giving one access to food and possibly shelter. So, that first $1,000 has a lot of marginal utility. However, the difference between having $1,000,000 and $1,001,000 is not so stark; one’s life probably wouldn’t change appreciably with that extra $1,000. This demonstrates that money has diminishing marginal utility: the value of additional units of money declines the more we've already acquired. Endowment effect: when we think of something as belonging to us—a possession—we value it more than if it's only potentially ours. For example, whereas I might have only been willing to pay $3 for a mug before acquiring it, upon acquiring it and thinking of it as mine, I value it more highly. Expected utility: A measure of how much of what we care about is achieved by the different possible outcomes of a choice, weighted by how likely those outcomes are. To calculate expected utility, we multiply the utility of each possible outcome by the probability of each outcome if that choice is made. Then, to get the expected utility of that choice, we add up the value of the various possible outcome. (See util and utility.)

Expected monetary value: A measure of how much money we stand to gain with different possible outcomes of a choice, weighted by how likely those outcome are. To calculate the expected monetary value: for each outcome, multiply its monetary value in dollars by its probability. Then, add up the results. Good decision: one that makes the best choice given what you know and value at the time of the decision. You can't reach back in time and "save" a decision if it was bad, nor can you do anything to make it bad if it was good. Honoring sunk costs: taking unrecoverable costs into account when estimating an option's expected value. Sunk costs include any loss or effort we've endured in the past, not just monetary costs. For example, it would be a mistake to take into account the non-refundable price you paid for a concert ticket in your decision about whether to attend that concert once a blizzard hits. New vs. Old Risks: we tend to reliably deviate from rational decision-making by assigning more value to the avoidance of new risks as compared to old risks. This can be explained by how old risks seem to us to be more manageable while new risks seem scarier. For example, people tend to worry much more about physical harm from strangers (e.g., terrorists), even though most murder victims are killed by someone they knew personally. Outcome framing: the same outcome can be described as a loss or a gain, and these different frames can reliably influence human decision-making. For example, physicians will tend to avoid a procedure with a 10% mortality rate but choose a procedure with a 90% survival rate, even though these two rates are equivalent. The first description has a loss frame, which makes it seem less desirable than the second description, which has a gain frame. Possibility and certainty effects: we tend to reliably deviate from rational decision-making by overvaluing the mere possibility of good outcomes and the certainty of avoiding bad outcomes. For example, we tend to be willing to pay more to go from a 0% to a 1% chance of winning a free vacation than we'd pay to go from a 4% to 5% chance of winning a free vacation. Sunk costs: taking unrecoverable costs that get taken into account when estimating expected value of an option. Time-inconsistent discounting: the tendency to value a given outcome less the further into the future it is from the present moment. We often have to make decisions that involve a trade-off between benefits now and costs later, or vice versa. And we tend to weigh the costs and benefits of near outcomes more heavily than those of distant outcomes. Even worse— the way we tend to do this means

that we predictably tend to disagree with our past and future selves about the relative value of two choices. So, for example, if I know that I will be offered a donut a month from now, I think the best choice is for me not to eat that donut, because I value eating a donut in a month much less than the later health benefits of not eating a donut. But I can predict that when my future self in a month is faced with a donut, he will value eating the donut at that time more than the later health benefits of not eating it. As the donut-time approaches, the relative expected value of those two decisions flips Utility & utils: a measure of how much of what we care about is achieved by an outcome. There is no absolute scale for how to measure utility, but it is important that we maintain the same proportions in our evaluations. So, if we value an outcome, A, twice as much as an outcome, B, then we should assign twice as much utility to B as to A. We can do this with utils, a "dummy" unit allowing us to calculate the relative value of decisions. The absolute value of utils used in a calculation is not important, but it's crucial that we preserve the ratio of different values. For example, if we value going to the theater twice as much as going to the park, then the value of utils we assign to going to the theater should be twice as high as the value of utils we assign to going to the park. Expected utility: the expected utility of an outcome O is how valuable O is, given what you care about. To make calculations that allow us to compare among outcomes it can help to use a “dummy” unit of value called a ‘util’, and speak of the “expected utility” of an outcome. However, we can speak of the relative expected value of outcomes without mentioning utils, as in “This outcome would be twice as good as that outcome.” The expected value of a choice C is the probability-weighted average of the value resulting from the possible outcomes of C.

Footnotes [1] We do have to make some assumptions about the beliefs and value judgments of our decision-maker. For example, we'll assume that degrees of belief follow the rules of probability that we covered previously in this book, and that their value judgments are coherent in various ways. And we'll assume that a decision-maker can always say how much better or worse one outcome is than another—for e.g., "twice as bad," "equally good," "twice as good." These assumptions allow us to use a simple version of decision theory, but there are more complex versions that give up these assumptions as well. There is a great deal of controversy about what combinations of value assignments to outcomes count as coherent or rational. See fn. 4. [2] See this chart for a good sense of income distribution in the world (keeping in mind that the x-axis is logarithmic and not

linear.) To get a qualitative sense of how income is distributed in the world, I recommend Dollar Street. [3] The exact numbers for capping out or "saturation" in the most comprehensive study were $95,000 for life evaluation, and $75,000 for emotional well-being: see Jebb, Tay, Diener & Oishi (2018). I say "per adult", but that papers over a complication, which is that the relevant data applies to single-person households; to obtain values for larger sizes, multiply the satiation estimate by the square root of the household size. For evidence of the log-linear relationship of income to well-being, and the satiation claim, see especially Jebb et. al., 2018, which explains limitations in previous studies on the matter of satiation; but also Stevenson and Wolfers 2013; Kahneman & Deaton 2010. [4] See Batson 2010, Batson, Ahmad, Lishner, & Tsang 2005; Goetz, Keltner, & Simon-Thomas 2010; Preston & DeWaal 2002, Dewaal, 2008. And see DeWaal 2010 for an informal overview of some of the relevant research. [5] For an introduction to some of the relevant issues, see the entries for Preferences and Decision Theory at the Stanford Encyclopedia of Philosophy. [6] See De Martino et. al. 2006. [7] Air pollution: for a major quantitative analysis see Caiazzo, Ashok, Waitz, Yim, & Barrett (2013). A summary can be found here. See also Hoek, Krishnan, Beelen, Peters, Ostro, Brunekreef, & Kaufman, (2013), and Pope, Burnett, Thun, Calle, Krewski, Ito, & Thurston (2002). Larson, Timothy V., and Jane Q. Koenig. "Wood smoke: emissions and noncancer respiratory effects." Annual Review of Public Health 15, no. 1 (1994): 133-156. [8] The World Health Organization estimates that 3.8 million people die every year "as a result of household exposure to smoke from dirty cookstoves and fuels"; the vast majority of this is due to fuels burned indoors in developing countries (coal and biomass like sticks and dung). In the US and Europe, woodsmoke constitutes a significant fraction of the particulate pollution in winter months, and is associated with increased risks for COPD, lung cancer, lung infection, heart and arterial disease, and likely dementia. See Naeher, Brauer, Lipsett, Zelikoff, Simpson, Koenig, & Smith (2007); Pope & Dockery (2006); Weichenthal, Kulka, Lavigne, Van Rijswijk, Brauer, Villeneuve, Stieb, Joseph, & Burnett (2017); Orozco-Levi, Garcia-Aymerich, Villar, Ramirez-Sarmiento, Anto, & Gea (2006); Sanhueza, Torreblanca, Diaz-Robles, Schiappacasse, Silva & Astete (2009); Bølling, Pagels, Yttri, Barregard, Sallsten, Schwarze, & Boman (2009). Oudin, Segersson, Adolfsson, & Forsberg (2018); Gorin, Collett, & Herckes (2006). For a review of older literature, see Smith (1987). [9] See Kahneman, Knetsch, & Thaler. (1990). [10] Temporal discounting doesn't have to be time-inconsistent, though for humans it often is (see Kirby 1997; Kirby & Maraković 1995). For example, if both lines in the graph above were straight and pointed upwards at the same angle, they'd never cross. That would represent my time-slices valuing the present moment more than the future, but in a way that

preserves the rank ordering of outcomes for any given choice: it doesn't lead to a conflict between time-slices about what to do.

References Batson, C. Daniel. "Empathy-induced altruistic motivation." in Mikulincer, Mario Ed, and Phillip R. Shaver. Prosocial motives, emotions, and behavior: The better angels of our nature. American Psychological Association, 2010: 15-34. Batson, C. D., Ahmad, N., Lishner, D. A., & Tsang, J. (2005). Empathy and altruism. In C. R. Snyder & S. J. Lopez (Eds.), Handbook of positive psychology (pp. 485–498). Oxford, UK: Oxford University Press. Bølling, A. K., Pagels, J., Yttri, K. E., Barregard, L., Sallsten, G., Schwarze, P. E., & Boman, C. (2009). Health effects of residential wood smoke particles: the importance of combustion conditions and physicochemical particle properties. Particle and fibre toxicology, 6(1), 29. Caiazzo, F., Ashok, A., Waitz, I. A., Yim, S. H., & Barrett, S. R. (2013). Air pollution and early deaths in the United States. Part I: Quantifying the impact of major sectors in 2005. Atmospheric Environment, 79, 198208. De Martino, Benedetto, Dharshan Kumaran, Ben Seymour, and Raymond J. Dolan. "Frames, biases, and rational decision-making in the human brain." Science 313, no. 5787 (2006): 684-687. De Waal, Frans. The age of empathy: Nature's lessons for a kinder society. Broadway Books, 2010. De Waal, Frans. "Putting the altruism back into altruism: the evolution of empathy." Annu. Rev. Psychol. 59 (2008): 279-300. Goetz, Jennifer L., Dacher Keltner, and Emiliana Simon-Thomas. "Compassion: an evolutionary analysis and empirical review." Psychological bulletin 136, no. 3 (2010): 351. Gorin, Courtney A., Jeffrey L. Collett Jr, and Pierre Herckes. "Wood smoke contribution to winter aerosol in Fresno, CA." Journal of the Air & Waste Management Association 56, no. 11 (2006): 1584-1590.

Hoek, G., Krishnan, R. M., Beelen, R., Peters, A., Ostro, B., Brunekreef, B., & Kaufman, J. D. (2013). Longterm air pollution exposure and cardio-respiratory mortality: a review. Environmental Health, 12(1), 43. Jebb, Andrew T., Louis Tay, Ed Diener, and Shigehiro Oishi. "Happiness, income satiation and turning points around the world." Nature Human Behaviour 2, no. 1 (2018): 33. Kahneman, Daniel, and Angus Deaton. "High income improves evaluation of life but not emotional wellbeing." Proceedings of the national academy of sciences 107, no. 38 (2010): 16489-16493. Kahneman, Daniel, Jack L. Knetsch, and Richard H. Thaler. "Experimental tests of the endowment effect and the Coase theorem." Journal of Political Economy 98, no. 6 (1990): 1325-1348. Kirby, Kris N. "Bidding on the future: evidence against normative discounting of delayed rewards." Journal of Experimental Psychology: General 126, no. 1 (1997): 54. Kirby, Kris N., and Nino N. Maraković. "Modeling myopic decisions: Evidence for hyperbolic delaydiscounting within subjects and amounts." Organizational Behavior and Human Decision Processes 64, no. 1 (1995): 22-30. Larson, Timothy V., and Jane Q. Koenig. "Wood smoke: emissions and noncancer respiratory effects." Annual Review of Public Health 15, no. 1 (1994): 133-156. Naeher, Luke P., Michael Brauer, Michael Lipsett, Judith T. Zelikoff, Christopher D. Simpson, Jane Q. Koenig, and Kirk R. Smith. "Woodsmoke health effects: a review." Inhalation toxicology 19, no. 1 (2007): 67-106. Orozco-Levi, M., J. Garcia-Aymerich, J. Villar, A. Ramirez-Sarmiento, J. M. Anto, and J. Gea. "Wood smoke exposure and risk of chronic obstructive pulmonary disease." European Respiratory Journal 27, no. 3 (2006): 542-546. Oudin, Anna, David Segersson, Rolf Adolfsson, and Bertil Forsberg. "Association between air pollution from residential wood burning and dementia incidence in a longitudinal study in Northern Sweden." PloS one 13, no. 6 (2018): e0198283. Pope III, C. Arden, and Douglas W. Dockery. "Health effects of fine particulate air pollution: lines that connect." Journal of the air & waste management association 56, no. 6 (2006): 709-742.

Pope III, C. A., Burnett, R. T., Thun, M. J., Calle, E. E., Krewski, D., Ito, K., & Thurston, G. D. (2002). Lung cancer, cardiopulmonary mortality, and long-term exposure to fine particulate air pollution. Jama, 287(9), 1132-1141. Preston, Stephanie D., and Frans BM De Waal. "Empathy: Its ultimate and proximate bases." Behavioral and brain sciences 25, no. 1 (2002): 1-20. Sanhueza, Pedro A., Monica A. Torreblanca, Luis A. Diaz-Robles, L. Nicolas Schiappacasse, Maria P. Silva, and Teresa D. Astete. "Particulate air pollution and health effects for cardiovascular and respiratory causes in Temuco, Chile: a wood-smoke-polluted urban area." Journal of the Air & Waste Management Association 59, no. 12 (2009): 1481-1488. Smith, Kirk R. "Biofuels, air pollution, and health: a global review." Modern perspectives in energy (USA) (1987). Stevenson, Betsey, and Justin Wolfers. "Subjective well-being and income: Is there any evidence of satiation?." American Economic Review 103, no. 3 (2013): 598-604. Weichenthal, Scott, Ryan Kulka, Eric Lavigne, David Van Rijswijk, Michael Brauer, Paul J. Villeneuve, Dave Stieb, Lawrence Joseph, and Rick T. Burnett. "Biomass burning as a source of ambient fine particulate air pollution and acute myocardial infarction." Epidemiology (Cambridge, Mass.) 28, no. 3 (2017): 329.

Image Credits Banner image of sign with arrows: image licensed under CC0, cropped. Two paths through woods: image by DennisM2 licensed under CC0. Flooding in European town (Elbe): image by LucyKaef, licensed under Pixabay license. Outdoor lawn in the dark with orange lamp: image licensed under CC0. Antique scales in front of windows: image by Michel Bertolotti, licensed under Pixabay license. Doctors in hospital hallway: image by Oles kanebckuu, licensed under Pexels license. Dark room with open ornate window: image by Bertsz, licensed under Pixabay license. Industrial area with red tint showing air pollution: image by analogicus, licensed under Pixabay license. Assortment of used men's shirts and jackets: image by mentatdgt, licensed under Pexels license. Lottery balls: image by Alejandro Garay, licensed under Pixabay license. Car and pedestrian in snow: image by Lisa Fotios, licensed under Pexels license. Water seen from beneath surface: image licensed under Pixabay

license. Four donuts: image by wonjun yoon, licensed under Pixabay license. Sailboat in front of sunset: image licensed under Pixabay license. All other images are the author's own.

Reason Better: An Interdisciplinary Guide to Critical Thinking © 2019 David Manley. All rights reserved version 1.2 Exported for Jason Bao on Tue, 12 Sep 2023 21:29:35 GMT