Expected Experiences: The Predictive Mind in an Uncertain World 9780367535476, 9780367540197, 9781003084082

This book brings together perspectives on predictive processing and expected experience. It features contributions from

118 98

English Pages 297 [314] Year 2024

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Cover
Half Title
Series Page
Title Page
Copyright Page
Contents
List of Contributors
Preface
Introduction: Mind and World, Predictive Style
PART I: Varieties of Experiences
1. Deep Neurophenomenology: An Active Inference Account of Some Features of Conscious Experience and of Their Disturbance in Major Depressive Disorder
2. Expectancies and the Generation of Perceptual Experience: Predictive Processing and Phenomenological Control
3. The Synergistic Relationship between Perception and Action
4. Perceptual Uncertainty, Clarity, and Attention
5. Predictive Processing and Object Recognition
6. Predicting First-Person and Counterfactual Experiences of Selfhood: Insights from Anosognosia
7. Predictive Processing in the “Second Brain”: From Gut Complex to Meta-Awareness
PART II: Related Theoretical Issues Concerning Bayesian Probability
8. Neural Implementation of (Approximate) Bayesian Inference
9. Realism and Instrumentalism in Bayesian Cognitive Science
10. Bayesian Psychiatry and the Social Focus of Delusions
11. Higher-Order Bayesian Statistical Decision Theory of Consciousness, Probabilistic Justification, and Predictive Processing
Index
Recommend Papers

Expected Experiences: The Predictive Mind in an Uncertain World
 9780367535476, 9780367540197, 9781003084082

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Routledge Studies in Contemporary Philosophy

EXPECTED EXPERIENCES THE PREDICTIVE MIND IN AN UNCERTAIN WORLD Edited by Tony Cheng, Ryoji Sato, and Jakob Hohwy

Expected Experiences

This book brings together perspectives on predictive processing and expected experience. It features contributions from an interdisciplinary group of authors specializing in philosophy, psychology, cognitive science, and neuroscience. Predictive processing, or predictive coding, is the theory that the brain constantly minimizes the error of its predictions based on the sensory input it receives from the world. This process of prediction error minimization has numerous implications for different forms of conscious and perceptual experience. The chapters in this volume explore these implications and various phenomena related to them. The contributors tackle issues ­related to precision estimation, sensory prediction, probabilistic perception, and attention, as well as the role predictive processing plays in emotion, action, psychotic experience, anosognosia, and gut complex. Expected Experiences will be of interest to scholars and advanced students in philosophy, psychology, and cognitive science working on issues related to predictive processing and coding. Tony Cheng is the Director of Center for Phenomenology at NCCU, Taiwan, and is also affiliated to Department of Philosophy/Research ­ Center for Mind, Brain and Learning at the same university. He obtained Ph.D. in Philosophy from University College London with the dissertation Sense, Space, and Self. His research topics include perception, the senses, attention, self-awareness, spatio-temporal representations, metacognition, cognitive development, and animal minds. He has published several theoretical papers, mostly single-authored, and several empirical papers, primarily with Patrick Haggard’s Action and Body Lab at UCL and Brown Hsieh’s Brain and Consciousness Lab at NTU. He just published a book entitled John McDowell on Worldly Subjectivity: Oxford Kantianism Meets Phenomenology and Cognitive Sciences and is working on another book Transcendental Epistemology.

Ryoji Sato is currently an Associate Professor at University Education Center, Tokyo Metropolitan University. Before that, he taught at Nagoya University of Foreign Studies and the University of Tokyo. He earned his Ph.D. in Philosophy from Monash University. He works broadly in philosophy of mind and specifically in the predictive processing framework. Jakob Hohwy is the Director of the Monash Centre for Consciousness and Contemplative Studies, M3CS, which conducts philosophical, ­neuroscientific, and psychological research in consciousness and contemplative science. He conducts interdisciplinary research in the areas of philosophy, psychology, and neuroscience. In M3CS and in his Cognition and Philosophy Lab, they study the science of consciousness, theoretical neurobiology, decision-making and rationality, and psychiatry and neurobiology. He collaborates with neuroscientists and psychologists from Monash University and around the world.

Routledge Studies in Contemporary Philosophy

The Politics of Recognition in the Age of Digital Spaces Appearing Together Benjamin JJ Carpenter Feminist Philosophy and Emerging Technologies Edited by Mary L. Edwards and S. Orestis Palermos Normative Species How Naturalized Inferentialism Explains Us Jaroslav Peregrin Philosophy of Mental Disorder An Ability-Based Approach Sanja Dembić Historical Explanation An Anti-Causalist Approach Gunnar Schumann Exploring Extended Realities Metaphysical, Psychological, and Ethical Challenges Edited by Andrew Kissel and Erick José Ramirez Expected Experiences The Predictive Mind in an Uncertain World Edited by Tony Cheng, Ryoji Sato, and Jakob Hohwy

For more information about this series, please visit: www.routledge.com/Routledge-Studiesin-Contemporary-Philosophy/book-series/SE0720

Taylor & Francis Taylor & Francis Group http://taylorandfrancis.com

Expected Experiences The Predictive Mind in an Uncertain World

Edited by Tony Cheng, Ryoji Sato, and Jakob Hohwy

First published 2024 by Routledge 605 Third Avenue, New York, NY 10158 and by Routledge 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN Routledge is an imprint of the Taylor & Francis Group, an informa business © 2024 selection and editorial matter, Tony Cheng, Ryoji Sato, and Jakob Hohwy; individual chapters, the contributors The right of Tony Cheng, Ryoji Sato, and Jakob Hohwy to be identified as the authors of the editorial material, and of the authors for their individual chapters, has been asserted in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. ISBN: 978-0-367-53547-6 (hbk) ISBN: 978-0-367-54019-7 (pbk) ISBN: 978-1-003-08408-2 (ebk) DOI: 10.4324/9781003084082 Typeset in Sabon by KnowledgeWorks Global Ltd.

Contents

List of Contributors Preface

ix xv

TONY CHENG

Introduction: Mind and World, Predictive Style

1

TONY CHENG, RYOJI SATO, AND JAKOB HOHWY

PART I

Varieties of Experiences7 1 Deep Neurophenomenology: An Active Inference Account of Some Features of Conscious Experience and of Their Disturbance in Major Depressive Disorder

9

MAXWELL J. D. RAMSTEAD, WANJA WIESE, MARK MILLER, AND KARL J. FRISTON

2 Expectancies and the Generation of Perceptual Experience: Predictive Processing and Phenomenological Control

47

PETER LUSH, ZOLTAN DIENES, AND ANIL SETH

3 The Synergistic Relationship between Perception and Action

76

CLARE PRESS, EMILY THOMAS, AND DANIEL YON

4 Perceptual Uncertainty, Clarity, and Attention

96

JONNA VANCE

5 Predictive Processing and Object Recognition BERIT BROGAARD AND THOMAS ALRIK SØRENSEN

112

viii Contents 6 Predicting First-Person and Counterfactual Experiences of Selfhood: Insights from Anosognosia

140

AIKATERINI FOTOPOULOU AND SAHBA BESHARATI

7 Predictive Processing in the “Second Brain”: From Gut Complex to Meta-Awareness

170

TONY CHENG, LYNN CHIU, LINUS HUANG, YING-TUNG LIN, HSING-HAO LEE, YI-CHUAN CHEN, AND SU-LING YEH

PART II

Related Theoretical Issues Concerning Bayesian Probability195 8 Neural Implementation of (Approximate) Bayesian Inference

197

MICHAEL RESCORLA

9 Realism and Instrumentalism in Bayesian Cognitive Science

240

DANIELLE J. WILLIAMS AND ZOE DRAYSON

10 Bayesian Psychiatry and the Social Focus of Delusions

257

DANIEL WILLIAMS AND MARCELLA MONTAGNESE

11 Higher-Order Bayesian Statistical Decision Theory of Consciousness, Probabilistic Justification, and Predictive Processing

283

TONY CHENG

Index294

Contributors

Sahba Besharati is a neuropsychologist and senior lecturer in cognitive neuroscience at the University of the Witwatersrand (Wits) in Johannesburg South Africa. She is a Global Scholar for the Canadian Institute for Advanced Research (CIFAR) in the Brain, Mind and Consciousness Program. She draws on interdisciplinary methods to investigate selfconsciousness, emotion, and social cognition in clinical, pediatric, and healthy populations. Her research addresses the question of how we become aware of ourselves and others in the world, and how our environment influences this construction of the self. She is actively involved in capacity building in the neurosciences for unrepresented researchers. Berit Brogaard is a Professor of Philosophy and Senior Cooper Fellow at the University of Miami. Her areas of research include philosophy of perception, philosophical psychology, and cognitive science. She is the author of Transient Truths (2012), On Romantic Love (2015), The Superhuman Mind (Penguin, 2015), Seeing & Saying (2018), and Hatred: Understanding Our Most Dangerous Emotion (2020). Yi-Chuan Chen is an Associate Professor in Department of Medicine at Mackay Medical College. He obtained his Ph.D. from Department of Experimental Psychology, University of Oxford, and held postdoctoral positions at University of Oxford, McMaster University, and Lancaster University. His research interests include human multisensory perception, life-span development of human perception, semantics and concept, neural plasticity, the influence of modern technology, and psycholinguistics. Lynn Chiu is a philosopher, a biologist, and a science communicator based in Vienna, Austria. In the past decade, she has worked in psychology, immunology, and evolutionary biology labs, publishing scientific work and public writings at the intersection of science and philosophy. As a philosopher-in-practice, her work is based on the principle that the

x Contributors philosophy, practice, and communication of science are all deeply intertwined components of science. Equipped with a philosopher’s eye for logic and sensitivity towards nuance, her mission is to create and transfer interdisciplinary knowledge both within and beyond the scientific community. Zoltan Dienes is a Professor of Psychology at the University of Sussex, where he has worked since 1990. He is equally interested in consciousness (on which subject he co-wrote a book with Dianne Berry in 1993 Implicit Learning: Theoretical and Empirical Issues) and scientific reform (on which subject he wrote a book in 2008, Understanding Psychology as a Science). He created the first online Bayes factor calculator in 2008, was on the first editorial board in 2013 for the Registered Reports article type, and co-founded in 2021 Peer Community in Registered Reports, which is free for both authors and readers. Zoe Drayson is Associate Professor of Philosophy and Director of Cognitive Science at the University of California, Davies. Her research is in the philosophy of the mind-sciences, and her recent work explores the naturalistic metaphysics of cognitive science. Aikaterini Fotopoulou is a Professor in Psychodynamic Neuroscience at University College London. Her lab focuses on topics and disorders that lie at the borders between neurology and psychology, funded initially by a Starting Investigator Grant “Bodily Self” and more recently a Consolidator grant “METABODY” from the European Research Council. See here for projects and publications (www.fotopoulou.com). Katerina is the founder of the International Association for the Study of Affective Touch (IASAT), fellow of the Association of Psychological Science and the recipient of many awards such as the prestigious Early Career Award of the International Neuropsychology Society, and the Young Champions Award of the World Economic Forum. Karl J. Friston is a theoretical neuroscientist and authority on brain imagine. He invented statistical parametric mapping (SPM), voxel-based morphometry (VBM), and dynamic causal modelling (DCM). These contributions were motivated by schizophrenia research and theoretical studies of value-learning – formulated as the ­dysconnection hypothesis of schizophrenia. Mathematical contributions include variational Laplacian procedures and generalized filtering for hierarchical Bayesian model inversion. Friston currently works on models of functional integration in the human brain and the principles that underlie neuronal interactions. His main contribution to theoretical neurobiology is a free-energy principle for action and perception (active inference).

Contributors xi Linus Huang is a philosopher of cognitive science, technology, and artificial intelligence. He received his Ph.D. in History and Philosophy of Science from University of Sydney and held postdoctoral fellowships at Academia Sinica, Taiwan and University of California, San Diego. Linus’s research programme explores, from an embodied mind perspective, the implications of computational cognitive neuroscience on the nature of agency and the human mind. Furthermore, he applied this embodied perspective to examine how we can promote social justice by ameliorating implicit bias and algorithmic bias. His research has been published in Synthese, Philosophical Psychology, and Philosophy & Technology. Hsing-Hao Lee is a Ph.D. student in the Department of Psychology, New York University, focusing on visual attention, consciousness, and cognitive control. His research investigates the interaction between visual attention and human perception, conscious and unconscious mental processes, and how people inhibit irrelevant stimuli. Hsing-Hao delves into these topics by using diverse methodologies, such as fMRI, EEG, TMS, eye-tracking, and psychophysical experiments. Through his work, he aims to understand how humans process information from the external world and integrate it into the brain network. Ying-Tung Lin is an Associate Professor at the Institute of Philosophy of Mind and Cognition, National Yang Ming Chiao Tung University in Taiwan. Her research interests include self-consciousness in mental simulation (e.g., memory, imagination, and dreaming), pain and suffering, and ethical issues in human-AI interactions. She has published several papers on perspectives and self-experience in memory and imagination, in which she explores the varieties of self-consciousness and of mental simulation. She has also co-authored papers on AI technologies and bias and is currently working on a project on the conceptual issues of pain and suffering. Peter Lush is a Research Fellow at the University of Sussex. His research focuses on trait response to imaginative suggestion within and outside the hypnotic context (phenomenological control), demand characteristics (cues which convey experimental aims to participants) and the sense of agency. Mark Miller is a philosopher of cognition. His research explores what recent advances in neuroscience can tell us about happiness and well-being, and what it means to live well in our increasingly technologically mediated world. Mark is currently a senior Research Fellow at Monash University’s Centre for Consciousness and Contemplative Studies and a Research Fellow in the Psychology Department at the University of Toronto.

xii Contributors Marcella Montagnese is a multidisciplinary researcher specializing in neuroscience, with a focus on neuroimaging and computational psychiatry. Her work includes studying hallucinations across psychiatric and neurodegenerative diseases, as well as developing early markers for ­dementia detection and prognosis in real-world memory clinic data. She employs state-of-the-art techniques such as graph theory, neuroimaging, and machine learning to enhance our understanding of these conditions and pave the way for new treatments. Clare Press is head of the Action and Perception lab at Experimental Psychology and the Wellcome Centre for Human Neuroimaging, UCL. Prior to this, she held three fellowships, and faculty posts at the University of Reading and Birkbeck, University of London. The lab ask a variety of questions pertaining to how action and perception shape each other, usually considering the role of predictive and statistical learning mechanisms. She is currently funded by an ERC Consolidator grant and the Leverhulme Trust. Maxwell J. D. Ramstead is the Director of Research at the VERSES Research Lab, where he and his team are developing a new, standardized approach to contextual computing, knowledge graphs, and graphical inference agents. Ramstead is also an Honorary Fellow at the Wellcome Centre for Human Neuroimaging, at University College London, in the United Kingdom, where he works with Professor Karl Friston within the Theoretical Neurobiology unit. Ramstead’s research focuses on the free-energy principle, Bayesian mechanics, multiscale formulations of active inference, and computational phenomenology. Michael Rescorla is a professor in the philosophy department at the University of California, Los Angeles. His main research areas are philosophy of mind, philosophy of logic, epistemology, and philosophy of language. Anil Seth is Professor of Cognitive and Computational Neuroscience and Director of the Sussex Centre for Consciousness Science at the University of Sussex, Co-Director of the Canadian Institute for Advanced Research Program on Brain, Mind and Consciousness, a European ­Research Council Advanced Investigator, and Editor-in-Chief of Neuroscience of Consciousness. His research interests cover computational neurophenomenology, measures of complexity and causality, and predictive processing. He has published more than 200 research papers and is recognized by Web of Science as being in the top 0.1% of ­researchers internationally. His 2017 TED talk has been viewed more than 14 million times, and his 2021 book Being You: A New Science of Consciousness was an instant Sunday Times Bestseller and a Book of the Year

Contributors xiii for The Economist, The New Statesman, Bloomsburg Business, The Guardian, The Financial Times and elsewhere. Thomas Alrik Sørensen is originally trained as an experimental psychologist and currently he is an Associate Professor and head of Centre for Cognitive Neuroscience at Aalborg University. He has a keen interest in consciousness; however, his main research focus is on the interplay between memory and perception. In recent years, he has had a particular interest in how familiarity and expertise influence perceptual processing and behaviour. Here he has published on topics like attention, memory, perception, synaesthesia, and clinical neuropsychology. His recent ­interest is in multisensory perception and the variation in perception between individual observers. Emily Thomas is a cognitive neuroscientist and obtained her B.Sc. in Psychology at the University of Lincoln, followed by an M.Sc. in Brain and Cognition at Erasmus University Rotterdam, the Netherlands. She completed her Ph.D. in 2021 at Birkbeck, University of London in Dr Clare Press’ lab, examining how expectations influence what we perceive and how these mechanisms are represented in the brain. Since then, Emily has joined Dr Biyu He’s lab as a Research Fellow at New York University Medical Center, continuing her research in these topics as well as examining the neural mechanisms underlying conscious perception. Jonna Vance is an Associate Professor of Philosophy at Northern Arizona University. She holds a Ph.D. from Cornell University’s Sage School of Philosophy. Wanja Wiese received his Ph.D. in Philosophy from Johannes Gutenberg University Mainz in 2015 and is currently a postdoctoral researcher and lecturer at the Institute for Philosophy II at Ruhr University B ­ ochum. His research focuses on how theoretical and empirical findings from cognitive neuroscience can deepen our understanding of the mind and consciousness. In 2018, he published his monograph Experienced Wholeness with MIT Press. Together with Jennifer Windt (Monash) and Sascha Fink (Magdeburg), he is Editor-in-Chief of Philosophy and the Mind Sciences (https://philosophymindscience.org), a diamond open-access journal. Daniel Williams is a Lecturer in Philosophy at the University of Sussex. He obtained his Ph.D. from the University of Cambridge in 2018, where he was supervised by Richard Holton and Huw Price. Dr Williams ­specializes in the fields of philosophy of mind and cognitive science, with additional interests in moral psychology, social epistemology, philosophy of science, economics, and psychiatry. His primary research

xiv Contributors centres around investigating the social functions and origins of beliefs, especially in cases of self-deception, religious beliefs, political ideologies, and delusions. Danielle Williams is a Mellon Postdoctoral Fellow in Modelling Interdisciplinary Inquiry at Washington University in St. Louis. She received her Ph.D. from University of California, Davis in 2023. In 2022, she was selected as a fellow for the Summer Seminars in Neuroscience and Philosophy (SSNAP) programme at Duke University. In spring 2022, she was nominated for the Outstanding Graduate Student Teacher Award for her instruction of the course titled “Minds, Brains, and Computers.” Her publications include “Markov blankets: Realism and our ontological commitments” (Behavioral and Brain Sciences, 2022) and Philosophy of Technology” (in the forthcoming Routledge Encyclopedia of Technology and the Humanities). Su-Ling Yeh holds the Fu Ssu-nien Memorial Chair at National Taiwan University and the distinguished position of lifetime professor in the Department of Psychology. She is a recipient of the Academic Award of the Ministry of Education and the Distinguished Research Award of the National Science and Technology Council of Taiwan. She is an APS (American Psychological Science) fellow, a 2019–20 fellow at the Center for Advanced Study in the Behavioral Sciences, Stanford University, and a 2023–24 fellow at the National Humanities Center. She serves as the associate dean of the College of Science and the associate director of the Center for Artificial Intelligence and Advanced Robotics at NTU. Her research interests include cognitive neuroscience, perception, attention, consciousness, multisensory integration, ageing, and applied psychological research on display technology, eye-tracking devices, and AI/robots. Daniel Yon is an experimental psychologist and cognitive neuroscientist. He studied Psychology at Oxford before moving to Birkbeck, University of London for his Ph.D. – which he received in 2018. Since 2021, he has been a member of faculty at Birkbeck, where he directs The Uncertainty Lab. Research in his lab combines tools from psychology and neuroscience to understand how our brains build models of ourselves and the world around us – and how these models shape perception, action, and belief.

Preface Tony Cheng

This project was planned when I just finished my Ph.D. at UCL in 2019. Predictive processing was something I often heard of but did not have the expertise to properly understand. However, it came up so often in discussions of topics that I work on, such as attention, consciousness, multisensory perception, and so on. There were then some pressures for me to understand it, at least to some superficial extent. One way of enhancing my understanding of it is to collaborate with experts in this area, and one of such collaborations is this editorial project. My coeditors Ryoji and Jakob know much more about this area than I do, so I thank them for supporting me in this regard. Along the way I have also learned so much from the contributions, although I must admit that there is still so much I fail to understand in these chapters. This is a tricky era, both in terms of geopolitics and of the pandemic. Taiwan was relatively less affected by Covid, but it still had some lasting impacts. During the lockdowns in many parts of the world, I thought people would have more time to work in general, but I was clearly wrong. So many people were badly affected, and they needed to postpone or even pull out for their contributions. I am still quite happy with the outcome, even though it was indeed challenging to secure contributions, and it was delayed for two years or so. I would like to thank those who were invited first round and delivered, and those who were invited later after some have pulled out. Last but not least, I also wish to thank Andy Clark for providing a very different perspective on predictive processing. Personally, I would like to view this volume as a sequel of my previous editorial project Spatial Senses: Philosophy of Perception in an Age of Science (2019), also available from Routledge, co-edited with Ophelia Deroy and Charles Spence. Although these two volumes have different themes, they are still related in that varieties of experiences are the targets of analysis. I hope readers will enjoy the contents of both volumes. Also personally, in recent years I have been having a very hard time due to various reasons. It is an encouraging fact that I managed to put things together. I believe and hope my future self will feel fortunate and grateful.

Taylor & Francis Taylor & Francis Group http://taylorandfrancis.com

Introduction Mind and World, Predictive Style Tony Cheng, Ryoji Sato, and Jakob Hohwy

Predictive processing (PP) is, to put it bluntly, a first promising unified theory of cognition. PP is supposed to explain every aspect of how the brain works: perception, action, and everything in between. In PP, all those aspects are explained with one simple idea of prediction error minimisation in the long run. Moreover, it is achieved with the set of simple theoretical tools such as hierarchical and/or active inference and precision weighting. Although the origin of inferentialist views can be traced back to centuries ago (e.g., Helmholtz, 1867; McClelland & Rumelhart, 1981; Rao & ­Ballard, 1999), the current trend initiated around 20 years ago, as can be seen in works by both neuroscientists and philosophers (e.g., Clark, 2013, 2016; Friston, 2003, 2009; Hinton, 2007; Hohwy, 2013). There have been many excellent introductions and edited volumes on this topic (e.g., Kirchhoff & Kiverstein, 2019; Metzinger & Wiese, 2017; Wiese, 2018), and we refer the reader to these sources for detailed overview of the PP framework. Below we briefly explain what this volume is all about to pave the way for readings of the chapters themselves. The objective of this volume is to bring together interdisciplinary research from philosophy, psychology, and neuroscience on PP and issues on conscious experiences. PP has an extremely broad scope, as a framework for overall brain functions, according to which the brain is a sophisticated hypothesis-testing mechanism, aiming at long-term average prediction error minimisation through perception and action. It subsumes, generalises, or connects with many areas, such as reinforcement learning, theories of attention, and Bayesian approaches to cognition and decision-making. Given these rich implications and broad influences, the framework ought to have a systematic impact on our understanding of conscious experience too – at least if one assumes that conscious experience arises through brain activities. The challenge for this specific focus is then to distinguish the brain activities accompanied by consciousness from the many neuronal PP computations that are unconscious, or do not generate conscious experiences (Clark, 2016, p. xvi, 10, 42, 300). The main theme of this volume DOI: 10.4324/9781003084082-1

2  Tony Cheng, Ryoji Sato, and Jakob Hohwy will therefore be experiences or “expected experiences,” which comports Hohwy (2013, Ch. 4). This comports well with developments in the field where more work is being done on if and how PP can be fashioned into a theory of consciousness, or otherwise make contact with consciousness sciences (Dolega & Dewhurst, 2021; Whyte & Smith, 2021; Williford, Bennequin, Friston, and Rudrauf, 2018). As mentioned, PP is a very ambitious theory with an extremely wide explanatory scope. It integrates existing accounts in many areas such as perception, action, emotion, and decision-making. This might lead to some initial concern: Who would object to it? What does it change? Can it be falsified? This is understandable to some extent: we see that PP has sought to explain well-known phenomena in many fields. This includes key topics in conscious experience, such as binocular rivalry and the rubber hand illusion. Why do we need PP, one might ask, if experts in those fields have already (by and large) explained those phenomena? One answer is that what PP aims for is a universal explanation of how the brain computes, so explaining everything concerning neuronal computations is its goal, and that is why it needs to accommodate cases that have been well studied in the past. Another significant virtue of having a theory that seeks to explain everything is the potential for new discovery: not only unifying existing accounts, PP has produced new accounts and even created a whole new area of research. One such example is the new role of dopamine as encoding precision (Friston, 2009). Dopamine has been known to be involved with modulation of neural activities and implicated in variety of mental disorders but it was not until PP that how and why lack or excess of dopamine leads to those disorders. From the perspective of PP, they are understood as failures in precision optimisation (for an overview, see Hohwy, 2013). Now, mental disorders are studied using mathematical models in the area of computational psychiatry thanks to PP (e.g., Stephan & Mathys, 2014). Relatedly, answering “who-would-object-to-it” can be done practically by looking into collections such as The Routledge Handbook of the Computational Mind (Sprevak & Colombo, 2018): PP, or prediction error minimisation in particular, is situated in the part of “types of computing,” in which other types are also discussed, including classical computational models, connectionist models, and dynamic information processing. To say this is not to say that these models have to be incompatible with one another; perhaps PP can be made cohere with versions of connectionism, for example. But the point is that PP does have many competitors. More broadly, not everyone is a believer of the computational mind: there are so-called triviality arguments about computational implementation (Chalmers, 1995, 1996, 2012; Searle, 1992), and also controversies concerning the relation between computational explanations and neural coding (Cao, 2018). So,

Introduction 3 not everyone agrees with PP, and it really needs defending against potential counterexamples; for example, it has been argued that PP can explain some but not all kinds of attention (Ransom, Fazelpour, & Mole, 2017). Another point of disagreement to highlight is whether PP naturally chimes well with the so-called embodied cognition approach (e.g., V ­ arela, Thompson, & Rosch, 1991), and relatedly, how it should deal with ­potential sceptical threats. In broad strokes, one view is that PP is all about how the brain computes, so there is a potential sceptical worry about how our minds can ever be in contact with the world (e.g., Hohwy, 2013, ch.11), while an opposing view is that PP is essentially embodied and situated so there is no pressing sceptical concern in this area (Clark, 2016). This mirrors the old divide between constructivism (Gregory, 1970) and ecological optics (Gibson, 1966) in cognitive psychology. We do not pick sides here, but the debate serves to remind that this background ideological disagreement should also be taken into considerations when evaluating the prospects of explaining conscious experiences via the PP framework. For example, these different takes on PP could reflect on what consciousness is. The former kind of rather “internalist” understanding of PP is naturally conducive to the project of looking for processes that give rise to conscious experiences within PP. The embodied and situated understanding of PP would in contrast regard the search for the locus of consciousness in the brain as misguided. There are further complications here: while Clark argues that consciousness itself is not extended, Kiverstein and Kirchhoff argue for extended consciousness (2019). This volume is divided into two parts: Part I is about how PP can explain varieties of experiences, including hypnotic responses, psychosis, and object recognition, while Part II touches on related theoretical issues, including singular perceptual representation and anti-Bayesian updating. The volume thus samples key issues concerning how PP can accommodate varieties of conscious experiences; it is about how the conscious predictive mind navigates in this uncertain world. This puts to the test the idea that PP “helps us understand [our] experience as our minds take possession of the world, and as the world takes possession of the mind” (Hohwy, 2013, p. 205). Although this remark is specifically about attention, it is central to how PP can explain all kinds of conscious experiences given close (though controversial) connections between attention and consciousness. In this way, we can also see how theoretical works like this in empirical psychology and neuroscience can be made relevant to traditional philosophical issues concerning the relation between mind and world: as John McDowell would put it, the world is open to our minds, but this intentional cum epistemic connection (second nature) needs to be grounded in our biological makeups (first nature), and framework such as PP is a candidate for the relevant enabling (if not constitutive) conditions (McDowell, 1994, 1996).

4  Tony Cheng, Ryoji Sato, and Jakob Hohwy Finally, we also wish to note that due to the Covid pandemic, the production of this volume has been seriously delayed. However, this year also marks the 10th anniversary of Clark’s “Whatever Next?” and Hohwy’s The Predictive Mind, and we take this as a happy coincidence that we can all celebrate. References Cao, R. (2018). Computational explanations and neural coding. In M. Sprevak, & M. Colombo (Eds.), The Routledge handbook of the computational mind (pp. 283–296). New York: Routledge. Chalmers, D. J. (1995). On implementing a computation. Mind and Machines, 4, 391–402. Chalmers, D. J. (1996). Does a rock implement every finite-state automation? Synthese, 108, 309–333. Chalmers, D. J. (2012). A computational foundation for the study of cognition. Journal of Cognitive Science, 12, 323–357. Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 30(3), 181–204. Clark, A. (2016). Surfing uncertainty: Prediction, action, and the embodied mind. New York: Oxford University Press. Dolega, K., & Dewhurst, J. E. (2021). Fame in the predictive brain: A deflationary approach to explaining consciousness in the prediction errorminimization framework. Synthese, 198(8), 7781–7806. Friston, K. (2003). Learning and inference in the brain. Neural Networks, 16(9), 1325–1352. Friston, K. (2009). The free-energy principle: A rough guide to the brain? Trends in Cognitive Sciences, 13(7), 293–301. Gibson, J. J. (1966). The senses considered as perceptual systems. Boston, MA: Houghton Mifflin. Gregory, R. (1970). The intelligent eye. London: Weidenfeld and Nicolson. Helmholtz, H. (1867). Handbuch der physiologishen optik. Leipzig: Leopold Voss. Hinton, G. E. (2007). Learning multiple layers of representation. Trends in Cognitive Sciences, 11(10), 428–434. Hohwy, J. (2013). The predictive mind. Oxford: Oxford University Press. Kirchhoff, M. D., & Kiverstein, J. (2019). Extended consciousness and predictive processing: A third wave view. New York: Routledge. McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception: I. An account of basic findings. Psychological Review, 88(5), 375–407. McDowell, J. (1994). The content of perceptual experience. Philosophical Quarterly, 44(175), 190–205. McDowell, J. (1996). Mind and world. Cambridge, MA: Harvard University Press. Metzinger, T., & Wiese, W. (Eds.) (2017). Philosophy and predictive processing. MIND Group. Ransom, M., Fazelpour, S., & Mole, C. (2017). Attention in the predictive mind. Consciousness and Cognition, 47, 99–112.

Introduction 5 Rao, R. P. N., & Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2, 79–87. Searle, J. (1992). The rediscovery of the mind. Cambridge, MA: MIT Press. Sprevak, M., & Colombo, M. (Eds.) (2018). The routledge handbook of the computational mind. New York: Routledge. Stephan, K. E., & Mathys, C. (2014). Computational approaches to psychiatry. Current Opinion in Neurobiology, 25, 85–92. Varela, F., Thompson, E., & Rosch, E. (1991). The embodied mind: Cognitive science and human experience. Cambridge, MA: MIT Press. Whyte, C. J., & Smith, R. (2021). The predictive global neuronal workspace: A formal active inference model of visual consciousness. Progress in Neurobiology, 199, 101918. Wiese, W. (2018). Experienced wholeness: Integrating insights from Gestalt theory, cognitive neuroscience, and predictive processing. Cambridge, MA: MIT Press. Williford, K., Bennequin, D., Friston, K., & Rudrauf, D. (2018). The projective consciousness model and phenomenal selfhood. Frontiers in Psychology, 9, 2571.

Part I

Varieties of Experiences

1

Deep Neurophenomenology An Active Inference Account of Some Features of Conscious Experience and of Their Disturbance in Major Depressive Disorder Maxwell J. D. Ramstead, Wanja Wiese, Mark Miller, and Karl J. Friston

1.1 Introduction This chapter uses the free-energy principle and active inference to make sense of some central facets of the first-person conscious experience of human beings. The most enthusiastic proponents of the free-energy principle and active inference claim that these frameworks may provide us with a unified theory of the mechanics of mind (Clark, 2015; Hohwy, 2014). The free-energy formulation originated as a principle to account for the function, structure, and dynamics of the brain (Friston, 2010, 2005); notably, not as a theory of consciousness. Formally speaking, the mathematical apparatus of the free-energy formulation provides us with a statement of some central properties of any system that exists for an appreciable amount of time, and that has a degree of independence from its embedding environment: it is a variational principle (c.f., Hamilton’s principle of least action) that offers a theory of thingness (Friston, 2019). Active inference is a process theory derived from the free-energy principle that allows us to model the dynamics (i.e., behavior) of systems that obey the free-energy principle. The free-energy formulation has been used to model biological phenomena at several spatial and temporal scales (Kirchhoff, Parr, Palacios, Friston, & Kiverstein, 2018; Ramstead, Constant, Badcock, & Friston, 2019); from micro-scale phenomena, such as dendrite formation in nervous tissue (Kiebel & Friston, 2011); to meso-scale phenomena, e.g., morphogenesis—the self-organized patterning of biological systems (Friston, Levin, Sengupta, & Pezzulo, 2015; Kuchling, Friston, Georgiev, & Levin, 2020); all the way to macro-scale processes, such as speciation and evolution by natural selection (Campbell, 2016), and the group behavior of humans—e.g., the enactment of cultural practices premised on shared expectations about behavioral conformity and about the salience of stimuli DOI: 10.4324/9781003084082-3

10  Maxwell J. D. Ramstead, Wanja Wiese, Mark Miller, et al. (Ramstead, Veissière, & Kirmayer, 2016; Veissière, Constant, Ramstead, Friston, & Kirmayer, 2019). The use of the free-energy principle to explain the features of conscious experience in humans has been more limited—notable attempts include (Friston, Wiese, & Hobson, 2020; Kirchhoff & Kiverstein, 2019; Kiverstein, Miller, & Rietveld, 2020; Wiese, 2017). Our purpose here will be to attempt just such an exercise. In exploring the manner in which the flow of conscious experience and meaning-making is generated by the dynamics of the embodied, encultured brain, we aim to pursue the projects of neurophenomenology (Petitot, 1999; Ramstead, 2015) and neural hermeneutics (Friston & Frith, 2015; Gallagher & Allen, 2016). These projects aim to create mutually illuminating cross-fertilization between the philosophies of conscious experience and the sciences that study what we call the mind. Phenomenological philosophy, broadly speaking, is about the development of insights into the “things themselves” through rigorous descriptions of first-person phenomenological experience (Heidegger, 2010; Husserl, 2012). Hermeneutic philosophy descends from phenomenological philosophy and concerns the phenomenon of interpretation, namely, how humans come to understand and interpret each other and their shared historical world (Gadamer, 2003). Neurophenomenology is the project to provide a naturalistic explanation of first-person conscious experience by appealing to the workings of mechanisms in the brain, and also the body and culture (Petitot, 1999; Ramstead, 2015). Neural hermeneutics, similarly, aims to provide a naturalistic explanation of the human capacity to understand other humans, to interpret their utterances and behaviors as reflective of their person, and to arrive at a mutual understanding by appealing to advances in neurosciences and other sciences of the mind (Friston & Frith, 2015; Gallagher & Allen, 2016). Our contribution to the ongoing discussions—regarding the use of active inference to model consciousness—will be to show that two aspects of first-person experience, which otherwise might seem quite mysterious, can be explained by appealing to a deeply structured generative model under active inference. The aspects of conscious experience that we will explore are, at least arguably, some of the most central aspects of human consciousness that make it properly human consciousness. First, our conscious experience seems to have a nested temporal structure (Husserl, 1991; Wiese, 2017). That is, our conscious experience spans several temporal scales. The flow of our conscious experience seems to integrate events that span several different timescales. Our experience of the present moment is temporally thick; with the past still “living,” as if in part retained, in the experience of the present moment, and the future already preempted (Edelman, 2001). Second, the global structure of

Deep Neurophenomenology 11 our experience seems to be concern or care (Heidegger, 2010; Kiverstein et al., 2020). Phenomenological philosophy suggests that the global structure of conscious experience is summarized in concern or care (in German, Sorge). This means that our conscious experience is directed toward, and motivated by, events and things in our world, which have meaning and significance to us. Things do not leave us unaffected, but rather, we are concerned with, and compelled by, the events and entities that our consciousness discloses. Moreover, our conscious experience seems to be deeply structured by our experience of other humans. That is, our experience is one of concern for other humans and for ourselves, and our experience is always at least implicitly filtered through the lens of other minds (Constant, Ramstead, Veissière, & Friston, 2019; Ramstead et al., 2016; Veissière et al., 2019). These two features of experience, its nested temporal structure and its deep connection to care, will be our explanatory target in this chapter. We will see that both features of conscious experience follow naturally from active inference, premised on the right kind of deeply structured generative model. The structure of the remainder of this chapter is as follows. In the next two sections, we review the free-energy principle and the active inference formulation, with a special focus on the generative models that figure in this account. In the fourth section, we review how this approach can shed light on the temporally nested structure of human experience. In the fifth section, we consider care and concern, as well as the manner  in which concern and care are realized in active inference. We use active inference to shed new light on the phenomenon of empathy, on the effects of social embedding on lived experience. In the final two sections, we consider the breakdown of the normal sense of intersubjective agency and layered conscious experience in depression, which lead to a loss of concern for things in the world and a loss of embeddedness in a shared social world. 1.2  An Introduction to the Free-Energy Principle and Active Inference 1.2.1  The Free-Energy Principle: A Theory of Thingness

The free-energy principle originated as a theory of the brain (Friston, 2010), but formally speaking, it is better understood as a mathematical framework allowing us to study and model systems that exist over some appreciable timescales (Friston, 2019; Friston et al., 2020; Ramstead, Friston, & Hipólito, 2020). For a system to exist, in this sense, means that it is able to persist as a system, maintaining its structure and internal organization over some relevant timespan and revisiting the neighborhood of the same characteristic states again and again (Ramstead, Badcock, & Friston,

12  Maxwell J. D. Ramstead, Wanja Wiese, Mark Miller, et al. 2018). More technically, the free-energy principle tells us about the properties that must be true of any system that is endowed with a degree of conditional independence from its embedding environment and that exists, in the sense of having and occupying characteristic states, which make it “the kind of thing that it is” (Ramstead et al., 2018; Ramstead, Kirchhoff, & Friston, 2019). Under the free-energy principle, these conditions—the existence of a boundary between a particular system and its environment, and the presence of a set of attracting (i.e., characteristic) states—are formalized using the constructs of Markov blankets, nonequilibrium steady states, and generative models (Ramstead et al., 2019). A Markov blanket is a partition that is introduced in a system of interest to separate the states that make that thing the particular kind of thing that it is (the so-called particular states of a system) from the “external” states that it is not (Friston, 2013, 2019). Technically, we individuate a system of interest, using the Markov blanket formalism to separate the states that are internal to the system of interest from the background of states that constitute its embedding environment (Constant, Ramstead, Veissière, Campbell, & Friston, 2018; Ramstead et al., 2018). This is accomplished by defining a third set of states that mediate the vicarious influence between internal and external states. This new set of states is known as a Markov blanket. The Markov blanket is constructed such that internal and external states are rendered conditionally independent of one another, given blanket states (Kirchhoff et al., 2018). The presence of a Markov blanket does not isolate the system from its environment; the partition merely introduces the structure of dependencies that mediate the causal effects of the environment on the organism—and vice versa (Ramstead et al., 2019). This construct implements formally the idea that a system, to exist, must be endowed with a degree of separation from its environment. The idea—that for a system to exist just means that it revisits the states that characterize that system—is implemented using the constructs of nonequilibrium steady states that underwrite the physics of self-organization (Friston, Parr, & de Vries, 2017; Ramstead et al., 2019, 2020). Statistically, the fact that a class of systems revisits the same set of states again and again means that such systems resist a tendency toward entropic decay, which is dictated by the fluctuation theorems that generalize the second law of thermodynamics (Esposito, Harbola, & Mukamel, 2009; Parrondo, Horowitz, & Sagawa, 2015; Seifert, 2012). For living systems, to find oneself in thermodynamic equilibrium with one’s environment is death—or at least decay and dissipation. Living systems exist far from equilibrium, in that their dynamics do not consume the gradients that generate them. Technically speaking, this is because their dynamics do not have something called detailed balance, which characterizes thermodynamic equilibrium

Deep Neurophenomenology 13 and dissolves the arrow of time. Living systems locally resist entropic decay by disorganizing their environments, such as to create energy and matter flows that sustain their own organization and structure—so long as they remain alive (England, 2015; Friston et al., 2015; Maturana & Varela, 1980). In other words, living things do not violate the second law of thermodynamics by existing, but rather conform to it exceptionally well because they create more entropy than would otherwise exist, through their activities and metabolism (Jeffery, Pollack, & Rovelli, 2019; Parr, Da Costa, & Friston, 2020). This global entropy production allows them to maintain themselves locally at a low entropy. This basic fact about the existence of living systems can be described probabilistically using the constructs of nonequilibrium steady-state density, which plays the role of a probabilistic generative model (Friston, 2019; Friston et al., 2020; Ramstead et al., 2019; Ramstead et al., 2020). If we write down a joint probability distribution (or, for continuous state spaces, a probability density) over all the states of a system at nonequilibrium steady state, then the states that are characteristic of the system will be occupied with a high probability, and the remaining, uncharacteristic states will be frequented with a low probability. When such a joint probability density underwrites the self-organization of a system, such that its dynamics (premised on this model) allow it to remain in the states associated with high probability, we say that the system is endowed with a nonequilibrium steady-state density and refer to the set of frequented states as attracting states, i.e., the set of states that the system will find itself in, on average and over time (Friston, 2019). In dynamical systems theory, the attracting set constitutes a manifold for the flow of the system. This means that the time evolution of states will be such that the trajectory of states is constrained to evolve on the surface of the manifold, which specifies all allowable combinations of states that are compatible with the existence of a system. Alternatively, this same density can be viewed as a description of what the system will be like when sampled at random. The key move behind the free-energy principle is to appreciate that the internal states of a system move on the same manifold as the external states and can therefore be interpreted as a statistical image of the external states. This introduces another manifold, namely, a statistical manifold where the internal states represent probability distributions over external states. If we call these probability distributions Bayesian beliefs, it will look as if internal states are engaged in Bayesian belief updating. Put simply, this means the internal states can either be regarded as flowing on the manifold of their attracting states or, crucially, updating their Bayesian beliefs about external states under some probabilistic model of external states (Friston et al., 2020). This model is the generative model above and is just the nonequilibrium steady-state

14  Maxwell J. D. Ramstead, Wanja Wiese, Mark Miller, et al. density. In the next section, we will see how these constructs are used to derive the dynamics or behavior of agents. But first, there are a few more things to say about the free-energy principle. 1.2.2  The Free-Energy Principle as a Formal Semantics

The free-energy principle is a statement of the necessary connections between statistical physics and the Bayesian belief updating. The systems just described under the free-energy formulation have a dual aspect (Friston et al., 2020; Ramstead et al., 2020). On the one hand, these systems have a physical structure, subject to thermodynamic and other energetic constraints: they are composed of physically real states, states that exist in the sense that they can be assigned a position in space-time. On the other hand, because internal and external states are coupled via the blanket states, they must also be the case that the internal states constitute Bayesian beliefs about the external influences on blanket states (e.g., sensory observations), and of blanket states (e.g., action). In short, (parameters of) Bayesian beliefs about external states are encoded by the physical states that constitute the interior of the system. The free-energy principle then goes on to furnish an explanation of the intentionality of living systems, that is, the fact that their behavior seems to aim at states of affairs in the world. Under the free-energy principle, this is unpacked as the ability to act as a function of (Bayesian) beliefs about what might have caused sensory states (Ramstead et al., 2020). In sum, when a system possesses a Markov blanket—and is implicitly at nonequilibrium steady state—the internal states of a system that are shielded by the Markov blanket will come to encode the parameters of probability densities defined over external states (Friston, 2019). Formally speaking, the free-energy principle says that if a system has a Markov blanket, then the system can be described as if it had Bayesian beliefs about the external world. In statistics, a joint probability density over the external states and particular (blanket and internal) states is known as a generative model. It is called a generative model because knowing the full joint distribution allows us to generate consequences from causes; here, blanket states from external states or sensations from states of affairs in the environment. This means that we can interpret the internal dynamics as a gradient flow on a free energy functional of blanket states and a generative model of how those states were caused. Crucially, the generative model is also defined over fictive external states (i.e., random variables) that the system “believes” causes its sensory states. This means that the free-energy principle allows us to systematically assign mental or semantic contents to physical states of a system: it is a formal theory of semantics (Ramstead et al., 2020).

Deep Neurophenomenology 15 1.3  Nested Generative Models 1.3.1  The Basics of Generative Modeling: State and Precision Estimation

In this section, we analyze in some detail the generative models that underwrite active inference. The joint probability density that we associate with the system’s most frequented states can be factorized into a product of prior probabilities and likelihoods. When they are decomposed in this way, the densities in question can be written as graphical generative models, which capture the dependencies that are entailed by the factorization (Friston et al., 2017). Basically, the generative model can be factorized to highlight the dependencies between (hidden and observable) states (see Figures 1.1 and 1.2).

Figure 1.1 A generative model for (precision weighted) perception. Minimizing free energy corresponds to maximizing the evidence for a generative model. This is also known as model inversion: namely, estimating a posterior probability density over some (external) states of interest, given some data (o), prior beliefs about states (D), and a likelihood mapping from states to data. The likelihood mapping A is itself parameterized by a precision term, γ, which quantifies the degree of reliability associated with that mapping Source: From Smith et al. (2020), based on a template from Hesp, Smith, Allen, Friston, and Ramstead (2019).

16  Maxwell J. D. Ramstead, Wanja Wiese, Mark Miller, et al.

Figure 1.2 A deep generative model of action. Generative models can be augmented to infer the most likely plan or policy in play and thereby select contextsensitive and characteristic actions. To do so, we first equip the generative model with beliefs about state transitions (denoted by B). A policy is a sequence of beliefs about state transitions; the agent believes that what it must be doing (i.e., the policy it is pursuing) is pursuing the policy associated with the least expected free energy. The ensuing selection arbitrates between two influences: first, that of expected free energy, G, itself dependent on a prior preference for certain kinds of outcomes (encoded in the C matrix); and second, that of a prior over policies, denoted E, which encodes habits or the cumulative effects of culture Source: From Smith et al. (2020), based on a template from Hesp et al. (2019). For cultural learning and the learning of priors over policies, see Constant et al. (2019) and Veissière et al. (2019).

The simplest generative models include probabilistic beliefs about states (denoted s), data or observations (denoted o), and how they are related. Generative models for discrete states comprise a likelihood mapping (denoted A), which encodes a conditional probability distribution or density over the data expected, given hidden states; and a set of prior beliefs about the baseline rate of occurrence of states (denoted D). Here, s denotes the (external) state that the system is trying to infer, as the most probable cause of its sensory states. The (parameters of the) beliefs about external states are encoded by the internal states of the system. Equipped with such a model, a system can infer the most probable state of the world, given its sensory data, from its prior beliefs about the base rates of hidden causes.

Deep Neurophenomenology 17 A crucial part of the story on offer is that the system is trying to infer not only the causes of its data but also the reliability of the signals that it must process. The construct of precision quantifies this belief about the reliability of signals. Mathematically, the precision of some signals is the inverse variance of a signal: the larger the variance around the mean, the less precise the signal. In the simple generative model just described, the likelihood matrix, A, is augmented by a precision term, γ. Crucially, in the generative models often used in simulating active inference, these precisions are themselves states of the system that must be inferred, based on other prior beliefs and sensory data. The majority of recent work on interoception (i.e., perception of states internal to the body) in active inference has focused on how affective states arise from inferences about states of the body and about the precision associated with these inferences (Barrett, 2017; Barrett & Simmons, 2015; Seth, 2013; Seth & Friston, 2016). Self-evidencing means that the agent can use their Bayesian beliefs as the basis for predictions about afferent signals generated internally by their body. Mismatches between the signals predicted by the generative model and the actual sensory states can be life-threatening, so actively drive corrections aiming to minimize this discrepancy in the form of autonomic reflexes and allostatic behaviors. Returning to homeostatic setpoints is nothing more than the active minimization of precision-weighted prediction errors (i.e., surprisal or free energy). A series of recent papers have proposed a new view of the role that affective life plays in active inference (Hesp et al., 2019; Kiverstein, Miller, & Rietveld, 2019; Miller & Clark, 2018; Nave, Deane, Miller, & Clark, 2020; Van de Cruys, 2017; Wilkinson, Deane, Nave, & Clark, 2019). In particular, bodily feelings play an important (yet still largely underappreciated) role in updating precision expectations on action policies. Precision, as indicated above, refers to the reliability or salience of brain-bound signals, e.g., the reliability of a prediction or a prediction error. The higher the precision, the greater the impact the associated signal (e.g., prediction errors) will have on processing within the system. Higher precision on sensory signals leads to those signals biasing the system in certain ways, while lower sensory precision means that higher level predictions—based upon prior beliefs—play a predominant role in determining the experience outcome (Friston, Schwartenbeck, et al., 2014). Precision estimation, and the weighting of signals on the basis of inferred precision (aka precision weighting), thus allows the predictive system to make the most of the information it has available to it—selectively amplifying those signals that it has learned have a higher probability of leading to valuable outcomes. An important twist here is that predicting precision requires a generative model with states that cause changes in precision.

18  Maxwell J. D. Ramstead, Wanja Wiese, Mark Miller, et al. The premise here is that these states are quintessentially affective. In other words, they represent hypotheses that best explain the evidence for changes in the reliability of sensory impressions; especially in the interoceptive domain. Put simply, “I am anxious” is the most parsimonious explanation for—and cause of—certain patterns of interoceptive signals of physiological arousal. This suggests that only higher forms of life may have sufficiently deep or elaborated generative models to support this kind of affective or emotional inference. In short, to “feel” is to infer the precision of your Bayesian beliefs. In psychology, this is often cast in terms of sensory attention and attenuation (Clark, 2013; Limanowski, 2017; Seth & Friston, 2016). 1.3.2  Deep Generative Models and Policy Selection

The basic generative model presented in Section 1.3.1 does not allow the agent to do very much. An agent that is endowed with such a model is able to infer the most probable causes of its sensory states from moment to moment, but it cannot project its inferences into the future. Indeed, an agent so endowed does not do anything at all since the model with which it is equipped does not infer its actions. To act upon the world, the generative model has to be able to generate the consequences of action that immediately bring something crucial to the table; namely, a model of states in the future. Generative models can be augmented with temporal depth, which is necessary for the agent to perform counterfactually deep inference and for it to act. In machine learning, this is known as planning as inference ­(Attias, 2003; Botvinick & Toussaint, 2012; Kaplan & Friston, 2018; Maisto, Donnarumma, & Pezzulo, 2015). A deep generative model entertains beliefs about how states evolve over time, and how those evolving states relate to sensory outcomes. More precisely, a temporally deep generative model contains beliefs about the way that states are propagated through time independently of how they are sampled, as well as counterfactual beliefs about the observations that the agent would make, conditioned on these (beliefs about) state transitions under different policies or plans. Thus, we augment the simple generative model with beliefs about state transitions (a B matrix), which embody beliefs about which states typically follow others. In the case that the generative model leverages beliefs about state transitions (i.e., when we have a deep generative model), it can be further augmented to enable the selection of actions. In active inference, action selection is implemented by inferring the most likely policy, cast as a sequence of state transitions. In other words, the things that would happen if “I pursued this course of action.” These policies, denoted π, entail

Deep Neurophenomenology 19 a Bayesian belief about a sequence of state transitions (i.e., a vector of indices for B matrices). Generally, in active inference, one specifies a target data distribution, denoted C, which encodes preferred outcomes that the system will realize through action. These prior preferences effectively encode the nonequilibrium steady-state distribution above and underwrite a certain kind of self-evidencing that is goal pointing and quintessentially enactive. Policy selection is thus driven in real time by sensory feedback that is predicated on Bayesian beliefs about a future that has yet to be realized. At each timestep, the discrepancy between the sensory data being registered and the sensory data that was expected under the generative is computed. This discrepancy is known as variational free energy or, equivalently under some simplifying (Gaussian) assumptions about noise, prediction error. In active inference, the policy that is selected is the one associated with the least amount of free energy expected in the future. The expected free energy under each policy is denoted by G. Expected free energy can be used to quantify the affordance of a given policy; i.e., how much the agent is compelled, in the moment, to act on the possibilities offered by that policy (Parr & Friston, 2017; Ramstead et al., 2018). This use of the term “affordance” is related (albeit a bit loosely) to the use of affordance in neurobiology, where it is used to quantify the amount of information available to guide action that is directly readable from sensory surfaces (Gibson, 1979) or to name a relational property that holds between the embodied skills of an agent and relevant features of its ecological niche (Chemero, 2009). Thus, predictive organisms select which courses of action to pursue on the basis of the predicted sensory consequences of those actions (i.e., they select courses of action that they believe will bring them closer to their preferred sensory states or maximize information gain, if the affordance is predominantly epistemic). Selecting actions in this way maximizes the probability that the organism will come to occupy the sensory states they believe they occupy and do so in an informed way. Finally, we can associate a precision with the expected free energy itself. This precision is, in a nutshell, a measure of confidence about what to do next: it tells us how well the system believes that it is navigating the world. Heuristically, if every option generates a lot of expected free energy, then there is no clear way forward and the agent loses confidence in its beliefs about what it is doing. The update term for this precision can be thought of as reflecting the difference between the expected free energy and the free energy actually encountered. It has been hypothesized that these update dynamics relate to felt emotional valence: when more free energy is generated than expected, this is evidence that the system is doing poorly, and vice versa. We pursue this in the next section.

20  Maxwell J. D. Ramstead, Wanja Wiese, Mark Miller, et al. 1.3.3  Hierarchical Generative Models and Affective Inference

Crucial to our purposes here is that, in addition to being endowed with temporal depth, the generative models can have a hierarchical or layered structure. With the above apparatus in play, new hierarchical levels of the generative model can be defined, which endow the agent with the ability to make inference about its own inferences. In a hierarchical model, higher levels of the model take as their input the state and precision estimations ongoing at the lower level and use them as evidence for further inference in conjunction with the prior beliefs held at that level. More precisely, in such a scheme, a new level of state inference arises, which takes as its data the posterior state and precision estimates at the lower level. In this scheme, lower level states and precisions are linked to higher level states through a superordinate likelihood mapping, A: formally, the posterior state and precision estimates are treated as internal observations, on the basis of which inferences about those subordinate-level inferences can be made. Each additional layer of the model thus encodes successively slower regularities that span successively larger spatial scales. Please see figures 1.3 and 1.4 below. In recent work, affective inference has been modeled using the above hierarchical scheme (Hesp et al., 2019). In this hierarchical model, posterior

Figure 1.3 A hierarchical model of self-evaluation. In active inference, superordinate levels of the generative model can be induced, which take state and precision estimations at the subordinate levels as data for further inference Source: From Smith et al. (2020), based on a template from Hesp et al. (2019).

Deep Neurophenomenology 21

Figure 1.4 Precision and landscapes of affordance. This schematic illustrates the important role of a particular precision, γ π ; namely, the precision afforded prior beliefs about policies based upon expected free energy: π = σ ( Eπ + Fπ + γ π ⋅ Gπ ) . As the precision falls, the affordance of each policy shrinks, and the landscape of affordances is flattened. This means that there is a loss of confidence in which particular policy to pursue. This flattening of the landscape of affordances (i.e., profile of expected free energy G) plays an important role in what follows

state and precision estimates at a first hierarchical level are fed to a superordinate level of the model. In this work, affective states are higher order states that are inferred as the causes of lower level inference. Affective states harness an agent’s beliefs about emotional valence, i.e., how it feels about what it is doing. These states act as a domain general controller, which tracks and assigns precision estimations, relative to the performance of selected action policies (Hesp et al., 2019). This pre-reflective, secondorder information reflects an agent’s perceived fitness (i.e., how well it is doing) and allows the agent to infer how apt its plans are, given its concerns, skills, and the demands of the present context. Of particular relevance for this kind of hierarchical inference is the precision of beliefs about policies based upon expected free energy, G, at the lower level, which the superordinate level takes as evidence to infer “how well I am doing.” Essential to optimizing precision expectations is a

22  Maxwell J. D. Ramstead, Wanja Wiese, Mark Miller, et al. sensitivity to the rate at which free energy is reduced over time, relative to expectations about the rate at which error is generated (Joffily & Coricelli, 2013). Affective inference captures this idea using the construct of affective charge, which is the difference between the expected free energy following some course of action and the free energy that was actually generated (Hesp et al., 2019). Heuristically, if the rate of free-energy reduction is higher than expected, then this is evidence that “I am doing well (better, in fact, that I had anticipated).” In contrast, if free energy falls at a rate lower than expected, this indicates that the system’s predictions are failing to lead to the expected outcomes, and so precision assigned to the free energy itself is decreased. It has been suggested that these anticipative precision dynamics are registered by the organism as embodied feelings (Hesp et al., 2019; Joffily & Coricelli, 2013; Kiverstein et al., 2019; Van de Cruys, 2017; Wilkinson et al., 2019). Building on these proposals, it has been suggested that higher order affective inference—based on anticipative precision dynamics— corresponds to felt emotional valence (Hesp et al., 2019). Positively and negatively valenced bodily feelings, then, reflect better-than- (and worsethan-) expected free-energy reduction (respectively). Think, for example, of the frustration and agitation that commuters feel when their train is late, even by only a few minutes. These negative feelings are, on the one hand, an explanation for the loss of precise plans at a lower level of hierarchical processing. On the other hand, this can be viewed as the higher level recognizing the state of angst induced by uncertainty and a failure to resolve free energy. The ensuing angst properly entertains alternative policies that, a priori, may not have been considered—they may provoke the agent to check the transit authority for delays or find an alternative (more reliable) means of transport such as a taxi. Another way of saying this is that positive and negative feelings are a reflection of the quality of the engagement between the organism and its environment; c.f. Polani (2009). Valenced feelings are, then, an embodied part of the valuation process, acting as a barometer that continually informs the agent how it is faring in its predictive engagements Barrett (2006). Predictive systems like us evolved to make use of this embodied information about how well they are doing in reducing free energy to adjust the precision of inferred policies. 1.3.4  Nested Systems of Systems

The scope of the free-energy principle does not stop at the boundary of the skull. The formalism that underwrites the free-energy principle applies recursively to all components of the system (Friston, 2019; Ramstead et al., 2018). After all, in most systems of interest, the components of a system

Deep Neurophenomenology 23 are also systems; in the sense outlined above or having a degree of separation from the superordinate system in which they are embedded. Using the formalism of nested Markov blankets, one can model the dynamics of multiscale systems. The idea is to define the system of interest as a stack of nested systems, with faster and smaller component systems integrated into larger and faster wholes as we ascend a nested hierarchy of spatial and temporal scales (Friston et al., 2015; Kuchling et al., 2020). For example, the human heart is composed of fast electrochemical interactions among human heart cells. More precisely, the interactions between heart cells are the basis from which we can define heart tissues, as a collection of slower, coordinated interactions among cells; and we can then define the heart itself as a slower coordinated beating organ; and so on. Technically, the way this works is that a higher level pattern becomes achievable for some component parts because they all share the same generative model (Friston et al., 2015; Kuchling et al., 2020). Recall that, in this context, the generative model harnesses beliefs about the typical sensory consequences of states in the world, and especially the sensory consequences of different courses of action. If a group of agents share a generative model, then they share the same beliefs about what must be causing their observations. Agents sharing a generative model will thus tend to interpret the causes of their sensory states in shared ways (singing from the same hymn sheet), with the net effect that all partners are able to settle into a mutually reinforcing steady state at the superordinate level. A recent trend in active inference modeling is its application to social and cultural dynamics, especially in humans (Constant et al., 2019; Kirmayer & Ramstead, 2017; Ramstead et al., 2016; Vasil, Badcock, Constant, Friston, & Ramstead, 2020; Veissière et al., 2019). This line of work leverages the idea that human sociality is premised on human agents’ sharing of the same, or of a sufficiently similar, generative model, which allows agents to achieve a target social or cultural configuration (e.g., the enactment of a specific cultural practice) at the superordinate group level. In humans, this is especially important, especially in relation to self-evidencing, as we have discussed above. Indeed, for humans, most of the priors in our generative models are about other humans. The beliefs that we have about state transitions mostly concern the states of other humans (and our interaction with them), and we assess the situations with which we engage daily “through the minds of” other humans (Ramstead et al., 2016; Veissière et al., 2019). That is, we see the world the way that we expect others would see it. Thus, the central aspects of the generative models of humans concern how we live with others in a shared prosocial world, and in a shared prosocial predicament. We shall see below that a breakdown in this embeddedness leads to distressing and sometimes bizarre phenomenology of, e.g.,

24  Maxwell J. D. Ramstead, Wanja Wiese, Mark Miller, et al. major depression. This completes our review of the free-energy principle and active inference. We turn now to our application of this framework to model aspects of human consciousness. 1.4 Active Inference and the Nested Temporal Structure of Consciousness Time consciousness has many interesting and puzzling aspects (see Arstila & Lloyd, 2014; Dainton, 2018; Dorato & Wittmann, 2020; Le Poidevin, 2019; Phillips, 2017). Here, we will focus on the experience of continuity on long timescales. In order to bring out clearly what we mean by that, we will first differentiate it from other types of experienced continuity. Continuity can be experienced at different timescales (Piper, 2019; Pöppel, 1997; Wiese, 2017). Among the simplest forms are visually perceived motion and aurally perceived music. In motion perception, we do not just see an object at one place and then at another place. We perceive an object as moving from one place to another (Hoerl, 2015; Phillips, 2011). That is, if the object is moving continuously, we typically experience the object as moving continuously. Contrast this with the experience of two stimuli briefly flashed on different parts of a screen, with a separation of about a second. The first stimulus will not be experienced as moving from one place to another. It will be experienced as appearing and disappearing. The appearance of the second stimulus will be experienced as a distinct event, disconnected from the first. Still, there is a sense in which continuity can be experienced even for events that are experienced as distinct. Insofar as the events are parts of a single, continuous stream of consciousness, there may be an experienced continuity for objects that are experienced as non-simultaneous and nonidentical (Dainton, 2006). This can more clearly be illustrated by the example of music perception (Phillips, 2010). When a sequence of notes unfolds continuously (say, notes played legato by a single instrument), we typically also experience each note continuously flowing into the next. On very short timescales, on the order of tens of milliseconds, the continuity is so strong that we hardly experience any temporal parts (Phillips, 2011). Conceptually, we may of course be able to distinguish the order of events, but even events that are experienced as non-simultaneous and ordered (first event A, then B) can be experienced as occurring now. Furthermore, even if notes are not experienced as “flowing into each other” (say, notes played staccato by a single instrument), there can be an experienced continuity to the extent that the notes are experienced as parts of a single temporal Gestalt (Denham & Winkler, 2015; Green, 2019; Winkler, Denham, & Nelken, 2009). One way to analyze the subtle differences between different types of continuity is to consider the

Deep Neurophenomenology 25 nested hierarchical structure of conscious experiences (Piper, 2019; Wiese, 2017). We experience events at different timescales, and these “elementary time experiences” (Pöppel, 1978) or “temporal windows” (Pöppel, 2009) are often related by part-whole relations (Wiese, 2017). At each time window, there can be experienced connections between objects and events, and the larger the window, the weaker the experienced connection tends to be. There is a strong experienced continuity between objects that are tracked over brief time intervals (on the order of tens of milliseconds), and this becomes most obvious when the objects are changing but still experienced as identical, e.g., in apparent motion (see Herzog & Ogmen, 2014). Events that are temporally separated by a few hundred milliseconds, and are experienced as distinct, can still be experienced as connected, to the extent that they are all experienced as occurring now, which becomes manifest, e.g., in temporal illusions involving postdictive effects (see Herzog, DrissiDaoudi, & Doerig, 2020; Stiles, Li, Levitan, Kamitani, & Shimojo, 2018). For instance, in the flash-lag illusion, a continuously moving visual stimulus and a briefly flashed stimulus are perceived as being displaced, although both stimuli are displayed at the same location (Eagleman & Sejnowski, 2000). In more recent work, it was shown that the perceived direction in which a vertical line is tilted can depend on stimuli that are presented up to 450 ms after the display of the first tilted line (Herzog et al., 2020). This suggests that the perceived properties of a stimulus are a function of events that occur shortly after the occurrence of the stimulus. More generally, what we experience as happening right now covers not just an instant, but a temporally extended interval. This phenomenon is also called the specious present, a term popularized by James (1890), who adopted it from Robert Kelly, alias E.R. Clay (see Andersen & Grush, 2009).1 Within this interval, events are experienced as connected by being joint parts of a single time window (i.e., they are both experienced as part of the specious present). The kind of inference premised on a deep generative model—that figures in active inference—can easily explain the nested structure of conscious experience (Wiese, 2017). In active inference, hierarchically superordinate levels of the generative model constrain inference ongoing at subordinate levels by providing top-down contextual information (technically, these constraints are called empirical priors). That is, inference at the upper level of the model furnishes priors (i.e., the D matrices) at the lower level. The whole system arrives at a synchronized inference about “what is going on” by instantiating a layered inference process that integrates the contributions of each level, separated by temporal scale, into an inferential dynamics that integrates the whole structure. In virtue of the separation of temporal scales, a moment at the higher level can place constraints on temporally extended sequences, narratives,

26  Maxwell J. D. Ramstead, Wanja Wiese, Mark Miller, et al. or trajectories at the lower level for any hierarchical depth under consideration. For example, this fairytale places empirical priors on this narrative, which places empirical priors on this sentence, which places empirical priors on this word, which places empirical priors on this letter, and so on. At each level, the succession of “specious moments” at one level subtended the “specious moment” at a higher level (Friston et al., 2017; George & Hawkins, 2009; Rikhye, Guntupalli, Gothoskar, Lázaro-Gredilla, & George, 2019). An experienced continuity between events can also occur at longer timescales. Events that occurred in the recent past, or anticipated events that will occur in the very near future, are not experienced as present. Still, there can be an experienced connection between these events and present events; cf. Kelly’s (2005) example of the opera singer, as well as auditory completion over multi-second intervals (see McWalter & McDermott, 2019). In other words, continuity is not restricted to the specious present, but the specious present is also experientially connected to the recent past and near future (Noë, 2006). Conceiving the multiple time windows as a nested hierarchy, we can account for the experienced connection between temporally distinct events: such events are experienced as distinct temporal parts of a temporal whole. Crucially, the perceptual whole is more than the sum of its parts. In apparent motion perception, we do not merely see an object at different places at different times; we “see” the motion—because “this thing is moving” is the best explanation for this sequence of sensory impressions. The fact that motion is a constructed experience (i.e., inferred) is evidenced, for instance, by the phi phenomenon, a “pure motion sensation” induced by two flickering stimuli A and B (Wertheimer, 1912). Crucially, although motion is perceived between A and B, the stimuli themselves are not perceived as moving, and they are experienced as non-identical. That is, there is a sensation of motion from A to B, without perceiving A as moving to the location of B, or vice versa—for a discussion of how this differs from ordinary apparent motion (beta movement) (see Steinman, Pizlo, & Pizlo, 2000; Wagemans et al., 2012; Wiese, 2018). This strongly suggests that when we perceive an object as moving, the experienced motion constitutes an additional content to experiencing the object at different successive locations. Similarly, when we perceive a melody, we not only hear one note after the other, but we also hear a melody and rhythm. This is likewise a deeply constructed percept. We submit that this constructed content is explained in the active inference account as nothing more or less than belief updating at a hierarchically superordinate level of the model. Each level of the model adds its own hidden or latent states, operative at their own temporal (and indeed, spatial) scale, which contextualize inference at the lower level of the model.

Deep Neurophenomenology 27 These scale-specific states can be modeled as level-specific state estimates, which add scale-specific detail to the ongoing perceptual experience. This analysis can be extended to larger timescales. However, it is not obvious that there is an experienced continuity between the immediate present and remembered events that occurred, say, a few days ago (or between remembered events and anticipated events that will take place in a few weeks). We will argue that there is an experienced continuity on such longer timescales as well; for instance, the timescale of narrative and autobiography (Taylor, 2016). This becomes evident when we consider deviations from ordinary temporal experience in some cases of depression, as we will consider in the closing section. In particular, we will focus on aberrant time experience in which the future is experienced as blocked (Ratcliffe, 2012), and remembered events are experienced as locked in the past. In this way, both the future and past are lost (Fernandez, 2014). The difference between such experiences and ordinary experience is not that there is no experience of past or future events. Rather, the difference is that there is no experienced connection. Where does the connection come from in ordinary experience? We will argue that experiences of possibilities to act usually connect events on long timescales. For instance, say you remember that you called a friend a few days ago to invite them to a hiking trip. Your friend was keen on going, and you agreed on a date in a fortnight. Now you are by yourself, studying the route that you both planned, imagining what it will be like to enjoy the landscape together. You experience these future events as possibilities, and, more specifically, as possibilities that are available to you. They are available to you not only because of things you can do right now but also because of things you did in the past (e.g., planning the route, making a date with your friend). These possible actions are parts of a more general possible action, i.e., spending time with your friend. Furthermore, this is something you experience not only as being possible right now (you could call your friend immediately), but also as having been possible, and as something that will be possible. We will argue that such possibilities for action, an experience of “I can,” can experientially connect temporally separated events at large timescales. This accounts for the difference between ordinary temporal experience on the one hand (in which the future is open, and the present arises from the past) and deviant temporal experience in some cases of depression on the other hand (in which the future is blocked, and the past is locked). Furthermore, what accounts for this difference is structurally similar to what accounts for continuity at smaller timescales. In active inference terms, what allows for long-timescale integration of conscious experience is that events are integrated through action, i.e.,

28  Maxwell J. D. Ramstead, Wanja Wiese, Mark Miller, et al. through policy selection. Recall that a policy is a belief about a specific action sequence, which is implemented as a series of beliefs about the way that states of the world evolve over time. This belief about state ­evolution effectively integrates disparate state transitions into a coherent whole that is articulated by the actions of the agent. Thus, the nested temporal structure of normal conscious experience can be viewed as a consequence of policy selection premised on a deep generative model, harnessing beliefs about state transitions through time, conditioned on the actions of the agent. 1.5  Care, Concern, and Affective States Having explored the nested structure of conscious experience, we turn now to care as the deepest structure of human consciousness. A common theme in phenomenology is the existence of a background sense of reality, or a style of belonging to the world that both underlies and makes possible that the organism can take up any kind of relation or attitude toward the world. This background sense of reality is in play when, for instance, we take the way the world appears to us at face value. For instance, when we see a neighborhood cat, we presuppose the presence of the cat as something real, something that can be interacted with and is potentially perceivable and accessible by others. Central to our sense of the cat’s reality is our understanding of how we and others can and will engage with the cat, and what is possible or not. Our sense of reality can vary over time, not only because the contents of the world continually vary but also because how we find ourselves in the world can also vary. Take for example the experience of walking home at night through a potentially dangerous part of town, and noticing that someone might be following you. While the fear that one feels is directed at the person, the situation itself can be said to be fearful (Heidegger, 2010, p. 180). Affective states are central to enactivist ideas about “sense-making,” that is, the capacity of living beings to enact or bring about a meaningful world through their actions (Colombetti, 2014; Thompson, 2007). Sensemaking here refers to the way that organisms come to have a point of view from which things in the world matter or have significance. Colombetti describes what she calls a “primordial affectivity” present in all living beings that acts as a “source of meaning” and that “grounds (makes possible) the richer and differentiated forms of sense-making in more complex organisms” (Colombetti, 2014, p. 19). Affective states here not only color a preexisting thought or perception of the world with an emotional quality, but rather, they are the very background upon which organisms can take up a meaningful relation to the world or adopt any attitude to the world whatsoever. We can thus think of these affective states as background feelings

Deep Neurophenomenology 29 of being alive, or what Ratcliffe has called “existential feelings” (2008, 2015, 2017). These existential feelings then represent a pre-reflective source of information about how well suited an agent is, given their skills, goals, and the context, to maintain their predictive grip. It is the agent’s bodily abilities (including habits and skills) that give them a sense of what they can do, of what is possible and what is not possible (Rietveld & Kiverstein, 2014). This quality operates in the background in all our engagements with the world, and it is through this bodily attunement that a person has the experience of living in a familiar world. We literally feel what is possible in any given situation. The ordinary feeling of concern and care can be accounted for under active inference by appealing to the affordance of policies. Ordinarily, when we are healthy, each policy is associated with an expected free energy that quantifies how compelling each policy is to the agent. Effectively, the affordance of policies colors our experience of the possibilities for action that the world affords. For social agents like humans, whose generative models essentially comprise information about other human agents, this means that ordinary lived experience is essentially about being with others in a shared social world. 1.6 Disturbances in the Care Structure of Consciousness in Major Depression This feeling of what is possible is a part of our background sense of belonging to the world (Heidegger, 2010, p. 180; Merleau-Ponty, 1982). It forms the background sense of reality because it has to be in place in order for the organism to take up any other kind of explicit propositional or evaluative attitude to the world. As we have seen, this feeling is underwritten in active inference by the affordance of policies, which for humans essentially involve our dealings with other human agents. The background sense of reality does not merely consist in the acceptance of certain propositions or statements of fact, but in a more fundamental trust in reality that Husserl described as the “natural attitude” (Husserl, 2012). It is through our skilled and feeling body that we can experience, think about, and act in and on the world. Feelings provide an organism with a dynamic sense of their existential relatedness to the world in the here and now, and their practical and caring involvement with things. They should not be cast as feelings of bodily changes taking place in the individual organism when considered in isolation from its environment. They are a part of the individual’s way of being-in-the-world reflecting how the individual finds themselves in the midst of things at a given time (Ratcliffe, 2008; cf. Fuchs, 2005). While the importance of this ongoing affective dialectic is easily

30  Maxwell J. D. Ramstead, Wanja Wiese, Mark Miller, et al. overlooked when it is functioning well, it reveals itself as essential to ordinary functioning when it goes missing. We have seen that the structure of our consciousness is concern and care, and that active inference explains the mechanics of enacting the policies that compel us most. What would it mean for this background sense of reality to be disturbed, and what would the experience be like? Everything would look and feel different. Our sense of what was possible would be transformed. Ratcliffe has described at length the various ways in which depression can transform one’s experience of what is possible (Ratcliffe, 2014). An experience of an object as enticing, valuable, or meaningful requires that we be open to experiencing things as enticing, valuable, or meaningful. In depression, there can be a shift in the style of encountering the world that is devoid of this sense of inviting possibility. People suffering from major depression often report experiencing their worlds as flat, uninviting, or empty of meaning (Badcock, Christopher, Whittle, Allen, & Friston, 2017; Fabry, 2020). Where the person once felt drawn into the world through their various cares and concerns (i.e., opportunities to succeed at work, possibilities to engage with friends, chances at new love), now no particular activity or person has the power to solicit engagement. In the active inference framework, we would describe this by saying that social policies have lost all their affordance, and no longer compel the agent to act. Importantly, it is not that the world is without alluring things; the people and place may still be available to the person physically or geographically. Rather, the very possibility of being allured by them in the first place has somehow become eroded or removed altogether (Ratcliffe, 2014). The loss of this ordinarily ubiquitous tension between agents and their world results in a profound, and very strange sense of estrangement or alienation from their ordinary lives. Ratcliffe suggests that this change may occur from either anticipations that are left unfulfilled and/or an absence of anticipations and their fulfillments. As this background sense of possibility is eroded, the world (including other people) ceases to solicit our behaviors and so are experienced as flat, empty, or alien. Models have begun to emerge in computational psychiatry that make use of active inference to understand a variety of psychopathological conditions such as depression, schizophrenia, depersonalization, obsessive compulsive disorder, addiction, and functional motor and sensory symptoms such as chronic pain (see, e.g., Badcock et al., 2017; Barrett, Quigley, & Hamilton, 2016; Corlett & Fletcher, 2014; Deane, Miller, & Wilkinson, 2020; Edwards, Adams, Brown, Pareés, & Friston, 2012; Fabry, 2020; Fletcher & Frith, 2009; Friston, Stephan, Montague, & Dolan, 2014; Gerrens, 2020; Kiverstein et al., 2019; Kiverstein et al., 2020; Miller, Kiverstein,

Deep Neurophenomenology 31 & Rietveld, 2020; Seth & Friston, 2016; Seth, Suzuki, & Critchley, 2012). Recently, an active inference account of the symptoms associated with depression has been developed, including the loss of pleasure, the loss of phenomenological depth, and the global loss of interest in rewarding opportunities. This account focuses on breakdowns in precision weighting (Kiverstein et al., 2020). As we outlined above, emotional valence acts as a domain general controller, tracking and assigning precision estimations relative to selected policies (Hesp et al., 2019). This pre-reflective, secondorder information is a reflection of the agent’s perceived fitness and so provides the agent with a feeling of what is possible given their current skills and the present context. Chronic negatively valenced affective states, resulting from continual rises in prediction error (resulting, for example, from living in a pathologically uncertain or volatile environment) eventually results in a form of learned helplessness. The system ceases to posit endogenous control on the negative outcomes it encounters, “it doesn’t matter what I do, it won’t get better” (Badcock et al., 2017; Fabry, 2020; Kiverstein et al., 2020; Miller et al., 2020). In active inference terms, learned helplessness is understood as the result of a global down-regulation of the precision on the agent’s belief that their policy selection will lead to a reduction in expected free energy (activity associated with approach-reward circuitry); i.e., a lowered precision on expected free energy. Given the close association between reward circuitry and affective valence (Tye, 2018), this produces the characteristic anhedonic effects associated with major depression. Global down-regulation of precision on the G matrix means simultaneously that (i) affect is dampened; (ii) the agent is no longer solicited by affordances in the environment as they once were; and as a result, (iii) the temporal depth of the generative model is eventually eroded, since it no longer serves any purpose. These are all primary characteristics associated with major depression. In the active inference framework, these aversive priors about the agent’s predictive control of its behavior are cashed out as an under-confidence in its own model of how to navigate the world. In other words, it appears to the agent that it lacks the understanding of the lived world needed to get itself into the sensory states that it expects to be in. Pathological underconfidence in top-down predictions has been found to impede adaptive behaviors due primarily to a loss of appropriate sensory prediction error (Badcock et al., 2017). The result is that available policies cease to pull the agent as strongly; the affordance landscape is effectively flattened. Rewards begin to appear less rewarding and so cease to draw or solicit the organism in the same way that they have in the past (Deane et al., 2020; Miller et al., 2020). Once again, this produces a powerful feedback loop, where the belief in one’s inability to reduce free energy through action leads the organism to sample the environment for evidence of this inability, which

32  Maxwell J. D. Ramstead, Wanja Wiese, Mark Miller, et al. confirms and supports the negative belief. Major depression then can be understood as a domain general inference of a loss of allostatic control, where, in extreme cases, the system ceases to posit itself as a causally efficacious agent at all. Notice here that the issue is not that the agent has lost confidence in some particular action policy, but rather that they no longer expect that any policy will lead to the outcomes they predict. That is, there is a global loss of confidence that any affordance will succeed at reducing free energy. This fits well with phenomenological notion of existential feelings. The erosion in our global confidence in our own abilities to make the world conform to our expectations results in the experience of a world where nothing matters—a world that is lacking in significance or depth. Another way of saying this is that this is an erosion in the very affordance space that sets the stage for an agent’s conscious experiences (what we have been calling the structure of their conscious experience). The consequence of this is that nothing has the power to call them to action, except perhaps the opportunity to end their own lives (Krueger & Colombetti, 2018). In healthy individuals, if expected free energy is on the rise due to the selection of some policy, the agent will explore the situation for alternative possibilities that offer a more reliable means of bringing about the states they expect. For the person experiencing major depression, this exploratory option is eroded. Instead, they encounter a world where they expect only familiar volatility. The constant nihilistic expectation results in the characteristic bodily stress responses (i.e., hyperactivity of the hypothalamic-pituitary-adrenal axis) and pro-inflammatory immune activity that produces the sickness behaviors aimed at reducing energy expenditure (Barrett et al., 2016; Ratcliffe, 2013). At some point, the finite resources of the autonomic, endocrine, and immune systems become exhausted (Peters, McEwen, & Friston, 2017). Facing this growing energy dysregulation, the system may attempt to conserve metabolic resources by instantiating certain “sickness behaviors,” such as low mood, fatigue, and negative affect, all associated with depression (Badcock et al., 2017; Stephan et al., 2016). Unfortunately, while this enforced slowing down may help reduce energetic output, the increasingly immobile predictive agent is also thereby deprived of one of the main ways of reducing free energy, namely, the ability to actively move and change its patterns of engagement with the world in ways that better align with expectations. Being confined to bed for an extended period may help reduce the unresolvable uncertainty of social interactions. However, it may also produce other, possibly more deleterious uncertainty, for example, by interrupting our work life and so also our ability to pay the rent, or again social isolation conflicting with our expecting social support from family and friends.

Deep Neurophenomenology 33 Negative moods accompanying depression can be beneficial, at least in the beginning stages of the disorder. Lowering of mood results in a loss of solicitation by the environment and other people which can help an agent who is embedded in a threatening or highly volatile social environment to conserve energetic resources by limiting the complexity of their social environments (Badcock et al., 2017; Barrett et al., 2016; Clark, 2018). The major issues arise when the lower mood persists and leads to a selfperpetuating cycle in which the agent begins to sample their social world for sensory observations that confirm their expectation of suffering. 1.7  Depressed and Disembedded An evolutionary account of depression has recently been developed that suggests that depression arises from breakdowns in social interactions, and more specifically from an inability to share a reality with others (Badcock et al., 2017). We agree there is a strong relationship between changes in our social embeddedness and many of the disruptions native to major depression. Given our previous discussion, we are now well poised to further flesh out the nature of that relationship between the discontents of the social sphere and depression. Confidence in our social world (that is, confidence in how we fit within the various extended socio-technological flows that make up our day-today actions) allows us to rely on those extended dynamics as a means of reducing error, and indeed, reducing error at a far better rate than we ever could on our own. We trust that our neighbor will inform us about suspicious looking people on the block, we trust our friends to vet the people we date, and we trust the internet company to keep things running smoothly so that we can work from home. In order for us to trust in these wider dynamics, and so be capable of utilizing them seamlessly, we have to have a high confidence (i.e., precision) in our extended social policies and routines (this includes confidence that if I act in some way, you will act in some way, which I can use to act in some way). But what if we lose contact with others (e.g., solitary confinement), or we lose the ability to smoothly coordinate and predict with others (e.g., brain injury, loss of loved ones, culture shock)? We would begin to lose confidence in those shared-policies; we would cease to rely on those extended social dynamics as a means of managing volatility; negatively charged affect would rise, which in turn would signal that we are losing ground to uncertainty and volatility at an alarming rate (the fact being that we can never reduce error as well as when we are part of a unit). This, in turn, would play a role in reducing our confidence in our own generative model, which further strengthens the belief that our actions are pointless because we are living in an overwhelmingly volatile environment. This

34  Maxwell J. D. Ramstead, Wanja Wiese, Mark Miller, et al. goes a long way in explaining why depression is commonly comorbid with extreme loneliness, grief, and physical incarceration (Santini et al., 2020). If I am different from you, then you are unpredictable—and perhaps that is the way things are. Social interactions are crucial for humans, in large part, because they allow us to manage a greater volatility than we could on our own: culture is essentially an evolved uncertainty-reducing technology (Constant et al., 2018; Veissière et al., 2019). Unfortunately, we now exist in an incredibly complex social world that would be impossible to navigate and manage without participating in (and having confidence in) the vast network of social-technological weaves that help us to manage the day-to-day complexity and volatility. A consensual commitment to a shared narrative (and generative model) is the thing that makes the prosocial world easy to navigate. If we become estranged from that narrative, then the volatility could be overwhelming. This provides us with a more fleshed out view of why depression is commonly associated with both a change in affectivity and social dynamics. Previous accounts point to evolutionary reasons for this: being social is important to us, and so a breakdown in the social fabric leads to the sort of sickness behaviors—energy-conserving behaviors—that can help reduce the social complexity we are dealing with. We suggest here that while social breakdowns are particularly relevant for inciting depression, it is the resulting loss of affective attunement (pathologically low precision on expected free energy) that leaves the world flat, and due to the deep interactivity between meaning production and social dynamics, it also leaves us bereft of our ordinary social nestedness. As we have already outlined above, depression is not just a negatively charged feeling but a change in the structure of consciousness, that is, in the possibility of having, and being compelled by, affective states and feelings in the first place. Moreover, affect is what situates us in this nested network. When we lose our confidence in our generative model, we simultaneously lose affective tuning and that mechanism that keeps us bound with others. This helps to highlight a previously underappreciated relationship between the loss of affectivity that commonly accompanies major depression and a disruption in our social embeddedness. When we lose the capacity to be moved (eroded precision on expected free energy) by what was once significant, we also become unglued from each other. The interpersonal nature of our experience falls flat as well: people no longer act on us as opportunities or as points of positive reorganization. At the heart of the interpersonal is a constant sensitivity to, and a sharing of, a world of meaning. We are constantly regulating one another affectively, which given our previous discussion is tantamount to adjusting each other’s precision

Deep Neurophenomenology 35 profiles. When we lose that affective tuning then we also lose something essential to interpersonal coherence—we become estranged from a world of meaning, and from the others who are first and foremost a source of that meaning. 1.8 Disruptions of the Temporal Structure of Consciousness in Depression Consider the following report, cited in Ratcliffe (2012): When I am depressed I feel like time goes slowly, yet at the same time I feel like I—or anyone else—has hardly any time to live at all. It feels as if time is running out. (Ratcliffe, 2012, p. 114) Ratcliffe analyzes statements like this one in terms of a lack of concern for (past and future) events, resulting in a disruption of the structure of temporal experience: “Those past events that are significant to our current situation and to where we are heading are closer to us, more alive, than those that are far removed from our concerns. Without any potential for significant change or any sense of one’s future having a teleological direction, all the past is a settled past, a distant past” (Ratcliffe, 2012, p. 130). Here, we will argue that the loss of experienced continuity between past, present, and future can be accounted for in the same way as the loss of some experienced continuities at shorter timescales. Moreover, from an active inference perspective, it involves a disruption of the same mechanisms that are involved in the disturbances of mood and interpersonal coherence. In Section 1.4, we invoked the experience of music as an example of temporal experience that typically involves experienced continuities at multiple short timescales (from tens and hundreds of milliseconds to multiple seconds). Some of these experienced continuities depend on regularities in auditory signals that are tracked by perception. However, empirical evidence strongly suggests that, in addition to this, possibilities to move are involved in auditory perception as well, even during passive listening (see Froese & González-Grandón, 2020). This is to be expected from the perspective of active inference, since it is rooted in ideomotor theories (Badets, Koch, & Philipp, 2016; Herbort & Butz, 2012), according to which the neural underpinnings of action and perception overlap (Hommel, 2015; Hommel, Müsseler, Aschersleben, & Prinz, 2001; Prinz, 1990). That is, some neural structures that are involved in the generation of action are also involved in passive perception—for evidence in the auditory domain (see Aglioti & Pazzaglia, 2011; Gordon, Cobb, & Balasubramaniam, 2018; Koelsch, 2011; Lima, Krishnan, & Scott, 2016; Ross, Iversen,

36  Maxwell J. D. Ramstead, Wanja Wiese, Mark Miller, et al. & Balasubramaniam, 2016). This can become experientially manifest as an “urge to move” during music perception (Grahn & McAuley, 2009; Lima et al., 2016; Witek, Clarke, Wallentin, Kringelbach, & Vuust, 2014). Furthermore, this is also predicted by sensorimotor accounts (O’Regan, 2011; Di Paolo, Buhrmann, & Barandiaran, 2017; Noë, 2004; O’Regan & Noë, 2001), according to which perception is constituted by practical knowledge of sensorimotor contingencies. Hence, even passive listening is active in the sense that it involves possibilities to act.2 Possibilities to act remain fairly invariant over intervals on the order of multiple seconds and are thus not confined to the specious present. Moreover, actions themselves have temporal parts: to reach for a cup of coffee involves the generation of movements at several different spatial and temporal scales (e.g., from the contraction of individual muscle fibers to the movement of skeletal muscles, bones, and the whole arm). Moreover, it has been argued that it is in virtue of their ability to select policies that systems are integrated across spatial and temporal scales (Ramstead et al., 2018). Therefore, policies and policy selection are particularly fit to connect the immediate present to the recent past and near future. We can now generalize this idea and apply it to the experience of continuity at longer timescales. As illustrated by the quotation given above, some persons suffering from depression report a radical disruption of temporal experience: the experienced present is no longer connected to past and future. In ordinary conscious experience, policies (i.e., beliefs about action sequences) connect past, present, and future events into an integrated whole: actions that are available to you right now were also available to you in the past and will be available in the future. Furthermore, some actions are available now due to things you achieved in the past and achieving things now will open possibilities for the future. Crucially, as argued in Section 1.6, when precision on expected free energy, G, becomes unusually or pathologically flat (i.e., imprecise), affect is dampened, and possible actions can seem meaningless. That is, possible action sequences can still be experienced, but will not be pursued, because of a lack of confidence in their potential to reduce free energy in the future. In active inference terms, the precision on the expected free energy is so low that it cannot drive action any longer: all policies are experienced as having little to no relative affordance, and no longer solicit the agent to act. In other words, these actions are still, subjectively, things that could be done, but not things I can do. As Ratcliffe puts it, “[t]he future changes from a realm where ‘I can’ to one where ‘I cannot’” (Ratcliffe, 2012, p. 127). Notably, this is not due to the affective disturbance that goes along ordinarily with depression but is instead likely the common cause of both negative affect and disruptions of temporal experience. More specifically, a loss of inferred allostatic control is underpinned by a flattening of the

Deep Neurophenomenology 37 temporal depth of the generative model, such that there is uncertainty about consequences of actions in the distal future—because the model’s predictions are of no use to minimize free energy. As such, motivationally salient features of the world and associated affordances fall flat. On the one hand, this means the system is unable to contextualize bodily states in terms of their “meaning” for action, and so the interoceptive inference underpinning emotion is lost, analogous to an agnosia of bodily states. On the other hand, this means that the agent will fail to connect present events to future (and past) through possible actions, because these cease to be available to the subject: “Not only is the world one in which there is no one to become; it is a world in which there is no one to have been. Rather than a past identity reifying into who one will always have to be, who one is, has been, and will be are lost.” (Fernandez, 2014, p. 609). We have argued that some symptoms of major depressive disorder, viz., affective disturbances and a disruption of the temporal structure of experience, can be modeled as inferences about self-efficacy premised on a pathologically low precision on expected free energy G—and ensuing flattening of the landscape of affordances. The absence of confidence in possibilities for action, which could connect the subject’s present to its past and future, leads to a diminished experience of continuity at longer timescales: remembered events are locked in the past, and future possibilities are experienced as blocked. In neurotypical temporal experience, the present is experienced as connected to both the future and the past by policies that are associated with high precision (i.e., possible actions that have been available in the past and continue to be so and are expected to efficiently reduce uncertainty about sensory signals). Moreover, we suggested at the beginning of this section (and in Section 1.4) that this mechanism was structurally similar to a mechanism undergirding experienced continuities at short timescales—i.e., timescales relevant to music perception. This suggests that our account should have implications for understanding the mechanisms at play in music therapy for depression (Aalbers et al., 2017; Erkkilä et al., 2011). We hypothesize that receptive music therapy (in which patients listen to music) can alleviate symptoms of depression because music evokes an ideomotor response, e.g., an “urge to move,” which facilitates action and may increase the precision of expected free energy, thereby making the associated policies more salient and compelling to the agent (Koelsch et al., 2019; Vuust, Dietz, Witek, & Kringelbach, 2018). Over time, confidence in policies at short timescales increases, and the regularities inherent in music enable a better-than-expected reduction of uncertainty. The present becomes experientially connected to the recent past and near future, and (positive) affect can arise anew. Active types of music therapy (in which patients sing or use an instrument) that involve more interaction with a therapist may, in addition, also restore confidence

38  Maxwell J. D. Ramstead, Wanja Wiese, Mark Miller, et al. in the social world. Active inference accounts of depression may therefore not only lead to a deepened understanding of disturbed temporal experience in major depressive disorder. They may also suggest ways to improve non-pharmacological treatments of depression, such as music therapy. 1.9 Conclusion In this chapter, we attempted to elucidate two very central facets of the first-person conscious experience of human beings by appealing to the framework of the free-energy principle and active inference. We have seen how active inference is able to account for temporal nestedness of conscious experience and for the concern or care that deeply structures firstperson experience. We then investigated the breakdown of these features in depression and explained some of the core aspects of the phenomenology of depression by appealing to the active inference framework. Much work remains to be done to make sense of consciousness using active inference, but we hope to have taken a first step. Acknowledgements We are very grateful to Mahault Albarracin, Axel Constant, David Foreman, Laurence Kirmayer, Julian Kiverstein, and Michael Lifshitz for helpful comments and discussions that were of great assistance in writing this chapter. This research was supported by the Social Sciences and Humanities Research Council of Canada (MJDR), a Horizon 2020 European Union ERC Advanced Grant XSPECT (MM; Ref: DLV-692739), and by a Wellcome Trust Principal Research Fellowship (KF; Ref: 088130/Z/09/Z). Notes 1 There are two complications we will not discuss any further here. The first is that not everyone agrees that our momentary experience typically covers an extended period of time (Arstila, 2018; Le Poidevin, 2007). The second is that the specious present is an experienced present (i.e., what we experience as happening right now is an extended event), but it is sometimes assumed that the experienced structure of the specious present also puts constraints on the structure of the experience itself. For instance, one might hold that the neural activity underpinning a specious present must mirror the structure of the specious present (see Dainton, 2006, 2018; Phillips, 2010), and for discussion (see Lee 2014; Viera 2019): if I experience a flash followed by a sound, then there must be different temporal parts of neural activity corresponding to the flash and the sound, even on small timescales. 2 There is another, more subtle sense in which it is active. Music perception involves mental actions: we selectively attend to features of the auditory stream that allow us to minimize uncertainty about sensory signals (Koelsch, Vuust, & Friston, 2019).

Deep Neurophenomenology 39 References Aalbers, S., Fusar-Poli, L., Freeman, R. E., Spreen, M., Johannes, C. F. K., Vink, A. C., Maratos, A., Crawford, M., Chen, X. J., & Gold, C. (2017). Music therapy for depression. Cochrane Database of Systematic Reviews. https://doi. org/10.1002/14651858.cd004517.pub3 Aglioti, S. M., & Pazzaglia, M. (2011). Sounds and scents in (social) action. Trends in Cognitive Sciences, 15(2), 47–55. Andersen, H., & Grush, R. (2009). A brief history of time-consciousness: Historical precursors to James and Husserl. Journal of the History of Philosophy, 47(2), 277–307. Arstila, V. (2018). Temporal experiences without the specious present. Australasian Journal of Philosophy, 96(2), 287–302. Arstila, V., & Lloyd, D.(Eds.). (2014). Subjective time: The philosophy, psychology, and neuroscience of temporality. Cambridge, MA: MIT Press. Attias, H. (2003). Planning by probabilistic inference. Proc. of the 9th Int. Workshop on Artificial Intelligence and Statistics. Badcock, P. B., Christopher, G. D., Whittle, S., Allen, N. B., & Friston, K. (2017). The depressed brain: An evolutionary systems theory. Trends in Cognitive Sciences, 21(3), 182–194. Badets, A., Koch, I., & Philipp, A. M. (2016). A review of ideomotor approaches to perception, cognition, action, and language: Advancing a cultural recycling  hypothesis. Psychological Research. https://doi.org/10.1007/s00426-0140643-8 Barrett, L. F. (2006). Valence Is a basic building block of emotional life. Journal of Research in Personality, 40(1), 35–55. Barrett, L. F. (2017). The theory of constructed emotion: An active inference account of interoception and categorization. Social Cognitive and Affective Neuroscience, 12(1), 1–23. Barrett, L. F., Quigley, K. S., & Hamilton, P. (2016). An active inference theory of allostasis and interoception in depression. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 371, 1708. https://doi. org/10.1098/rstb.2016.0011 Barrett, L., & Simmons, F. (2015). Interoceptive predictions in the brain. Nature Reviews Neuroscience, 16(7), 419–429. Botvinick, M., & Toussaint, M. (2012). Planning as inference. Trends in Cognitive Sciences, 16, 485–488. Campbell, J. O. (2016). Universal Darwinism as a process of Bayesian inference. Frontiers in Systems Neuroscience, 10, 49. Chemero, A. (2009). Radical embodied cognition. Cambridge, MA: MIT Press. Clark, A. (2013). The many faces of precision (Replies to commentaries on “Whatever next? Neural prediction, situated agents, and the future of cognitive science”). Frontiers in Psychology, 4, 270. Clark, A. ( (2015). Surfing uncertainty: Prediction, action, and the embodied mind. Oxford: Oxford University Press. Clark, A. (2018). A nice surprise? Predictive processing and the active pursuit of novelty. Phenomenology and the Cognitive Sciences, 17(3), 521–534.

40  Maxwell J. D. Ramstead, Wanja Wiese, Mark Miller, et al. Constant, A., Ramstead, M. J. D., Veissière, S. P. L., Campbell, J. O., & Friston, K. (2018). A variational approach to niche construction. Journal of the Royal Society, Interface/the Royal Society, 15(141), 1–14. Constant, A., Ramstead, M. J. D., Veissière, S. P. L., & Friston, K. (2019). Regimes of expectations: An active inference model of social conformity and human decision making. Frontiers in Psychology, 10, 679. Corlett, P. R., & Fletcher, P. C. (2014). Computational psychiatry: A Rosetta stone linking the brain to mental illness. The Lancet. Psychiatry, 1(5), 399–402. Dainton, B. (2006). Stream of consciousness: Unity and continuity in conscious experience. Taylor & Francis: New York, NY. Dainton, B. (2018). Temporal consciousness. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy. Stanford University, Metaphysics Research Lab. Deane, G., Miller, M. D., & Wilkinson, S. (2020). Losing ourselves: Active inference, depersonalization and meditation. Frontiers in Psychology, 11, 2893. Denham, S. L., & Winkler, I. (2015). Auditory perceptual organization. In J. Wagemans (Ed.), The Oxford handbook of perceptual organization (pp. 601–620). Oxford University Press: Oxford. Di Paolo, E., Buhrmann, T., & Barandiaran, X. (2017). Sensorimotor life: An enactive proposal. Oxford: Oxford University Press. Dorato, M., & Wittmann, M. (2020). The phenomenology and cognitive neuroscience of experienced temporality. Phenomenology and the Cognitive Sciences, 19(4), 747–771. Eagleman, D. M., & Sejnowski, T. J. (2000). Motion integration and postdiction in visual awareness. Science, 287(5460), 2036–2038. Edelman, G. (2001). Consciousness: The remembered present. Annals of the New York Academy of Sciences, 929, 111–122. Edwards, M. J., Adams, R. A., Brown, H., Pareés, I., & Friston, K. (2012). A Bayesian account of ‘Hysteria’. Brain: A Journal of Neurology, 135(11), 3495–3512. England, J. L. (2015). Dissipative adaptation in driven self-assembly. Nature Nanotechnology, 10, 919–923. Erkkilä, J., Punkanen, M., Fachner, J., Ala-Ruona, E., Pöntiö, I., Tervaniemi, M., Vanhala, M., & Gold, C. (2011). Individual music therapy for depression: Randomised controlled trial. British Journal of Psychiatry. https://doi.org/10.1192/ bjp.bp.110.085431 Esposito, M., Harbola, U., & Mukamel, S. (2009). Nonequilibrium fluctuations, fluctuation theorems, and counting statistics in quantum systems. Reviews of Modern Physics, 81, 1665–1702. Fabry, R. E. (2020). Into the dark room: A predictive processing account of major depressive disorder. Phenomenology and the Cognitive Sciences, 19(4), 685–704. Fernandez, A. V. (2014). Depression as existential feeling or de-situatedness? Distinguishing structure from mode in psychopathology. Phenomenology and the Cognitive Sciences, 13(4), 595–612. Fletcher, P. C., & Frith, C. D. (2009). Perceiving is believing: A Bayesian approach to explaining the positive symptoms of schizophrenia. Nature Reviews Neuroscience, 10(1), 48–58. Friston, K. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 360(1456), 815–836.

Deep Neurophenomenology 41 Friston, K. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience, 11(2), 127–138. Friston, K. (2013). Life as we know it. Journal of the Royal Society, Interface/the Royal Society, 10(86). https://doi.org/10.1098/rsif.2013.0475 Friston, K. (2019). A free energy principle for a particular physics. arXiv [q-bio. NC]. arXiv. http://arxiv.org/abs/1906.10184 Friston, K., & Frith, C. (2015). A duet for one. Consciousness and Cognition, 36, 390–405. Friston, K., Levin, M., Sengupta, B., & Pezzulo, G. (2015). Knowing one’s place: A free-energy approach to pattern regulation. Journal of the Royal Society, ­Interface/the Royal Society, 12(105). Friston, K., Parr, T., & de Vries, B. (2017). The graphical brain: Belief propagation and active inference. Network Neuroscience, 1(4), 381–414. Friston, K., Rosch, R., Parr, T., Price, C., & Bowman, H. (2017). Deep temporal models and active inference. Neuroscience and Biobehavioral Reviews, 77, 388–402. Friston, K., Schwartenbeck, P., FitzGerald, T., Moutoussis, T., Behrens, T., & Dolan, R. J. (2014). The anatomy of choice: Dopamine and decision-making. Philosophical Transactions of the Royal Society B: Biological Sciences. https:// doi.org/10.1098/rstb.2013.0481 Friston, K., Stephan, K. E., Montague, R., & Dolan, R. J. (2014). Computational psychiatry: The brain as a phantastic organ. The Lancet Psychiatry, 1(2), 148–58. Friston, K., Wiese, W., & Hobson, J. A. (2020). Sentience and the origins of consciousness: From Cartesian duality to Markovian monism. Entropy, 22(5), 516. Froese, T., & González-Grandón, X. (2020). How passive is passive listening? Toward a sensorimotor theory of auditory perception. Phenomenology and the Cognitive Sciences. https://doi.org/10.1007/s11097-019-09641-6 Fuchs, T. (2005). Corporealized and disembodied minds: A phenomenological view of the body in melancholia and schizophrenia. Philosophy, Psychiatry, and Psychology, 12(2), 95–107. Gadamer, H.-G. (2003). Truth and method. Continuum International: Dallas, Texas. Gallagher, S., & Allen, M. (2016). Active inference, enactivism and the hermeneutics of social cognition. Synthese, 195, 2627–2648. George, D., & Hawkins, J. (2009). Towards a mathematical theory of cortical micro-circuits. PLoS Computational Biology, 5, e1000532. Gerrans, P. (2020). Pain asymbolia as depersonalization for pain experience. An interoceptive active inference account. Frontiers in Psychology, 11, 2643. Gibson, J. J. (1979). The ecological approach to visual perception: Classic edition. Psychology Press: London, UK. Gordon, C. L., Cobb, P. R., & Balasubramaniam, R. (2018). Recruitment of the motor system during music listening: An ALE meta-analysis of fMRI data. PloS One, 13(11), e0207213. Grahn, J. A., & McAuley, J. D. (2009). Neural bases of individual differences in beat perception. NeuroImage. https://doi.org/10.1016/j.neuroimage.2009.04.039 Green, E. J. (2019). A theory of perceptual objects. Philosophy and Phenomenological Research, 99(3), 663–693. https://doi.org/10.1111/phpr.12521 Heidegger, M. (2010). Being and time. Albany, NY: SUNY Press.

42  Maxwell J. D. Ramstead, Wanja Wiese, Mark Miller, et al. Herbort, O., & Butz, M. V. (2012). Too good to be true? Ideomotor theory from a computational perspective. Frontiers in Psychology. https://doi.org/10.3389/ fpsyg.2012.00494 Herzog, M. H., Drissi-Daoudi, L., & Doerig, A. (2020). All in good time: Longlasting postdictive effects reveal discrete perception. Trends in Cognitive Sciences, 24(10), 826–837. Herzog, M. H., & Ogmen, H. (2014). Apparent motion and reference frames. In Oxford Handbook of Perceptual Organization. Hesp, C., Smith, R., Allen, M., Friston, K., & Ramstead, M. J. D. (2019). Deeply felt affect: The emergence of valence in deep active inference. https://psyarxiv. com/62pfd/download?format=pdf. Hoerl, C. (2015). Seeing motion and apparent motion. European Journal for Philosophy of Science, 23(3), 676–702. Hohwy, J. (2014). The predictive mind. Oxford: Oxford University Press. Hommel, B. (2015). The theory of event coding (TEC) as embodied-cognition framework. Frontiers in Psychology. https://doi.org/10.3389/fpsyg.2015.01318 Hommel, B., Müsseler, J., Aschersleben, G., & Prinz, W. (2001). The theory of event coding (TEC): A framework for perception and action planning. Behavioral and Brain Sciences. https://doi.org/10.1017/s0140525x01000103 Husserl, E. (1991). Analysis of the consciousness of time. In On the phenomenology of the consciousness of internal time (1893–1917). Husserl, E. (2012). Ideas pertaining to a pure phenomenology and to a phenomenological philosophy: First book: General introduction to a pure phenomenology. Springer Science & Business Media: Berlin, Germany. James, W. (1890). The principles of psychology (Vol. 1). New York: Henry Holt & Co. Inc. http://dx. doi. org/10. 1037/11059-000 Jeffery, K., Pollack, R., & Rovelli, C. (2019). On the statistical mechanics of life: Schrödinger revisited. arXiv:1908.08374 [physics.bio-ph], 21(12), 1211. https:// www.mdpi.com/1099-4300/21/12/1211. Joffily, M., & Coricelli, G. (2013). Emotional valence and the free-energy principle. PLoS Computational Biology, 9(6), e1003094. Kaplan, R., & Friston, K. (2018). Planning and navigation as active inference. Biological Cybernetics, 112, 323–343. Kelly, S. D. (2005). Temporal awareness. Phenomenology and Philosophy of Mind. https://doi.org/10.1093/acprof:oso/9780199272457.003.0011 Kiebel, S. J., & Friston, K. (2011). Free energy and dendritic self-organization. Frontiers in Systems Neuroscience, 5, 80. Kirchhoff, M. D., & Kiverstein, J. (2019). Extended consciousness and predictive processing: A third wave view. London: Routledge. Kirchhoff, M. D., Parr, T., Palacios, E., Friston, K., & Kiverstein, J. (2018). The Markov blankets of life: Autonomy, active inference and the free energy principle. Journal of the Royal Society, Interface/the Royal Society, 15(138), 20170792. Kirmayer, L. J., & Ramstead, M. J. D. (2017). Embodiment and enactment in cultural psychiatry.” In C. Durt, T. Fuchs, & C. Tewes (Eds.), Embodiment, enaction, and culture: Investigating the constitution of the shared world. Cambridge, MA: MIT Press.

Deep Neurophenomenology 43 Kiverstein, J., Miller, M., & Rietveld, E. (2019). The feeling of grip: Novelty, error dynamics, and the predictive brain. Synthese, 196(7), 2847–2869. Kiverstein, J., Miller, M., & Rietveld, E. (2020). How mood tunes prediction: A neurophenomenological account of mood and its disturbance in major depression. Neuroscience of Consciousness, 2020(1), niaa003. Kiverstein, J., Rietveld, E., Slagter, H. A., & Denys, D. (2019). Obsessive compulsive disorder: A pathology of self-confidence? Trends in Cognitive Sciences, 23(5), 369–372. Koelsch, S. (2011). Toward a neural basis of music perception – A review and updated model. Frontier in Psychology. https://doi.org/10.3389/fpsyg.2011.00110 Koelsch, S., Vuust, P., & Friston, K. (2019). Predictive processes and the peculiar case of music. Trends in Cognitive Sciences. https://doi.org/10.1016/j.tics.2018.10.006 Krueger, J., & Colombetti, G. (2018). Affective affordances and psychopathology. Discipline Filosofiche, 18, 221–247. Kuchling, F., Friston, K., Georgiev, G., & Levin, M. (2020). Morphogenesis as Bayesian inference: A variational approach to pattern formation and control in complex biological systems. Physics of Life Reviews, 33(July), 88–108. Lee, G. (2014). Temporal experience and the temporal structure of experience. Philosophers’ Imprint, 14(3), 1–21. Le Poidevin, R. (2007). The images of time: An essay on temporal representation. Oxford: Oxford University Press. Le Poidevin, R. (2019). The experience and perception of time. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy. Stanford University, Metaphysics Research Lab. Lima, C. F., Krishnan, S., & Scott, S. K. (2016). Roles of supplementary motor areas in auditory processing and auditory imagery. Trends in Neurosciences, 39(8), 527–542. Limanowski, J. (2017). Dis-)attending to the body. In T. K. Metzinger, & W. Wiese (Eds.), Philosophy and predictive processing. Frankfurt am Main: MIND Group. Maisto, D., Donnarumma, F., & Pezzulo, G. (2015). Divide et impera: Subgoaling reduces the complexity of probabilistic inference and problem solving. Journal of the Royal Society Interface, 12(104), 20141335. Maturana, H. R., & Varela, F. (1980). Autopoiesis: The organization of the living. In H. R. Maturana, & F. J. Varela (Eds.), Autopoiesis and cognition. Dordrecht: Riedel. McWalter, R., & McDermott, J. H. (2019). Illusory sound texture reveals multisecond statistical completion in auditory scene analysis. Nature Communications, 10(1), 5096. Merleau-Ponty, M. (1982). Phenomenology of perception. London: Routledge. Miller, M., & Clark, A. (2018). Happily entangled: Prediction, emotion, and the embodied mind. Synthese, 195(6), 2559–2575. Miller, M., Kiverstein, J., & Rietveld, E. (2020). Embodying addiction: A predictive processing account. Brain and Cognition, 138(February), 105495. Nave, K., Deane, G., Miller, M., & Clark, A. (2020). Wilding the predictive brain. Cognitive Science, 11(6), e1542. Noë, A. (2004). Action in perception. Cambridge, MA: MIT press. Noë, A. (2006). Experience of the world in time. Analysis, 66(1), 26–32.

44  Maxwell J. D. Ramstead, Wanja Wiese, Mark Miller, et al. O’Regan, J. K. (2011). Why red doesn’t sound like a bell: Understanding the feel of consciousness. Oxford: Oxford University Press. O’Regan, J. K., & Noë, A. (2001). A sensorimotor account of vision and visual consciousness. Behavioral and Brain Sciences, 24(5), 939–1031. https://doi. org/10.1017/s0140525x01000115 Parrondo, J. M. R., Horowitz, J. M., & Sagawa, T. (2015). Thermodynamics of information. Nature Physics, 11, 131–139. Parr, T., Da Costa, L., & Friston, K. (2020). Markov blankets, information geometry and stochastic thermodynamics. Philosophical Transactions. Series A, Mathematical, Physical, and Engineering Sciences, 378(2164), 20190159. Parr, T., & Friston, K. (2017). Working memory, attention, and salience in active inference. Scientific Reports, 7(1), 14678. Peters, A., McEwen, B. S., & Friston, K. (2017). Uncertainty and stress: Why it causes diseases and how it is mastered by the brain. Progress in Neurobiology, 156, 164–188. Petitot, J. (1999). Naturalizing phenomenology: Issues in contemporary phenomenology and cognitive science. Redwood, CA: Stanford University Press. Phillips, I. (2010). Perceiving temporal properties. European Journal for Philosophy of Science, 18(2), 176–202. Phillips, I. (2011). Indiscriminability and experience of change. The Philosophical Quarterly, 61(245), 808–827. https://doi.org/10.1111/j.1467-9213.2011.703.x Phillips, I. (Ed.). (2017). The Routledge handbook of philosophy of temporal experience. London: Routledge. Piper, M. S. (2019). Neurodynamics of time consciousness: An extensionalist explanation of apparent motion and the specious present via reentrant oscillatory multiplexing. Consciousness and Cognition, 73(August), 102751. Polani, D. (2009). Information: Currency of life? HFSP Journal, 3(5), 307–316. Pöppel, E. (1978). Time perception. In R. Leibowith, H. W. Teuber, & H. L. Held(Eds.), Handbook of sensory physiology. Berlin: Springer. Pöppel, E. (1997). A hierarchical model of temporal perception. Trends in Cognitive Sciences, 1(2), 56–61. Pöppel, E. (2009). Pre-semantically defined temporal windows for cognitive processing. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 364(1525), 1887–1896. Prinz, W. (1990). A common coding approach to perception and action. In Relationships between Perception and Action. https://doi.org/10.1007/978-3-642-75348-0_7 Ramstead, M. J. D. (2015). Naturalizing what? Varieties of naturalism and transcendental phenomenology. Phenomenology and the Cognitive Sciences, 14(4), 929–971. Ramstead, M. J. D., Badcock, P. B., & Friston, K. (2018). Answering Schrödinger’s question: A free-energy formulation. Physics of Life Reviews, 24, 1–16. Ramstead, M. J. D., Constant, A., Badcock, P. B., & Friston, K. (2019). Variational ecology and the physics of sentient systems. Physics of Life Reviews, 31, 188–205. Ramstead, M. J. D., Friston, K., & Hipólito, I. (2020). Is the free-energy principle a formal theory of semantics? From variational density dynamics to neural and phenotypic representations. Entropy, 22(8), 889.

Deep Neurophenomenology 45 Ramstead, M. J. D., Kirchhoff, M. D., Constant, A., & Friston, K. (2019). Multiscale integration: Beyond internalism and externalism. Synthese, 198, 41–70. https://doi.org/10.1007/s11229-019-02115-x Ramstead, M. J. D., Kirchhoff, M. D., & Friston, K. (2019). A tale of two densities: Active inference is enactive inference. Adaptive Behavior, 28(4), 1059712319862774. Ramstead, M. J. D., Veissière, S. P. L., & Kirmayer, L. J. (2016). Cultural affordances: Scaffolding local worlds through shared intentionality and regimes of attention. Frontiers in Psychology, 7, 1090. Ratcliffe, M. (2008). Feelings of being: Phenomenology, psychiatry and the sense of reality. Oxford: Oxford University Press. Ratcliffe, M. (2012). Varieties of temporal experience in depression. The Journal of Medicine and Philosophy, 37(2), 114–138. Ratcliffe, M. (2013). A bad case of the flu? The comparative phenomenology of depression and somatic illness. Journal of Consciousness Studies, 20(7–8), 198–218. Ratcliffe, M. (2014). Experiences of depression: A study in phenomenology. ­Oxford: Oxford University Press. Rietveld, E., & Kiverstein, J. (2014). A rich landscape of affordances. Ecological Psychology: A Publication of the International Society for Ecological Psychology, 26(4), 325–352. Rikhye, R. V., Guntupalli, J. S., Gothoskar, N., Lázaro-Gredilla, M., & George, D.. (2019). Memorize-Generalize: An online algorithm for learning higher-order sequential structure with cloned Hidden Markov Models. bioRxiv, 764456. Ross, J. M., Iversen, J. R., & Balasubramaniam, R. (2016). Motor simulation theories of musical beat perception. Neurocase, 22(6), 558–565. https://doi.org/ 10.1080/13554794.2016.1242756 Santini, Z. I., Jose, P. E., Cornwell, E. Y., Koyanagi, A., Nielsen, L., Hinrichsen, C., Meilstrup, C., Madsen, K. R., & Koushede, V. (2020). Social disconnectedness, perceived isolation, and symptoms of depression and anxiety among older Americans (NSHAP): A longitudinal mediation analysis. The Lancet Public Health, 5(1), e62–e70. https://doi.org/10.1016/s2468-2667(19)30230-0. Seifert, U. (2012). Stochastic thermodynamics, fluctuation theorems and molecular machines. Reports on progress in physics. Physical Society (Great Britain), 75, 126001. Seth, A. K. (2013). Interoceptive inference, emotion, and the embodied self. Trends in Cognitive Sciences, 17(11), 565–573. Seth, A. K., & Friston, K. (2016). Active interoceptive inference and the emotional brain. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 371(1708). https://doi.org/10.1098/rstb.2016.0007 Seth, A. K., Suzuki, K., & Critchley, H. D. (2012). An interoceptive predictive coding model of conscious presence. Frontiers in Psychology, 2, 395. Smith, L. S., Hesp, C., Lutz, A., Mattout, J., Friston, K., & Ramstead, M. J. D. (2020). Towards a formal neurophenomenology of metacognition: Modelling meta-awareness, mental action, and attentional control with deep active inference. Neuroscience of Consciousness, 2, niab018. Steinman, R. M., Pizlo, Z., & Pizlo, F. J. (2000). Phi is not beta, and why Wertheimer’s discovery launched the Gestalt revolution. Vision Research, 40(17), 2257–2264.

46  Maxwell J. D. Ramstead, Wanja Wiese, Mark Miller, et al. Stephan, K. E., Manjaly, Z. M., Mathys, C. D., Weber, L. A. E., Paliwal, S., Gard, T., & Tittgemeyer, M. (2016). Allostatic self-efficacy: A metacognitive theory of dyshomeostasis-induced fatigue and depression. Frontiers in Human Neuroscience, 10(November), 550. Stiles, N. R. B., Li, M., Levitan, C. A., Kamitani, Y., & Shimojo, S. (2018). What you saw is what you will hear: Two new illusions with audiovisual postdictive effects. PloS One, 13(10), e0204217. Taylor, C. (2016). The language animal. Cambridge, MA: Harvard University Press. Tye, K. M. (2018). Neural circuit motifs in valence processing. Neuron, 100(2), 436–452. Van de Cruys, S. (2017). Affective value in the predictive mind. Frankfurt am Main: MIND Group. Vasil, J., Badcock, P. B., Constant, A., Friston, K., & Ramstead, M. J. D. (2020). A world unto itself: Human communication as active inference. Frontiers in Psychology, 11(March), 417. Veissière, S. P. L., Constant, A., Ramstead, M. J. D., Friston, K., & Kirmayer, L. J. (2019). Thinking through other minds: A variational approach to cognition and culture. The Behavioral and Brain Sciences, 43, e90. Viera, G. A. (2019). The fragmentary model of temporal experience and the mirroring constraint. Philosophical Studies, 176(1), 21–44. Vuust, P., Dietz, M. J., Witek, M., & Kringelbach, M. L. (2018). Now you hear it: A predictive coding model for understanding rhythmic incongruity. Annals of the New York Academy Sciences, 1423, 19–29. Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., & Heydt, R. von der. (2012). A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure–ground organization. Psychological Bulletin, 138(6), 1172–1217. https://doi.org/10.1037/a0029333 Wiese, W. (2017). Predictive processing and the phenomenology of time consciousness: A hierarchical extension of Rick Grush’s trajectory estimation model.” In T. Metzinger and W. Wiese (Eds.), Philosophy and predictive processing. MIND Group: Frankfurt am Main. Wiese, W. (2018). Experienced wholeness: Integrating insights from Gestalt theory, cognitive neuroscience and predictive processing. Cambridge, MA: MIT Press. Wilkinson, S., Deane, G., Nave, K., & Clark, A. (2019). Getting warmer: Predictive processing and the nature of emotion. In L. Candiotto (Ed.), The value of emotions for knowledge. Berlin: Springer Verlag. https://doi.org/10.1007/ 978-3-030-15667-1_5 Winkler, I., Denham, S. L., & Nelken, I. (2009). Modeling the auditory scene: Predictive regularity representations and perceptual objects. Trends in Cognitive Sciences, 13(12), 532–540. Witek, M. A. G., Clarke, E. F., Wallentin, M., Kringelbach, M. L., & Vuust, P. (2014). Syncopation, body-movement and pleasure in groove music. PLoS ONE, 9(4), e94446. https://doi.org/10.1371/journal.pone.0094446

2

Expectancies and the Generation of Perceptual Experience Predictive Processing and Phenomenological Control Peter Lush, Zoltan Dienes, and Anil Seth

People can, to varying degrees, generate expected experiences in accordance with their goals, for example in response to imaginative suggestions. This ability is known as phenomenological control (Dienes et al., 2022; Dienes, Lush, & Palfi, 2022). Imaginative suggestions are verbal descriptions of counter-factual experiences which establish participant expectations for experience (expectancies) including hallucinations, delusions, amnesia, paralysis and apparently involuntary action (Kihlstrom, 2008; Lynn et al., 2008). Although most phenomenological control research has historically been focused on direct verbal suggestions presented within the context of hypnosis, it has long been known that response to imaginative suggestion is not limited to either this context or to direct verbal delivery. Rather, phenomenological control may occur in a variety of situations in which counter-factual expectancies arise (Dienes & Lush, 2023), a notable case being explicit and implicit cues in psychological experiments (demand characteristics; Corneille & Lush, 2023; Orne, 1962). Despite increasing evidence for widespread effects of phenomenological control (Lush et al., 2020; Lush, Seth, Dienes, & Scott, 2022), there are many open questions about the neurocognitive mechanisms by which expectancies influence experience. One possible class of mechanistic explanation lies in the framework of predictive processing (PP). According to PP accounts, most or all aspects of perception and cognition are explainable in terms of neural inference about the causes of sensory signals. PP can be considered as a framework for process theories (mechanisms) operationalising the general concept of the ‘Bayesian brain’, in which the brain’s probabilistic beliefs (priors) are optimally combined with sensory signals (likelihoods) to furnish updated beliefs (posteriors; Clark, 2013). More specifically, PP claims that perception, action and cognition arise from the interplay of (mainly top-down) predictions and (mainly bottomup) prediction error signals, in which continual minimisation of prediction error provides an approximation to Bayesian inference on the causes of DOI: 10.4324/9781003084082-4

48  Peter Lush, Zoltan Dienes, and Anil Seth sensory signals (Clark, 2015; Hohwy, 2013). In this view, the influence of sensory prediction errors on perceptual and cognitive inference depends on their relative precision (informally, their reliability). Sensory signals with high (expected) precision have greater influence than signals with low precision-weighting. This perspective has been influential in recent years, with simple mechanisms based on the relative weighting of precision for priors and sensory signals proposed to underlie a wide range of psychological phenomena. For example, depression (Barrett et al., 2016), anxiety (Chekroud, 2015), schizophrenia (Adams, Stephan, Brown, Frith, & Friston, 2013), interoception (Seth, 2013) and autism (Lawson, Rees, & Friston, 2014), though empirical validation is not yet available for many PP accounts of cognitive phenomena (Litwin & Miłkowski, 2020). Phenomenological control necessarily involves an interplay between expectancies and sensory information, and a large body of historical and contemporary research provides a rich background by which theorising can be constrained. In particular, the theory that response to imaginative suggestion arises directly from expectancies (response expectancy theory; Kirsch, 1985) can be readily redescribed in PP terms. Empirical evidence and theoretical arguments from research into hypnosis and imaginative suggestion are therefore directly relevant to simple PP accounts of response to imaginative suggestion. Here we consider phenomenological control from a PP perspective with the aim of assessing how far a simple theory based on the balance of sensory priors and signals might take us towards a PP account of ­phenomenological control. In the first section, we will review the history and key characteristics of phenomenological control to establish the empirical base for theory development. In the second section, we will consider three simple theories of phenomenological control: response expectancy theory (Kirsch & Lynn, 1999), in which experience arises directly from expectancies, the PP theory of hypnosis (Martin & Pacherie, 2019) and cold control theory (Dienes, 2012), which posits that phenomenological control centrally involves metacognition (and which appeals to processes not easily explained by current PP accounts). 2.1  Phenomenological Control as an Empirical Phenomenon The modern history of phenomenological control begins in late 18th century France with the introduction of mesmerism, in which simple, implicit suggestive procedures resulted in startling behaviour and remarkable cures for a variety of ailments. Mesmeric effects primarily manifested as alarming convulsions or fits but, in some cases, involved a response which would later be central to hypnotism – a ‘somnambulistic state’ (Gauld, 1992; Hammond, 2013). Franz Anton Mesmer believed that his

Expectancies and the Generation of Perceptual Experience 49 procedures manipulated a magnetic fluid, which he believed had curative effects when redistributed in his patient’s bodies. Initially, he employed repetitive strokes along the arms and hands of his patients, but alternative procedures were soon developed, including gazing into the eyes, applying pressure to the abdomen and the use of ‘magnetized’ props (e.g., iron rods or water-filled tubs). In 1784, with Mesmer attracting considerable attention in high society, Louis XVI commissioned a group of leading scientists (including Benjamin Franklin, Antoine Lavoisier and Joseph-Ignace Guillotin) to investigate his claims. This Royal Commission performed a series of simple experimental manipulations of demand characteristics, providing an important early example of experimental blinding and placebo control which pre-empt the modern controlled clinical trial (Herr, 2005). These demonstrated that a necessary precursor to mesmeric effects was the subject’s belief that mesmeric procedures were taking place. The commission concluded that mesmeric effects were attributable to imagination (Barnier & McConkey, 1991). However, they also made an error of interpretation that may have had important implications for subsequent understanding. Although they had tested only the posited mechanism of mesmerism rather than its potential curative properties, they reported that mesmerism could not be clinically effective and thus diverted scientific attention from the apparent efficacy (at least in some patients) of mesmeric treatments (Lynn & Lilienfeld, 2002; Perry & McConkey, 2002). Although Mesmer’s reputation never recovered, mesmerists fought back by publishing vast numbers of case reports detailing treatment of conditions, including head pain, eye disease and asthma (Sheehan & Perry, 1976). Mesmerism treatments remained popular with clinicians during the following century, perhaps reaching a peak of public awareness in John Elliotson’s controversial public displays and championing of mesmerism as surgical anaesthesia at University College Hospital (Moore, 2017). Theoretical and empirical developments, meanwhile, shifted away from Mesmer’s by now long-discredited theory. Of particular note, Abbé Faria adopted procedures which anticipated the key features of the hypnotic context – a direct verbal induction (‘sleep!’) which triggered a ‘­lucid sleep’. He also attributed the effects to the subject rather than to an external force, proposing concentration to play a central role (de Faria, 1819; Perry, 1978). Faria was largely ignored in his time, but his ideas are echoed in Braid’s influential introduction of the term ‘hypnosis’ (Braid, 1843), which drew an explicit analogy with sleep and finally severed the phenomena from the (by that time implausible) action of a magnetic fluid. The switch from mesmeric to hypnotic contexts was accompanied by a shift away from implicit suggestion, and during the 20th century, hypnosis research primarily involved explicit verbal imaginative suggestions

50  Peter Lush, Zoltan Dienes, and Anil Seth delivered following a ‘hypnotic induction’ procedure which was intended to bring about a hypnotic state. This state was thought to facilitate response to imaginative suggestion. However, it had been observed since at least the late 19th century (Bernheim, 1886) that hypnotic induction was not required for successful response. This observation was taken up by psychological scientists during the 20th century, who drew on experimental tests of non-hypnotic direct verbal imaginative suggestion (Barber, 1969; Hull, 1933; Weitzenhoffer, 1953; Wells, 1924); see Weitzenhoffer (1978). Employing a hypnotic induction before giving suggestions has historically been reported to show a small increase in response to imaginative suggestions (Barber & Glass, 1962; Braffman & Kirsch, 1999; Hilgard & Tart, 1966; Hull, 1933; Weitzenhoffer & Sjoberg, 1961). This increase in response after an induction versus no induction is not observed when the word ‘hypnosis’ is removed from an induction script (Gandhi & Oakley, 2005). A simple explanation is that this word situates the procedure in a hypnotic context, allowing participants to draw on their pre-existing beliefs about hypnosis. Such beliefs arise from multiple sources, including popular culture (e.g., novels, films, plays and stage hypnosis) and clinical hypnosis, but show consistency across individuals (Johnson & Hauck, 1999). From this perspective, hypnotic trance can be interpreted as an imaginative suggestion effect resulting from demand characteristics, including the suggestive nature of the induction and beliefs about hypnosis (Wagstaff, 2004). Consistent with this interpretation, attempts to determine brain activity that arises from a hypnotic induction yield highly inconsistent results across studies (Landry, Lifshitz, & Raz, 2017), as might be expected if the trance state arises from the interpretation of situational cues. It seems that virtually any procedure can produce a reported ‘hypnotic trance’, providing it is presented as hypnosis, e.g., a pill labelled ‘hypnosis’ (Glass & Barber, 1961), staring at the back of the hand (Page & Handley, 1991) high frequency strobe light (Kroger & Schneider, 1959) or pedalling on an exercise bike (Banyai & Hilgard, 1976). While the hypnotic context can sometimes produce a small boost in response to suggestion, it can at other times also reduce response in comparison to a non-hypnotic context. This reduction may be attributable to reactance arising from negative aspects of the hypnotic context (Lush, Scott, Seth, & Dienes, 2021); other studies report no significant increase (Meyer & Lynn, 2011; Milling, Coursen, Shores, & Waszkiewicz, 2010). In a recent review, Lynn, Kirsch, Terhune, and Green (2020) address a number of common myths about hypnosis, including that hypnosis involves a loss of control (Coe, 1973), that response involves gullibility (Coe, Kobayashi, & Howard, 1973; Moore, 1964), that hypnotic responding involves special attentional abilities (Dienes et al., 2009), that response is mere compliance (Evans & Orne, 1971; see Oakley & Halligan, 2013 for

Expectancies and the Generation of Perceptual Experience 51 a review of neurophysiological evidence) or faking (Kinnunen, Zamansky, & Block, 1994) and that response is automatic (rather than involving goal-directed plans and strategies; Spanos, 1986; White, 1941). Although neither a trance state nor the induction procedures intended to bring it about are required for response to imaginative suggestion, the myth of the hypnotist taking control of an unwilling subject persists today, e.g., in the use of the term ‘suggestibility’, which may imply susceptibility to coercion (Lush et al., 2021). However, responsive participants are not passive recipients of imaginative suggestions, but (at least implicitly) active agents, creatively interpreting pre-existing beliefs and cues arising from the situation (demand characteristics) to generate relevant experience (McConkey & Sheehan, 1982). Experiencing the distortions of volition and reality called for by imaginative suggestions typically takes capacity (Kirsch, Burgess, & Braffman, 1999; Tobis & Kihlstrom, 2010) and responses are only elicited when appropriate to the subject’s goals (Coe et al., 1973; Spanos, Menary, Brett, Cross, & Ahmed, 1987). As we will discuss in the following section, the degree to which participants are able to generate relevant experience is both highly variable between individuals and highly stable within them. 2.2  Individual Differences in Phenomenological Control That there are individual differences in response was noted by Mesmer as early as 1776. de Puységur focused his attention on highly responsive ‘somnambules’ (Gauld, 1992), and in the early 19th century, Faria reported that around 15–20% of subjects were able to produce ‘lucid sleep’ (de Faria, 1819; Perry, 1978). In the 20th century, this theme was taken up by Clark Hull (Hull, 1933), leading to the development of the Stanford scales which are still in use today (Weitzenhoffer & Hilgard, 1959, 1962, 1963) and which are the basis for many contemporary group scales, e.g., the Harvard Group Scale of Hypnotisability (Shor & Orne, 1963), the Sussex Waterloo Scale of Hypnotisability (Lush, Moga, McLatchie, & Dienes, 2018) and others (see Laurence, Beaulieu-Prévost, & Chéné, 2008 for a historical review). Non-hypnotic imaginative suggestion scales can be adapted from hypnosis scales, simply by removing reference to concepts related to hypnosis (e.g., Braffman & Kirsch, 1999; Lush et al., 2021; Oakley, Walsh, Mehta, Halligan, & Deeley, 2021). Imaginative suggestion scales present participants with a series of imaginative suggestions for counter-factual experiences which can be categorised according to the required response: motor (e.g., ‘involuntary’ hand lowering), perceptual (e.g., auditory, gustatory or tactile hallucinations) and cognitive (e.g., age regression). In addition to direct suggestions for experience, ‘challenge suggestions’ require that a particular act cannot be performed (e.g., paralysis, the inability to see something in the visual field,

52  Peter Lush, Zoltan Dienes, and Anil Seth or negative visual hallucination and amnesia (Woody & Barnier, 2008). Whole sample response rates for types of suggestion vary, which is taken to indicate suggestion difficulty. On traditional binary behavioural measures, approximately 80–90% of participants respond to motor suggestions, but response to a music hallucination suggestion occurs in around 5–10% (Bowers, 1998; Lush et al., 2018). Trait differences in phenomenological control are stable in adulthood (though children show higher response; London & Cooper, 1969), with test-retest reliability of .60 shown over 10 years (Morgan, Johnson, & Hilgard, 1974) and .71 over 25 years (Piccione, Hilgard, & Zimbardo, 1989) for the hypnotic context. There is also evidence for heritability – monozygotic twins show a correlation of 0.52–0.63 on scale scores, and dizygotics 0.18 (Morgan, 1973). Research into proposed relationships between trait phenomenological control in the hypnotic context and other measures of individual differences have historically often been confounded by demand characteristics, through context effects which occur when both measures are taken in the same session (Council, Kirsch, & Grant, 1996). When context effects are minimised (for example by taking the measures in different sessions and/or contexts), relationships with traits hypothesised to be related to hypnosis are generally weak, for example, absorption (Council, Kirsch, & Hafner, 1986), dissociation (Kirsch & Lynn, 1998), imagery vividness (Cabbai et al., 2023; Kogon et al., 1998), interrogative suggestibility (Malinoski & Lynn, 1999) and personality (Green, 2004). A possible exception is fantasy proneness (Silva & Kirsch, 1992), though even here the relationship is not straightforward; fantasy proneness is neither necessary nor sufficient for response (Lynn & Rhue, 1986). 2.3  Implicit Suggestions for Phenomenological Control Despite the origin of hypnosis in the implicit suggestion effects brought about by mesmeric procedures, the focus on explicit verbal suggestion which became central to the hypnotic context has been accompanied by a consequent lack of attention to implicit suggestion. By implicit suggestion, we mean that participants may realise what is required of them not because they are explicitly suggested, but because of their implicit or explicit beliefs about what is required. Mesmer’s clients knew they were meant to convulse; they did not need to be told ‘Now you will convulse!’. It has been argued that explicit and implicit sensory suggestions are driven by distinct mechanisms (Tasso & Pérez, 2008), though the argument is based on interpreting non-significant low-powered correlations as evidence for no effect. Orne (1959) demonstrated how beliefs that become part of the demand characteristics of the situation can become part of hypnotic response without being explicitly suggested: when participants

Expectancies and the Generation of Perceptual Experience 53 were given a talk indicating that a good hypnotic subject would manifest catalepsy of the dominant arm, those participants when later hypnotised manifested catalepsy of the dominant arm without an explicit hypnotic suggestion for that experience. Overlooking how readily people can respond to implicit suggestions with phenomenological control may have impeded researchers from realizing that participant expectancies arising from demand characteristics may produce suggestion effects – and therefore confound interpretation of effects in psychological science in a wide range of experimental contexts (Lush et al., 2020, 2022; Michael, Garry, & Kirsch, 2012). While Orne acknowledged that demand characteristics were not limited to faking but could involve real experience (e.g., in a study showing sensory deprivation effects for participants told that sitting in a room was a sensory deprivation experiment [Orne & Scheibe, 1964]), he seems to have disregarded the possibility that experience in response to demand characteristics might reflect trait response to imaginative suggestion outside of the hypnotic context (perhaps because of a focus on the then dominant trance theories of hypnosis). However, generating situationally appropriate experiences by phenomenological control may be one way in which participants can comply with their understanding of the experimental hypothesis arising from demand characteristics. When participants are both hypothesis aware (their beliefs conform to those of the experimenter) and compliant (they wish to be ‘good participants’), experimenters may incorrectly attribute the effects of demand characteristics (e.g., faking, imagination or phenomenological control) to other hypothesised mechanisms (see Corneille & Lush, 2023 for a conceptual model of demand characteristic effects including phenomenological control). Recently, we have begun to test a prediction which arises from this proposal: if participants respond with phenomenological control according to their trait ability, then we should see relationships between measures of experience and trait phenomenological control when demand characteristics have not been adequately controlled. The prediction holds for all the reports of anomalous experience we have so far tested (Lush et al., 2020, 2022): reports of experiencing touch or pain when seeing others touched or apparently in pain (mirror touch synaesthesia and vicarious pain [Blakemore, Bristow, Bird, Frith, & Ward, 2005; Fitzgibbon, Giummarra, Georgiou-Karistianis, Enticott, & Bradshaw, 2010]), of ownership of a fake hand (the rubber hand illusion [Botvinick & Cohen, 1998]), of a tingling sensation on the scalp while watching ‘trigger videos’ (the autonomic sensory meridian response or ASMR [Barratt & Davis, 2015]) and of sound heard while watching silent video (visually evoked auditory sensation or vEAR [Fassnidge & Freeman, 2018; Saenz & Koch, 2008]). These effects share key characteristics – they involve suggestive procedures

54  Peter Lush, Zoltan Dienes, and Anil Seth (without adequate control of demand characteristics); they result in anomalous experience which is variable between individuals, both in degree and in form; and they have all been proposed to at least partly involve top-down processes (see also relationships between indirect sensory suggestibility and response to body illusions [Forster, Karimpur, & Fiehler, 2022; Marotta, Tinazzi, Cavedini, Zampini, & Fiorio, 2016; Stone, Bullock, Keizer, & Dijkerman, 2018] and ASMR [Keizer, Chang, O’Mahony, Schaap, & Stone, 2020]). By contrast, classic visual illusions, which are generally attributed to bottom-up processes (the Müller-Lyer illusion and the vertical-horizontal illusion, both of which involve incorrectly judging the lengths of lines), have shown little to no relationship between reported experience and trait phenomenological control. While participants may be hypothesis aware for these visual illusion measurement procedures, responding with phenomenological control may be limited because they involve mechanisms which are largely encapsulated from top-down influence. Wherever participants are hypothesis aware, the presence or absence of relationships between experimentally induced effects and trait phenomenological control may provide an indication of the degree to which topdown and bottom-up processes are involved (Lush et al., 2022). Phenomenological control is unlikely to be limited to experimental and hypnotic situations. Both ASMR and vEAR are internet phenomena, which appear to have emerged without the assistance of scientists. Mirror touch and pain are not considered to be restricted to laboratory studies (Ward & Banissy, 2015). Many other anomalous phenomena reported in the wild may be attributable to phenomenological control: channelling spirits, dowsing, automatic writing and glossolalia, all bear surface similarity to imaginative suggestion effects, not least the experience of involuntariness over behaviours that would not normally be considered automatic (Dienes & Lush, 2023; Dienes & Perner, 2007). 2.4 Simple Theories of Phenomenological Control: Response Expectancy Theory and Cold Control A central explanatory target for theories of phenomenological control is how response to imaginative suggestion can bear the hallmarks of voluntary acts (e.g., be strategic and goal-directed) and yet be experienced as though it is automatic. Historically, theories of hypnosis can be distinguished according to whether the explanatory focus is on an ability or aptitude for response, or on particular attitudes towards responding (Benham, Woody, Wilson, & Nash, 2006). Dissociation accounts propose a specific mechanism underlying aptitude, for example, that the central executive monitoring function dissociates from the control structure, leading to a dissociation of intention and awareness (Hilgard, 1986; Woody &

Expectancies and the Generation of Perceptual Experience 55 Sadler, 2008). Sociocognitive accounts instead posit explanations rooted in established psychological concepts, which relate to attitudes (e.g., beliefs, expectancies, compliance and motivation; Spanos, 1986), and attribute differences in responding to differences in attitude rather than to a particular aptitude. A prominent contemporary sociocognitive theory proposes that response arises directly from expectancies (response expectancy theory; Kirsch, 1991; Lynn, 1997). This is the simplest of contemporary theories, as it attributes response to a single determinant, with other attitudes contributing to response only through their influence on expectancies. ­Expecting that one will respond in a particular way (e.g., by one’s arm rising involuntarily) will tend to produce that response, with no further intervening mechanisms at the psychological level. While it is appealingly simple, empirical evidence presents difficulties for the proposal that response to imaginative suggestion directly arises from expectancies. First, expectancies account for a relatively small amount of the variance in response compared with the amount attributable to an underlying aptitude (Benham et al., 2006), though this would be expected on any theory if the measures are not highly reliable. Second, strong expectancies do not appear to be sufficient to trigger many of the phenomena which follow from imaginative suggestions. Fully expecting a sheet of paper to be blank (when shown a series of blank sheets) does not lead to a negative visual hallucination when the paper turns out not to be blank (Wagstaff, Toner, & Cole, 2002). This accords with everyday experience – we do not, generally, hallucinate missing keys if convinced that we left them on the kitchen table (Kallio & Revonsuo, 2003); vanishing a rabbit in a hat might be a less popular magic trick if audiences hallucinated the missing rabbit. Response expectancy theory is a central component of contemporary theories of placebo (Kirsch, 1985). Indeed, Kirsch (1999) describes imaginative suggestion as a ‘non-deceptive placebo’ – a key theory of placebo response is that the experience (e.g., of pain going away) is directly produced by expecting it. Thus, response expectancy theory unites hypnotic response and placebo as fundamentally the same. However, evidence linking placebo and hypnosis is mixed, and there is some indication of distinct mechanisms (Parris, 2016). Notably, while placebo analgesia has been reported to be blocked by opioid antagonists (Benedetti & Amanzio, 1997), this is apparently not the case for analgesia following hypnotic suggestion (Spiegel & Albert, 1983). An extension to response expectancy theory – response set theory (Kirsch & Lynn, 1999; Lynn, 1997) – addresses the puzzle of how apparently voluntary behaviours come to be accompanied by an experience of involuntariness by arguing that all human behaviours and cognitive processes are automatic, as they arise directly from expectancies. For response

56  Peter Lush, Zoltan Dienes, and Anil Seth set theory, then, hypnotic responding is genuinely involuntary. Response set theory requires the assumption that conscious will is a retrospective illusion (Wegner, 2003), but the reappraisal of empirical work which had supported the claim that conscious will is illusory (e.g., Schurger, Sitt, & Dehaene, 2012; Sherman & Rivers, 2021) brings a central assumption of this account into question. So, while expectancies undoubtedly play an important role in phenomenological control, response expectancy theory appears to be insufficient as a complete account and requires supplementing with additional mechanisms (for a review of further issues for a strong version of response set theory, see Lynn, Green, Zahedi, & Apelian, 2023). An alternative approach to resolving the issue of reported experience of involuntariness over executive acts is to argue that the behaviours are voluntary, but the experience is involuntary. There are theories within both the dissociative (e.g., Hilgard, 1986) and sociocognitive (Spanos, 1991) traditions which take this approach. Cold control theory (Dienes, 2012; Dienes & Perner, 2007; Dienes et al., 2022) abstracts features of both traditions, supplementing the sociocognitive focus on attitude with the proposal of an aptitude for inaccurate metacognition. The theory draws on the higher order thought (HOT) theory of David Rosenthal (2012), according to which a mental state becomes conscious only when there is a higher order mental state directed at it, with the content that one is in the lower order mental state. In HOT theory, first order states are unconscious in the absence of higher order states; therefore, intentions, as first order states, are unconscious unless targeted by the right kind of HOT. It follows that intentional acts can be experienced as unintended by the strategic misattribution of (or simply by the absence of) HOTs of intending. So, imaginative suggestion effects are simply voluntary acts which are experienced as unintentional, e.g., action experienced as involuntary may be voluntary action, hallucinations and delusions may be intentional acts of imagination experienced as unintentional, and amnesia and analgesia may be the intentional application of unconscious strategies. Response using cold control always requires goal-directed strategies (which may be unconscious), and for some suggestions, there are a variety of potential strategies available. For example, being unable to raise one’s arm following a motor challenge suggestion can be achieved by tensing opposing muscles, activating the triceps while relaxing the biceps or by not trying to raise the arm (Galea, Woody, Szechtman, & Pierrynowski, 2010; Winkel, Younger, Tomcik, Borckardt, & Nash, 2006). Amnesia might be brought about through distraction, focusing on an irrelevant mental image or by not trying to remember (Spanos, 1986; Wagstaff, 2004). To be successful, a strategy needs only to be effective in meeting the participants’ interpretation of the suggestion, and for the intention to implement the strategy to remain unconscious. Different unconscious strategies may

Expectancies and the Generation of Perceptual Experience 57 therefore be employed by participants responding to the same suggestion, and suggestion difficulty may reflect the difficulty of enacting a strategy or the difficulty in sustaining inaccurate HOTs of intending for its duration. While response expectancy theory is the simplest account, it is difficult to reconcile with empirical evidence without appealing to additional processes, and a central assumption of the theory currently has little empirical support. We therefore argue that cold control is at present the simplest plausible theory of phenomenological control. 2.5  Predictive Processing While the term ‘predictive processing’ can refer to a variety of theoretical positions, here, it is employed for a family of positions which form the basis for much contemporary work in cognitive psychology and cognitive neuroscience (Clark, 2013; Hohwy, 2013; Hohwy & Seth, 2020). In this view, perception, cognition and action emerge from a probabilistic hierarchical framework in which sensory ‘prediction error’ signals are minimised through the recursive exchange of (mainly bottom-up) prediction errors (and mainly top-down) predictions. Perception and cognition are associated with the content of top-down predictions that are calibrated through prediction error minimisation, while action corresponds to self-fulfilling top-down predictions that overwhelm sensory prediction errors – this being ‘active inference’ (Friston, 2010). PP has roots in computational neuroscience and signal processing (e.g., predictive coding architectures; Rao & Ballard, 1999), in psychology in the form of the concepts of the Bayesian brain and perception as inference (Helmholtz, 1881) and in philosophy going back at least as far as Kant. Our exploration of PP accounts of phenomenological control will focus on a few key features of the approximate Bayesian processes by which this framework is proposed to operate. In PP architectures, sensory signals and top-down predictions are considered probability distributions which are characterised by a mean value and a precision (inverse variance), and it is the balance of precision that determines the relative influence of sensory signals and top-down predictions on the final perceptual inference (the posterior distribution, or Bayesian ‘best guess’). When played out in hierarchical terms, the posterior at one level forms the prior prediction for the level below, and the bottom-up signal for the level above. Within this hierarchy, fine-grained (spatially and temporally) sensory predictions and signals are processed at the lower levels, with increasingly abstracted predictions and signals at the higher levels, ultimately corresponding to beliefs. Critically, in PP, precisions themselves are not given but are estimated through the application of ‘precision expectations’ – i.e., expecting a signal to be precise increases the ‘precision-weighting’ of that signal and,

58  Peter Lush, Zoltan Dienes, and Anil Seth therefore, its influence on perceptual inference. The process of precisionweighting is typically associated with ‘paying attention’ (Friston, 2010), though it can also occur through action itself (e.g., by moving one’s gaze). Action, as mentioned, corresponds to self-fulfilling predictions in PP and can be considered a direct descendant of William James’ ideomotor theory. Specifically, execution of a particular action corresponds to a selffulfilling proprioceptive prediction in which top-down expectations about joint position and kinematics overwhelm proprioceptive prediction errors and are implemented through engaging reflex arcs. For this ‘active inference’ to happen, precision-weighting must tilt towards the top-predictions, implying a form of ‘dis-attention’ to proprioceptive sensory information. In other words, shifting one’s attention away from the current position of one’s arm is what allows it to move. 2.6  Predictive Processing and Response Expectancy Theory A PP account of response expectancy theory emerges from the reinterpretation of expectancies as sensory priors distributed over this hierarchy (Lynn et al., 2023). In this view, imaginative suggestions correspond to high level priors (beliefs) which propagate down the hierarchy. When precision on sensory priors is sufficient to outweigh the sensory signal, expected experience and behaviour would result. A suggested hallucination would arise directly from the weighting of sensory predictions for the hallucinated percept over contradictory sensory evidence, and an apparently involuntary movement from the precision on sensory priors for the proprioceptive consequences of movement over the sensory signal from the static limb. Can the challenges for response expectancy theory be easily resolved within this framework? First, consider the question of why we do not generally hallucinate when we are confident our keys are we left them and therefore expect to see them. According to PP, for a prior to outweigh a sensory signal, the signal must have relatively reduced precision. Leaving aside movement (if we assume we are looking directly at where the keys are expected to be), there are two ways by which the precision of signals may be reduced. First, noise in the signal will reduce (estimated) precision of that signal. If we expect to see the missing keys during a power cut, perhaps the sensory signal will be noisy enough for priors to dominate. This seems intuitively plausible – we may mistake shadows in a darkened room for an expected object. Some imaginative suggestion effects involve inherently low precision signals. For example, proprioception or touch felt on the back of the hand might be relatively noisy signals. Priors arising from imaginative suggestions for movement, or for an insect tickling the hand, could also conceivably be more precise than the relevant sensory evidence.

Expectancies and the Generation of Perceptual Experience 59 However, the simple over-weighting of sensory priors cannot easily account for phenomenological control in general. People can respond to imaginative suggestion when signals are not noisy, for example, hearing music or a buzzing fly in a quiet room or being unable to see a ball placed in front of them in broad daylight. The second way in which the precision of signals can be reduced is through attention. Could it be that these responses involve attending away from sensory input? Someone who is able to not see a ball while staring directly at it may simply be very good at reducing attention to the relevant part of the visual field (Barber, 1961). While this may seem intuitively plausible, there does not appear to be a reliable link between attention or executive control abilities and trait response (Dienes et al., 2009; Varga, Németh, & Szekely, 2011). As a possible response to this problem, Lynn et al. (2023) propose a ‘weak’ response expectancy theory of hypnosis, which, in addition to expectancies, appeals to other sociocognitive attitudes and abilities (e.g., motivation and rapport) and also to aptitudes (e.g., attention and the metacognitive abilities proposed by cold control theory). They propose that priors which are not updated by prediction error (‘stubborn predictions’) may account for non-veridical experience even when sensory precision is high. This suggestion, though, lacks an explanation for why priors are stubborn only in the particular situations in which phenomenological control occurs. 2.7  The Predictive Coding Model of Hypnosis Martin and Pacherie (2019) propose a PP account of response to direct imaginative suggestion – the predictive coding model – which draws on the role of attention in precision-weighting while taking into account the difficulties posed for attentional accounts by the lack of evidence for a link between trait hypnotisability and attentional abilities. Their model is illustrated by examining the wording of a direct motor suggestion for arm movement, identifying three key components: suggestions that the arm is moving, instructions to pay attention to sensation in the limb and finally, suggestions that the movement can be attributed to an external force (e.g., an imaginary magnetic force or heavy ball). The first two components are in conflict; active inference requires that attention is not directed to proprioceptive sensation in a limb because otherwise the prior will not outweigh the sensory signal and action will not occur. The predictive coding model resolves this conflict by proposing that priors for movement-related proprioceptive signals are particularly precise (due to the suggestion for movement). Therefore, even though sensory precision is high for proprioceptive signals from the static limb, the prior is yet more precise, and movement occurs. An unstable balance thus results from the high precision of both

60  Peter Lush, Zoltan Dienes, and Anil Seth signals, a balance which repeatedly shifts between the two distributions as the suggestion proceeds. When precision of the prior is higher, movement occurs, and movement ceases when the precision of sensory information is higher. The authors claim that movement in response to motor suggestion is typically dysfluent and argue that this directly results from the shifting balance of precision. The point of attending to proprioception is that having precise sensory information is not typical of voluntary movement, such that the attribution of involuntariness is facilitated by high sensory precision – so it is made just as high as is consistent with actual movement. Finally, the third aspect of the suggestion, that the movement is attributable to an imagined external force, completes the process of creating the experience of involuntariness, because it leads to a precise prior for ‘no agency’ which outweighs prediction errors for signals associated with intentional action. In short, the predictive coding model proposes that motor suggestions optimise both proprioceptive predictions and actual proprioceptive evidence through attentional modulation, yielding highly precise prediction errors that require explanation. The motor suggestion also supplies such an explanation by providing a prior of nonagency, accounting for involuntariness. The predictive coding model account of motor suggestions is a strong proposal which takes into account key empirical restraints for theories of phenomenological control, and which is constructed in such a way as to be falsifiable. For example, the observation that dysfluency is a characteristic of motor suggestion response is open to challenge; it, for example, may reflect demand characteristics of producing an action that is not produced normally. Formal observation of response may reveal participants who do not respond dysfluently (in our research, we have informally observed a variety of responses, including smooth movement). Further, the attribution of movement to an external force which plays a central role in the updating of priors in the predictive coding model is a ‘goal-directed fantasy’ (Spanos & McPeake, 1977). Although goal-directed fantasies are a strategy employed in responding, they are not the only available strategy and are not required; removing goal-directed fantasies can increase response (Comey & Kirsch, 1999). Further, hypnotic response occurs for suggestions containing counter imagery, e.g., that one’s arm is not moving (Zamansky, 1977; Zamansky & Clark, 1986; Zamansky & Ruehle, 1995). Still, the core notion that the sense of involuntariness arises because an unusually high sensory precision is produced is novel and compelling – and this too could be experimentally testable. While the predictive coding model is primarily focused on motor suggestion, Martin & Pacherie also provide brief accounts of other suggestion types. Perceptual hallucinations are attributed to mental imagery which,

Expectancies and the Generation of Perceptual Experience 61 due to the suggestion, is assigned high precision. This highly precise ‘imagined’ sensory data accords with the prior for the hallucinated percept and so this prior is not updated. If the precision of imagery-related signals explains hallucination experience, we might expect strong relationships between trait imagery vividness and trait phenomenological control. However, when context effects are avoided, relationships are weak (e.g., Phenomenological Control Scale score on a six point scale predicts Vividness of Visual Imagery score on a four point scale by just 0.2 points per scale point; Cabbai et al., 2023). This is not necessarily a fatal challenge. Perhaps attention to mental imagery is an ability sufficiently unrelated to trait visual imagery that it is not reflected in the corresponding measures. Alternatively, it may be that utilising scales consisting only of visual hallucination items will reveal a stronger relationship between trait visual imagery and phenomenological control. These proposals have yet to be tested. A second issue is that the sensory signals relating to imagery will conflict with incoming visual signals. Because of the problems facing attentional theories of phenomenological control, the predictive coding model attributes differences in suggestion difficulty to the precision assigned to sensory evidence (so suggestions will be easier in noisier environments). The theory then becomes identical to response expectancy theory. Although hallucinations are among the more difficult of suggestion effects, it is difficult to explain how precision afforded to mental imagery could be such that it dominates over clear visual signals, unless we once again appeal to unusual attentional abilities which can reduce the precision of the visual signal, or unless we propose the use of strategies (e.g., unfocusing or looking away). For other suggestion effects, the predictive coding model draws on sociocognitive accounts and appeals to the use of strategies. For example, amnesia is attributed to reduced precision afforded to the sensory evidence associated with particular memories, and paralysis following a motor challenge suggestion to precise priors for being unable to move. Note that the strategy for reducing precision to sensory evidence in amnesia and the strategy reflected by the prior can vary between individuals. These therefore appeal to processes beyond a simple balance of precision on priors and sensory evidence and will require fleshing out with PP theories of goaldirected strategy (Friston et al., 2016). While both response expectancy theory and the predictive coding model of motor suggestion can be plausibly applied to response to particular suggestions, they each require supplementation by appealing to additional mechanisms to account for the full range of phenomenological control effects. Simple PP accounts which appeal only to priors, sensory information and attention may be insufficient to account for all aspects of phenomenological control. So the question arises, how could goal-directed strategies

62  Peter Lush, Zoltan Dienes, and Anil Seth give rise to phenomenological control? We turn, then, to the next simplest theory after response expectancy theory – that response to imaginative suggestion involves voluntary acts which are experienced as unintentional due to inaccurate higher order thoughts. 2.8  Predictive Processing and Cold Control Although cold control is a conceptually simple theory, it may not be straightforward to reinterpret in PP terms. According to cold control, phenomenological control consists of voluntary acts and inaccurate HOTs of intending those acts. A simple PP view of this situation could assert that voluntary acts arise from self-fulfilling proprioceptive predictions at mid-levels of a hierarchy, while HOTs are implemented at higher levels and are perceptual inferences about the contents of mid-level predictions. However, to fully flesh out, these correspondences will require both a PP account of HOT theory, and an account of all the mental and physical acts that are involved in each case. Work towards the first requirement – a PP account of HOT – is at a preliminary stage (but see Nikolova, Waade, Friston, & Allen, 2022 for a sketch of some directions such a theory might take, and Fleming, 2020 for a proposal regarding metacognition). As for the second requirement, responses can involve, in theory, any act of which an individual is capable as well as the goal-directed strategies and plans involved. This requirement is therefore indistinguishable from the development of a PP account of general cognition. While generalisations of PP – notably the free energy principle (Friston, 2010) – have been claimed to provide adequate resources for full explanations of embodied perception, cognition and action, specific application to the particular mental and physical plans and acts typically involved in phenomenological control remains a challenge. The concept of thought is central to cold control. For example, in imagining the required sensory percept for a hallucination suggestion, or constructing a counter-factual world for a delusion, or in planning goaldirected strategies for response. But what would correspond to thought in PP? To rehearse, in ‘vanilla’ PP, low level sensory priors correspond to temporally and spatially fine-grained signals, with higher levels relating to ever greater temporal and spatial scale. In this way, high level cognitive processes (e.g., thoughts and beliefs) are proposed to emerge (Clark, 2013; Hohwy, 2013). Evidently, more is needed to account for specific varieties of high level cognition than merely increased abstraction. In particular, there are significant challenges associated with formulating PP views of cognition that respect compositionality – the flexible combination of concepts which is characteristic of conceptual thought – and generality – the ability to think about phenomena at any level of temporal or spatial

Expectancies and the Generation of Perceptual Experience 63 abstraction (Vance, 2015; Williams, 2020). Other challenges have been raised for PP accounts of other aspects of cognition directly relevant to a cold control account of phenomenological control, for example, see Klein (2018) for desires, Williams (2018) for delusion and Jurjako (2022) for self-deception. Although these and other issues are currently under debate (Clark, 2020; Rappe, 2022; Wilkinson, Deane, Nave, & Clark, 2019), PP remains most directly applicable to perception, action and selected aspects of cognition such as belief formation. The incorporation of strategic, goal-directed responses into PP has a somewhat stronger track record. Here it is worth considering in more detail the role of intentions in both cold control and in PP. While response expectancy theory does not appeal to intentions (conscious intention is illusory on this theory), cold control theory requires the existence of first order intentions which can be misrepresented in higher order thoughts. To illustrate the importance of this difference, consider the predictions each theory supports regarding relationships between response to imaginative suggestion and response to placebo. In response expectancy theory, placebo and phenomenological control effects are attributable to a single mechanism. Cold control, however, posits a strategic, goal-directed response in phenomenological control. A direct imaginative suggestion invites a strategic response to generate the required experience. Believing an expected effect will be caused by a medicine when a deceptive placebo is administered may or may not invite such strategic response; a placebo effect may arise directly from expectancies (e.g., through conditioning), or from phenomenological control. In PP (or, more generally, in active inference), an intention can be considered a prior over future (sensory) states: technically, a policy. Goals, by contrast, can be considered priors over policies (Pezzulo, Rigoli, & Friston, 2018). Augmenting the resources of PP to include goals and policies – ­beyond a (mere) hierarchy of increasingly abstract priors and prediction errors – enables us to address the strategic, goal-directed nature of responses under phenomenological control (though see Roskies & Wood, 2017). Note that this allows a simple adaptation of Martin and Pacherie’s predictive coding model – we have high-precision proprioceptive sensory evidence and high-precision priors, but now the latter operate over policies, not proprioceptive predictions per se. The direct imaginative suggestion which invites a strategic response to generate a required experience now corresponds to a strong prior over a policy (setting the goal state), which in turn elicits that policy (the intention; the prior over future states); notice that a policy can be with respect to sensory states that are proprioceptive (for motor suggestions) and/or exteroceptive/interoceptive (for perceptual suggestions). But what still remains to be explained is the phenomenology of involuntariness.

64  Peter Lush, Zoltan Dienes, and Anil Seth One option is provided by noticing that policy selection in active inference has both automatic context-insensitive and context-sensitive components (Parr, Holmes, Friston, & Pezzulo, 2023). Perhaps, the phenomenology of involuntariness corresponds to the suggestion preferentially recruiting (by some mechanism) the automatic component of policy selection. This casts in PP terms the central claim of dissociated control theory of hypnosis (Woody & Bowers, 1994), in which a contention scheduling system (which drives habitual behaviour; Norman & Shallice, 1986) triggers suggested actions when, due to a hypnotic state, the executive system is weakened. These accounts which propose that response to imaginative suggestion is automatic are difficult to reconcile with the wealth of evidence that response can involve executive control (Dienes & Perner, 2007; Raz, Shapiro, Fan, & Posner, 2002; Spanos, Radtke, & Dubreuil, 1982; Terhune, Cleeremans, Raz, & Lynn, 2017) and were later abandoned by their original authors (Woody & Sadler, 2008). The challenge therefore remains the development of a full-fledged PP account of the first order thoughts that people intend and the higher order thoughts about those intentions, such that the higher order thoughts can be inaccurate. Consider a suggestion that one is right now Elvis Presley (requiring a thought about smallish time scales – right now), and the HOT that one did not intend to pretend that. A fit for purpose version of PP would be able to fluently account for this situation and others like it. In general, key to future PP theories of hypnotic suggestion will be the extent to which such theories make distinctive predictions about behaviour, subjective reports and/or physiological and neurophysiological responses – and in doing so go beyond being mere redescriptions of observed phenomena or other theories. 2.9 Conclusion Some key characteristics of phenomenological control appear to, on surface examination, invite simple PP or active inference theories based on the relative weighting of priors and sensory signals. From this starting point, an extensive theoretical and empirical base regarding the posited mechanisms which support hypnotic responding, including expectancies and attention, constrain the range of viable PP accounts of phenomenological control. It seems unlikely that, at this point in time, a simple theory based on the relative weighting of priors and sensory signals can provide a satisfactory account of phenomenological control. However, there are promising avenues for further development of active inference theories based on attentional modulation and on the incorporation of intention and goal-­ directed planning. Further conceptual and empirical work fleshing out active inference accounts of aspects of cognition central to cold control and other theories of hypnotic response seems likely to pay dividends.

Expectancies and the Generation of Perceptual Experience 65 Conversely, research into phenomenological control in the hypnotic context since the earliest days of psychological science provides a rich source of data and theory regarding the role of expectancies in experience and associated neurophysiological processes, which may be fruitfully employed to inform and test PP accounts of cognition. Acknowledgements AKS and PL are grateful to the Dr. Mortimer and Theresa Sackler Foundation for support. AKS is also supported by the European Research Council Advanced Investigator Grant CONSCIOUS (AKS; grant number 101019254). We are grateful to Karl Friston and Giovanni Pezzulo for helpful discussions. References Adams, R., Stephan, K., Brown, H., Frith, C., & Friston, K. (2013). The computational anatomy of psychosis. Frontiers in Psychiatry, 4. https://www.frontiersin. org/articles/10.3389/fpsyt.2013.00047 Banyai, E. I., & Hilgard, E. R. (1976). A comparison of active-alert hypnotic induction with traditional relaxation induction. Journal of Abnormal Psychology, 85, 218–224. https://doi.org/10.1037/0021-843X.85.2.218 Barber, T. X. (1961). Physiological effects of hypnosis. Psychological Bulletin, 58(5), 390–419. https://doi.org/10.1037/h0042731 Barber, T. X. (1969). Hypnosis: A scientific approach. New York: Van Nostrand Reinhold. Barber, T. X., & Glass, L. B. (1962). Significant factors in hypnotic behavior. The Journal of Abnormal and Social Psychology, 64(3), 222–228. https://doi. org/10.1037/h0041347 Barnier, A. J., & McConkey, K. M. (1991). The Benjamin Franklin report on animal magnetism: A summary comment. Australian Journal of Clinical and Experimental Hypnosis, 19(2), 77–86. Barratt, E. L., & Davis, N. J. (2015). Autonomous sensory meridian response (ASMR): A flow-like mental state. PeerJ, 3, e851. https://doi.org/10.7717/ peerj.851 Barrett, L. F., Quigley, K. S., & Hamilton, P. (2016). An active inference theory of allostasis and interoception in depression. Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1708), 20160011. https://doi. org/10.1098/rstb.2016.0011 Benedetti, F., & Amanzio, M. (1997). The neurobiology of placebo analgesia: From endogenous opioids to cholecystokinin. Progress in Neurobiology, 52(2), 109–125. https://doi.org/10.1016/S0301-0082(97)00006-3 Benham, G., Woody, E. Z., Wilson, K. S., & Nash, M. R. (2006). Expect the unexpected: Ability, attitude, and responsiveness to hypnosis. Journal of Personality and Social Psychology, 91(2), 342–350. https://doi.org/10.1037/00223514.91.2.342

66  Peter Lush, Zoltan Dienes, and Anil Seth Blakemore, S.-J., Bristow, D., Bird, G., Frith, C., & Ward, J. (2005). Somatosensory activations during the observation of touch and a case of vision–touch synaesthesia. Brain, 128(7), 1571–1583. https://doi.org/10.1093/brain/awh500 Botvinick, M., & Cohen, J. (1998). Rubber hands ‘feel’ touch that eyes see. Nature, 391(6669), Article 6669. https://doi.org/10.1038/35784 Bowers, K. S. (1998). Waterloo-Stanford group scale of hypnotic susceptibility, form c: Manual and response booklet. International Journal of Clinical and Experimental Hypnosis, 46(3), 250–268. https://doi.org/10.1080/00207149808410006 Braffman, W., & Kirsch, I. (1999). Imaginative suggestibility and hypnotizability: An empirical analysis. Journal of Personality and Social Psychology, 77(3), 578–587. https://doi.org/10.1037/0022-3514.77.3.578 Braid, J. (1843). Neurypnology, or the rationale of nervous sleep, considered in relation with animal magnetism. London: J. Churchill. Cabbai, G., Dance, C., Dienes, Z., Simner, J., Forster, S., & Lush, P. (2023). Investigating relationships between trait visual imagery and phenomenological control: The role of context effects. PsyArXiv. https://doi.org/10.31234/osf.io/ 7qmfj Chekroud, A. M. (2015). Unifying treatments for depression: An application of the free energy principle. Frontiers in Psychology, 6. https://www.frontiersin.org/ articles/10.3389/fpsyg.2015.00153 Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181–204. https://doi. org/10.1017/S0140525X12000477 Clark, A. (2015). Surfing uncertainty: Prediction, action, and the embodied mind. Oxford: Oxford University Press. Clark, A. (2020). Beyond desire? Agency, choice, and the predictive mind. Australasian Journal of Philosophy, 98(1), 1–15. https://doi.org/10.1080/00048402. 2019.1602661 Coe, W. C. (1973). A further evaluation of responses to an uncompleted posthypnotic suggestion. American Journal of Clinical Hypnosis, 15, 223–228. Coe, W. C., Kobayashi, K., & Howard, M. L. (1973). Experimental and ethical problems of evaluating the influence of hypnosis in antisocial conduct. Journal of Abnormal Psychology, 82, 476–482. https://doi.org/10.1037/h0035365 Comey, G., & Kirsch, I. (1999). Intentional and spontaneous imagery in hypnosis: The phenomenology of hypnotic responding. International Journal of Clinical and Experimental Hypnosis, 47(1), 65–85. https://doi.org/10.1080/002071 49908410023 Corneille, O., & Lush, P. (2023). Sixty years after Orne’s American psychologist article: A conceptual framework for subjective experiences elicited by demand characteristics. Personality and Social Psychology Review, 27(1), 83–101. https:// doi.org/10.1177/10888683221104368 Council, J. R., Kirsch, I., & Grant, D. L. (1996). Imagination, expectancy, and hypnotic responding. In Hypnosis and imagination (pp. 41–65). Baywood Publishing Co. https://doi.org/10.1017/9781108580298.043 Council, J. R., Kirsch, I., & Hafner, L. P. (1986). Expectancy versus absorption in the prediction of hypnotic responding. Journal of Personality and Social Psychology, 50, 182–189. https://doi.org/10.1037/0022-3514.50.1.182

Expectancies and the Generation of Perceptual Experience 67 de Faria, J. C. (1819). De la cause du sommeil lucide ou étude de la nature de l’homme. París: Mme Horiac. Dienes, Z. (2012). Is hypnotic responding the strategic relinquishment of metacognition? In Foundations of metacognition (pp. 267–278). Oxford: Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199646739.003.0017 Dienes, Z., Brown, E., Hutton, S., Kirsch, I., Mazzoni, G., & Wright, D. B. (2009). Hypnotic suggestibility, cognitive inhibition, and dissociation. Consciousness and Cognition, 18(4), 837–847. https://doi.org/10.1016/j.concog.2009.07.009 Dienes, Z., & Lush, P. (2023). The role of phenomenological control in experience. Current Directions in Psychological Science, 32(2), 145–151. https://doi. org/10.1177/09637214221150521 Dienes, Z., Lush, P., & Palfi, B. (2022). Controlling phenomenology by being unaware of intentions. In J. Weisberg (Ed.), Qualitative consciousness: Themes from the philosophy of David Rosenthal (pp. 229–242). Cambridge University Press. https://doi.org/10.1017/9781108768085.017 Dienes, Z., Lush, P., Palfi, B., Roseboom, W., Scott, R., Parris, B., … Lovell, M. (2022). Phenomenological control as cold control. Psychology of Consciousness: Theory, Research, and Practice, 9, 101–116. https://doi.org/10.1037/ cns0000230 Dienes, Z., & Perner, J. (2007). Executive control without conscious awareness: The cold control theory of hypnosis. In Hypnosis and conscious states: The cognitive neuroscience perspective (pp. 293–314). Oxford: Oxford University Press. Evans, F. J., & Orne, M. T. (1971). The disappearing hypnotist: The use of simulating subjects to evaluate how subjects perceive experimental procedures. International Journal of Clinical and Experimental Hypnosis, 19(4), 277–296. https:// doi.org/10.1080/00207147108407173 Fassnidge, C. J., & Freeman, E. D. (2018). Sounds from seeing silent motion: Who hears them, and what looks loudest? Cortex, 103, 130–141. https://doi. org/10.1016/j.cortex.2018.02.019 Fitzgibbon, B. M., Giummarra, M. J., Georgiou-Karistianis, N., Enticott, P. G., & Bradshaw, J. L. (2010). Shared pain: From empathy to synaesthesia. Neuroscience & Biobehavioral Reviews, 34(4), 500–512. https://doi.org/10.1016/ j.neubiorev.2009.10.007 Fleming, S. M. (2020). Awareness as inference in a higher-order state space. Neuroscience of Consciousness, 2020(1), niz020. https://doi.org/10.1093/nc/niz020 Forster, P.-P., Karimpur, H., & Fiehler, K. (2022). Why we should rethink our approach to embodiment and presence. Frontiers in Virtual Reality, 3. https:// www.frontiersin.org/articles/10.3389/frvir.2022.838369 Friston, K. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience, 11(2), Article 2. https://doi.org/10.1038/nrn2787 Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., O’Doherty, J., & Pezzulo, G. (2016). Active inference and learning. Neuroscience & Biobehavioral Reviews, 68, 862–879. https://doi.org/10.1016/j.neubiorev.2016.06.022 Galea, V., Woody, E. Z., Szechtman, H., & Pierrynowski, M. R. (2010). Motion in response to the hypnotic suggestion of arm rigidity: A window on underlying mechanisms. International Journal of Clinical and Experimental Hypnosis, 58(3), 251–268. https://doi.org/10.1080/00207141003760561

68  Peter Lush, Zoltan Dienes, and Anil Seth Gandhi, B., & Oakley, D. A. (2005). Does ‘hypnosis’ by any other name smell as sweet? The efficacy of ‘hypnotic’ inductions depends on the label ‘hypnosis’. Consciousness and Cognition, 14(2), 304–315. https://doi.org/10.1016/j.concog.2004.12.004 Gauld, A. (1992). A history of hypnotism. Cambridge: Cambridge University Press. Glass, L. B., & Barber, T. X. (1961). A note on hypnotic behavior, the definition of the situation and the placebo effect. The Journal of Nervous and Mental Disease, 132(6), 539. Green, J. P. (2004). The five factor model of personality and hypnotizability: Little variance in common. Contemporary Hypnosis, 21(4), 161–168. https://doi. org/10.1002/ch.303 Hammond, D. C. (2013). A review of the history of hypnosis through the late 19th century. American Journal of Clinical Hypnosis, 56(2), 174–191. https://doi.org/ 10.1080/00029157.2013.826172 Helmholtz, H. (1881). On the relation of optics to painting. In E. Atkinson (Ed.), Popular lectures on scientific subjects, second series (pp. 73–138). New York: D Appleton & Company. https://doi.org/10.1037/12825-003 Herr, H. W. (2005). Franklin, Lavoisier, and Mesmer: Origin of the controlled clinical trial. Urologic Oncology: Seminars and Original Investigations, 23(5), 346–351. https://doi.org/10.1016/j.urolonc.2005.02.003 Hilgard, E. R. (1986). Divided consciousness: Multiple controls in human thought and action. Hoboken, NJ: John Wiley & Sons Inc. Hilgard, E. R., & Tart, C. T. (1966). Responsiveness to suggestions following waking and imagination instructions and following induction of hypnosis. Journal of Abnormal Psychology, 71, 196–208. https://doi.org/10.1037/h0023323 Hohwy, J. (2013). The predictive mind. Oxford: Oxford University Press. Hohwy, J., & Seth, A. (2020). Predictive processing as a systematic basis for identifying the neural correlates of consciousness. Philosophy and the Mind Sciences, 1(II), Article II. https://doi.org/10.33735/phimisci.2020.II.64 Hull, C. L. (1933). Hypnosis and suggestibility. New York: Appleton-Century. Johnson, M. E., & Hauck, C. (1999). Beliefs and opinions about hypnosis held by the general public: A systematic evaluation. American Journal of Clinical Hypnosis, 42(1), 10–20. https://doi.org/10.1080/00029157.1999.10404241 Jurjako, M. (2022). Can predictive processing explain self-deception? Synthese, 200(4), 303. https://doi.org/10.1007/s11229-022-03797-6 Kallio, S., & Revonsuo, A. (2003). Hypnotic phenomena and altered states of consciousness: A multilevel framework of description and explanation. Contemporary Hypnosis, 20(3), 111–164. https://doi.org/10.1002/ch.273 Keizer, A., Chang, T. H. (Rebecca), O’Mahony, C. J., Schaap, N. S., & Stone, K. D. (2020). Individuals who experience autonomous sensory meridian response have higher levels of sensory suggestibility. Perception, 49(1), 113–116. https:// doi.org/10.1177/0301006619891913 Kihlstrom, J. F. (2008). The domain of hypnosis, revisited. In The Oxford handbook of hypnosis: Theory, research, and practice (pp. 21–52). Oxford: Oxford University Press. Kinnunen, T., Zamansky, H. S., & Block, M. L. (1994). Is the hypnotized subject lying? Journal of Abnormal Psychology, 103, 184–191. https://doi.org/ 10.1037/0021-843X.103.2.184

Expectancies and the Generation of Perceptual Experience 69 Kirsch, I. (1991). The social learning theory of hypnosis. In Theories of hypnosis: Current models and perspectives (pp. 439–465). New York: Guilford Press. Kirsch, I. (1999). Clinical hypnosis as a nondeceptive placebo. In Clinical hypnosis and self-regulation: Cognitive-behavioral perspectives (pp. 211–225). American Psychological Association. https://doi.org/10.1037/10282-008 Kirsch, I. (1985). Response expectancy as a determinant of experience and behavior. https://doi.org/10.1037/0003-066X.40.11.1189 Kirsch, I., Burgess, C. A., & Braffman, W. (1999). Attentional resources in hypnotic responding. International Journal of Clinical and Experimental Hypnosis, 47(3), 175–191. https://doi.org/10.1080/00207149908410031 Kirsch, I., & Lynn, S. J. (1998). Dissociation theories of hypnosis. Psychological Bulletin, 123(1), 100–115. https://doi.org/10.1037/0033-2909.123.1.100 Kirsch, I., & Lynn, S. J. (1999). Hypnotic involuntariness and the automaticity of everyday life (pp. 72). American Psychological Association. https://doi.org/ 10.1037/10282-002 Klein, C. (2018). What do predictive coders want? Synthese, 195(6), 2541–2557. https://doi.org/10.1007/s11229-016-1250-6 Kogon, M. M., Jasiukaitis, P., Berardi, A., Gupta, M., Kosslyn, S. M., & Spiegel, D. (1998). Imagery and hypnotizability revisited. International Journal of Clinical and Experimental Hypnosis, 46(4), 363–370. https://doi.org/ 10.1080/00207149808410015 Kroger, W. S., & Schneider, S. A. (1959). An electronic aid for hypnotic induction: A pret, immary report. International Journal of Clinical and Experimental Hypnosis, 7(2), 93–98. https://doi.org/10.1080/00207145908415812 Landry, M., Lifshitz, M., & Raz, A. (2017). Brain correlates of hypnosis: A systematic review and meta-analytic exploration. Neuroscience & Biobehavioral Reviews, 81, 75–98. https://doi.org/10.1016/j.neubiorev.2017.02.020 Laurence, J.-R., Beaulieu-Prévost, D., & Chéné, T. du. (2008). Measuring and understanding individual differences in hypnotizability. In The Oxford handbook of hypnosis: Theory, research, and practice (pp. 225–253). Oxford: Oxford University Press. Lawson, R. P., Rees, G., & Friston, K. J. (2014). An aberrant precision account of autism. Frontiers in Human Neuroscience, 8. https://www.frontiersin.org/ articles/10.3389/fnhum.2014.00302 Litwin, P., & Miłkowski, M. (2020). Unification by fiat: Arrested development of predictive processing. Cognitive Science, 44(7), e12867. https://doi.org/10.1111/ cogs.12867 London, P., & Cooper, L. M. (1969). Norms of hypnotic susceptibility in children. Developmental Psychology, 1, 113–124. https://doi.org/10.1037/h0027002 Lush, P., Botan, V., Scott, R. B., Seth, A. K., Ward, J., & Dienes, Z. (2020). Trait phenomenological control predicts experience of mirror synaesthesia and the rubber hand illusion. Nature Communications, 11(1), Article 1. https://doi.org/ 10.1038/s41467-020-18591-6 Lush, P., Moga, G., McLatchie, N., & Dienes, Z. (2018). The Sussex-Waterloo Scale of Hypnotizability (SWASH): Measuring capacity for altering conscious experience. Neuroscience of Consciousness, 2018(1). https://doi.org/10.1093/ nc/niy006

70  Peter Lush, Zoltan Dienes, and Anil Seth Lush, P., Scott, R. B., Seth, A. K., & Dienes, Z. (2021). The phenomenological control scale: Measuring the capacity for creating illusory nonvolition, hallucination and delusion. Collabra: Psychology, 7(1), 29542. https://doi.org/10.1525/ collabra.29542 Lush, P., Seth, A., Dienes, Z., & Scott, R. B. (2022). Trait Phenomenological Control in Top-Down and Bottom-up Effects: ASMR, Visually Evoked Auditory Response and the Müller-Lyer Illusion. PsyArXiv. https://doi.org/10.31234/osf. io/hw4y9 Lynn, S. J. (1997). Automaticity and hypnosis: A sociocognitive account. International Journal of Clinical and Experimental Hypnosis, 45(3), 239–250. https:// doi.org/10.1080/00207149708416126 Lynn, S. J., Green, J. P., Zahedi, A., & Apelian, C. (2023). The response set theory of hypnosis reconsidered: Toward an integrative model. American Journal of Clinical Hypnosis, 65(3), 186–210. https://doi.org/10.1080/00029157.2022. 2117680 Lynn, S. J., Kirsch, I., & Hallquist, M. N. (2008). Social cognitive theories of hypnosis. In The Oxford handbook of hypnosis: Theory, research, and practice (pp. 111–139). Oxford: Oxford University Press. Lynn, S. J., Kirsch, I., Terhune, D. B., & Green, J. P. (2020). Myths and misconceptions about hypnosis and suggestion: Separating fact and fiction. Applied Cognitive Psychology, 34(6), 1253–1264. https://doi.org/10.1002/acp.3730 Lynn, S. J., & Lilienfeld, S. (2002). A critique of the franklin commission report: Hypnosis, belief and suggestion. International Journal of Clinical and Experimental Hypnosis, 50(4), 369–386. https://doi.org/10.1080/00207140208410111 Lynn, S. J., & Rhue, J. W. (1986). The fantasy-prone person: Hypnosis, imagination, and creativity. Journal of Personality and Social Psychology, 51, 404–408. https://doi.org/10.1037/0022-3514.51.2.404 Malinoski, P. T., & Lynn, S. J. (1999). The plasticity of early memory reports: Social pressure, hypnotizability, compliance and interrogative suggestibility. International Journal of Clinical and Experimental Hypnosis, 47(4), 320–345. https://doi.org/10.1080/00207149908410040 Marotta, A., Tinazzi, M., Cavedini, C., Zampini, M., & Fiorio, M. (2016). Individual differences in the rubber hand illusion are related to sensory suggestibility. PLoS One, 11(12), e0168489. https://doi.org/10.1371/journal.pone.0168489 Martin, J.-R., & Pacherie, E. (2019). Alterations of agency in hypnosis: A new predictive coding model. Psychological Review, 126(1), 133–152. http://dx.doi. org/10.1037/rev0000134 McConkey, K. M., & Sheehan, P. W. (1982). Effort and experience on the creative imagination scale. International Journal of Clinical and Experimental Hypnosis, 30(3), 280–288. https://doi.org/10.1080/00207148208407265 Meyer, E. C., & Lynn, S. J. (2011). Responding to hypnotic and nonhypnotic suggestions: Performance standards, imaginative suggestibility, and response expectancies. International Journal of Clinical and Experimental Hypnosis, 59(3), 327–349. https://doi.org/10.1080/00207144.2011.570660 Michael, R. B., Garry, M., & Kirsch, I. (2012). Suggestion, cognition, and behavior. Current Directions in Psychological Science, 21(3), 151–156. https://doi.org/ 10.1177/0963721412446369

Expectancies and the Generation of Perceptual Experience 71 Milling, L. S., Coursen, E. L., Shores, J. S., & Waszkiewicz, J. A. (2010). The predictive utility of hypnotizability: The change in suggestibility produced by hypnosis. Journal of Consulting and Clinical Psychology, 78(1), 126–130. http://dx. doi.org/10.1037/a0017388 Moore, R. K. (1964). Susceptibility to hypnosis and susceptibility to social influence. The Journal of Abnormal and Social Psychology, 68(3), 282–294. http:// dx.doi.org/10.1037/h0048401 Moore, W. (2017). John Elliotson, Thomas Wakley, and the mesmerism feud. The Lancet, 389(10083), 1975–1976. https://doi.org/10.1016/S0140-6736(17) 31296-5 Morgan, A. H. (1973). The heritability of hypnotic susceptibility in twins. Journal of Abnormal Psychology, 82, 55–61. https://doi.org/10.1037/h0034854 Morgan, A. H., Johnson, D. L., & Hilgard, E. R. (1974). The stability of hypnotic susceptibility: A longitudinal study. International Journal of Clinical and Experimental Hypnosis, 22(3), 249–257. https://doi.org/10.1080/00207147408413004 Nikolova, N., Waade, P. T., Friston, K. J., & Allen, M. (2022). What might interoceptive inference reveal about consciousness? Review of Philosophy and Psychology, 13(4), 879–906. https://doi.org/10.1007/s13164-021-00580-3 Norman, D. A., & Shallice, T. (1986). Attention to Action. In R. J. Davidson, G. E. Schwartz, & D. Shapiro (Eds.), Consciousness and Self-Regulation: Advances in Research and Theory. Volume 4 (pp. 1–18). Springer. https://doi.org/ 10.1007/978-1-4757-0629-1 Oakley, D. A., & Halligan, P. W. (2013). Hypnotic suggestion: Opportunities for cognitive neuroscience. Nature Reviews Neuroscience, 14(8), Article 8. https:// doi.org/10.1038/nrn3538 Oakley, D. A., Walsh, E., Mehta, M. A., Halligan, P. W., & Deeley, Q. (2021). Direct verbal suggestibility: Measurement and significance. Consciousness and Cognition, 89, 103036. https://doi.org/10.1016/j.concog.2020.103036 Orne, M. T. (1959). The nature of hypnosis: Artifact and essence. The Journal of Abnormal and Social Psychology, 58(3), 277–299. http://dx.doi.org/10.1037/ h0046128 Orne, M. T. (1962). On the social psychology of the psychological experiment: With particular reference to demand characteristics and their implications. American Psychologist, 17(11), 776–783. http://dx.doi.org/10.1037/h0043424 Orne, M. T., & Scheibe, K. E. (1964). The contribution of nondeprivation factors in the production of sensory deprivation effects: The psychology of the ‘panic button.’ The Journal of Abnormal and Social Psychology, 68, 3–12. https://doi. org/10.1037/h0048803 Page, R. A., & Handley, G. W. (1991). A comparison of the effects of standardized Chiasson and eye-closure inductions on susceptibility scores. American Journal of Clinical Hypnosis, 34(1), 46–50. https://doi.org/10.1080/00029157.1991. 10402959 Parr, T., Holmes, E., Friston, K. J., & Pezzulo, G. (2023). Cognitive effort and active inference. Neuropsychologia, 184, 108562. https://doi.org/10.1016/j. neuropsychologia.2023.108562 Parris, B. A. (2016). The prefrontal cortex and suggestion: Hypnosis vs. placebo effects. Frontiers in Psychology, 7. https://doi.org/10.3389/fpsyg.2016.00415

72  Peter Lush, Zoltan Dienes, and Anil Seth Perry, C. (1978). The Abbé Faria: A neglected figure in the history of hypnosis. In F. H. Frankel & H. S. Zamansky (Eds.), Hypnosis at its bicentennial: Selected papers (pp. 37–45). Springer US. https://doi.org/10.1007/978-1-46132859-9_3 Perry, C., & McConkey, K. M. (2002). The franklin commission report in light of past and present understandings of hypnosis. International Journal of Clinical and Experimental Hypnosis, 50(4), 387–396. https://doi.org/10.1080/002071 40208410112 Pezzulo, G., Rigoli, F., & Friston, K. J. (2018). Hierarchical active inference: A theory of motivated control. Trends in Cognitive Sciences, 22(4), 294–306. https:// doi.org/10.1016/j.tics.2018.01.009 Piccione, C., Hilgard, E. R., & Zimbardo, P. G. (1989). On the degree of stability of measured hypnotizability over a 25-year period. Journal of Personality and Social Psychology, 56(2), 289–295. http://dx.doi.org/10.1037/0022-3514.56.2.289 Rao, R. P. N. & Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1), 79–89. https://doi.org/10.1038/4580 Rappe, S. (2022). Predictive minds can think: Addressing generality and surface ­compositionality of thought. Synthese, 200(1), 13. https://doi.org/10.1007/s11229022-03502-7 Raz, A., Shapiro, T., Fan, J., & Posner, M. I. (2002). Hypnotic suggestion and the modulation of Stroop interference. Archives of General Psychiatry, 59(12), 1155–1161. https://doi.org/10.1001/archpsyc.59.12.1155 Rosenthal, D. (2012). Higher-order awareness, misrepresentation and function. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1594), 1424–1438. https://doi.org/10.1098/rstb.2011.0353 Roskies, A., & Wood, C. (2017). Catching the prediction wave in brain science. Analysis, 77(4), 848–857. https://doi.org/10.1093/analys/anx083 Saenz, M., & Koch, C. (2008). The sound of change: Visually-induced auditory synesthesia. Current Biology, 18(15), R650–R651. https://doi.org/10.1016/j.cub. 2008.06.014 Schurger, A., Sitt, J. D., & Dehaene, S. (2012). An accumulator model for spontaneous neural activity prior to self-initiated movement. Proceedings of the National Academy of Sciences, 109(42), E2904–E2913. https://doi.org/10.1073/ pnas.1210467109 Seth, A. K. (2013). Interoceptive inference, emotion, and the embodied self. Trends in Cognitive Sciences, 17(11), 565–573. https://doi.org/10.1016/j.tics. 2013.09.007 Sheehan, P. W., & Perry, C. W. (1976). Methodologies of hypnosis: A critical appraisal of contemporary paradigms of hypnosis. Mahwah, NJ: Lawrence Erlbaum. Sherman, J. W., & Rivers, A. M. (2021). There’s nothing social about social priming: Derailing the “train wreck”. Psychological Inquiry, 32(1), 1–11. https://doi. org/10.1080/1047840X.2021.1889312 Shor, R. E., & Orne, E. C. (1963). Norms on the Harvard group scale of hypnotic susceptibility, form A. International Journal of Clinical and Experimental Hypnosis, 11(1), 39–47. https://doi.org/10.1080/00207146308409226

Expectancies and the Generation of Perceptual Experience 73 Silva, C. E., & Kirsch, I. (1992). Interpretive sets, expectancy, fantasy proneness, and dissociation as predictors of hypnotic response. Journal of Personality and Social Psychology, 63, 847–856. https://doi.org/10.1037/0022-3514.63.5.847 Spanos, N. P. (1986). Hypnotic behavior: A social-psychological interpretation of amnesia, analgesia, and “trance logic”. Behavioral and Brain Sciences, 9(3), 449–467. https://doi.org/10.1017/S0140525X00046537 Spanos, N. P. (1991). A sociocognitive approach to hypnosis. In Theories of hypnosis: Current models and perspectives (pp. 324–361). New York: Guilford Press. Spanos, N. P., & McPeake, J. D. (1977). Cognitive strategies, reported goaldirected fantasy, and response to suggestion in hypnotic subjects. American Journal of Clinical Hypnosis, 20(2), 114–123. https://doi.org/10.1080/00029157. 1977.10403913 Spanos, N. P., Menary, E., Brett, P. J., Cross, W., & Ahmed, Q. (1987). Failure of posthypnotic responding to occur outside the experimental setting. Journal of Abnormal Psychology, 96, 52–57. https://doi.org/10.1037/0021-843X.96.1.52 Spanos, N. P., Radtke, H. L., & Dubreuil, D. L. (1982). Episodic and semantic memory in posthypnotic amnesia: A reevaluation. Journal of Personality and Social Psychology, 43, 565–573. https://doi.org/10.1037/0022-3514.43.3.565 Spiegel, D., & Albert, L. H. (1983). Naloxone fails to reverse hypnotic alleviation of chronic pain. Psychopharmacology, 81(2), 140–143. https://doi.org/10.1007/ BF00429008 Stone, K. D., Bullock, F., Keizer, A., & Dijkerman, H. C. (2018). The disappearing limb trick and the role of sensory suggestibility in illusion experience. Neuropsychologia, 117, 418–427. https://doi.org/10.1016/j.neuropsychologia.2018.07.012 Tasso, A. F., & Pérez, N. A. (2008). Parsing everyday suggestibility: What does it tell us about hypnosis? Oxford: Oxford University Press. https://doi.org/10.1093/ oxfordhb/9780198570097.013.0011 Terhune, D. B., Cleeremans, A., Raz, A., & Lynn, S. J. (2017). Hypnosis and topdown regulation of consciousness. Neuroscience & Biobehavioral Reviews, 81, 59–74. https://doi.org/10.1016/j.neubiorev.2017.02.002 Tobis, I. P., & Kihlstrom, J. F. (2010). Allocation of attentional resources in posthypnotic suggestion. International Journal of Clinical and Experimental Hypnosis, 58(4), 367–382. https://doi.org/10.1080/00207144.2010.499330 Vance, J. (2015). The Predictive Mind [Review of the book The Predictive Mind, by J. Hohwy]. Notre Dame Philosophical Reviews. https://ndpr.nd.edu/reviews/ the-predictive-mind/ Varga, K., Németh, Z., & Szekely, A. (2011). Lack of correlation between hypnotic susceptibility and various components of attention. Consciousness and Cognition, 20(4), 1872–1881. https://doi.org/10.1016/j.concog.2011.09.008 Wagstaff, G. F. (2004). High hypnotizability in a sociocognitive framework. In The highly hypnotizable person. London: Routledge. Wagstaff, G. F., Toner, S., & Cole, J. (2002). Is response expectancy sufficient to account for hypnotic negative hallucinations? Contemporary Hypnosis, 19(3), 133–138. https://doi.org/10.1002/ch.250 Ward, J., & Banissy, M. J. (2015). Explaining mirror-touch synesthesia. Cognitive Neuroscience, 6(2–3), 118–133. https://doi.org/10.1080/17588928.2015. 1042444

74  Peter Lush, Zoltan Dienes, and Anil Seth Wegner, D. M. (2003). The mind’s best trick: How we experience conscious will. Trends in Cognitive Sciences, 7(2), 65–69. https://doi.org/10.1016/S13646613(03)00002-0 Weitzenhoffer, A. M. (1953). Hypnotism: An objective study in suggestibility (pp. xvi, 380). New York: John Wiley. Weitzenhoffer, A. M. (1978). What did he (Bernheim) say? In F. H. Frankel & H. S. Zamansky (Eds.), Hypnosis at its bicentennial: Selected papers (pp. 47–56). Springer US. https://doi.org/10.1007/978-1-4613-2859-9_4 Weitzenhoffer, A. M., & Hilgard, E. R. (1959). Stanford hypnotic susceptibility scale: Forms A and B, for use in research investigations in the field of hypnotic phenomena. Palo Alto, CA: Consulting Psychologists Press. Weitzenhoffer, A. M., & Hilgard, E. R. (1962). Stanford hypnotic susceptibility scale: To be used in conjunction with forms A and B in research investigations in the field of hypnotic phenomena. Form C. Palo Alto, CA: Consulting Psychologists Press. Weitzenhoffer, A. M., & Hilgard, E. R. (1963). Stanford profile scales of hypnotic susceptibility: Forms I and II, to provide measures of differential susceptibility to a variety of suggestions within the induced hypnotic state. Palo Alto, CA: Consulting Psychologists Press. Weitzenhoffer, A. M., & Sjoberg, B. M. (1961). Suggestibility with and without ‘induction of hypnosis.’ Journal of Nervous and Mental Disease, 132, 204–220. https://doi.org/10.1097/00005053-196103000-00002 Wells, W. R. (1924). Experiments in waking hypnosis for instructional purposes. The Journal of Abnormal Psychology and Social Psychology, 18, 389–404. https://doi.org/10.1037/h0074476 White, R. W. (1941). A preface to the theory of hypnotism. The Journal of Abnormal and Social Psychology, 36, 477–505. https://doi.org/10.1037/h0053844 Wilkinson, S., Deane, G., Nave, K., & Clark, A. (2019). Getting warmer: Predictive processing and the nature of emotion. In L. Candiotto (Ed.), The value of emotions for knowledge (pp. 101–119). London: Palgrave MacMillan. https:// doi.org/10.1007/978-3-030-15667-1_5 Williams, D. (2018). Hierarchical Bayesian models of delusion. Consciousness and Cognition, 61, 129–147. https://doi.org/10.1016/j.concog.2018.03.003 Williams, D. (2020). Predictive coding and thought. Synthese, 197(4), 1749–1775. https://doi.org/10.1007/s11229-018-1768-x Winkel, J. D., Younger, J. W., Tomcik, N., Borckardt, J. J., & Nash, M. R. (2006). Anatomy of a hypnotic response: Self-report estimates, actual behavior, and physiological response to the hypnotic suggestion for arm rigidity. The International Journal of Clinical and Experimental Hypnosis, 54(2), 186–205. https:// doi.org/10.1080/00207140500528430 Woody, E. Z., & Barnier, A. J. (2008). Hypnosis scales for the twenty-first century: What do we need and how should we use them? Oxford: Oxford University Press. https://doi.org/10.1093/oxfordhb/9780198570097.013.0010 Woody, E. Z., & Bowers, K. S. (1994). A frontal assault on dissociated control. In Dissociation: Clinical and theoretical perspectives (pp. 52–79). New York: The Guilford Press.

Expectancies and the Generation of Perceptual Experience 75 Woody, E. Z., & Sadler, P. (2008). Dissociation theories of hypnosis. In The ­Oxford handbook of hypnosis: Theory, research, and practice (pp. 81–110). Oxford: Oxford University Press. Zamansky, H. S. (1977). Suggestion and countersuggestion in hypnotic behavior. Journal of Abnormal Psychology, 86, 346–351. https://doi.org/10.1037/ 0021-843X.86.4.346 Zamansky, H. S., & Clark, L. E. (1986). Cognitive competition and hypnotic behavior: Whither absorption? International Journal of Clinical and Experimental Hypnosis, 34(3), 205–214. https://doi.org/10.1080/00207148608406986 Zamansky, H. S., & Ruehle, B. L. (1995). Making hypnosis happen: The involuntariness of the hypnotic experience. International Journal of Clinical and Experimental Hypnosis, 43(4), 386–398. https://doi.org/10.1080/00207149508409983

3

The Synergistic Relationship between Perception and Action Clare Press, Emily Thomas, and Daniel Yon

It is widely believed that organisms have been shaped to perceive – both phylogenetically and ontogenetically – purely to serve action. If we do not move, we do not survive, because we must source food and avoid predators. Perception allows us to determine the location of our food, where our predators live, and when our actions to approach and avoid these entities need correcting. Psychologists assume that we select our actions based upon representations of their anticipated outcomes (Hommel, Müsseler, Aschersleben, & Prinz, 2001; James, 1890) and continuously compare sensory input against predictions to determine and correct ­errors ­(Desmurget & Grafton, 2000). Actions also shape perception (Press & Cook, 2015). For instance, some elements of our perceptual environment become relevant for the task at hand and are therefore more readily sampled. We saccade towards objects to be grasped and towards planned landing locations, improving perception of events at these locations (­ Johansson, Westling, Bäckström, & Flanagan, 2001). When we move an effector, we attenuate processing of all tactile events presented to that effector (Seki & Fetz, 2012), and when we move our eyes, we are largely blind to visual input during those saccades (Matin, 1974). One of the ubiquitous ways in which action and perception interact – and the focus of this chapter – is via predictive mechanisms (Press, Kok, & Yon, 2020a, 2020b). A lifetime of producing actions and perceiving contingent outcomes causes predictive associations to form between perceptual and motoric representations (Cook, Bird, Catmur, Press, & Heyes, 2014). Actions thus furnish predictions about what is likely to be perceived, e.g., I can predict that if I send motor commands to grasp the cup in front of me, I am likely to see my hand approach the cup, I will feel cutaneous stimulation on my fingertips, and I will hear a certain sound as I place the cup back on the table. The present chapter unpacks how these predictive mechanisms work – examining how action predictions shape our perceptual experience. It will consider theoretical accounts that have prevailed across the last few DOI: 10.4324/9781003084082-5

The Synergistic Relationship between Perception and Action 77 decades and contrast these with accounts postulated in the broader sensory cognition literature. We will consider how these accounts conflict and how these conflicts may be resolved. We will conclude by outlining the likely impact of these resolutions on predictive processing theories of action and perception, as well as accounts that explain how we determine our agency over the external world. 3.1 How Action Predictions Shape Perception: Conflicting Theoretical Work For some time, researchers have claimed that action predictions fundamentally alter our perceptual experiences. Theorising along these lines started in the 1970s when the fact that we cannot tickle ourselves was first interrogated by scientists (Blakemore, Wolpert, & Frith, 1998; Weiskrantz, Elliott, & Darlington, 1971; Figure 3.1a). The theories that resulted from this extensive line of work – which was subsequently extended with studies presenting a wide range of tactile events (e.g., Bays, Wolpert, & Flanagan, 2005, 2006) – propose that we perceptually cancel the anticipated consequences of our actions. Specifically, cancellation theories postulate that we predict the resulting sensory outcomes of actions and “subtract” these from the input (Blakemore et al., 1998). Such a mechanism is thought to result in attenuated perception of anticipated consequences of action, manifesting in less intense perception (Bays et al., 2005; Kilteni, Houborg, & Ehrsson, 2019; Figure 3.1c) or – relatedly – lower likelihood of these sensory events being detected (Cardoso-Leite, Mamassian, Schütz-Bosbach, & Waszak, 2010). Proponents of these theories suggest that this cancellation process is adaptive because downweighting perception of signals we predict renders us better able to perceive unexpected events, and it is the unexpected that is more informative to us. The unexpected highlights inconsistencies between the world and our current beliefs about it and, therefore, highlights when we must act differently or update our models. However, these theories are in direct opposition to accounts developed in other sensory disciplines across the last few decades. The theories that have prevailed outside of action – e.g., within disciplines asking how we perceive simple visual features of our environment, like edges, objects and scenes – propose that we use probabilistic information to upweight, rather than downweight, perception of what we expect (Bar, 2004; de Lange, Heilbron, & Kok, 2018). These theories are typically couched within Bayesian frameworks and state that we determine our perceptual experience (“posterior”) by combining our prediction (“prior”) with the sensory input (“likelihood”), resulting in expected events being more likely to be perceived and perceived with greater intensity (Figure 3.1c). This process is proposed to be adaptive because expected events are more likely to be

78  Clare Press, Emily Thomas, and Daniel Yon

Figure 3.1 (a) If we cancel what we expect from perception, this will render us more able to perceive the unexpected, yielding more informative experiences for the organism. These cancellation accounts have prevailed across the years in sensorimotor disciplines – examining how we perceive the sensory consequences of our actions. They are thought to explain why we cannot tickle ourselves and, more generally, why action consequences may not be readily perceived. (b) Sensory input is often noisy, especially when acquired at speed. Therefore, combining the input with an expectation of what should be there will tend to render perceptual experiences more accurate than if I rely solely on the input. Under this Bayesian account, upweighting perception of what we expect will yield more veridical experiences. Bayesian accounts have been popular across a range of perceptual disciplines for the last few decades. (c) A cartoonised depiction of Bayesian and cancellation logic relative to baseline, assuming expectation of sight of a right hand.

there – assuming a certain stability to our environment – and biasing our experiences in line with these likelihoods will, on average, make them more veridical. For instance, consider that I am walking home at dusk and take a quick glance up the path in front. The input may look something like the cartoonised panel in Figure 3.1b. It has been obtained quickly and is inherently noisy due to poor lighting, such that it does not provide a particularly faithful representation of what is in fact out there. However, I also have the expectation that I will see some boys in front. If I combine the noisy input with this expectation, this combination provides a more accurate representation of what is out there than if I relied solely upon the noisy input.

The Synergistic Relationship between Perception and Action 79 Despite this theoretical conflict, we must consider whether action predictions really do shape perception in a wholly different fashion from other predictions. Although the adaptive arguments for Bayesian upweighting and cancellation downweighting are different, they do not appear to apply differentially according to whether we predict based upon action. As such, perceptual downweighting will yield more informative experiences in limited capacity systems, highlighting those elements of the environment that the organism truly must process. However, we need our experiences to be informative regardless of domain. For example, if I check on my sleeping child and they are on the floor rather than in bed, I will be surprised. Devoting resources to this surprising information will allow me to return them back to bed. However, the error signal is generated based upon the visual context (bed positioning relative to child) rather than any action I performed (I walked into the room in the same way I do every night). Conversely, perceptually upweighting what we expect will instead render our experiences more veridical in the face of sensory noise and rapid sampling, and it is crucial that our experiences are veridical also during action. For example, we need an approximately accurate representation of the location of our hand across time when planning to pick up a cup. Optimising perception to serve one function will therefore suboptimise it for another. The cancellation mechanisms proposed by action researchers to yield informative experiences would directly render our experiences less veridical. Upweighting Bayesian mechanisms proposed elsewhere for generating veridical experiences would render our experiences less informative. Therefore, at the theoretical level, we have a paradox. 3.2  How Action Predictions Shape Perception: Empirical Work If the theoretical level suggests a paradox, we must ask whether there is also an empirical paradox. If conflicting accounts have prevailed across decades in different disciplines, it would be logical to assume that they are based upon hundreds of conflicting studies. However, until recently, it has been difficult to compare the domains directly due to incomparable empirical approaches. Some recent work has allowed for more direct comparison, and we outline this work in the following section. 3.2.1 Psychophysics

Signal detection measures of perception are popular in the broader sensory cognition literature, which involve presenting difficult-to-detect stimuli and determining how expectation influences detectability (Meijs, Mostert, Slagter, de Lange, & van Gaal, 2019; Stein & Peelen, 2015; Wyart, Nobre, & Summerfield, 2012). For example, Wyart et al. (2012) presented

80  Clare Press, Emily Thomas, and Daniel Yon gratings embedded in spatial noise that were expected or unexpected on the basis of a colour cue. Participants were biased to detect expected gratings, i.e., if they expected a clockwise tilted grating based upon the colour cue, they were more likely to report seeing one. Reverse correlation analyses – which determine the relationship between the particular level of stimulus information and responses – demonstrated that this effect was more likely due to perceptual than response biasing. Despite the range of perceptual insights to be garnered through such signal detection measures, they have been rarely employed in action (but see Cardoso-Leite et al., 2010; Schwarz, Pfister, Kluge, Weller, & Kunde, 2018). We therefore recently implemented such measures to assess comparability to effects outside of action (Yon, Zainzinger, de Lange, Eimer, & Press, 2020). In this series of experiments, participants performed index or middle finger abductions and observed avatar-hand action outcomes that were congruent (expected) or incongruent (unexpected; Figure 3.2a) with these actions. Performing an action biased perceptual decisions towards congruent outcomes, such that an index finger abduction increased the probability of reporting a visually presented index finger abduction. Drift diffusion modelling (Ratcliff & McKoon, 2008) indicated that this bias was best explained by a

Figure 3.2 Psychophysical methods for studying the influence of action predictions on perception. (a) In Yon, Zainzinger, de Lange, Eimer, and Press (2021), participants performed an action in response to a numerical imperative cue. This action sometimes generated presentation of a briefly presented, backward-masked avatar finger movement, and sometimes in presentation of only the masks. Participants were required to report whether fingers had moved at the end of the trial and were biased to report movement when it was congruent with their action. (b) In Thomas, Yon, de Lange, and Press (2022), participants performed an action that generated a punctate tactile force on fingers that remained stationary. These events were expected or unexpected on the basis of the particular action. Tactile events were rated as more forceful when they were expected on the basis of action. The findings in these visual and tactile experiments provide support for Bayesian accounts of how actions influence perception.

The Synergistic Relationship between Perception and Action 81 difference in drift biasing according to expectation, such that evidence was more readily accumulated in line with expectations. These findings importantly mirror those by Wyart et al. (2012) when coloured cues predicted gratings – indicating that when signal detection paradigms are employed within action, evidence in support of Bayesian upweighting accounts can also be found. On the other hand, action-perception studies predominantly examine perceived signal intensity rather than signal detection. For example, typical tactile studies record the level of perceived tickliness or force. Examining intensity rather than detection should not alter the operation of the outlined mechanisms per se, because both intensity and detection judgements are thought to be modulated via sensory gain. For instance, processes that increase sensory gain on expected channels will make expected events more likely to be detected and also increase their perceived intensity. Similarly, decreasing gain will make them less detectable and less intense. The detection threshold is therefore treated as the lower bound of perceptible intensities in formal models (Brown, Adams, Parees, Edwards, & Friston, 2013; Yon & Press, 2017). However, in stark contrast to the evidence supporting upweighting in detection studies, studies examining perceived intensity have widely reported support for cancellation accounts. In a typical study, participants are presented with punctate tactile events and asked to rate their force (Bays et al., 2005, 2006; Kilteni et al., 2019; Shergill et al., 2013). The most common comparison is between an “active” condition, where participants perform an action that generates this force, and a “passive condition” – where the same force is judged in the absence of action. Such studies commonly report that actively generated forces are perceived less intensely than passive forces and attribute this reduction in intensity to the operation of mechanisms that attenuate perception of predicted action outcomes. However, action can influence perception via a range of mechanisms (Press & Cook, 2015; Press et al., 2020a, 2020b). For example, as outlined earlier, spinal mechanisms are thought to reduce processing of any events – regardless of their expectedness – presented to a moving effector (Seki & Fetz, 2012). It is therefore crucial to isolate the particular predictive contribution of actions to understand the nature of underlying mechanisms. Such isolation requires action to be performed in all trials and comparisons between events that are expected and unexpected on the basis of that action (Press et al., 2020). Outside of action contexts, the influence of predictive mechanisms is measured by training participants in statistical mappings between cues and outcomes (e.g., Kok, Jehee, & de Lange, 2012). We employed a similar manipulation in a series of action studies examining tactile perception (Thomas et al., 2022). Participants were trained in mappings between actions and tactile outcomes, e.g., every

82  Clare Press, Emily Thomas, and Daniel Yon time they moved a right-hand finger upwards, they would be stimulated on their left-hand index finger (Figure 3.2b). We degraded this contingency at test such that some of the tactile events were consistent with this learnt mapping, and therefore “expected”, and others were unexpected. We found that tactile forces were perceived as more intense when they were expected relative to unexpected – consistent with Bayesian accounts of perception and inconsistent with cancellation accounts. Drift diffusion modelling demonstrated that biased evidence accumulation contributed to the perceptual biasing effects – similarly to the visual studies described above (Yon, Bunce, & Press, 2020). In line with these findings in touch, similar manipulations in vision demonstrate that expected action outcomes are perceived with greater luminance (Yon & Press, 2017, 2018). 3.2.2 Neuroimaging

The nature of underlying mechanisms can also be informed via neuroimaging studies. One of the largest differences between neural studies in the action domain and elsewhere is the analysis approach. Cancellation accounts in action are typically supported by functional magnetic resonance imaging (fMRI) studies demonstrating a lower blood-oxygen-leveldependent (BOLD) signal in sensory brain regions for expected action outcomes (Blakemore et al., 1998; Kilteni & Ehrsson, 2020; Shergill et al., 2013; Stanley & Miall, 2007; Uhlmann et al., 2020). These findings are attributed to a cancellation of expected neural responses. However, Bayesian accounts have also been demonstrated to be consistent with findings of lower signal in sensory brain regions. For example, Kok et al. (2012) presented participants with gratings whose orientation was expected or unexpected on the basis of a preceding tone. They found that expected trials were associated with a lower signal in primary visual cortex, but intriguingly that pattern classifiers (trained and tested on grating orientations) performed with superior accuracy relative to unexpected trials. These patterns can be understood if considering that expectation “sharpens” representation in sensory brain regions – consistent with Bayesian accounts of perception. Understanding this logic requires considering that visual cortices contain neural populations tuned towards different visual features (Figure 3.3a). For example, if trials present a middle finger movement, the populations tuned towards middle fingers will be more active than those tuned towards other fingers. Cancellation accounts assume that the lower BOLD signal on expected trials is generated by a signal reduction in those populations tuned towards the presented event (left panel). However, a lower signal could also be generated via a sharpening process (right panel), whereby expecting a middle finger movement activates populations tuned towards middle fingers and lateral inhibition

The Synergistic Relationship between Perception and Action 83

Figure 3.3 Neural mechanisms. (a) Demonstration of cancellation (left) and sharpening (right) mechanisms generating a lower sensory signal for expected events – in this instance a middle finger abduction. (b) Yon, Gilbert, de Lange, and Press (2018) found superior pattern classification of observed avatar finger movements from visual cortical activation when the movements were congruent relative to incongruent with executed actions. (c) This superior decoding for congruent events was associated with a reduced signal in voxels tuned away from the presented event. These findings are consistent with sharpening accounts of how action predictions modulate perceptual processing – which are consistent with Bayesian perceptual accounts.

suppresses processing in other populations. The finding of superior pattern classification for expected events is more consistent with this sharpening account than cancellation theories. It could also generate behavioural biases towards perceiving what we expect. Kok et al. (2012) further found support for the sharpening idea via univariate analyses demonstrating that presentation of expected (e.g., clockwise) gratings attenuates the signal in voxels that preferentially respond to unexpected (counter-clockwise) rather than expected events. We have recently applied similar logic and analyses to action processing. In one study (Yon et al., 2018), participants produced manual finger

84  Clare Press, Emily Thomas, and Daniel Yon abductions and observed congruent or incongruent actions of an onscreen avatar. Across a range of visual regions, we found superior classification of observed visual avatar movements when they were congruent relative to incongruent with executed actions (Figure 3.3b). In line with the sharpening account, the signal was found only to be attenuated in voxels tuned away from the presented stimulus, with no signal reduction in voxels tuned towards it (Figure 3.3c). A second study replicated this pattern in a study where expectations were established via statistical relationships presented in a training session (Press et al., 2020). Therefore, when employing comparable analyses, comparable sharpening effects are found in action domains and other areas of sensory cognition. 3.2.3  Remaining Evidence for Cancellation

One may interpret these findings as suggesting that Bayesian theories should be applied across perceptual disciplines and cancellation accounts simply discarded. We perceptually upweight what we expect regardless of domain. For example, the above findings have demonstrated that lower neural signal in sensory brain regions is not necessarily reflective of a cancellation-type process. However, this conclusion would be premature. First, as already outlined in Section 3.1, the adaptive arguments provided for why one needs informative experience are persuasive, and a mechanism that solely renders our experiences less informative – via perceptual upweighting of expected events – would appear maladaptive. Second, there remains some empirical evidence for the operation of cancellation mechanisms that is more convincing. Some of this empirical evidence comes from the domain of action. For example, Roussel, Hughes, and Waszak (2013, 2014) trained participants in action-letter mappings and found that contrast discrimination of expected letters was poorer than that of unexpected letters. In a contrast judgement study, we also found that expected visual action outcomes were perceived with greater contrast than unexpected outcomes at 50 ms after presentation, but by 200 ms, this bias had reversed – such that unexpected outcomes were perceived with greater intensity (Yon & Press, 2017). This later effect is more consistent with cancellation than Bayesian accounts. There is also some ­behavioural evidence of cancellation outside of action. For example, Denison, Sheynin, and Silver (2016) trained participants in arbitrary sequences of natural scene images and found that images consistent with such trained mappings were less likely to break into perceptual awareness in a binocular rivalry paradigm (but see Alilović, Slagter, & van Gaal, 2020; Denison, Piazza, & Silver, 2011; Meijs, Slagter, de Lange, & van Gaal, 2018). There is also neural evidence of cancellation-like mechanisms. For example, an electroencephalography study required participants to raise or

The Synergistic Relationship between Perception and Action 85 lower their index finger and observe congruent or incongruent outcomes (Press, Gherri, Heyes, & Eimer, 2010). Congruent outcomes were associated with smaller magnitude N1 and N2 components relative to incongruent outcomes. A number of studies from outside of action have also demonstrated inferior neural representation of expected sensory events. For example, Blank and Davis (2016) employed representational similarity analysis to demonstrate that there was less information associated with a particular heard word in posterior superior temporal sulcus when it was expected on the basis of preceding congruent visual text (see also Blank, Spangenberg, & Davis, 2018). Similar effects have been observed in magnetoencephalography studies (Sohoglu & Davis, 2020). Objects expected on the basis of learnt statistical regularities have also been associated with less informative sensory representations in a range of visual areas – such as primary visual cortex and lateral occipital cortex – compared to unexpected objects (Richter & de Lange, 2019; Richter, Ekman, & de Lange, 2018). 3.3  A Theoretical Resolution To accommodate this range of findings across domains and provide an account of how our experiences could be rendered both veridical and informative, some of us have provided a novel account that incorporates reasoning from both theories (Press et al., 2020b; see Figure 3.4). It preserves the adaptive function of determining informative experiences but via a different mechanism from that proposed in current cancellation accounts.

Figure 3.4 A working theory of how expectations could render our experiences both veridical and informative, with application to how actions and other predictive cues influence perception. Source: Taken from Press et al. (2020).

86  Clare Press, Emily Thomas, and Daniel Yon We propose that perception is pre-emptively biased towards what is expected, to generate largely veridical experiences rapidly in the face of sensory noise. This primary mechanism would operate as outlined in Bayesian accounts, by “sharpening” neural processing – upweighting representations of expected events and relatively inhibiting other sensory representations (Blom, Feuerriegel, Johnson, Bode, & Hogendoorn, 2020; Feuerriegel, Blom, & Hogendoorn, 2021; Kok, Mostert, & de Lange, 2017). However, if events are presented that are especially inconsistent with current models, reactive processes are elicited to generate a precise perceptual representation of these events – to enable accurate model updating. That is, the initial error signal broadcasts that this event is inconsistent with one’s existing models but the precise nature of the event is uncertain. Processing resources are therefore devoted to the unexpected event to resolve this uncertainty and enable accurate model updating. A key part of this theory is that not all “unexpected” events are informative for model updating. For example, it is common to present hundreds of trials in which events that follow cues with 75% probability are “expected” and those that follow cues with 25% probability are “unexpected”. The unexpected events in such settings may be less likely than the expected events, but the organism could hold a model that these events will be shown with a particular likelihood. Additionally, many unexpected events are presented with low clarity, like those presented within signal detection tasks. Low clarity unexpected stimuli will not deviate dramatically from expectations because the sensory evidence is imprecise. Intuitively, we cannot be surprised by an event that we do not detect. In these settings, observers may encounter events that are less probable from the omniscient experimenter’s point of view, but such events have a very different functional significance in comparison to clear signals from the environment that contradict existing predictive models. We hypothesise that the processes highlighting truly unexpected inputs are reactive because the only way one could pre-emptively highlight them is to downweight the expected. However, such a mechanism would render our experiences less veridical and the vast majority of unexpected inputs are uninformative for model updating. Furthermore, processes have been identified within the learning and inference literature that could be responsible for such reactive highlighting. Specifically, the presentation of highly surprising events is associated with phasic catecholamine release, which is thought to facilitate model updating (Dayan & Yu, 2006; Marshall et al., 2016; Redgrave & Gurney, 2006; Yu & Dayan, 2005). Such facilitation may act by increasing the sensory gain associated with current input (Lawson, Rees, & Friston, 2014, 2017). This hypothesis is based upon findings that noradrenergic modulation alters the signal-to-noise ratio within sensory cortical and thalamic regions (Hasselmo, Linster, Patil, Ma, & Cekic,

The Synergistic Relationship between Perception and Action 87 1997; Hirata, Aguilar, & Castro-Alamancos, 2006) but is yet to be determined whether such phasic catecholaminergic effects mediate learning via a modulation of perception. We hypothesise that such a relationship will ultimately be demonstrated. One clear prediction made by this account is that the influence of expectations on perceptual processing – and hence perception – will often change across time. The processes upweighting perception of the expected operate in anticipation of its presentation. In contrast, processes that upweight the unexpected operate reactively, due to the phasic release of catecholamines. Such phasic release likely occurs ~70–100 ms after presentation of the stimulus (Redgrave & Gurney, 2006; Yu & Dayan, 2005) and would therefore, in most cases, only be expected to influence perception after that time. This change across time can be seen in the psychophysics studies outlined above that demonstrate expected action outcomes to be perceived with greater intensity 50 ms after presentation, but unexpected events to be more intense by 200 ms (Yon & Press, 2017). This account also makes the prediction that cancellation-type effects (i.e., reduced perception of the expected, generated via enhanced perception of the unexpected) can never be observed in early time windows because they operate according to reactive processes. It is therefore also consistent with the fact that cancellation neural effects are typically observed on later sensory processing components (>100 ms after stimulus presentation; [e.g., Press et al., 2010]). Finally, it makes the prediction that the reactive process only operates if unexpected events signal the need for model updating and, thus, would be mediated by manipulations identified in the learning literature that determine learning rate. However, this account is a theoretical solution and therefore still requires empirical interrogation and expansion. 3.4  Relationship to Other Theoretical Accounts of Action Popular predictive processing frameworks, outlined elsewhere in this book, are accounts of cortical processing whereby – during perceptual inference – we infer the most likely state of the outside world by minimising prediction errors about its contents. More specifically, “higher” neural areas predict the activity of “lower” areas, and lower areas pass prediction error signals back up the hierarchy. Predictions are constantly updated based on these incoming error signals, and this iterative message-passing process generates a largely veridical model of the world. Under this account, there may only be a paradox at the level of the unitary perceptual experience, because hypothesis and error units may reflect postulated operation of Bayesian and cancellation mechanisms. Of course at this stage, we do not know that the brain operates in such a fashion or that it contains such distinct unit types, but if so, neural data can only inform these accounts of

88  Clare Press, Emily Thomas, and Daniel Yon perception via tight examination of how processing relates to perceptual phenomenology. It has been proposed that there is no reason to think error processing would constitute our perceptual experience (Brown et al., 2013) and that our experience is more likely reflected in the contents of the hypothesis units (Teufel, Dakin, & Fletcher, 2018). If true, using neural data to inform theories about our experience must consider whether the neural population is reflective of hypothesis or error unit processing. For example, it has been proposed that hypothesis units are more likely to reside in deep cortical layers and error units in superficial layers (Aitken et al., 2020; Bastos et al., 2012, 2015). Under this assumption, processing in deep layers may be of greater direct relevance to these examinations than that in superficial layers. However, the precise way in which different units interact to determine our experiences needs close examination. There is one important way in which predictive processing theories in their current instantiations are less consistent with our outlined theories of perception. Notably, the account of cortical message-passing outlined above is one of “perceptual inference”, whereby we minimise prediction error by changing our models to fit the world. Accounts of “active inference” also postulate how we can minimise prediction errors by changing the world to fit our models (Friston, FitzGerald, Rigoli, Schwartenbeck, & Pezzulo, 2017; Yon, de Lange, & Press, 2019) – that is, we can move. Under these accounts, the hypothesis units represent how we would like the world to be rather than how we believe it is. They represent our d ­ esires rather than our beliefs. There is no qualitatively distinct representation of beliefs and desires, but we ensure that we change the world to fit our models rather than vice versa by increasing the precision on the prior. In a precision-weighted framework, the likelihood will therefore not exhibit high influence on the posterior. These accounts are proposed to be consistent with findings of generalised reduced perception during action (Brown et al., 2013) – noting that the account does not particularly specify that predicted and unpredicted events should be influenced differently. This theory could in principle allow for accurate perception during action, like demonstrated in the types of experiment presented earlier, if one assumed phases of perceptual inference interspersed with phases of active inference. However, there is evidence of continuous sensory monitoring during action that renders it unlikely that we shift between active and perceptual inference repeatedly during action (Desmurget & Grafton, 2000). Additionally, one would need to hold the representations separate at some levels in the hierarchy concerning the ultimate goal of an action and the present sensory state (Yon, Heyes, & Press, 2020). We therefore propose that these theories need modifying to allow for such distinction, and some recent proposals have been made for how to achieve this (Smith, Ramstead, & Kiefer, 2022).

The Synergistic Relationship between Perception and Action 89 Interestingly, the assumption that actions shape perception in a distinct fashion relative to other predictions has also been used to justify use of cancellation measures to reflect one’s sense of agency (Haggard, 2017). If action predictions do not shape perception in such a distinct way, then some conclusions about agency mechanisms likely need re-examination. Agency researchers have been keen to move away from subjective measures of agency that are heavily influenced by response biases – like Likert scale responses of the extent to which one feels in control – replacing them instead with more objective measures like sensory cancellation (Haggard, 2017). However, we may be able to find measures that allow for dissociation of response biases from one’s real beliefs, while still asking participants directly about their sense of agency. For example, we recently developed a signal detection paradigm where participants move their hands continuously and observe motion that is yoked to it or not (Yon et al., 2020). They report whether the observed motion is generated by them. By employing signal detection analyses, we could dissociate one’s sensitivity to the signal from biases in reporting it, and computational modelling allowed dissociation of response from perceptual biasing. We propose that such measures would be effectively employed to advance theorising about the interacting mechanisms determining our sense of agency. Considerations about mechanistic links between prediction and agency may be especially important for our understanding of psychiatric illnesses like schizophrenia. Some schizophrenic patients develop “delusions of control” – feeling as though their bodies are animated by an external force. One tantalising possibility offered by earlier work on action prediction was that these distressing symptoms might be explained by abnormal processes of action prediction (Shergill, Samson, Bays, Frith, & Wolpert, 2005). An influential piece of evidence taken to support this idea came from a series of studies by Blakemore and colleagues – which found that delusional schizophrenics do not show the same kinds of sensory attenuation during action as healthy controls – e.g., they are able to tickle themselves (Frith, Blakemore, & Wolpert, 2000). The modal way of interpreting this result has been to suggest that these patients have problems predicting the consequences of their actions. This idea persists in more recent accounts of psychosis, which suppose failures of “predictive” attenuation cause agents to experience the consequences of their actions in an unusually salient way – leading to sense that these actions must have been generated by an extrinsic source (Corlett et al., 2019). Though this idea has been influential, the framework we have offered in this chapter presents some challenges to this mechanistic picture. In our way of thinking, sensory attenuation effects (e.g., failures to tickle oneself) are not signs of prediction proper, and so failures of attenuation do not imply a failure of prediction per se. However, the mechanistic sketch we

90  Clare Press, Emily Thomas, and Daniel Yon offer could provide more nuanced thinking about how atypicalities in processing of prediction and prediction error relate to unusual experiences of action in schizophrenia. Indeed, the way that our tentative model delineates proactive predictive processes and reactive prediction error processes could provide clinical scientists with the resources they need to explain why schizophrenia is associated with an exaggerated sense of agency over expected action outcomes (Voss et al., 2010) and attenuated sensitivity to prediction errors (Synofzik, Thier, Leube, Schlotterbeck, & Lindner, 2010). 3.5 Conclusion In conclusion, action and perception synergistically shape each other due to bilaterally providing predictive information. It has been classically assumed that the underlying mechanisms are specific to action-perception interactions, but we propose here how they may in fact be governed by domain-general predictive mechanisms. Future research must further unpack and interrogate these ideas. Acknowledgement Clare Press is funded by a European Research Council (ERC) consolidator grant (101001592) under the European Union’s Horizon 2020 research and innovation programme. References Aitken, F., Menelaou, G., Warrington, O., Koolschijn, R. S., Corbin, N., Callaghan, M. F., & Kok, P. (2020). Prior expectations evoke stimulus-specific activity in the deep layers of the primary visual cortex. PLoS Biology, 18(12), e3001023. https://doi.org/10.1371/journal.pbio.3001023 Alilović, J., Slagter, H. A., & van Gaal, S. (2020). Subjective visibility report is facilitated by conscious predictions only. BioRxiv. https://doi.org/10.1101/2020. 07.08.193037 Bar, M. (2004). Visual objects in context. Nature Reviews Neuroscience, 5(8), 617–629. https://doi.org/10.1038/nrn1476 Bastos, A. M., Usrey, W. M., Adams, R. A., Mangun, G. R., Fries, P., & Friston, K. J. (2012). Canonical microcircuits for predictive coding. Neuron, 76(4), 695–711. https://doi.org/10.1016/j.neuron.2012.10.038 Bastos, A. M., Vezoli, J., Bosman, C. A., Schoffelen, J.-M., Oostenveld, R., Dowdall, J., … Fries, P. (2015). Visual areas exert feedforward and feedback influences through distinct frequency channels. Neuron, 85(2), 390–401. https://doi. org/10.1016/j.neuron.2014.12.018 Bays, P. M., Flanagan, J. R., & Wolpert, D. M. (2006). Attenuation of selfgenerated tactile sensations is predictive, not postdictive. PLoS Biology, 4(2), e28. https://doi.org/10.1371/journal.pbio.0040028

The Synergistic Relationship between Perception and Action 91 Bays, P. M., Wolpert, D. M., & Flanagan, J. R. (2005). Perception of the consequences of self-action is temporally tuned and event driven. Current Biology, 15(12), 1125–1128. https://doi.org/10.1016/j.cub.2005.05.023 Blakemore, S. J., Wolpert, D. M., & Frith, C. D. (1998). Central cancellation of self-produced tickle sensation. Nature Neuroscience, 1(7), 635–640. https://doi. org/10.1038/2870 Blank, H., & Davis, M. H. (2016). Prediction errors but not sharpened signals simulate multivoxel fMRI patterns during speech perception. PLoS Biology, 14(11), e1002577. https://doi.org/10.1371/journal.pbio.1002577 Blank, H., Spangenberg, M., & Davis, M. H. (2018). Neural prediction errors distinguish perception and misperception of speech. Journal of Neuroscience, 38(27), 6076–6089. https://doi.org/10.1523/JNEUROSCI.3258-17.2018 Blom, T., Feuerriegel, D., Johnson, P., Bode, S., & Hogendoorn, H. (2020). Predictions drive neural representations of visual events ahead of incoming sensory information. Proceedings of the National Academy of Sciences, 117(13), 7510–7515. https://doi.org/10.1073/pnas.1917777117 Brown, H., Adams, R. A., Parees, I., Edwards, M., & Friston, K. (2013). Active inference, sensory attenuation and illusions. Cognitive Processing, 14(4), 411–427. https://doi.org/10.1007/s10339-013-0571-3 Cardoso-Leite, P., Mamassian, P., Schütz-Bosbach, S., & Waszak, F. (2010). A new look at sensory attenuation. Action-effect anticipation affects sensitivity, not response bias. Psychological Science, 21(12), 1740–1745. https://doi.org/ 10.1177/0956797610389187 Cook, R., Bird, G., Catmur, C., Press, C., & Heyes, C. (2014). Mirror neurons: From origin to function. The Behavioral and Brain Sciences, 37(2), 177–192. https://doi.org/10.1017/S0140525X13000903 Corlett, P. R., Horga, G., Fletcher, P. C., Alderson-Day, B., Schmack, K., & Powers, A. R. (2019). Hallucinations and strong priors. Trends in Cognitive Sciences, 23(2), 114–127. https://doi.org/10.1016/j.tics.2018.12.001 Dayan, P., & Yu, A. J. (2006). Phasic norepinephrine: A neural interrupt signal for unexpected events. Network: Computation in Neural Systems, 17(4), 335–350. https://doi.org/10.1080/09548980601004024 de Lange, F. P., Heilbron, M., & Kok, P. (2018). How do expectations shape perception? Trends in Cognitive Sciences, 22(9), 764–779. https://doi.org/10.1016/ j.tics.2018.06.002 Denison, R. N., Piazza, E. A., & Silver, M. A. (2011). Predictive context influences perceptual selection during binocular rivalry. Frontiers in Human Neuroscience, 5. https://doi.org/10.3389/fnhum.2011.00166 Denison, R. N., Sheynin, J., & Silver, M. A. (2016). Perceptual suppression of predicted natural images. Journal of Vision, 16(13), 6. https://doi.org/10.1167/ 16.13.6 Desmurget, M., & Grafton, S. (2000). Forward modeling allows feedback control for fast reaching movements. Trends in Cognitive Sciences, 4(11), 423–431. https://doi.org/10.1016/S1364-6613(00)01537-0 Feuerriegel, D., Blom, T., & Hogendoorn, H. (2021). Predictive activation of sensory representations as a source of evidence in perceptual decision-making. Cortex, 136, 140–146. https://doi.org/10.1016/j.cortex.2020.12.008

92  Clare Press, Emily Thomas, and Daniel Yon Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., & Pezzulo, G. (2017). Active inference: A process theory. Neural Computation, 29(1), 1–49. https://doi. org/10.1162/NECO_a_00912 Frith, C. D., Blakemore, S., & Wolpert, D. M. (2000). Explaining the symptoms of schizophrenia: Abnormalities in the awareness of action. Brain Research. Brain Research Reviews, 31(2–3), 357–363. Haggard, P. (2017). Sense of agency in the human brain. Nature Reviews Neuroscience, 18(4), 196–207. https://doi.org/10.1038/nrn.2017.14 Hasselmo, M. E., Linster, C., Patil, M., Ma, D., & Cekic, M. (1997). Noradrenergic suppression of synaptic transmission may influence cortical signal-to-noise ratio. Journal of Neurophysiology, 77(6), 3326–3339. https://doi.org/10.1152/ jn.1997.77.6.3326 Hirata, A., Aguilar, J., & Castro-Alamancos, M. A. (2006). Noradrenergic activation amplifies bottom-up and top-down signal-to-noise ratios in sensory thalamus. Journal of Neuroscience, 26(16), 4426–4436. https://doi.org/10.1523/ JNEUROSCI.5298-05.2006 Hommel, B., Müsseler, J., Aschersleben, G., & Prinz, W. (2001). The theory of event coding (TEC): A framework for perception and action planning. Behavioral and Brain Sciences, 24(05), 849–878. https://doi.org/10.1017/S0140525X01000103 James, W. (1890). The principles of psychology (Vol. 1). Dover Publications: Mineola, New YorK. Johansson, R. S., Westling, G., Bäckström, A., & Flanagan, J. R. (2001). Eye– hand coordination in object manipulation. Journal of Neuroscience, 21(17), 6917–6932. https://doi.org/10.1523/JNEUROSCI.21-17-06917.2001 Kilteni, K., & Ehrsson, H. H. (2020). Functional connectivity between the cerebellum and somatosensory areas implements the attenuation of self-generated touch. Journal of Neuroscience, 40(4), 894–906. https://doi.org/10.1523/ JNEUROSCI.1732-19.2019 Kilteni, K., Houborg, C., & Ehrsson, H. H. (2019). Rapid learning and unlearning of predicted sensory delays in self-generated touch. ELife, 8, e42888. https://doi. org/10.7554/eLife.42888 Kok, P., Jehee, J. F. M., & de Lange, F. P. (2012). Less is more: Expectation sharpens representations in the primary visual cortex. Neuron, 75(2), 265–270. https:// doi.org/10.1016/j.neuron.2012.04.034 Kok, P., Mostert, P., & de Lange, F. P. (2017). Prior expectations induce prestimulus sensory templates. Proceedings of the National Academy of Sciences, 114(39), 10473–10478. https://doi.org/10.1073/pnas.1705652114 Lawson, R. P., Mathys, C., & Rees, G. (2017). Adults with autism overestimate the volatility of the sensory environment. Nature Neuroscience, 20(9), 14. Lawson, R. P., Rees, G., & Friston, K. J. (2014). An aberrant precision account of autism. Frontiers in Human Neuroscience, 8. https://doi.org/10.3389/fnhum. 2014.00302 Marshall, L., Mathys, C., Ruge, D., Berker, A. O., de Dayan, P., Stephan, K. E., & Bestmann, S. (2016). Pharmacological fingerprints of contextual uncertainty. PLOS Biology, 14(11), e1002575. https://doi.org/10.1371/journal.pbio. 1002575

The Synergistic Relationship between Perception and Action 93 Matin, E. (1974). Saccadic suppression: A review and an analysis. Psychological Bulletin, 81(12), 899–917. https://doi.org/10.1037/h0037368 Meijs, E. L., Mostert, P., Slagter, H. A., de Lange, F. P., & van Gaal, S. (2019). Exploring the role of expectations and stimulus relevance on stimulus-specific neural representations and conscious report. Neuroscience of Consciousness, 2019(1), niz011. https://doi.org/10.1093/nc/niz011 Meijs, E. L., Slagter, H. A., de Lange, F. P., & van Gaal, S. (2018). Dynamic interactions between top–down expectations and conscious awareness. Journal of Neuroscience, 38(9), 2318–2327. https://doi.org/10.1523/JNEUROSCI. 1952-17.2017 Press, C., & Cook, R. (2015). Beyond action-specific simulation: Domain-general motor contributions to perception. Trends in Cognitive Sciences, 19(4), 176–178. https://doi.org/10.1016/j.tics.2015.01.006 Press, C., Gherri, E., Heyes, C., & Eimer, M. (2010). Action preparation helps and hinders perception of action. Journal of Cognitive Neuroscience, 22(10), 2198–2211. https://doi.org/10.1162/jocn.2009.21409 Press, C., Kok, P., & Yon, D. (2020a). Learning to perceive and perceiving to learn. Trends in Cognitive Sciences, 24(4), 260–261. https://doi.org/10.1016/ j.tics.2020.01.002 Press, C., Kok, P., & Yon, D. (2020b). The perceptual prediction paradox. Trends in Cognitive Sciences, 24(1), 13–24. https://doi.org/10.1016/j.tics.2019.11.003 Press, C., Thomas, E., Gilbert, S., Lange, F., de Kok, P., & Yon, D. (2020). Neurocomputational mechanisms of action-outcome prediction in V1. Journal of Vision, 20(11), 712–712. https://doi.org/10.1167/jov.20.11.712 Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: Theory and data for two-choice decision tasks. Neural Computation, 20(4), 873–922. https://doi. org/10.1162/neco.2008.12-06-420 Redgrave, P., & Gurney, K. (2006). The short-latency dopamine signal: A role in discovering novel actions? Nature Reviews Neuroscience, 7(12), 967–975. https://doi.org/10.1038/nrn2022 Richter, D., & de Lange, F. P. (2019). Statistical learning attenuates visual activity only for attended stimuli. ELife, 8, e47869. https://doi.org/10.7554/eLife.47869 Richter, D., Ekman, M., & de Lange, F. P. (2018). Suppressed sensory response to predictable object stimuli throughout the ventral visual stream. Journal of Neuroscience, 38(34), 7452–7461. https://doi.org/10.1523/JNEUROSCI.3421-17.2018 Roussel, C., Hughes, G., & Waszak, F. (2013). A preactivation account of sensory attenuation. Neuropsychologia, 51(5), 922–929. https://doi.org/10.1016/ j.neuropsychologia.2013.02.005 Roussel, C., Hughes, G., & Waszak, F. (2014). Action prediction modulates both neurophysiological and psychophysical indices of sensory attenuation. Frontiers in Human Neuroscience, 8. https://doi.org/10.3389/fnhum.2014.00115 Schwarz, K. A., Pfister, R., Kluge, M., Weller, L., & Kunde, W. (2018). Do we see it or not? Sensory attenuation in the visual domain. Journal of Experimental Psychology: General, 147(3), 418–430. https://doi.org/10.1037/xge0000353 Seki, K., & Fetz, E. E. (2012). Gating of sensory input at spinal and cortical levels during preparation and execution of voluntary movement. The Journal of

94  Clare Press, Emily Thomas, and Daniel Yon Neuroscience: The Official Journal of the Society for Neuroscience, 32(3), 890– 902. https://doi.org/10.1523/JNEUROSCI.4958-11.2012 Shergill, S. S., Samson, G., Bays, P. M., Frith, C. D., & Wolpert, D. M. (2005). Evidence for sensory prediction deficits in schizophrenia. The American Journal of Psychiatry, 162(12), 2384–2386. https://doi.org/10.1176/appi.ajp.162. 12.2384 Shergill, S. S., White, T. P., Joyce, D. W., Bays, P. M., Wolpert, D. M., & Frith, C. D. (2013). Modulation of somatosensory processing by action. NeuroImage, 70, 356–362. https://doi.org/10.1016/j.neuroimage.2012.12.043 Smith, R., Ramstead, M. J. D., & Kiefer, A. (2022). Active inference models do not contradict folk psychology. Synthese, 200(2), 81. https://doi.org/10.1007/ s11229-022-03480-w Sohoglu, E., & Davis, M. H. (2020). Rapid computations of spectrotemporal prediction error support perception of degraded speech. BioRxiv. https://doi. org/10.1101/2020.04.22.054726 Stanley, J., & Miall, R. C. (2007). Functional activation in parieto-premotor and visual areas dependent on congruency between hand movement and visual stimuli during motor-visual priming. NeuroImage, 34(1), 290–299. https://doi. org/10.1016/j.neuroimage.2006.08.043 Stein, T., & Peelen, M. V. (2015). Content-specific expectations enhance stimulus detectability by increasing perceptual sensitivity. Journal of Experimental Psychology. General, 144(6), 1089–1104. https://doi.org/10.1037/xge0000109 Synofzik, M., Thier, P., Leube, D. T., Schlotterbeck, P., & Lindner, A. (2010). Misattributions of agency in schizophrenia are based on imprecise predictions about the sensory consequences of one’s actions. Brain, 133(1), 262–271. https://doi. org/10.1093/brain/awp291 Teufel, C., Dakin, S. C., & Fletcher, P. C. (2018). Prior object-knowledge sharpens properties of early visual feature-detectors. Scientific Reports, 8(1), 10853. https://doi.org/10.1038/s41598-018-28845-5 Thomas, E. R., Yon, D., de Lange, F. P., & Press, C. (2022). Action enhances predicted touch. Psychological Science, 33(1), 48–59. https://doi.org/10.1177/ 09567976211017505 Uhlmann, L., Pazen, M., Kemenade, B. M., van Steinsträter, O., Harris, L. R., Kircher, T., & Straube, B. (2020). Seeing your own or someone else’s hand moving in accordance with your action: The neural interaction of agency and hand identity. Human Brain Mapping, 41(9), 2474–2489. https://doi.org/10.1002/ hbm.24958 Voss, M., Moore, J., Hauser, M., Gallinat, J., Heinz, A., & Haggard, P. (2010). Altered awareness of action in schizophrenia: A specific deficit in predicting action consequences. Brain, 133(10), 3104–3112. https://doi.org/10.1093/brain/ awq152 Weiskrantz, L., Elliott, J., & Darlington, C. (1971). Preliminary observations on tickling oneself. Nature, 230(5296), 598–599. https://doi.org/10.1038/230598a0 Wyart, V., Nobre, A. C., & Summerfield, C. (2012). Dissociable prior influences of signal probability and relevance on visual contrast sensitivity. Proceedings of the National Academy of Sciences of the United States of America, 109(9), 3593–3598. https://doi.org/10.1073/pnas.1120118109

The Synergistic Relationship between Perception and Action 95 Yon, D., Bunce, C., & Press, C. (2020). Illusions of control without delusions of grandeur. Cognition, 205, 104429. https://doi.org/10.1016/j.cognition.2020.104429 Yon, D., de Lange, F. P., & Press, C. (2019). The predictive brain as a stubborn scientist. Trends in Cognitive Sciences, 23(1), 6–8. https://doi.org/10.1016/ j.tics.2018.10.003 Yon, D., Gilbert, S. J., de Lange, F. P., & Press, C. (2018). Action sharpens sensory representations of expected outcomes. Nature Communications, 9(1), 4288. https://doi.org/10.1038/s41467-018-06752-7 Yon, D., Heyes, C., & Press, C. (2020). Beliefs and desires in the predictive brain. Nature Communications, 11(1), 4404. https://doi.org/10.1038/ s41467-020-18332-9 Yon, D., & Press, C. (2017). Predicted action consequences are perceptually facilitated before cancellation. Journal of Experimental Psychology. Human Perception and Performance, 43(6), 1073–1083. https://doi.org/10.1037/xhp0000385 Yon, D., & Press, C. (2018). Sensory predictions during action support perception of imitative reactions across suprasecond delays. Cognition, 173, 21–27. https:// doi.org/10.1016/j.cognition.2017.12.008 Yon, D., Zainzinger, V., de Lange, F., Eimer, M., & Press, C. (2020). Action biases perceptual decisions towards expected outcomes. Journal of Experimental Psychology: General. Yon, D., Zainzinger, V., de Lange, F. P., Eimer, M., & Press, C. (2021). Action biases perceptual decisions toward expected outcomes. Journal of Experimental Psychology: General, 150(6), 1225. https://doi.org/10.1037/xge0000826 Yu, A. J., & Dayan, P. (2005). Uncertainty, neuromodulation, and attention. Neuron, 46(4), 681–692. https://doi.org/10.1016/j.neuron.2005.04.026

4

Perceptual Uncertainty, Clarity, and Attention Jonna Vance

4.1  Perceptual Uncertainty Perception occurs under conditions of uncertainty. The same stimulus can give rise to different internal responses in an organism due to environmental variations and neural noise. And different stimuli can cause the same internal response due to the position from which they’re perceived. For several decades, Bayesian decision theory has provided an attractive framework in which to model perceptual processing under conditions of uncertainty. Bayes’ Theorem states p(H|I) = 

p(I|H) ×  p(H) p ( I)

Here H indicates hypotheses over possible states of the distal environment, such as possible shapes, colors, and distances. I is an input received by (some part of) the perceptual system on an occasion. p(H) is the prior probability the system assigns to the different hypotheses H, before the system receives input I.1 The prior serves as the perceptual system’s expectation about which distal features (and combinations thereof) are probable. p(I∣H) is the likelihood: the probability assigned to receiving input I under hypotheses H. It too can be described as a kind of expectation. But there the expectation concerns the likely relationships between stimulus inputs and distal hypotheses. p(H∣I) is the posterior: the probability of each hypothesis H given I. p(I) is a normalizing constant. Dividing by p(I) ensures that the probabilities in the posterior sum (or integrate) to one. Bayesian approaches model perceptual processing as an inference combining priors and likelihoods to output posterior probability distributions over distal features. Priors, likelihoods, and posteriors can be represented by two-­dimensional curves.2 A distribution curve’s mode is its highest point (or points, if there’s more than one mode). A distribution curve’s mean (if it exists) is its center of mass, such that half the area under the curve is to the left of the mean DOI: 10.4324/9781003084082-6

Perceptual Uncertainty, Clarity, and Attention 97 and the other half is to the right. The curves of probabilities distributions can have various shapes. A distribution’s precision indicates how spread out over the hypotheses the curve is. The curves of distributions with low precision are very spread out. Curves of distributions with high precision are narrower. Precision is a measure of uncertainty. As I’ll use the term, perceptual uncertainty occurs if a perceptual system weights its distal representations according to a measure of uncertainty or an estimate thereof.3 We can distinguish probabilistic from non-­probabilistic measures of uncertainty. Perceptual uncertainty could be probabilistic or non-probabilistic. Probabilistic measures of uncertainty are defined in terms of the uncertainty represented in or by a probability distribution. These include vertical measures of the height of a distribution (such as the probability assigned to estimates) and horizontal measures of the spread of a distribution (such as precision, variance, and standard deviation).4 Non-probabilistic measures of uncertainty are those that are not defined in terms of the uncertainty represented in or by a probability distribution. For example, actively suspending judgment is a non-probabilistic measure of uncertainty. In evidence accumulation models of perceptual decision-making, the perceptual system waits to make an estimate of a ­target feature until it has accumulated evidence above some threshold. During the active suspension of judgment, the perceptual system is arguably in a state of uncertainty about the target feature, but such processes need not be defined in terms of probability distributions.5 Measures of uncertainty (both probabilistic and non-probabilistic) can be distinguished from mere heuristic proxies to uncertainty. For example, in perception, stimulus contrast can serve as a heuristic proxy to uncertainty: as contrast decreases, measurement noise in the perceptual system increases, which can be used as a proxy for increased perceptual uncertainty. But stimulus contrast is not itself a form of uncertainty. We can also distinguish conscious from unconscious instances of perceptual uncertainty. Conscious perceptual uncertainty, as I’ll use the term, is uncertainty intrinsic to perceptual experience. Much of the information in perceptual processing never makes it into experience. As we’ll see below, there is considerable evidence that perceptual systems weight their distal estimates by (an estimate or expectation of) their own uncertainty. The claim that uncertainty is used at various points during perceptual processing does not entail that uncertainty is ever intrinsic to experience. Bayesian perception science has not only been credited with many successes in modeling phenomena as diverse as motion perception, cue combination, and attention, but it is also controversial. In particular, it is a matter of ongoing debate whether perceptual systems unconsciously represent probability distributions over distal features.6 It’s arguably even more controversial whether perceptual systems operate on probabilistic

98  Jonna Vance representations to approximate optimal Bayesian inference.7 And it’s arguably more controversial still whether perceptual experiences assign probabilities to distal contents.8 However, even among those who are skeptical that perceptual systems represent probability distributions over distal ­features, there is considerably more agreement that perceptual systems weight their distal estimates using some measure of their own uncertainty, at least unconsciously, in ways that influence perceivers’ behavior.9 In this chapter, I lean on this last point of emerging agreement: perceptual systems weight their distal representations using some measure of their own uncertainty, at least unconsciously. A central aim of this chapter is to leverage that point of agreement about uncertainty in perceptual processing to argue for a more controversial claim about perceptual experience. I argue that some of the uncertainty in perceptual processing fixes an aspect of experience that I call clarity. Throughout this chapter, I’ll draw on results from Bayesian perception science. I’ll report these results in the terms familiar from Bayesian modeling frameworks, including reference to the precision of perceptual estimates. However, as noted above, perceptual uncertainty could be nonprobabilistic. Thus, it is important to note that my core claims about the role of perceptual uncertainty in fixing perceptual clarity in experience do not require commitment to probabilistic perceptual representations, nor to Bayesian perceptual inference. Extensive evidence for uncertainty in perceptual processing comes from the large literature on cue combination. For example, take the ventriloquist effect. As one watches the ventriloquist’s performance, sometimes one seems to both see and hear a voice emanating from the dummy’s mouth. In fact, the ventriloquist is speaking without moving her lips. Why does the voice seem to come from the dummy’s mouth in that case? According to a large body of modeling in perception science, one’s perceptual system combines the location information it receives from vision and audition and weights each location estimate according to the degree of uncertainty the perceptual system has about that estimate. The models typically use the precision of a probability distribution as the measure of uncertainty. In the classic ventriloquist effect, the perceptual system combines the two precision-weighted estimates to arrive at an integrated estimate across vision and audition, where the precision-weights reflect the (estimated) precisions of the distal representations in vision and audition, respectively, before the representations are combined. The perceptual system weights vision more heavily in the classic ventriloquist case, because it has learned that vision is more precise for location estimates of this kind than audition is. The visual estimate favors the dummy’s mouth, since it can be seen moving, while the ventriloquist’s mouth does not move. So, the voice seems to come from the dummy’s mouth. Precision-weighting

Perceptual Uncertainty, Clarity, and Attention 99 models account for this classic version of the ventriloquist effect by accurately predicting human perceptual results (Alais & Burr, 2004). ­Precision-weighting models also accurately predict results for variations on the classic ventriloquist effect. If uncertainty for the pre-integration visual estimate is high compared to the pre-integration auditory estimate, such models correctly predict that audition will dominate (Cheng, Shettleworth, Huttenlocher, & Rieser, 2007). And, when visual and auditory stimuli are sufficiently discrepant, multi-stage precision-weighting models correctly predict separate location estimates across vision and audition (Körding et al., 2007). Finally, precision-weighting models accurately predict human perceptual decisions in a large number of other cue combination conditions.10 Another line of evidence for uncertainty in perceptual processing comes from neural decoding studies. For example, Walker, Cotton, Ma, and Tolias (2020) presented macaques with motion gratings of various orientations. The gratings were chosen from one of two probability distributions with the same mean orientation, but with different precision. The monkeys learned during training that the next grating was equally likely to come from either distribution. The monkeys’ task was to classify each grating as coming from one of the distributions based on the grating’s orientation. The optimal classification strategy for this task takes account of perceptual uncertainty. Perceptual measurements are noisy; the measurement of the same stimulus varies with some randomness across trials. The optimal classification strategy depends on the amount of noise (here, manipulated through stimulus contrast). As measurement noise increases, the monkeys should classify a wider range of orientations around zero degrees (the mean of both distributions) as coming from the high-precision distribution. And that’s exactly what they did. Moreover, Walker et al. recorded the activity of orientation-tuned neural populations in the monkeys’ primary visual cortex. Then they decoded probability distributions over orientation from that activity (for details on the decoding methods, see the original study). The precision of the decoded distributions predicted the monkeys’ classifications in ways that could not be predicted from a point estimate of orientation alone, nor from mere heuristic proxies of uncertainty, such as contrast. These results indicate that the monkeys’ perceptual systems used an estimate of their own perceptual uncertainty.11 The above results provide evidence for uncertainty in perceptual processing (or expectations or estimates thereof). As noted above, it’s a further step to argue that (an expectation or estimate of) uncertainty fixes any aspects of perceptual experience. Below I argue for that further claim about experience. Before doing so, however, I need to delineate the target aspect of perceptual experience.

100  Jonna Vance 4.2  Perceptual Clarity Perceptual experiences vary in how clearly or unclearly they represent features of the environment. As I’ll use the term, perceptual clarity is an aspect of a perceptual experience’s phenomenal character (i.e., what it’s like to have that experience); it is the aspect of how clearly the experience represents each of its contents. Clarity and unclarity are two sides of the same coin. For example, representing a color with a low degree of clarity is the same as representing it with a high degree of unclarity. Additionally, to say that an experience is clear with respect to its representation of an object’s color is equivalent to saying that it represents the object’s color with a high degree of clarity. And to say that an experience is unclear with respect to its representation of an object’s color is equivalent to saying that it represents the color with a low degree of clarity (or with a high degree of unclarity). To begin to isolate the target phenomenon, we can consider how clarity can change while other things remain the same. Other things being equal, the degree of (un)clarity with which you experience some object’s feature F depends on the following factors. Focus: ceteris paribus, some features viewed in focus tend to be visually experienced more clearly than when those features are viewed out of focus. For example, suppose you view a crisp red square in focus (e.g., with corrective lenses on). You see the locations of its edges quite clearly. Then suppose you view the same square out of focus (e.g., with corrective lenses off). You’ll then tend to see its edge locations less clearly.12 Eccentricity: ceteris paribus, some features viewed at greater eccentricity (farther from the center of one’s visual field) tend to be visually experienced less clearly. For example, suppose you view a dot in the center of your visual field, and you see its size quite clearly. Then suppose the dot moves out toward the periphery. As it does, you’ll tend to see its size less and less clearly. Lighting: ceteris paribus, some features viewed in bright lighting tend to be visually experienced more clearly than features viewed in dim lighting. For example, suppose that you view a surface in dim light. You see the surface’s color with very low clarity. Now suppose that the brightness of the lighting is gradually increased. Correspondingly, you’ll tend to see the surface’s color with increasing clarity, until the lighting is good and you see the color very clearly. Fog: ceteris paribus, some features viewed in foggy conditions tend to be visually experienced less clearly than features viewed in sunny, unfoggy conditions. For example, suppose you see an animal 50 meters away in a fog while hiking in the woods. You’ll tend to experience the animal’s shape less clearly than if you saw the same animal at the same distance without the fog.

Perceptual Uncertainty, Clarity, and Attention 101

Figure 4.1 While fixating on the cross at the top left, one tends to experience the shape of the “K” at the top right more clearly (without distractors), than one experiences the shape of the “K” at the bottom right (amid crowding distractors) when one fixates on the cross at the bottom left.

Distractors: ceteris paribus, peripherally, some of an object’s features tend to be visually experienced more clearly when there are no distracting items within a critical distance from the target object (e.g., other objects within the same receptive field) than when there are no such distractors. One can demonstrate this crowding effect using Figure 4.1. When other distractors crowd the target stimulus, the shape of target stimulus is seen less clearly. As illustrated above, clarity can vary over time, across experiences, within a single experience across the perceptual field, and with respect to different features at the same location(s). Additionally, clarity comes in degrees. It’s not all or nothing. For example, as an object moves toward the periphery, one sees its size less clearly by degrees. As the lights brighten, one sees a surface’s color more clearly by degrees. In what follows, I’ll often refer to a feature as being experienced with “high” clarity or “low” clarity. These descriptions refer to relative degrees of clarity, not well-defined categories. Finally, it is worth reiterating that perceptual clarity is always with respect to some content. Clarity is content-specific. For example, when you blurrily experience a red square out of focus, you see the location of its edges with low clarity. But you can still see the square’s color clearly. That’s because the degree clarity with which you experience the square’s edge locations is distinct from—and can vary independently of—the degree of clarity with which you experience its color. Generally, you can experience some of an object’s features very clearly while experiencing its other features unclearly. 4.3  Uncertainty Explains Unclarity Earlier, we saw evidence that perceptual systems weight their representations of distal features according to (estimates or expectations of) their own uncertainty. However, the results we discussed from cue combination and neural decoding were not explicitly about the phenomenal character of perceptual experience. This section extends the evidence of uncertainty

102  Jonna Vance in perceptual processing in an effort to account for clarity in perceptual experience. Here is the account for which I’ll argue: Uncertainty Account of Unclarity Experiences carry information about the uncertainty of the representations from which each distal content of experience was selected, and this uncertainty fixes the degree of (un)clarity with which that content is represented in experience. My defense of the Uncertainty Account starts from the claim that the contents of experience are selected from representations generated during perceptual processing and that these representations are weighted by, or include, a measure of uncertainty. If, for example, the contents of perceptual experience are selected from a probability distribution, then the relevant uncertainty could be the precision of that distribution (Vance, 2020). However, if perceptual uncertainty turns out to be non-probabilistic, the uncertainty could come in some other form. The Uncertainty Account of Unclarity is supported by a range of empirical results from perception science. Here I survey several such results. First, consider the difference in how clearly one experiences the location of an object’s edge when it’s viewed in a fog versus when it’s viewed in bright sunshine. The location of the object’s edges appears less clearly in a fog than in bright sun. Foggy conditions are noisier than sunny conditions. Light reflecting from surfaces in a fog is refracted more randomly on its way to the perceiver’s sensory receptors than light in clear, sunny conditions. In addition, objects in a fog are at low contrast with the surrounding environment, and reducing contrast adds noise and uncertainty to the sensory measurement, which in turn causes the distal shape estimate to be more uncertain. We can put the point in terms of Bayesian models, where precision is the relevant measure of uncertainty—with the caveat that the core point about uncertainty can be put using a different framework and a different measure of uncertainty. When perceiving an object in fog the likelihood is less precise than it would be for perceiving the same object in the sun, while the priors over the object’s features will remain the same (Sotiropoulos, Seitz, & Seriès, 2014; Stocker & Simoncelli, 2006; Weiss, Simoncelli, & Adelson, 2002). As a result, on a Bayesian approach, the posterior for an estimate of the location of an object’s edge will be less precise in foggy conditions compared with sunny conditions. Thus, the Uncertainty Account of Unclarity correctly predicts that the location of an object’s edge will be experienced less clearly in fog than in sunny conditions. Next, consider the difference in clarity with respect to an object’s size when it’s viewed peripherally versus when it’s viewed in the center of one’s visual field. The object’s size appears less clearly when experienced

Perceptual Uncertainty, Clarity, and Attention 103 peripherally than when experienced centrally. Likewise, in general, the precision of distal estimates tends to be highest for features in the center of one’s visual field and uncertainty tends to increase with eccentricity. Van den Berg, Roerdink, and Cornelissen (2010) provide a neurobiologically plausible, probabilistic model of the general effects of eccentricity. The cells’ properties in their model-based simulations closely resemble the properties of simple cells in primary visual cortex (p.2). Their model is probabilistic, but the core point can be made without commitment to probabilistic representation in perceptual processing. Given that uncertainty tends to increase with greater eccentricity, the Uncertainty Account of Unclarity correctly predicts that, other things equal, a peripheral experience of an object’s features will tend to be less clear than an experience of that object’s features in the center of one’s visual field. Additionally, consider the unclarity of an experience with respect to a surface’s color when the surface is viewed in dim light compared to its unclarity when viewed in brighter light. Other things equal, decreasing light intensity increases the noise in photoreceptor responses that are crucial for color vision (Kelber, Yovanovich, & Olsson, 2017, p. 2). The increase in noise causes a corresponding increase in uncertainty with respect to color estimates. In a Bayesian model of color vision, this means that the likelihood for the estimate of the surface’s color will have lower precision, while the prior of possible colors remains the same (Brainard, 2009). As a result, the posterior distribution over color estimates will be less precise for dim conditions compared with bright conditions. Thus, the Uncertainty Account of Unclarity again correctly predicts that, other things equal, one will have a clearer experience with respect to the surface’s color in bright conditions compared with dim conditions. Finally, recall the effect of crowding on unclarity. Peripherally, one experiences a letter’s shape less clearly when there are multiple distractors within a critical distance from the target letter than when there are no distractors. The precision with which letters’ shapes are represented decreases as more letters are added, and there’s evidence that crowded items can be given different weights according to the uncertainty with which they are represented (Alvarez, 2011). As a result, the Uncertainty Account of Unclarity correctly predicts that, other things equal, one will have a clearer experience of a letter’s shape when there are no crowded distractors than when there are.13 4.4  Clarity, Salience, and Attention We can further home in on perceptual clarity by distinguishing it from another aspect of perceptual experience: salience. This section articulates the relevant notion of salience, distinguishes it from clarity, and then provides further support for the Uncertainty Account of Unclarity by showing that

104  Jonna Vance the account can explain several key facts about the relationships between salience and clarity. Perceptible features can also be experienced more or less saliently. For example, during one experience, an apple’s redness might be salient to you, while in another experience, the apple’s shape might be salient to you. Or suppose you’re sitting on a packed subway car at rush hour. A person enters carrying several heavy bags. With no seats available, they stand uncomfortably. You consciously attend to features related to the person’s posture and facial expression. The features are salient in your experience.14 The target notion of salience is relational, phenomenal salience. ­Relational salience is relative to a perceiver and a time (Beck & Schneider, 2017, p. 483). For example, if you’re currently attending to an apple’s redness, then its redness is relationally salient for you now. Relational salience contrasts with uses of “salience” to refer to something’s disposition to attract normal perceivers’ attention. According to Wu, “Phenomenal salience is the way [a location] object or property figures to a subject when she consciously attends to it in perception, a way that constitutes what it is like to attend to that [location] object or property” (Wu, 2011, pp. 93–94). In other words, phenomenal salience picks out the distinctive phenomenal character of consciously attending to a location, object, or feature. A caveat is in order here. As we’ll see below, attention may have more than one typical effect on the way things figure in experience. Phenomenal salience contrasts with uses of “salience” to refer to a location, object, or feature’s being unconsciously prioritized for enhanced perceptual processing. For ease of presentation, in what follows, I use “salience” to refer to relational, phenomenal salience. Salience and clarity have much in common. First, they are both aspects of the phenomenal character of perceptual experiences. They are part of what it’s like to have a perceptual experience. Second, they both come in degrees; neither is all or nothing. A feature can be experienced more or less saliently by degrees. Likewise, a feature can be more or less clearly experienced by degrees. Third, like clarity, salience can vary over time, across experiences, and within a single experience across the perceptual field. Fourth, both clarity and salience are content-specific. As noted above, you can experience one of an object’s features very clearly while experiencing another of its feature unclearly. Likewise, one of an object’s features can be salient to you, while others are not. For example, if you attend to an apple’s redness but not its shape, its redness is salient to you while its shape is not. 4.4.1  Attention Tends to Increase Salience and Clarity

In addition to the above characteristics that salience and clarity have in common, it is worth highlighting one further commonality for special

Perceptual Uncertainty, Clarity, and Attention 105 consideration. It serves as the first of three key facts to be explained about the relationships among attention, salience, and clarity. Fact 1: A person’s consciously attending to a feature tends to increase both the salience and the clarity with which that person experiences that feature. Given the definition of phenomenal salience, it’s straightforward why consciously attending to a feature increases the feature’s salience. The phenomenal salience of a feature is defined as the way that feature figures to a subject when she consciously attends to it in perception. To see that attending to a feature also tends to increase the clarity with which it is experience consider two examples. First, at a given time, you experience the shape of most of the letters on this page unclearly. But as you attend to the words as you read them, you experience their shape clearly. Second, as you scan a crowd of people looking for your friend, the features of each face become clear as you attend to them. The features of less attended faces are seen much less clearly. Examples such as these could be easily multiplied. The Uncertainty Account of Unclarity explains why attending to a feature tends to increase the clarity with which it’s experienced, because attending to a feature tends to increase the precision—and, more generally, reduce the uncertainty—with which the perceptual system represents that feature. ­Theorists and modelers disagree about the specific mechanisms that underwrite uncertainty-reduction in attention, and they account for the connection between attention and uncertainty using a range of modeling frameworks, including signal-detection theory (Wyart, Nobre, & Summerfield, 2012) and probabilistic population codes (Cohen & Maunsell, 2009). Relatedly, theorists and modelers disagree about whether attention to something causes, constitutes, or is caused by a decrease in uncertainty with respect to it. We can illustrate this disagreement in terms of how theorists model the relationship between changes in attention and in precision-weighting in sensory processing. According to the predictive processing account of attention, attending to x occurs by increasing the relative precision-weighting on error units associated with x (Feldman & Friston, 2010; Friston & Stephan, 2007; Hohwy, 2012). The increased precision-weighting alters the post-synaptic gain on the selected error units, allowing them to have greater influence upward in the perceptual hierarchy (Friston, Bastos, Pinotsis, & Litvak, 2015, p. 1). Here changes to precision-weightings cause shifts in attention. By contrast, others argue that changes to precision-weightings are caused by shifts in attention. For example, Bowman and colleagues write, [W]e could ask the reader to find the nearest word printed in bold. Attention will typically shift to one of the headers, and indeed momentarily increase precision there, improving reading. But this makes

106  Jonna Vance precision-weighting a consequence of attending. At least as interesting is the mechanism enabling stimulus selection in the first place. The brain has to first deploy attention before a precision advantage can be realised for that deployment. (Bowman, Filetti, Wyble, & Olivers, 2013, p. 207, emphasis original) For present purposes, we need not settle the debate about whether increased attention to F tends to cause, be caused by, or merely correlate with increases in the precision with which F is represented. Regardless, increased attention to F correlates with representing F with increased precision (or more generally: reduced uncertainty). The Uncertainty Account of Unclarity then provides the link between reduced uncertainty and increased ­clarity. According to the account, an experience’s representing F with increased precision (reduced uncertainty) increases the clarity with which the experience represents F. Thus, the Uncertainty Account explains Fact 1. 4.4.2  High Salience with Low Clarity

Despite their commonalities, salience and clarity are distinct phenomena. Here is the second key fact about them that we need to explain: Fact 2: At a time, a feature can be experienced with high salience and low clarity. To illustrate Fact 2, consider the following example. You’re hiking in the forest in a dense fog. You see what looks like an animal moving in the distance. You want to know what kind of animal it is, so you attend to its shape. Its shape is highly salient to you. But because it’s foggy, the viewing conditions are noisy, and you experience the animal’s shape with very low clarity. The Uncertainty Account of Unclarity explains why a feature can be experienced with high salience and low clarity. As we saw in the discussion of Fact 1 in the previous section, consciously attending to a feature tends to increase the precision (reduce the uncertainty) with which that feature is represented in experience. However, in noisy conditions like the fog, the perceptual system will be starting from a very imprecise (very uncertain) estimate. So, even though there’s an increase in the precision with which the animal’s shape is represented, it’s an increase from a very low degree of precision to a slightly higher—but still very low degree of precision. In terms of perceptual uncertainty: even though there’s a reduction in the uncertainty with which the animal’s shape is represented in experience, that shape is still represented in the experience with a high degree

Perceptual Uncertainty, Clarity, and Attention 107 of uncertainty in such noisy conditions. According to the Uncertainty Account of Unclarity, the experience thus represents the animal’s shape with low clarity. 4.4.3  Low Salience with High Clarity

Here is the third key fact to be explained with regarding salience and clarity: Fact 3: At a time, a feature can be experienced with low salience and high clarity. To illustrate, suppose that you’re a participant in a visual experiment. You’re stationed a computer. The screen is white. Letters of various colors appear and disappear rapidly. Your task is to identify the red vowel. ­Because you must identify a red vowel, you’re directing feature-based attention to the letters’ shapes and colors. The letters’ shapes and colors are salient for you. By contrast, because you’re not directing much attention to the white background, the background’s color is not highly salient to you. Yet you experience the background’s color with a high degree of clarity. The Uncertainty Account of Unclarity explains why a feature can be experienced with low salience and high clarity. Failing to direct much attention to a feature (here, the whiteness of the computer screen background) means that you experience it with a low degree of salience. There’s little or no attention-based increase to the precision with which your experience represents the background color. However, in the excellent viewing conditions of the experiment, with the bright computer screen background in clear view, there’s relatively little noise with respect to the screen’s background color. In this case, the precision with which your experience represents the background color could be quite high. In terms of perceptual uncertainty: the uncertainty with which a feature is represented in experience can be quite low, independent of any attentional boost. According to the Uncertainty Account of Unclarity, the low uncertainty with which your experience represents the screen’s color explains why you experience the color clearly in this case, even though the screen’s color has low salience for you. 4.5 Conclusion In this chapter, I articulated an aspect of perceptual experience called perceptual clarity. I then offered an account of perceptual clarity in terms of perceptual uncertainty. According to the Uncertainty Account, perceptual experiences carry information about the uncertainty of the representations from which each distal content of experience is selected, and this

108  Jonna Vance uncertainty fixes the degree of clarity with which that content is represented in that experience. I then offered two lines of support for the account. The first line of support showed how the account correctly predicts the degree of clarity across a range of experiences—organized by the range of dimensions along which the experience of various distal features can vary. The account correctly predicts that various features tend to be visually experienced less clearly (i) at greater eccentricity from the center of one’s visual field, (ii) in dimmer light, (iii) in foggier conditions, (iv) when out of focus, and (v) when flanked by crowding stimuli. The second line of support showed how the account can explain three key facts about the relationships among clarity, salience, and attention. According to the first fact: when a person consciously attends to a feature, it tends to increase both the salience and the clarity with which that person experiences that feature. According to the second fact: at a time, a feature can be experienced with high salience and low clarity. And according to the third fact: at a time, a feature can be experienced with low salience and high clarity. In addition, the second and third facts showed that salience and clarity are distinct phenomena. Distinguishing clarity from salience can help further delineate the nature of perceptual clarity. Notes 1 For discrete variables, each estimate is assigned a probability. For continuous variables, the relevant notion is a probability density. For simplicity in the main text, I use “probability” to cover both notions. 2 Such curves can be higher dimensional, but we can ignore that complexity here. 3 This is a sufficient condition. It leaves open whether perceptual uncertainty occurs in other ways. 4 Precision, variance, and standard deviation are all inter-defined, related to the spread of a distribution, and probabilistic measures of uncertainty. Precision is inverse variance. Variance is the average squared distance from the mean. And standard deviation is the square root of variance. Perceptual uncertainty characterized in terms of precision could be reformulated in terms of variance or standard deviation. 5 See Denison, Block, and Samaha (2020) for further discussion of evidence accumulation models. 6 Proponents include Clark (2016), Hohwy (2013), Rescorla (2015, 2020), and many others. Opponents include Block (2018); Bowers and Davis (2012); and many others. For helpful discussion, see Gross (2020). 7 Proponents again include Clark (2016), Hohwy (2013), Rescorla (2015, 2020), and many others. Opponents include Block (2018), Bowers and Davis (2012), Mandelbaum (2019), and many others. For an excellent overview of suboptimality in perception, see Rahnev and Denison (2018). 8 Proponents include Morrison (2016, 2017) and Munton (2016). Opponents include Beck (2020), Byrne (2021), Cheng (2022), Denison (2017), Raleigh and Vindrola (2021), and Siegel (2020).

Perceptual Uncertainty, Clarity, and Attention 109 9 For example, see Rahnev et al. (2021), and Siegel (2020). 10 For reviews, see Ma (2010); Trommershäuser, Körding and Landy (2011); and Alais and Burr (2019). 11 For more evidence for perceptual uncertainty from decoding studies, see Fetsch et al. (2012), van Bergen et al. (2015), and van Bergen and Jehee (2019). 12 The quantifier “some” is important here: not all features seen out of focus vary in clarity to the same degree, or even at all. The color of the red square seen out of focus is arguably experienced with (at least approximately) the same high clarity as when it’s seen in focus. As noted below, clarity is content-specific. 13 For additional support for this account, see Vance (2020, forthcoming). 14 This example is adapted from Blum (1991). See Vance and Werner (2022) for further discussion of attention and salience in moral perception.

References Alais, D., & Burr, D. (2004). The ventriloquist effect results from near-optimal bimodal integration. Current Biology, 14(3), 257–262. Alais, D., & Burr, D. (2019). Cue combination within a Bayesian framework, in A. K. C. Lee et al. (Eds.), Multisensory processes, Springer Handbooks. Alvarez, G. A. (2011). Representing multiple objects as an ensemble enhances visual cognition. Trends in Cognitive Sciences, 15(3), 122–131. Beck, J. (2020). On perceptual confidence and “completely trusting your experience. Analytic Philosophy, 61(2), 174–188. Beck, J., & Schneider, K. A. (2017). Attention and mental primer. Mind & Language, 32(4), 463–494. Blum, L. (1991). Moral perception and particularity. Ethics, 101(4), 701–725. Block, N. (2018). If perception is probabilistic, why does it not seem probabilistic? Philosophical Transactions of the Royal Society of London B: Biological Sciences, 373(1755), 20170341. Bowers, J. S., & Davis, C. J. (2012). Bayesian just-so stories in psychology and neuroscience. Psychological Bulletin, 138(3), 389–414. Bowman, H., Filetti, M., Wyble, B., & Olivers, C. (2013). Attention is more than prediction precision [commentary on target article]. Behavioral and Brain Sciences, 36(3), 206–208. Brainard, D. H. (2009). Bayesian approaches to color vision. In M. S. Gazzaniga (Ed.), The cognitive neurosciences (pp. 395–408). Cambridge, MA: The MIT Press. Byrne, A. (2021). Perception and probability. Philosophy and Phenomenological Research. https://doi.org/10.1111/phpr.12768 Cheng, T. (2022). Post-perceptual confidence and supervaluative matching profile. Inquiry, 65(3), 249–277. Cheng, K., Shettleworth, S. J., Huttenlocher, J., & Rieser, J. J. (2007). Bayesian integration of spatial information. Psychological Bulletin, 133(4), 625–637. Clark, A. (2016). Surfing uncertainty: Prediction, action, and the embodied mind. Oxford: Oxford University Press. Cohen, M. R., & Maunsell, J. H. (2009). Attention improves performance primarily by reducing interneuronal correlations. Nature Neuroscience, 12(12), 1594.

110  Jonna Vance Denison, R. N. (2017). Precision, not confidence, describes the uncertainty of perceptual experience: Comment on john Morrison’s “perceptual confidence. Analytic Philosophy, 58(1), 58–70. Denison, R. N., Block, N., & Samaha, J. (2020). What do models of visual perception tell us about visual phenomenology? In F. de Brigard, & W. SinnotArmstrong (Eds.), Neuroscience and philosophy. Cambridge, MA: MIT Press. Feldman, H., & Friston, K. (2010). Attention, uncertainty, and free-energy. Frontiers in Human Neuroscience, 4, 215. Fetsch, C. R., Pouget, A., DeAngelis, G. C., & Angelaki, D. E. (2012). Neural correlates of reliability-based cue weighting during multisensory integration. Nature Neuroscience, 15(1), 146–154. Friston, K. J., & Stephan, K. E. (2007). Free-energy and the brain. Synthese, 159(3), 417–458. Friston, K. J., Bastos, A. M., Pinotsis, D., & Litvak, V. (2015). LFP and oscillations—What do they tell us? Current Opinion in Neurobiology, 31, 1–6. Gross, S. (2020). Probabilistic representations in perception: Are there any, and what would they be? Mind & Language, 35(3), 377–389. Hohwy, J. (2012). Attention and conscious perception in the hypothesis testing brain. Frontiers in Psychology, 3, 96. Hohwy, J. (2013). The predictive mind. Oxford: Oxford University Press. Kelber, A., Yovanovich, C., & Olsson, P. (2017). Thresholds and noise limitations of colour vision in dim light. Philosophical Transactions of the Royal Society B: Biological Sciences, 372(1717), 20160065. Körding, K., Beierholm, U., Ma, W. J., Quartz, S., Tenenbaum, J. B., & Shams, L. (2007). Causal inference in multisensory perception. PLoS One, 2(9), e943. Ma, W. J. (2010). Signal detection theory, uncertainty, and poisson-like population codes. Vision Research, 50(22), 2308–2319. Mandelbaum, E. (2019). Troubles with Bayesianism: An introduction to the psychological immune system. Mind & Language, 34(2), 141–157. Morrison, J. (2016). Perceptual confidence. Analytic Philosophy, 57(1), 15–48. Morrison, J. (2017). Perceptual confidence and categorization. Analytic Philosophy, 58(1), 71–85. Munton, J. (2016). Visual confidences and direct perceptual justification. Philosophical Topics, 44(2), 301–326. Rahnev, D., Block, N., Denison, R. N., & Jehee, J. (2021). Is perception probabilistic? Clarifying the definitions. https://psyarxiv.com/f8v5r/download?format=pdf Rahnev, D., & Denison, R. N. (2018). Suboptimality in perceptual decision making. Behavioral and Brain Sciences, 41, e223. Raleigh, T., & Vindrola, F. (2021). Perceptual experience and degrees of belief. The Philosophical Quarterly, 71(2), 378–406. Rescorla, M. (2015). Bayesian perceptual psychology. In M. Matthen (Ed.), The Oxford handbook of philosophy of perception (pp. 694–716). Oxford: Oxford University Press. Rescorla, M. (2020). A realist perspective on Bayesian cognitive science. In A. Nes, & T. Chan (Eds.), Inference and consciousness. New York: Routledge. Siegel, S. (2020). How can perceptual experiences explain uncertainty? Mind & Language, 37(2), 134–158.

Perceptual Uncertainty, Clarity, and Attention 111 Sotiropoulos, G., Seitz, A., & Seriès, P. (2014). Contrast dependency and prior expectations in human speed perception. Vision Research, 97, 16–23. Stocker, A., & Simoncelli, E. (2006). Noise characteristics and prior expectations in human visual speed perception. Nature Neuroscience, 4, 578–585. Trommershäuser, J., Körding, K., & Landy, M. S. (Eds.) ( 2011). Sensory cue integration. Oxford: Oxford University Press. van Bergen, R. S., & Jehee, J. F. M. (2019). Probabilistic representation in human visual cortex reflects uncertainty in serial decisions. The Journal of Neuroscience, 39(41), 8164–8176. van Bergen, R. S., Ma, W. J., Pratte, M. S., & Jehee, J. F. M. (2015). Sensory uncertainty decoded from visual cortex predicts behavior. Nature Neuroscience, 18(12), 1728–1730. Van den Berg, R., Roerdink, J. B., & Cornelissen, F. W. (2010). A neurophysiologically plausible population code model for feature integration explains visual crowding. PLoS Computational Biology, 6(1), e1000646. Vance, J. (2020). Precision and perceptual clarity. Australasian Journal of Philosophy, 99(2), 379–395. Vance, J. (forthcoming). Vagueness in visual experience. In R. French, & B. Brogaard (Eds.), The roles of representations in visual perception. Berlin: Springer. Vance, J., & Werner, P. J., (2022). Attentional moral perception. Journal of Moral Philosophy, 19(5), 501–525. Walker, E. Y., Cotton, R. J., Ma, W. J., & Tolias, A. S. (2020). A neural basis of probabilistic computation in visual cortex. Nature Neuroscience, 23(1), 122–129. Weiss, Y., Simoncelli, E., & Adelson, E. (2002). Motion illusions as optimal percepts. Nature Neuroscience, 5, 598–604. Wu, W. (2011). What is conscious attention? Philosophy and Phenomenological Research, 82(1), 93–120. Wyart, V., Nobre, A. C., & Summerfield, C. (2012). Dissociable prior influences of signal probability and relevance on visual contrast sensitivity. Proceedings of the National Academy of Sciences, 109(9), 3593–3598.

5

Predictive Processing and Object Recognition Berit Brogaard and Thomas Alrik Sørensen

5.1 Introduction There has been a lot of recent interest in predictive processing (PP) theories of cognition (Bar, 2003; Clark, 2013, 2016, 2020; Friston, 2003, 2009, 2010; Hohwy, 2012, 2013, 2020). While initial PP models focused primarily on visual perception (Friston, 2003, 2005), recent advocates have suggested that the predictive framework can account for all mental processes and indeed all of the brain’s operations (Clark, 2013, 2016, 2020; Hohwy, 2013, 2020). PP theories take issue with classic models of visual perception. On the classical approach, lower cortical areas (V1, V2, V4, etc) in the ventral stream process sensory information that has been filtered through the thalamus (LGN) and then project this information to higher regions [e.g., the inferior temporal (IT) cortex]. Here, the information is further processed in light of feedback from object templates stored in long-term memory. Once the last visual area in the ventral stream has processed the ­information from earlier areas, a visual perception of the distal object is generated. The traditional model thus focuses primarily on bottom-up processing and to a lesser extent on top-down modulation. On the predictive view, this picture is reversed (Clark, 2013, 2015; Feldman & Friston, 2010; Hohwy, 2012, 2013). The predictive framework posits that the brain deploys internal models, which contain information extracted from past experience, to generate predictions, or hypotheses, about its surroundings. These predictions are then matched to the incoming visual information. Mismatches between predictions and incoming signals – so-called prediction errors – are then projected bottom-up to higher brain areas, where they are used to update the predictions. This process, which occurs hierarchically, continues until the prediction errors are minimized to the greatest extent possible, and the winning prediction determines the visual content. In contrast to traditional models, predictive approaches thus hold that all bottom-up processes are signals conveying prediction errors to higher regions. By only DOI: 10.4324/9781003084082-7

Predictive Processing and Object Recognition 113 processing prediction errors rather than all visual stimuli, the brain saves energy (Friston, 2003, 2004, 2009). Here, we argue that the predictive approach falls short of providing a complete account of visual perception. Specifically, we take issue with the predictive approach’s core idea that all bottom-up signals are prediction error signals and that prediction error minimization is “all that the brain ever does,” as Jakob Hohwy puts it (2013, p. 7). Although our point is a general one, we will focus on the case of object recognition. As we will see, there is a substantial body of evidence suggesting that there are three stages to object recognition: (i) scene gist processing, (ii) attentional object selection, and (iii) hypothesis testing. We argue that PP theories lack the resources to accommodate the first two stages of object recognition. Ransom, Fazelpour, and Mole (2017) and Ransom et al. (2020) have previously argued that the predictive account of attention is unable to account for voluntary object attention and affect-biased attention. These conclusions challenge one of the predictive account’s key claims, viz. that it offers a unified theory of the mind. We will argue that attention during the earliest stages of object recognition presents a further problem for the predictive account of attention. The chapter is structured as follows. In Section 5.2, we outline the details of the predictive approach, as presented by Karl Friston, Andy Clark, Jakob Hohwy, and others. In the two subsequent sections, we review the empirical evidence for the claim that object recognition begins with gist processing and argue that the PP framework is unable to accommodate gist processing. In Section 5.5, we offer a brief overview of the previous studies showing that predictive models of attention are unable to account for selective attention and affect-biased attention and then argue that attention at the earliest stage of object processing presents a problem for the predictive approach. Finally, in the concluding section, we discuss some ways in which the predictive account may be augmented to provide a unified theory of the mind. 5.2  The Brain as a Hypothesis-Testing Mechanism The PP approach is often cast as a solution to the problem of how the brain determines the distal cause of an incoming visual signal. This problem arises because any visual input has an infinite number of possible distal causes, which raises the question of how the brain reliably determines which is most probable. Consider this analogy from Hohwy: You are like the brain, the house is the skull, and the sound is auditory sensory input. As you are wondering about the cause of the input, you begin to list the possible causes of the input. It could be a woodpecker pecking at the wall, a branch tapping at the wall in

114  Berit Brogaard and Thomas Alrik Sørensen the wind, a burglar tampering with a lock, heavy roadworks further down the street, a neighbour’s loud music, or those kids throwing stones; or it could be something internal such as loose water pipes banging against each other. Let your imagination rip: it could be that your house has been launched into space over night and the sound is produced by a shower of meteorites. There is no end to the possible causes. Call each of these possibilities a hypothesis. The problem of perception is how the right hypothesis about the world is shaped and selected. (2013, pp. 15–16) In Hohwy’s analogy, you wonder about the cause of the auditory input and begin to list possible causes of the input. It might be caused by a woodpecker pecking at the wall, a branch tapping at the wall in the wind, a burglar tampering with a lock, heavy roadworks further down the street, a neighbor’s loud music, those kids throwing stones, loose water pipes banging against each other, a shower of meteorites, and so on ad infinitum. Of course, the brain could not possibly test infinitely many hypotheses. So, it needs to somehow narrow down the infinite set to a more manageable one. PP’s popularity is partly due to its advertisement as a solution to this problem (Hohwy, 2013). PP holds that the brain generates predictions, or hypotheses, and then uses Bayes’ principle to determine the most probable hypothesis.1 The competing, or alternative, hypotheses are generated by models that group together patterns, or statistical regularities, derived from past sensory inputs (Friston, 2009). One key concept in Bayes’ principle is likelihood: how probable it is that the hypothesis accurately predicts the distal cause of the sensory input. The more probable it is that the hypothesis accurately predicts the distal cause of the sensory input, the greater its likelihood. Since mosquitos do not make pecking sounds, the  likelihood that the pecking sound is caused by a mosquito buzzing around the ceiling lamp is low. So, this hypothesis’ likelihood is low. But as far as you know, there are countless other hypotheses with a high likelihood. A second concept in Bayes’ principle is a hypothesis’ independent, or prior, probability. According PP, the prior is also determined by information about the environment, extracted from past experience. In our example, the prior probability of the hypothesis that the pecking sound is produced by a shower of meteorites is infinitesimal. But if there are a lot of woodpeckers, burglars, and stone-throwing kids in the area, then the prior probabilities of the woodpecker, burglar, and stone-throwing kids hypotheses are high. In the Bayesian framework, the hypothesis with the greatest posterior probability determines what you perceive. According to Bayes’ principle, the posterior probability is the product of a hypothesis’ prior and the likelihood that the hypothesis accurately predicts a distal cause of the sensory signal. If the woodpecker hypothesis has the highest

Predictive Processing and Object Recognition 115 posterior probability, owing to its higher prior or its higher likelihood (or both), then you perceive the auditory signal as the sound of a woodpecker. Of course, Hohwy’s analogy is just that: an analogy. Here are three key differences between this analogy and the predictive account of perception. First, in the brain, the Bayesian inferences that go into determining the hypothesis with the highest posterior probability occur at the subpersonal level; these inferences are unconscious (at least for the case of perception).2 So, you do not first hear a pecking sound and then the sound of a woodpecker. You just hear the sound of a woodpecker. Second, in the brain, there is a hierarchy of generative models that produce hypotheses (or predictions). In the brain, these hypotheses are more akin to “This neural activation in the V4/V8 color region is caused by a red object” than “This auditory signal is caused by a woodpecker.” Third, at every level in the hierarchy, hypotheses generated at one level are matched to inputs at the level below. If there is a mismatch, or prediction error, between the hypothesis and the input, this prediction error is used to update the hypothesis. Updating a hypothesis effectively means that the hypothesis is revised in light of the information that did not accurately depict the distal cause of the incoming sensory signal. This process then continues until the brain has arrived at the hypothesis with the highest posterior probability. Determining the hypothesis with the highest posterior probability at each level amounts to minimizing the prediction error between the hypothesis and the sensory input. A perception arises as the prediction error is sufficiently minimized. So, prediction error minimization is a key concept in PP (Friston, 2010; Hohwy, 2013). One complication within the PP framework, which we will turn to below, is that prediction error minimization is subject to expectations of noise. One of PP’s boldest conjectures is that only prediction error signals, that is, signals that encode information about the prediction error, are propagated up through the system in a bottom-up fashion. In the following sections, we take issue with this claim. We argue that empirical studies of visual object recognition run counter to this conjecture that all bottomup processing is prediction error signaling. We begin by looking closer at visual object recognition. 5.3  Object Recognition in Natural Visual Scenes In ordinary life, objects tend to occur as parts of larger scenes, together with other items that are likely to occur in the same scenes (Trapp & Bar, 2015). While it is often difficult to find an object hidden in a crowded scene, context can facilitate the visual recognition of objects that are congruent with it (e.g., a frying pan in a kitchen) (Fiser & Aslin, 2001, 2005; Kondo, van Loon, Kawahara, & Moore, 2017; Oliva & Torralba, 2007). However,

116  Berit Brogaard and Thomas Alrik Sørensen scene context presents an obstacle to the discrimination of objects that are incongruent with it (e.g., a frying pan in a movie theater) (Auckland et al., 2007; Gordon, 2004; Hollingworth & Henderson, 1998; Oliva & Torralba, 2007; Palmer, 1975). Thus, when viewing a kitchen containing a frying pan and a bicycle helmet, the pan is detected faster and with greater ease than the helmet. However, if an object is located in an unusual place (e.g., a microwave hanging from the ceiling), detecting the object is slower in a congruent scene (e.g., kitchen) than an incongruent scene (e.g., living room) (Bar, 2004; Hoffman, 1996; Meyers & Rhoades, 1978). These effects are also known as “scene consistency-inconsistency effects” (Oliva & Torralba, 2007). Various other contextual factors besides object-scene relationships provide cues that can be exploited by the visual system for the identification of objects, including co-variation of objects, spatial and temporal proximity of objects, spatial configuration of objects relative to each other, typical positions of objects in scenes, familiar relative size of object, and pose of objects in scenes (Bar, 2004; Biederman, Mezzanotte, & Rabinowitz, 1982; Green & Hummel, 2006; Hock et al., 1974; Oliva & Torralba, 2007). For example, chairs and tables are expected to co-occur, whereas a frying pan and an elephant are not; fire hydrants are expected be on top of the sidewalk rather floating in the air; dinner plates are expected to be on top of tables, in stacks on shelves or in the sink or dishwasher but not on the floor; chairs are expected to be oriented toward tables rather than away from them; cars are expected to be oriented along the driving directions of a street rather than in the direction of the sky; and pedestrians are expected to be in an upright position rather than lying down. The effects of scene context can be so strong that altering the background scene while leaving the target object intact can change the perceived identity of the object. In Figure 5.1, for example, the orange Toyota Supra is recognized as a real car in the nature scene but as a toy car in the street scene and the indoor scene. Biederman et al.’s (1982) prediction that relative familiar size (i.e., the scale of an object relative to other objects) influences object recognition is borne out here. In the street scene, relative size trumps statistical co-variation of objects, whereas both relative size and statistical co-variation of objects contribute to the identification of the car in the indoor scene. People are sometimes capable of recognizing objects embedded in congruent scenes, even when they completely lack perceptible structures or features that can guide object recognition independently of scene context. In the street scene in Figure 5.2, for example, the blob on the right is identical to the blob on the left after 90 degrees rotation (Oliva & Torralba, 2007). So, the blob’s intrinsic features do not reveal its identity. Nevertheless, the scene is immediately recognized as a street scene. Recognition of the scene gist activates a scene template (i.e., context frame) that provides

Predictive Processing and Object Recognition 117

Figure 5.1 The car is immediately recognized as a real car in the nature scene (a) but is recognized as a toy car in both the street scene (b) and the indoor scene (c).

Figure 5.2 The gist of a street scene. The gist of a scene is the scene’s low spatial frequency information, such as the global scene configuration and the gross contour of objects. In this image, the “pedestrian” on the right is identical to the “car” on the left after 90 degrees rotation. As cars and pedestrians typically are oriented differently in a street scene, observers recognize one blob as a car and the other as a pedestrian Source: From Oliva and Torralba (2007)

118  Berit Brogaard and Thomas Alrik Sørensen information about the typical differences in the orientation of cars and pedestrians in a street scene. As a result, the blob on the right is recognized as a pedestrian and the blob on the left as a car. In dynamic scenes, scene recognition can also facilitate object tracking and trajectory prediction and expectations. In a street scene where a moving bus passes a grocery store on the opposite side of the street and thereby occludes the storefront relative to the vantage point of an observer, she still expects the store to be present once the bus has passed it. However, expectations regarding the trajectory of a pedestrian occluded by the bus are not nearly as strong, as the pedestrian might have gone into the store in the meantime. While scene context facilitates and sometimes is essential to object recognition, people usually recognize a good exemplar (or prototype member) of an object category when presented to us without any scene context in a laboratory setting within 75 ms (see Dall, Wang, Cai, Chan, & Sørensen, 2021; Shibuya & Bundesen, 1988; Sørensen, Vangkilde, & Bundesen, 2015). Even under optimal conditions, however, recognizing an object without scene context is considerably slower than recognizing a familiar scene (36 ms) (Larson, Freeman, Ringer, & Loschky, 2014) (Figure 5.3).

Figure 5.3 Demonstration of scene gist recognition. From KSU, Vision Cognition Laboratory (left, online only).3 Scene information presented briefly ­between two blank screens can be extracted rapidly (right)

Predictive Processing and Object Recognition 119 Studies have shown that the ability to rapidly recognize objects and scenes is due to the collaboration of two distinct visual pathways: a fast pathway that projects the gist of the object directly from the primary visual cortex (V1) to the orbitofrontal cortex (OFC) in the prefrontal cortex, which then generates predictions, or hypotheses, and a slower pathway that processes detailed information in a standard button-up fashion (V1, V4, V5/MT, LOC, IT) (Bar et al., 2006; Torralba, Oliva, Castelhano, & Henderson, 2006). The gist of an object or scene takes the form of low spatial frequency (LSF) information extracted from the sensory signal originating in the object or scene. LSF information encodes gross outlines and object contours, whereas high spatial frequency (HSF) information encodes sharp edges and fine details. To demonstrate that the recognition of isolated objects takes place via dual visual pathways, Bar et al. (2006) combined functional magnetic resonance imaging (fMRI), magnetoencephalography (MEG), and a behavioral task. In the fMRI study, participants were shown images that were either unmanipulated or manipulated to contain only LSFs or HSFs. The results showed that the LSF image of an object elicited greater activity in OFC than the HSF image of that object, although unmanipulated images resulted in the greatest increase in OFC activity. Greater OFC activation was also observed when an object’s LSF image had multiple interpretations compared to just a few, suggesting that the more ambiguous an object’s LSF signal is, the greater the workload for OFC (see also Dall et al., 2021). In their 2007 study, Kveraga, Boshyan & Bar found that the fast pathway is a magnocellular pathway, which projects information from V1 to OFC via either subcortical projections or the dorsal “visual for action” pathway. All the brain’s magnocellular (M) pathways project information much faster than its parvocellular (P) pathways, but at the expense of detailed information. The brain’s visual M pathways process global spatial structure, object contours, depth, and motion. In the absence of scene context, the inferior part of OFC matches the object gist to object templates in long-term memory to find the best match or best matches. For example, when the object gist with the mushroom contour in Figure 5.4A is matched to object templates in long-term memory in the absence of scene context, there is no single best match but rather several best matches, such as the object templates for a mushroom, a lamp, and an umbrella. Although the LSFs projected via the M pathway generate predictions, or hypotheses, about the identity of the objects, the HSFs extracted from the object and processed bottom-up via the P pathway are typically ­required for the visual system to be able to determine the identity of the object. The hypotheses arrive back in the IT cortex temporally prior to the arrival of the finely detailed bottom-up information, and the fast-arriving

120  Berit Brogaard and Thomas Alrik Sørensen

Figure 5.4 Three pictures (256 pixels) of familiar objects (A: lamp, B: flower, and C: vase) filtered to include only the low frequency spatial components (0–4 cycles/picture) Source: From Bar (2003)

hypotheses are then compared to the slowly arriving fine-grained information. If there is a mismatch between a hypothesis and the fine-grained information, then the fast M pathway sends a prediction error signal to OFC, telling it to update the hypothesis. Object recognition in real-world scenes proceeds in a similar way. The scene gist (i.e., the LSF information extracted from the scene) is projected directly from V1 to OFC (Bar et al., 2006) (Figure 5.4). But different parts of OFC show selectivity for the gists of objects and scenes. However, the inferior part of the OFC shows selectivity for LSFs extracted from images of isolated objects, the medial part of OFC responds preferentially to LSFs extracted from images of scenes (Aminoff, Kveraga, & Bar, 2013; Bar et al., 2006). The increased activation of the medial areas of OFC recruits a scene template, or what is also sometimes called a context frame, schemata, a script, or a frame (Bar, 2004; Friedman, 1979; Palmer, 1975; Schyns & Oliva, 1994). Scene templates are structures in long-term memory, which store statistical scene regularities and derive from past exposure to similar scenes. Once a scene template has been recruited (e.g., a living room scene template), associated object templates are rapidly activated, a process that provides the platform for predictions of which objects are most likely to be found in the scene (e.g., a sofa, a sofa table, a lamp, a television) (Bar, 2009). Activated scene templates constrain expectations with respect to the presence and typical characteristics and location of objects in the scene and provide the ability to direct attention in order to shift gaze to relevant regions of the scene. The scene template serves as a coarse-grained prediction about the distal cause of the scene gist. The scene information is then projected to the IT cortex, where it awaits the later-arriving HSF signal that has been processed bottom-up. Suppose the task is to determine the distal cause of the mushroom contour in Figure 5.4A in the context of a living room. Although multiple object templates match the object gist with the mushroom contour when processed without a scene gist, one can imagine that only the lamp is a

Predictive Processing and Object Recognition 121 suitable match when the object gist with the mushroom contour is processed together with the gist of a living room. In the envisaged case, the gist of a living room recruits a living room scene template in long-term memory, which in turn helps narrow down the range of hypotheses about the distal cause of the object gist with the mushroom contour to a single one (viz., the hypothesis that the distal cause of the gist with the mushroom contour is a lamp). Even so, one cannot equate the information projected back to IT with perceptual content, as this signal only consists of LSF information from the mushroom contour and semantic information about a prototypical living room table lamp. But there is obviously more to perceptual content than LSF and semantic prototype information. Perceptual content also contains HSF information, such as information about texture, sharp edges, and colors. 5.4  The Predictive Account and Gist Processing Let us now turn to one of the problems object recognition presents for the predictive approach, viz., that of accounting for gist processing. Recall that on the predictive account, the only information that gets relayed up through the hierarchy is the prediction error signal, which encodes information about the mismatch between the sensory input and a hypothesis about what caused the signal. A prediction error can also be thought of as information that has not yet been successfully predicted by the hypothesis. Prediction error signals carry information bottom-up in the visual system, eliciting an update of the hypothesis, which is then compared to the lower level sensory signal. This hypothesis-testing process continues until the prediction error is minimized as much as possible. But, as we have just seen, the first step in object recognition is not the generation of a hypothesis but rather the projection of low frequency spatial information – the kind of information that encodes holistic layouts and contours of objects – from V1 to OFC. This gist of a scene or an object recruits a scene or object template encoded in long-term memory. Scene templates and object templates serve as hypotheses about the distal causes of sensory information about scenes and objects, respectively, where the sensory information at this stage is the gist of the object or scene. If a single hypothesis wins out, then this hypothesis is matched against the slower arriving, HSF signal, which has been processed bottom-up. Hypothesis revision continues until the best match has been found. The problem this poses for PP is that the projection of the gist of a scene or object from V1 to OFC cannot be construed as a prediction error signal, as the brain needs to be apprised about its surroundings, at least in broad strokes, before it can generate detailed predictions about its surroundings. If the brain were to start with a randomly chosen hypothesis – a random guess – our vision would fail us far

122  Berit Brogaard and Thomas Alrik Sørensen more often than it actually does, as correcting the potentially gross error of a random guess would be rather time-consuming in most cases. Andy Clark uses the following analogy to shed light on the idea of a prediction error signal: [S]uppose you and I play a game in which I (the “higher, predicting level”) try to describe to you (the “lower level”) the scene in front of your eyes. I can’t see the scene directly, but you can. I do, however, believe that you are in some specific room (the living room in my house, say) that I have seen in the past. Recalling that room as best I can, I say to you “there’s a vase of yellow flowers on a table in front of you”. The game then continues like this. If you are silent, I take that as your agreeing to my description. But if I get anything that matters wrong, you must tell me what I got wrong. You might say “the flowers are yellow”. You thus provide an error signal that invites me to try again in a rather specific fashion—that is, to try again with respect to the colour of the flowers in the vase. The next most probable colour, I conjecture, is red. I now describe the scene in the same way but with red flowers. Silence. We have settled into a mutually agreeable description. (Clark, 2015, p. 5) The problem with this analogy is that Clark assumes that he (the “higher, predicting level”) already believes that he is in a living room, a specific living room indeed. He thus skips right over the first (and the second) stage of object recognition, that is, he ignores that the sensory signal somehow must generate an initial hypothesis, or prediction, about the distal cause of the sensory signal. Otherwise, the initial hypothesis is just a random guess. If we assume that the sensory signal does not initially help shape the formation of a hypothesis, then the game you (the “lower level”) play with Clark (the “higher, predicting level”) might well run as follows: Clark: I don’t really have any idea where you are. But let me just give it a shot. You are in a living room in Edinburgh. You: You are wrong about “You are in a living room in Edinburgh.” Clark: How wrong? Is it a room in a house? You: You are wrong about “You are in a living room in Edinburgh.” Clark: Okay, let me try something more general. You are outside. You: [Silence] Clark: Silence means we have settled to a mutually agreeable description. Okay, then. You are outside. You are in your yard. You: You are wrong about “You are in your yard.” Clark: You are walking your dog. You: You are wrong about “You are walking your dog.”

Predictive Processing and Object Recognition 123 Clark: You: Clark: You:

You are on a beach. You are wrong about “You are on a beach” Bloody hell. I give up … Really! Alright then. I am in the outback of Australia, on a mission to extract venom from the deadly eastern brown snake. Clark: You’re what? Granted, people often do know at the top predictive level where they are. We bet that you know where you are right now, even without having to open your eyes. This much is true. The problem with this rejoinder is that object recognition does not require knowing where one is. If you are shown slides of different familiar scenes one by one, you can immediately identify the scenes and the objects in them without holding any prior beliefs about what the slides might present. The same goes for object recognition without a scene context. It is possible to recognize familiar objects in isolation of scene context in less than 80 ms without having the slightest hint ahead of time as to what the object on the next slide may be (e.g., Davenport & Potter, 2004). This example merely serves to drive home the point that you need a bottom-up signal to present at least a general sketch of the scene or object to the prefrontal cortex, so the decision-making part of the brain can generate probable predictions rather than being forced to rely on random guesses. But PP encounters further trouble once we consider how it handles noise, or imprecision, in the sensory input (see also Vance, 2021). Noise can be understood as a meaningless discrepancy between the sensory signal and its distal cause. Externally generated noise in a visual signal may be due to poor viewing conditions, such as morning fog, which makes sensory signals less reliable. Internally generated noise, by contrast, may be due to random deviations in neural firing. Within the PP framework, updating a hypothesis is supposed to generate a more accurate prediction about the distal cause. Updating a hypothesis on the basis of a noisy, or imprecise, signal, however, is much less likely to give rise to a more accurate prediction. So, PP maintains that noisy, or imprecise, signals have much less influence on the updating of the brain’s predictions. To accommodate this idea, advocates of PP assume that in addition to making hypotheses, or predictions, about the distal cause of a sensory signal, the brain also makes predictions about how precise the sensory signal is. The greater precision the brain expects, the greater the gain on the prediction error signal, and the more weight is given to the prediction error in updating the hypothesis. Conversely, if the brain expects a noisy, sensory input, then it attenuates the prediction error signal, thus inhibiting its influence on the update of the hypothesis. But this causes trouble for PP with respect to the gists of scenes and objects.

124  Berit Brogaard and Thomas Alrik Sørensen As we have seen, a substantial body of research shows that in real scenes, the brain first samples LSF information about a scene and propagates it to the prefrontal cortex, where it activates a scene template in long-term memory. The scene template then generates a hypothesis about what sorts of objects are likely to be present in the scene. When dealing with a fixed object, the gist of the object is projected from V1 to the prefrontal cortex, which then activates compatible object templates in long-term memory. Object gists and scene gists are prime examples of noisy incoming sensory signals. After all, they are encoded in the form of LSF information – information which, by definition, lacks fine details about the object or scene. But PP suggests that the brain attenuates noisy, or imprecise, ­sensory signals. If, however, the brain attenuated the gists of objects and scenes, then the sensory input would not be able to activate object or scene templates in long-term memory. Accordingly, the brain would not be able to generate an informed prediction about the distal cause of the sensory signal. If, however, sensory signals had not been able to shape predictions about their distal causes through gist signaling, then the brain would have been forced to rely on pure guesswork, and humans and animals would have been unable to perceive objects. The case of object perception thus unveils a flaw in PP’s way of dealing with noisy sensory signals. At this point, advocates of PP may deny that the gists of scenes and objects are noisy sensory signals because they facilitate object recognition. This, however, would be an ad hoc maneuver. The gists of scenes and objects are paradigm examples of noisy signals. For example, when pictures of familiar objects are filtered to include only the LSF components (0–4 cycles/picture), the objects cannot be recognized with high certainty (Bar, 2003) (Figure 5.4). When the distal cause is viewed in the absence of a scene context, the brain ought to expect the object gist with the mushroom contour in Figure 5.4A to lack precision. That is, the brain ought to ­predict that gists are low-precision signals. After all, in the absence of scene information, the brain has no basis upon which to give priority to the hypothesis that the object gist with the mushroom shape in Figure 5.4A was caused by a living room table lamp rather than the hypothesis that it was caused by a mushroom or an umbrella. So, the brain ought to predict that the object gist in Figure 5.4A is a low-precision signal. But PP holds that low-precision signals are attenuated. So, PP wrongly predicts that gist signals are attenuated. 5.5  Object Recognition Depends on Attention Given that the gist of a scene can be extracted over the course of a single eye fixation, during which all components of the retinal image have fixed locations, it may be tempting to think that the scene gist is processed

Predictive Processing and Object Recognition 125 homogeneously across the visual field prior to any involvement of attention. Recent studies, however, have shown that attention plays a pivotal role in scene gist recognition (Larson et al., 2014; see also Berry & Schwartz, 2011)). Although scene gist acquisition occurs within a single fixation, covert attention aids in extracting the gist of the scene. Evidence shows that masking central vision during the first 50 ms of eye fixation interferes with visual tasks, such as reading, visual search, scene memory (Võ & Henderson, 2009), and scene gist recognition, whereas masking peripheral vision only interferes with visual tasks when it occurs about 70–100 ms into fixation (Glaholt, Rayner, & Reingold, 2012; Larson et al., 2014). This points to the hypothesis that the type of attention that is operative during a single eye fixation is zoom-out attention, that is, attention that is initially focused in the center of the visual field but then expands diffusely outward into the visual periphery within the first 100 ms of viewing (Figure 5.5). Zoom-out attention is thus a form of (covert) spatial attention. Once a scene template has been activated, other types of attention determine where we allocate our resources to object recognition. In a visual search task, voluntary attention guides the movement of our eye fixation. However, zoom-out attention is operative during each eye fixation. In the absence of a perceptual task (e.g., visual search), our attentional resources are preferentially allocated to the identification of objects within our peripersonal space – that is, the region immediately surrounding the

Figure 5.5 During a single eye fixation, attention is initially focused in the center of the visual field but then expands diffusely outward into the visual periphery Source: From Larson et al. (2014)

126  Berit Brogaard and Thomas Alrik Sørensen perceiver, typically within an arm’s reach (for a review, see Castelhano & Krzyś, 2020). As a result, objects within peripersonal space are identified more accurately (Fernandes & Castelhano, 2019; Josephs & Konkle, 2019; Man, Krzys, & Castelhano, 2019). This prioritization of information closer to the perceiver is also known as the “foreground bias.” As we will explore below, other types of attention can be central to object recognition at the stage of hypothesis testing, including attentional capture, ­affect-biased, and cued attention. However, the idea that attention facilitates object recognition presents a further problem for the predictive account. Before spelling out the gist of this problem, let us first have a closer look at how the predictive account handles attention. The original proposal, due to Friston, is that attention is the optimization of the expected precision of incoming signals (Feldman & Friston, 2010; Friston, 2009; see also Hohwy, 2013, p. 195): The Predictive Account of Attention To attend to a stimulus just is to turn up the gain on expected highprecision signals while turning down the gain on other signals. According to PP, to turn up the gain on an expected high precision signal is to enhance the precision of a signal already expected to be highly precise. A prediction error signal with an enhanced gain is given greater weight in the revision of hypotheses about the visual scene. So, PP holds that attention allows a prediction error signal to play a weightier role in hypothesis revision. However, thus formulated, the PP’s account of attention seems to face problems similar to those that fueled the classical debate between early (Broadbent, 1958) and late selection (Deutsch & Deutsch, 1963) in attention, especially to unexpected externally driven salient events like the eruption of a sudden fire or a loud noise in a quiet café. Here, a chicken and egg problem arises: how can the system reliably expect a high precision signal before it knows which object it is processing? One theory that provides a highly effective solution to this problem is the theory of visual attention (Bundesen, 1990). This theory proposes that we change the way we think of the relationship between memory subsystems, so that rather than positing that information flows from short-term memory into longterm memory; this theory takes information from the environment to be matched to mental categories (or templates) in long-term memory (Bundesen & Habekost, 2008). The best match (which could be guided by PP principles)4 then competes in a stochastic race for active representation in a limited capacity short-term memory, or working memory, store. That is, incoming sensory information is compared to long-term memory and

Predictive Processing and Object Recognition 127 then attention prioritizes the most relevant categorizations for encoding in short-term memory. Thereby avoiding the problem of how selection can occur before objects have been identified (Dall, Watanabe, & Sørensen, 2016; Brogaard, & Sørensen, 2023). The details of the predictive account differ for different types of attention, specifically for attentional capture, form of exogenous attention, and cued spatial and voluntary spatial attention, forms of endogenous attention. To a first approximation, we can say that attention is exogenous when it is automatically drawn toward a target (e.g., a spatial region, object, or attribute), whereas attention is endogenous when attention is directed toward a target by an internal state (e.g., volition, expectation, memory, or emotion), sometimes as a result of processing a cue that aids the perceiver in directing her attention to the target. In attentional capture, the best known form of exogenous attention, attention is grabbed by a stimulus that stands out in some way from its surroundings, such as a scream in a relatively quiet coffee shop, or a red dot in an array of black dots.5 Attentional capture is thought to be an early visual phenomenon, as perceptual features must be processed early enough in the visual system for them to attract attention and lead to segregation (Beck, 1966; Treisman, 1982). A stimulus captures attention when the associated signal is strong compared to other incoming signals. But now, advocates of PP argue, a signal can reasonably be assumed to be strong due to a higher signal-to-noise ratio compared to weaker incoming signals. Granting this assumption, strong signals are more precise than weaker incoming signals. So, according to PP, the visual system increases the gain on attention-capturing signals. As PP holds that high precision signals have a substantially greater influence on the revision of hypotheses than noisy signals, signals associated with attention-capture thus lead to a significant revision of the brain’s existing hypothesis about its surroundings. Next, let’s look at cued spatial attention, one of the most studied forms of endogenous attention. A classic paradigm for studying the effect of cued spatial attention on the speed of detecting a target object is the Posner paradigm (Posner, 1980). Here, volunteers fixate on a central fixation cross. Then a cue appears that points in the direction of the target in 80 percent of trials (valid cue) and in the opposite direction of the target in the remaining 20 percent of trials (invalid cue) (Figure 5.6). The expected finding in this paradigm is that target stimuli are detected faster when a valid cue directs our attention to the target’s spatial location. On the predictive account, the appearance of a cue generates a hypothesis that a target will appear in the cued spatial region, which increases the gain for a high precision signal associated with the target in that region, facilitating detection. If the task involves spotting a particular target (e.g.,

128  Berit Brogaard and Thomas Alrik Sørensen

Figure 5.6 Posner paradigm. Valid cues (left) direct attention to the target’s spatial location, which allows for faster detection of the target stimulus Source: Adapted from Posner (1980)

the emotional valence of a smiley, the letter identity, or a Gabor pattern), then cued spatial attention facilitates spotting just those targets. Voluntary spatial attention, another form of endogenous attention, is not directed by environmental cues (e.g., a pointing finger, a gaze, or an arrow) but rather by the perceiver’s internal states (i.e., a desire/belief pair, an intention, or a volition). Hohwy proposes to treat voluntary spatial attention as a kind of action used to test a perceptual hypothesis (Hohwy, 2013, pp. 77–78). The general idea here is that we often try to figure out what the world is like by actively engaging in it, for instance, by walking closer to a target object or inspecting it from different perspectives. Say you expect a wooden construction in front of you to be a real barn but want to rule out that it’s a realistic barn facade used in a movie set. You can test your “real barn” hypothesis by making a prediction about what the sensory signal would be like, if your hypothesis were true. For example, unlike a barn facade, a real barn will keep looking like a real barn if you walk around it. So, to test your hypothesis, you can walk around the construction. By doing that, you are sampling more data, thus increasing the “power” of your study. If your “real barn” hypothesis predicts the stimulus well, your walk around the barn will bring about the predicted sensory signal, where the predicted sensory signal here is the wooden construction still looks like a real barn. In exploratory perception, then, the prediction error, or mismatch between the hypothesis and your initial sensory signal, is not minimized by revising the hypothesis but rather by acting to bring yourself into a situation where your hypothesis will

Predictive Processing and Object Recognition 129 match the sensory signal well, if the hypothesis predicts the stimulus well. Hohwy applies this idea of acting to sample additional data to voluntary spatial attention. Voluntary spatial attention, he argues, involves acting for the purpose of testing your hypothesis (Hohwy, 2013, pp. 197–198). But in voluntary attention, the action is not intentional bodily movement, but a mental act, a preparedness, to increase the sensory gain on a high precision signal if one appears in the sampled region of space. If, for example, your hypothesis is that there is a mouse on the kitchen floor, your attention to the kitchen floor increases the sensory gain for a high precision “mouse” signal in that region, resulting in faster detection if a mouse shows up on the floor. Ransom et al. (2017) have recently argued that the predictive approach fails to offer a complete account of voluntary attention. Their test case comes from Ulric Neisser and Robert Becklen’s 1975 classical study of “selective looking.” Neisser and Becklen used a system of half-silvered mirrors to present two overlapping films of equal quality to participants, appearing in the same segment of their visual field. One depicted actors playing a hand clapping game, filmed from up close, whereas the other depicted actors playing a ball game, filmed from a distance (Figure 5.7). Neisser and Becklen found that the volunteers were perfectly capable of attending to either film, while ignoring the other, and switching their attention from one to the other, provided that both films were presented to both eyes. Attending both films at once, however, proved to be “demanding” or “impossible.” They conclude on the basis of their findings that it is not the distance or clarity of a visual stimulus that enables us to selectively attend to it, nor a “filter” or “gate” created on the spot, but rather the stimulus’ intrinsic properties and structure. The type of attention exemplified in the study is voluntary attention, as evidenced by the fact that the volunteers

Figure 5.7  Neisser & Becklen’s (1975) “Selective Looking” Experiment. Two overlapping films were presented to volunteers, as shown in pane C. One film depicted A hand clapping game, as shown in pane A, and the other depicted A ball game, as shown in pane B Source: From Ransom et al. (2017)

130  Berit Brogaard and Thomas Alrik Sørensen could choose to attend to either one of the overlapping films and were able to easily switch from one to the other. According to Ransom et al. (2017), the kind of voluntary attention exemplified in Neisser and Becklen’s study presents a problem for the predictive model. The kind of voluntary attention that determines whether a subject is attending to the hand clapping game or the ball game is voluntary object attention, where the object is the depicted game. But advocates of the predictive account do not say how we are to understand voluntary object attention. Hohwy (2013), for example, only explains how PP would deal with voluntary spatial attention for enhanced target detection. Ransom et al. (2017) thus consider and ultimately reject several ways that advocates of PP could accommodate voluntary target attention. The gist of their argument, however, is this: the participants in Neisser and Becklen’s experiment can voluntarily attend to either one of the two overlapping stimuli. But the stimuli differ only in intrinsic features; the signals associated with the stimuli do not differ in context-dependent precision. For example, it is not the case that the hand clapping game is presented in foggy conditions and that the associated sensory signal therefore is less precise; and there is no reason to think the volunteers expect otherwise. In response to Ransom et al. (2017), it may be argued that rather than being a perceiver-independent matter, the participants’ expectations regarding the precision of the sensory signals on which stimulus they choose to attend to. This objection, however, is misguided. If the expected precision of the sensory signal depends on what the volunteers voluntarily attend to, then – on pain of circularity – PP cannot account for voluntary attention in terms of expected precision. Clark (2017) nonetheless insists on something like this response to Ransom et al. (2017). According to Clark, the participants’ voluntary switch in attention should be cashed out in terms of their desires, which in turn are to be understood in terms of their predictions about what they will do. As he puts it, [D]esires are simply beliefs/predictions that thus guide inference and action (see Friston et al., 2011, p. 157). My desire to drink a glass of water now is cast as a prediction that I am drinking a glass of water now – a prediction that will yield streams of error signals that may be resolved by bringing the drinking about, thus making the world conform to my prediction. Desires are here re-cast as predictions apt to be made true by action. Thus consider the prediction (based on some standing or newly emerging belief) that I will now experience, say, the hand-clapping film. This would enslave action, including the ‘mental action’ of altering the precision-weighting on hand-clappy stuff. In this way

Predictive Processing and Object Recognition 131 desires and motivations are revealed as beliefs that enslave action. The apparently non-indicative nature of a thought such as ‘let’s have a look at the hand-clap film’ is now no barrier. For the real content of the thought, as far as the PEM mechanism is concerned, is indicative – it is something like “I am looking at the hand-clap film now.” (Clark, 2017, p. 117) However, Clark’s fix simply re-introduces the worry of circularity. Clark’s suggestion boils down to this: a participant S attends to a stimulus (the ball game, say) just in case S expects the associated sensory signal to be a high precision signal, but the associated sensory signal is a high precision signal just in case S predicts that S attends to the stimulus. So, S attends to the ball game stimulus just in case S predicts that S attends to the ball game stimulus. The predictive account of attention thus presupposes an account of attention. In a more recent paper, Ransom et al. (2020) argue that PP also fails to account for affect-biased attention, that is, attention to stimuli that are emotionally, or affectively, salient as a result of their associations with reward or punishment. Their main example of affect-biased attention runs as follows: Suppose you walk your dog uneventfully every day past a house on the corner of your block. One morning, however, a large Doberman rushes to the fence, barking and snapping. You jump backwards and for a moment you fear for your life. From this day forward, you give this house a bit of extra attention when you walk past, your eyes always searching the fence for signs of the Doberman, though it is seldom in fact in the yard. (Ransom et al., 2020, p. 1) Your increased attention to the yard subsequent to your initial encounter with the Doberman is not obviously stimulus-driven, or exogenous. The yard presumably triggers a flashback, an affect-laden memory, which then causes you to pay closer attention to the yard. Affect-biased attention is thus a kind of endogenous attention. Ransom et al. (2020) argue that affect-biased attention cannot be understood in terms of expectations of a high precision signal, as PP suggests. In the envisaged case, you eventually learn that the Doberman rarely is in the yard. So, your attention to the yard cannot be explained by your expectation that the Doberman will be causing a high precision sensory signal. Rather, it seems that your affect-laden memory of the aggressive dog plays a part in explaining your attention to the yard. Ransom et al. (2020) suggest that it’s your desire to shun the yard-associated punishment that drives your attention to the yard, where the yard-associated punishment is the

132  Berit Brogaard and Thomas Alrik Sørensen Doberman lunging at you. The predictive account thus yields the wrong result here, viz. that you attend to the yard because you expect a high precision “Doberman” signal there. Ransom et al.’s (2020) objection points to a more general problem with PP: suppose you step out of the car and jump to the side because it looks like there is a snake under the car. Upon further scrutiny, however, the snake-shaped object is just a stick. Here, the mistaken categorization occurs because if the stick had been a snake, this would be a potentially dangerous situation. This is so, in spite of the fact that you have encountered far more sticks than snakes and therefore should have a higher prior for categorizing the stimulus as a stick. The problems voluntary object attention and affect-biases attention present for the predictive account are augmented by the role that these types of attention may play in object recognition. While a scene hypothesis is activated within the duration of a single eye fixation, which information is processed bottom-up depends on which object is attended. In a visual search of a complex scene, voluntary attention partially guides eye movements, whereas zoom-out attention covertly scans a spatial region from the visual center to the periphery. The visual search results in the selection of an object. If the scene contains a visually salient object (e.g., a colored object in a black-and-white scene), attentional capture results in a selection of the object. Cued and affect-biased attention could also be what drives the selection of an object. Unless an object is selected, however, the object will appear as a diffusely attended blob with features that are insufficient for confident identification. Attention is thus a precondition for object recognition. Yet, as we have already seen, the predictive account fails to provide a satisfactory account of at least two forms of attention that may assist in the selection of an object, viz. voluntary object attention and affect-biased attention (Ransom et al., 2017, 2020). The predictive account also lacks the resources to account for the zoomout attention that occurs during scene gist recognition. Recall that over the course of an eye fixation, which takes around 100 ms, attention is initially highly focused at the center of the visual field and then covertly diffuses from the center to the periphery (Glaholt et al., 2012; Larson et al., 2014). This zoom-out attention also operates during each eye fixations in saccades and visual searches of complex scenes at later stages in the process of object recognition. The predictive account construes attention as turning up the gain on an expected high precision signal, but although expectations modulate attention (Sørensen et al., 2015; Vangkilde, Coull, & Bundesen, 2012), scene gist extraction seems to occur during a single eye fixation and depends on zoom-out attention (Larson et al., 2014). Nevertheless, in zoom-out attention, covert attention is diffusely distributed around the visual center for about 50–75 ms of eye fixation, and it then diffusely expands outward

Predictive Processing and Object Recognition 133 during the subsequent 25–50 ms of fixation. During initial zoom-out attention, the visual system expects a low-precision LSF signal extracted from the scene, not a high-prediction signal. As the system does not expect a high precision signal, it should not turn up the gain on the signal, and so, PP is unable to accommodate zoom-out attention; PP thus lacks the resources to account for object recognition on multiple levels. 5.6  The Dark Room Problem The main problem with PP, as we have seen, lies in its excessive emphasis on top-down predictions. However, you can have too much of a good thing. So, perhaps an easy fix is to tone down the emphasis on top-down predictions. One option is a proposal that replaces prediction error minimization with the ratio of top-down processes to bottom-up processes as the overarching unifying principle (TD:BU ratio) (Herz et al., 2020). Top-down predictions guide the bottom-up signals by enhancing expected signals and inhibiting unexpected signals, whereas bottom-up processes are free of top-down guidance (or disturbances). The ultimate perceptual output is shaped by the TD:BU ratio. As argued by Herz et al. (2020), different states of minds lead to different TD:BU ratios, including different moods, attentional scopes, and thinking styles. A broader thinking style, for example, entails a lower TD:BU ratio. Reduced guidance and inhibition by top-down processes enables increased associative activation and non-linear thought processes. Narrower thinking, by contrast, is linked to a higher TD:BU ratio. The increased top-down processing helps prevent distraction by competing thoughts, inhibits free associative activation, and thus results in a narrower and more ruminative style of thinking (e.g., Smith & Alloy, 2009).6 The state of mind framework may help address the so-called dark room problem for PP (Klein, 2018; Mumford, 1992; Sun & Firestone, 2020). The gist of the dark room objection is that if PP is right that the principle of prediction error minimization lies at the heart of all of our mental processes, then we should be biologically driven to stay inside a dark room. The dark room that drives the objection is not just pitch-dark, but also quiet, non-smelly, unfelt, and so on.7 So, inside the dark room, no sensory inputs from the environment reach us. With no sensory inputs entering our brain, there cannot be a mismatch, or prediction error, between sensory inputs and our predictions about our environment. As long as we stay inside the dark room, no evidence could ever cause us to update our predictions. No matter how tame or wild we predict that the dark room really is, our predictions go unchallenged. So, staying in the dark room seems to be a much more effective way of minimizing prediction errors than entering the outside chaotic world (however, see also Van de Cruys, Friston, & Clark,

134  Berit Brogaard and Thomas Alrik Sørensen 2020). But, as a matter of fact, we do not stay in a dark room, in fact most seem motivated to gather in “noisy” cities. So, taking the prediction error minimization principle to be fundamental to our brain and mind, as PP does, seems misguided. This is not the place to consider PP’s replies to the dark room problem.8 It should, however, be clear that if we take the TD:BU ratio to explain the mind, then the dark room problem no longer rears its shady head. Rather, if the mind operates in different modes, involving different TD:BU ratios, then staying in a dark room should be appealing to us only when we are in a hyper-narrow state of mind entailing a sky-high TD:BU ratio – the TD:BU ratio characteristic of people in depressive states characterized by complete apathy. The states of mind proposal can be seen as an augmentation of PP, but it should be emphasized that accepting the states of mind proposal entails denying some of the central claims made by PP, for instance, that prediction error signals are the only signals processed bottom-up and that prediction error minimization is the overarching principle explaining all our mental processes. Notes 1 Predictive processing is far from the only Bayesian approach to the brain and cognition. For a review of Bayesian approaches, see, e.g.,Talbott (2016), Spratling (2017). 2 While perceptual states are the product of unconscious, subpersonal inferences, mental states themselves, including judgment and desire, are personallevel states (Clark, 2020; Wiese & Metzinger, 2017). 3 For the original demonstration, see KSU, Vision Cognition Laboratory, https:// www.k-state.edu/psych/vcl/images/beach%20loop.gif, retrieved Oct 31, 2018. 4 TVA assumes that sensory evidence matched with templates in long-term memory provides the initial basis for stimulus encoding that is modulated by additional top-down mechanisms of pertinence and bias. However, the specific mechanism in this template matching procedure is not entirely clear, and we propose that PP could in fact be that exact mechanism. Thus, PP may be a key mechanism in perception, but not an exclusive unified mechanism in perceptual processing. 5 It may be argued that attentional capture is the only form of exogenous attention. However, here we leave room for other forms of exogenous attention, for example, stimulus-driven diffuse attention and what Azenet Lopez (2020, ch. 4) calls “spillover attention.” According to Lopez, spillover attention is attentional allocation to a vicarious or secondary target, such as the bearer of a feature in the case of feature attention (for the most recent version of her view, see Lopez, 2022). The most intuitive cases of spillover attention are instances of endogenous attention. But given that attentional capture entails attentional selection, presumably spillover attention could be exogenous as well. 6 For a review of the relationship between rumination and depression, see, e.g., Thomsen (2006).

Predictive Processing and Object Recognition 135 7 On Mumford’s (1992) variation on the dark room problem, you are to envisage a place, “like the oriental Nirvana … when nothing surprises you and new stimuli cause the merest ripple in your consciousness” (1992, p. 247, fn. 5). Here, there are sensory inputs, but they don’t move you the least bit. This version of the dark room is more akin to a depressive state characterized by complete apathy. 8 Sun and Firestone (2000) argue that various intuitive responses to the dark room problem ultimately fail. For example, it’s highly predictable that we will get hungry in the dark room, but as Klein (2018) notes “predicting hunger is not the same as being motivated by it.” Sun & Firestone acknowledge that Friston’s (2013) reply succeeds in solving the problem but only by introducing a new one. Friston argues that the dark room problem rests on the mistaken assumption that the dark room is not surprising. As he puts it, “the state of a room being dark is surprising, because we do not expect to occupy dark rooms.” The problem with this reply, Sun & Firestone argue, is that it makes PP trivially true, as no behavior can count as evidence against the view: “Why do we dance? Because we predict we won’t stay still. Why do we donate to charity? Because we predict we will do good deeds. Why do we seek others? Because the brain has a prior which says ‘brains don’t like to be alone’” (pp. 347–348). See also Kwisthout et al. (2017), who present an interesting variation on the dark room problem based on the idea that coarse-grained predictions are more likely to minimize prediction error than fine-grained predictions.

References Aminoff, E. M., Kveraga, K., & Bar, M. (2013). The role of the parahippocampal cortex in cognition. Trends in Cognitive Sciences, 17, 379–390. Auckland, M. E., Cave, K. R., & Donnelly, N. (2007). Non-target objects can influence perceptual processes during object recognition Psychonomic Bulletin Review, 14, 332–337. Bar, M. (2003). A cortical mechanism for triggering top–down facilitation in visual object recognition. Journal of Cognitive Neuroscience, 15, 600–609. Bar, M. (2004). Visual objects in context. Natural Reviews Neuroscience, 5, 617–629. Bar, M. (2009). The proactive brain: Memory for predictions. Philosophical Transactions of the Royal Society B, 364, 1235–1243. Bar, M., Kassam, K. S., Ghuman, A. S., Boshyan, J., Schmidt, A. M., Dale, A. M., Hamalainen, M. S., Marinkovic, K., Schacter, D. L., Rosen, B. R., & Halgren, E. (2006). Top-down facilitation of visual recognition. Proceedings of the National. Academy of Sciences U.S.A, 103, 449–454. Beck, J. (1966). Effect of orientation and of shape similarity on perceptual grouping. Perception & Psychophysics, 1, 300–302. Berry, M. J., & Schwartz, G. (2011). The retina as embodying predictions about the visual world. In Bar, M. (ed.), Predictions in the brain: Using our past to generate a future (pp 295–308). Oxford: Oxford University Press. Biederman, I., Mezzanotte, R. J., & Rabinowitz, J. C. (1982). Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 14(2), 143–177.

136  Berit Brogaard and Thomas Alrik Sørensen Broadbent, D. (1958). Perception and communication. London: Pergamon Press. Brogaard, B., & Sørensen, T. A. (2023). Perceptual variation in object perception: A defence of perceptual pluralism. In A. Mroczko-Wąsowicz, & R. Grush (eds.), Sensory Individuals: Unimodal and Multimodal Perspectives (pp 113–129). Oxford: Oxford University Press. Bundesen, C., & Habekost, T. (2008). Broadbent, D. (1958). Perception and communication. London: Pergamon Press. Castelhano, M. S., & Krzyś, K. (2020). Rethinking space: A review of perception, attention, and memory in scene processing. Annual Review of Vision Science, 6(1), 563–586. Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181–204. Clark, A. (2015). Embodied prediction, https://uberty.org/wp-content/uploads/2017/ 06/Embodied-Prediction.pdf Clark, A. (2016). Surfing uncertainty: Prediction, action, and the embodied mind. New York: Oxford University Press. Clark, A. (2017). Predictions, precision, and agentive attention. Consciousness and Cognition, 56, 115–119. Clark, A. (2020). Beyond desire? Agency, choice, and the predictive mind. Australasian Journal of Philosophy, 98(1), 1–15. Dall, J. O., Wang, Y., Cai, X., Chan, R. C., & Sørensen, T. A. (2021). Visual shortterm memory and attention: An investigation of familiarity and stroke count in Chinese characters. Journal of Experimental Psychology: Learning, Memory, and Cognition, 47(2), 282–294. https://doi.org/10.1037/xlm0000950 Dall, J. O., Watanabe, K., & Sørensen, T. A. (2016, February). Category specific knowledge modulates capacity limitations of visual short-term memory. In 2016 8th international conference on knowledge and smart technology (KST). IEEE, 275–280. Davenport, J. L., & Potter, M. C. (2004). Scene consistency in object and background perception. Psychological Science, 15(8), 559–564. Deutsch, J. A., & Deutsch, D. (1963). Attention: Some theoretical considerations. Psychological Review, 70(1), 80–90. Feldman, H., & Friston, K. (2010). Attention, uncertainty, and free-energy. Frontiers in Human Neuroscience, 4, 215. Fernandes, S., & Castelhano, M. S. (2019). The foreground bias: Initial scene representations across the depth plane. PsyArXiv. https://doi.org/10.31234/OSF. IO/S32WZ Fiser, J., & Aslin, R. N. (2001). Unsupervised statistical learning of higher-order spatial structures from visual scenes. Psychological Science, 12, 499–504. Fiser, J., & Aslin, R. N. (2005). Encoding multi-element scenes: Statistical learning of visual feature hierarchies. Journal of Experimental Psychology General, 134, 521–537. Friedman, A. (1979). Framing pictures: The role of knowledge in automatized encoding and memory of gist. Journal of Experimental Psychology General, 108, 316–355. Friston, K. J. (2003). Learning and inference in the brain. Neural Network, 16(9), 1325–1352.

Predictive Processing and Object Recognition 137 Friston, K. J. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 360, 815–836. Friston, K. J. (2009). The free-energy principle: A rough guide to the brain? Trends Cognitive Science, 13(7), 293–301. Friston, K. J. (2010). The free-energy principle: A unified brain theory? Natural Reviews Neuroscience, 11(2), 127–138. Friston, K. J. (2013). Active inference and free energy. Behavioral and Brain Science, 36, 212–213. Glaholt, M. G., Rayner, K., & Reingold, E. M. (2012). The mask-onset delay paradigm and the availability of central and peripheral visual information during scene viewing. Journal of Vision, 12(1), 9. Gordon, R. D. (2004). Attentional allocation during the perception of scenes. Journal of Experimental Psychology Human Perception and Performance, 30, 760–777. Green, C., & Hummel, J. E. (2006). Familiar interacting object pairs are perceptually grouped. Journal of Experimental Psychology Human Perception and Performance, 32, 1107–1119. Herz, N., Baror, S., & Bar, M. (2020). Overarching states of mind. Trends in Cognitive Sciences, 24, 184–199. Hock, H. S., Gordon, G. P., & Whitehurst, R. (1974) Contextual relations: The influence of familiarity, physical plausibility, and belongingness. Perception and Psychophysics, 16, 4–8. Hoffman, J. (1996). Visual object recognition. In W. Prinz, & B. Bridgeman (Eds.), Handbook of perception and action (Vol. 1, pp. 297–344). New York: Academic Press. Hohwy, J. (2012). Attention and conscious perception in the hypothesis testing brain. Frontiers in Psychology, 3(96), https://doi.org/10.3389/fpsyg.2012.00096. Hohwy, J. (2013). The predictive mind. Oxford: Oxford University Press. Hohwy, J. (2020). New directions in predictive processing. Mind & Language, 35(2), 209–223. Hollingworth, A., & Henderson, J. M. (1998). Does consistent scene context facilitate object detection. Journal of Experimental Psychology General, 127, 398–415. Josephs, E. L., & Konkle, T. (2019). Perceptual dissociations among views of objects, scenes, and reachable spaces. Journal of Experimental Psychology Human Perception and Performance, 45(6), 715–28. Klein, C. (2018). What do predictive coders want? Synthese, 195, 2541–2557. Kondo, H. M., van Loon, A. M., Kawahara, J.-I., & Moore, M. C. J. (2017). Auditory and visual scene analysis: An overview. Philosophical Transactions of the Royal Society B Biological Science, 372(1714), 20160099. Kveraga, K., Boshyan, J., & Bar, M. (2007). Magnocellular projections as the trigger of top-down facilitation in recognition. Journal of Neuroscience, 27, 13232–13240. Kwisthout, J., Bekkering, H., & Rooij, I. (2017). To be precise, the details don’t matter: On predictive processing, precision, and level of detail of predictions. Brain and Cognition, 112, 84–91. Larson, A. M., Freeman, T. E., Ringer, R. V., & Loschky, L. C. (2014). The spatiotemporal dynamics of scene gist recognition. Journal of Experimental Psychology: Human Perception and Performance, 40(2), 471–487.

138  Berit Brogaard and Thomas Alrik Sørensen Lopez, A. (2020). Information gating and the structure of consciousness, Doctoral dissertation at the University of Miami. Lopez, A. (2022). Vicarious attention, degrees of enhancement and the contents of consciousness. Philosophy and the Mind Sciences, 3(1). https://doi. org/10.33735/phimisci.2022.9194 Man, L., Krzys, K., & Castelhano, M. (2019). The foreground bias: Differing impacts across depth on visual search in scenes. PsyArXiv. https://doi.org/10.31234/ OSF.IO/W6J4A Meyers, L. S., & Rhoades, R. W. (1978). Visual search of common scenes. Quarterly Journal of Experimental Psychology, 30, 297–310. Mumford, D. (1992). On the computational architecture of the neocortex. Biological Cybernetics, 66(3), 241–251. Neisser, U., & Becklen, R. (1975). Selective looking: Attending to visually specified events. Cognitive Psychology, 7(4), 480–494. Oliva, A., & Torralba, A. (2007). The role of context in object recognition. Trends in Cognitive Sciences, 11(12), 520–527. Palmer, S. E. (1975). The effects of contextual scenes on the identification of objects. Memory and Cognition, 3, 519–526. Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32, 3. Ransom, M., Fazelpour, S., & Mole, C. (2017). Attention in the predictive mind. Consciousness and Cognition, 47, 99–112. Ransom, M., Fazelpour, S., Markovic, J., Kryklywy, J., Thompson, E. T., & Todd, R. M.. (2020). Affect-biased attention and predictive processing, Cognition, 203, https://doi.org/10.1016/j.cognition.2020.104370. Schyns, P. G., & Oliva, A. (1994). From blobs to boundary edges: Evidence for timeand spatial-dependent scene recognition. Psychological Science, 5, 195–200. Spratling, M. W. (2017). A review of predictive coding algorithms. Brain and Cognition, 112, 92–97. Smith, J. M., & Alloy, L. B. (2009). A roadmap to rumination: A review of the definition, assessment, and conceptualization of this multifaceted construct. Clinical Psychology Review, 29(2), 116–128. Sun, Z., & Firestone, C. (2020). The dark room problem. Trends in Cognitive Sciences, 24(5), 346–348. Sørensen, T. A., Vangkilde, S., & Bundesen, C. (2015). Components of attention modulated by temporal expectation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41(1), 178–192. Talbott, W. (2016). Bayesian epistemology, In Edward N. Zalta (ed.) (2016) The stanford encyclopedia of philosophy. https://plato.stanford.edu/archives/ win2016/entries/epistemology-bayesian/ Thomsen, D. K. (2006). The association between rumination and negative affect: A review. Cognition and Emotion, 20(8), 1216–1235. Torralba, A., Oliva, A., Castelhano, M., & Henderson, J. (2006). Contextual guidance of attention in natural scenes: The role of global features on object search. Psychological Review, 113, 766–786. Trapp, S., & Bar, M. (2015). Prediction, context and competition in visual recognition. Annals of the New York Academy of Sciences, 1339, 190–198.

Predictive Processing and Object Recognition 139 Treisman, A. (1982). Perceptual grouping and attention in visual search for features and for objects. Journal of Experimental Psychology: Human Perception and Performance, 8(2), 194–214. Van de Cruys, S., Friston, K., & Clark, A. (2020). Controlled optimism: Reply to sun and firestone on the dark room problem. Trends in Cognitive Sciences, 24(9), 1–2. Vance, J. (2021). Precision and perceptual clarity. Australasian Journal of Philosophy, 99(2), 379–395. Vangkilde, S., Coull, J. T., & Bundesen, C. (2012). Great expectations: Temporal expectation modulates perceptual processing speed. Journal of Experimental Psychology: Human Perception and Performance, 38(5), 1183–1191. Võ, M. L. H., & Henderson, J. M. (2009). Does gravity matter? Effects of semantic and syntactic inconsistencies on the allocation of attention during scene perception. Journal of Vision, 9(3), 24–24. Wiese, W., & Metzinger, T. (2017). Vanilla PP for philosophers: A primer on predictive processing. In philosophy and predictive processing. Frankfurt am Main: MIND Group.

6

Predicting First-Person and Counterfactual Experiences of Selfhood Insights from Anosognosia Aikaterini Fotopoulou and Sahba Besharati

6.1 Introduction In this chapter, we build upon empirical findings on the neurological syndrome of anosognosia for hemiplegia (lack of awareness into one’s paralysis; AHP) and recent neuroscientific theories of self-awareness (Fotopoulou, 2014, 2015; Friston, 2018) to propose that the experience of ones’ self entails at least two normally integrated levels of inference, namely, inferences about the here-and-now of experience in the first person and counterfactual inferences about the self beyond first-person e­xperience. The relation between the two is understood as dynamic, in the sense that the contextual salience of different signals determines the degree to which prediction errors in first-person experiences will be explained away by more objectified predictions about the self, or will allow the updating of the latter. To unpack and support these claims, we will present and combine clinical ­experiences of anosognosia with empirical findings from our own and other labs. Our main thesis is that anosognosia is best explained as a disconnection between how one expects the body to feel in a firstperson ­perspective (emotionally mine and under my control) and how one perceives the body counterfactually. This difficulty in integrating current sensations and emotions from the body with their beliefs about the body corresponds to difficulties in the most abstract, metacognitive (allocentric and prospective) aspects of body awareness (Besharati et al., 2016, 2022; Kirsch et al., 2020). Thus, self-awareness in anosognosia is subject to the influence of non-updated premorbid beliefs and emotions about the self (Fotopoulou, 2012a–2012c, 2014, 2015). Normally, these facets of selfawareness are integrated and experienced by individuals as unified. For example, people do not habitually doubt in front of the mirror that the potential stomach aches they experience corresponds to the same person who is located in front of the mirror and reflected there and is thus also visible to others and existing for others. The dissociation observed in anosognosia allows us a rare glimpse of the normally unconscious processes DOI: 10.4324/9781003084082-8

First-Person and Counterfactual Experiences of Selfhood  141 of integration and inference that underlie self-experience in everyday life. Furthermore, while we do not make general claims about the issue of whether delusions are beliefs or not in this present chapter, we do regard certain anosognosic delusions as beliefs. Such anosognosic beliefs are plausible (e.g. I can walk); many patients act upon them (i.e. they try to walk and fall repeatedly) and ground them in other beliefs about themselves (e.g. don’t you think that if I could not move my own arm, I would know about it?). These three criteria (groundlessness, implausibility and failure to act upon) have been put forward as key differences between ordinary beliefs and delusions (Hamilton, 2006) and may well apply to other psychiatric or neurological delusions, but they do not apply in anosognosic beliefs (also see Fotopoulou, 2010 for discussion on the relation of delusions to neurological false memories, i.e. confabulations). 6.1.1  The 100-Year-Old Question of Anosognosia

In 1914, Joseph Babinski introduced the neologism anosognosia (from the Greek, α = without, νόσος = disease, γνώσις = knowledge) to describe how some patients with left hemiplegia following a stroke were “unaware of or seem to be unaware of the existence of the paralysis which affects them” (Babinski, 1914, see translation by Langer & Levine, 2014, pp. 5–8). In the years since Babinski, anosognosia has since been the focus of much medical and scientific research (see Jenkinson & Fotopoulou, 2014 for an edited collection marking the 100th anniversary of the syndrome’s description). The term itself is now applied to other instances of unawareness of one’s illness or deficits (e.g., in dementia or schizophrenia), but in this chapter, we restrict ourselves to its classic use in the context of right hemisphere damage and left-sided paralysis. Hundreds of anosognosic patients have now been described that indeed seem to have lost the ability to update their beliefs about their bodily state and particularly their paralysis. Their beliefs about their body (e.g., I have no weakness since the stroke) no longer correspond to what the rest of their environment observes about the patients’ bodies (e.g., their left side of the body is completely paralysed). Patients expect their paralysed arm and leg to be able to grasp their water glass, hug a friend and walk out of hospital and thus return to their habitual lives. Furthermore, what is most striking is that although they get many opportunities to test the veridicality of their beliefs against experience and contrary social feedback, they still do not update their beliefs. The left arm cannot pick up the glass, the hug is half and one cannot stand, let alone walk. Frequently, everyday tasks cannot be completed; objects fall on the floor and break. Sadly, anosognosic patients also have higher incidents of falls than other patients in stroke wards. Patients appear to forget these incidents, to minimise their importance and relevance or to

142  Aikaterini Fotopoulou and Sahba Besharati misattribute their causes to other people and events. In a subset of these patients, there are also concomitant body delusions (somatoparaphrenias, Gerstmann, 1942) affecting the sense of body ownership (the subjective feeling that our body is separate from the world and other bodies). Such patients may reject the ownership of one’s limb (asomatognosia), misattribute it to others or vice versa (somatoparaphrenia proper), claim they have three or more limbs (supernumerary limbs) or treat the limb as though it was a separate person (personification; Critchley, 1955). Behind this counterintuitive adherence to delusional beliefs about the motor abilities or the ownership of one’s body, lies a central question that scholars of the syndrome have asked for more than 100 years: Do these patients fail to learn from errors because they cannot, or will not let go of their beliefs (e.g., related motor intentions, predictions, motivations, wishes and hopes), or because they cannot observe that they have made an error (e.g., they have lost the ability to perceive or appreciate sensory, or other feedback). This question has been framed and debated in a number of binary ways in the past decades, for example as “psychogenic versus neurogenic,” “defense versus deficit,” “motivation versus cognition,” “top-down versus bottom-up,” “feedforward versus feedback,” “prediction versus prediction error” and “belief conservatism versus observational adequacy.” We outline two examples of this kind of binary explanatory thinking below, including also clinical examples of the anosognosic experience of one’s self. We then go on to propose a more dynamic way of conceptualising belief-updating about one’s self. 6.2  Denial versus Deficit A classic example of how the above central question of anosognosia has been framed is the example of the role of motivation and psychogenesis versus cognition and neurogenesis in anosognosia. Babinski and his contemporaries portray a profound dualism in their thinking. For example, Babinski wonders whether anosognosia is motivated by self-esteem, or whether it is real. In the same volume, M. Henry Meige wonders whether “Is it resignation, a wish to hide from himself or others a defect that afflicts him? It is possible, in certain cases; but in others one is faced with a true psychopathological problem” (see Langer & Levine, 2014, pp. 5–8). This absolute contrast between a psychological wish to be healthy and to deceive the self and others accordingly, and a neurological condition that deprives the person of knowledge into their abilities and it is somewhat considered more real than mental causation, is a contrast that many scholars and clinicians adhered to for most of the 20th century (Prigatano & Schacter, 1991). In fact, it is still common among some clinicians to insist that a distinction between psychogenic denial and neurobiological

First-Person and Counterfactual Experiences of Selfhood  143 anosognosia proper may be useful (Mograbi & Morris, 2018; Prigatano, 2014), even if they would not adhere to dualism more generally. However, in the past 15 years, a number of integrative perspectives have also emerged. These accounts stress the necessary combination of bottom-up and top-down factors, as well as cognitive and emotional factors (Davies, Davies, & Coltheart, 2005; Levine, 1990; Levine, Calvanio, & Rinn, 1991; Marcel, Tegnér, & Nimmo-Smith, 2004; Ramachandran, 1995; Vuilleumier, 2004). For example, considering anosognosia in the more general context of delusional beliefs, Davies et al. (2005) proposed that anosognosic beliefs maybe explained by a two-factor account used to explain other delusions; abnormal beliefs arise due to a first impairment in perception that prompts the abnormal belief and a second impairment that interferes with higher order, monitoring processes thus allowing the abnormal perceptions to become abnormal beliefs. The scope of these combination theories has clearly improved understanding on this syndrome. However, these theories have been for the most part additive. Like more general so-called biopsychosocial models, these views do not tell us much more than the fact that different factors need to be added together for a more comprehensive understanding of a phenomenon. Moreover, reflecting the modular epistemology of cognitive neuropsychology (Fotopoulou, 2014 for a critical review), these models treat the complexity of AHP as caused by simultaneous damage to functionally independent lesion sites. For example, Vocat, Staub, Stroppini, and Vuilleumier (2010) suggested that a combination of lesions to two or more brain areas within the insular, premotor, parietal and temporal cortex, or the white matter connections that link one or more of these areas with subcortical regions, may lead to different combinations of deficits in functions, such as proprioception, spatial neglect and error monitoring, which in turn lead to anosognosia in different patients. While such “combinations” of lesion sites and deficits are consistent with the multifaceted nature of the syndrome, what these accounts lack is a more precise account of the dynamic and hierarchical relation between relevant affected and unaffected areas and their functional role in body awareness. One clinical feature of anosognosia is particularly relevant to a dynamic understanding of anosognosia. Specifically, anosognosic patients can show implicit or tacit awareness of their motor deficits (Fotopoulou, Pernigo, Maeda, Rudd, & Kopelman, 2010; Nardone, Ward, Fotopoulou, & Turnbull, 2007), which in the cognitive literature is defined as “knowledge that is expressed in task performance unintentionally and with little or no phenomenal awareness” (Schacter, 1990, p. 157). Thus, while patients may explicitly deny their paralysis, they may be unconsciously processing some components of their deficits, including the emotional aspects. Indeed, in both examples, one is clinically (at face value) unsure to the

144  Aikaterini Fotopoulou and Sahba Besharati degree to which the patient knows about the disability but nevertheless deceives the examiner and maybe even themselves. Decades of clinical and more recently experimental work has established that these mixed messages conveyed by the patients, such as the patient who first moved the left arm with the help of the right arm and then winked at the examiner, or the first patient that claimed that the examiner is not paying attention (see examples below), are actually not the result of deliberate deception by the patient but rather indications of the fact that our self-awareness is not unitary; these patients may have paradoxically unconscious knowledge of their deficits, despite their explicit denial (Fotopoulou et al., 2010; Turnbull, Fotopoulou, & Solms, 2014). Below we present two, quite different ­between them, examples of what we think are indications of implicit awareness of one’s deficits. Both patients were female, the first in the subacute (three weeks) and the second in the chronic stage (two years) since right hemisphere stroke. More generally, all patients presented in this chapter are adult, right-handed patients who suffered a stroke in the region of the right middle cerebral artery and were recruited in various research studies over the past 15 years. The interviews and observations were conducted for research purposes. 6.2.1  Case Example 1: Implicit Awareness of Paralysis

The patient underwent a physiotherapy session observed by junior trainees during a ward round who occasionally asked questions about her efforts and challenges. At the end of the session, the senior psychotherapist asked the patient if she now wanted to ask the trainees any questions. The patient, of very high and preserved intelligence, previously the director of a youth charity, asked the trainees several questions about their career plans and a 5-minute long pleasant discussion unfolded. After the trainees said their goodbyes and were preparing to leave the room, the patient who had not acknowledged her paralysis at all during the whole session, called out to them and said: “Perhaps it would be useful to you to come and see me at a time when I’ll be really ill and unable to move.” 6.2.2  Case Example 2: Misplaced Loss

KF: Oh, I am sorry to see you upset about this memory [having a miscarriage while pregnant with thins about 30 years prior to this interview]. Patient: “It is funny, you know. My sister was here the other day and she was very surprised to see me upset about this. She said that back then everyone thought I did not mind, I just got on with things, maybe I was even relieved a little, given that they were twins and all and my relationship with David [the father] was a mess. But now, when I think about it,

First-Person and Counterfactual Experiences of Selfhood  145 oh the tears… the sorrow, I mean how can this happen to me? What did I do to deserve it? And yet back then… [patient falls silent].” Indeed, it is frequently this facet of AHP that has been commented upon by clinicians since the time of Babinski, as eliciting suspicion of deceit or empathy for self-deceit to listeners and readers of such indications (see also Kaplan-Solms & Solms, 2000). One can say that even though what these patients express is not in accordance with the facts (I am not ill now, but I may be in the future) and it appears out of its correct cognitive context (my sorrow is not for my present loss, it is for my old loss), it is emotionally on target (Kinsbourne, 2000). In recent decades, several case and group studies have now demonstrated that anosognosic patients may have implicit (unconscious) knowledge into their deficits that they cannot appreciate at an explicit, conscious level. For example, when patients are asked to perform an irrelevant cognitive task that includes neutral, emotionally negative and disability-related content, their performances to the cognitive task are particularly affected by the disability-related content, even though the content is unrelated to the task and patients themselves do not see its relevance to the self (Besharati, 2015; Fotopoulou et al., 2010; Nardone et al., 2007). This is typically interpreted as an unconscious interference effect of content on performance, even though there is no explicit awareness of the self-relevance of the content (see also Cocchini, Beschin, Fotopoulou, & Della Sala, 2010; Moro, Pernigo, Zapparoli, Cordioli, & Aglioti, 2011). By contrast, other patients may be explicitly aware of their symptoms per se but deny their emotional consequences (usually referred to as anosodiaphoria, Babinski, 1914), or practical (also referred to as activities of daily living in the clinical literature) and future significance (how it will affect life in the near future), or they may attribute their causes to other people (we will later call these faulty inferences about the causes of beliefs and experiences). Below, we present two further examples, again quite different between them but still alluding to the same difficulty in accepting the negative emotions associated with one’s disability as either important or self-related. The first patient is male and in the acute phase (one week) and the second in the chronic stage (two years) since right hemisphere stroke. 6.2.3  Case Example 3: Anosodiaphoria and Lack of Prospective Awareness

KF: Why are you in the hospital? Patient: I had a stroke, some days ago. KF: What kind of symptoms have you noticed since you came to the hospital? Patient: I have pain, here [points to left shoulder]. Ah, like that [touches it with the right hand], it hurts.

146  Aikaterini Fotopoulou and Sahba Besharati KF: Patient: KF: Patient: KF: Patient: KF: Patient:

KF: Patient:

I am sorry to hear that. Is there anything else that bothers you since the stroke? Nothing that worries me. Nothing that worries you. Do you feel weakness anywhere? No, I mean the left arm is sometimes weak but it does not worry me. The left arm. Can you move your left arm? Yes, sometimes if the right hurts, I use the left to assist it. You use this arm [point to the left, patients follows her pointing], the left arm, to assist this arm [points and touches the patient’s right arm], or I got it wrong? No that’s it. The left is not what it used to be but I can use it. It really helps when the right arm is tired, or something. [this is a reversal and minimisation of the true situation in which the left arm is completely paralysed] I see. So, you are not worried about losing the ability to move your left arm, even if you see some weakness. oh, I know it will be alright. And it really does not bother me. ….

KF: Can you please try and raise your left arm for me now? Patient: [Silence. The patient tries, lifts left shoulder, shows pain by expression] KF: Did this hurt? Patient: Yes. Here [points to shoulder]. KF: ok, let’s not try this again then but did you manage to raise the arm you think? Patient: No, not this time. Getting tired, now. But it is nothing that bother me. KF: I understand…. 6.2.4  Case Example 4: Misplaced Anger

KF: It sounds like you are a bit upset today. Is that right? Patient: Well, I am angry, or maybe disappointed. I mean how can they be so useless? How hard is it? My glasses were right here, on the left of the bed, and the card from my friend was next to it. Now, it is gone. Gone, forever, never to return, never to be fixed. They do not know what happened, they say. They just moved things to clean. They need to clean they say, well fine, but I need my letter. I mean, how am I supposed to function without my letters? And whose fault is it? Is it mine fault? No, what do I know? I

First-Person and Counterfactual Experiences of Selfhood  147 was asleep. They came in, moved things again and now the letter is gone. KF: I see, yes, you do sound very upset about it. Its sounds like an important letter? Patient: Well, it is my friend. Of course, she writes frequently. I suppose it is important. KF: Well, it sounds like it feels important to you. Patient: Oh, yes it does and I am so angry at them. How can they take it away? KF: It does sound like it feels important to you. I wonder if you feel the same about the stroke? Patient: Well, you know, the stroke does not make things easy for me, sometimes I feel so desperate, but what really gets to me, is that they could help and they do not. The whole fear about me not been able to move is down to them, you know. That initial physiotherapist, I mean, she was helping me and it was better and then her friend came, and they then decided I cannot walk. Why? They are afraid, if I fall, I will sue and ask for compensation… These apparent dissociations between how the body is experienced emotionally and cognitively, as well as who is the agent of the body’s predicament, can take even more extreme forms of delusional reduplication (splitting of the self, other people or places that lead to corresponding beliefs of two, existing independent entities), as illustrated in the examples below. The first example is an interview with a male patient in the acute stage (2 weeks), the second with a chronic female patient (6 months). 6.2.5  Case Example 5: Two Versions of Events

SB: What happened to bring you here? Patient: I have two versions. One, I was in the bathroom and I hit my head (all my troubles started in that bathroom). Two, I had a bit of a stroke. SB: What kind of symptoms have you noticed since you came to the hospital? Patient: I feel Okay. I mean, I know there are things I can’t do. I have an age problem. SB: Do you have any weakness anywhere? Patient: Yes, my arm and leg, on the left side. They seem okay, but they might be dead. SB: Are they causing you any trouble? Patient: No, no trouble and no pain. SB: Does it feel normal?

148  Aikaterini Fotopoulou and Sahba Besharati Patient: Not really, but it doesn’t cause any trouble, as long as everything is near and I can reach it. SB: Can you use it as well as you used to? Patient: I can’t, no. SB: Are you fearful about losing the ability in your arm? Patient: Yes, a little, but I feel it will come back. SB: The doctors tell me there is some paralysis in your arm and leg. Do you agree? Patient: I have never talked to a doctor about that. But he did say it will come alright. 6.2.6  Case Example 6: Two Versions of My Body

KF: Can you move your legs equally well since the stroke? Patient: Yes, I can, I have no problems there. [Patient then spontaneously tries lifting her legs. She slightly lifted her left leg with her hands and commented] “Oh, yes, this one is a bit weaker.” KF: So, you cannot really move both legs well since the stroke? Patient: Well I cannot move this one so easily because of the fall. They let me fall one day and I’ve hurt this side, otherwise it is ok. [Staff had indeed reported a fall in the preceding days]. KF: I see. And what about your hands, are they equally strong. Patient: Well, this one [points to the left] is fine because I take it out at night to do things, but the other one [points to the right] gives me pain from time, to time. KF: I see. So this arm [examiner lifts the left arm of the patient slightly] is ok, but… Patient [interrupts]: This one [pointing to the left arm] is paralysed. I found it one morning in the bed and it has been stuck to me ever since. Useless, but it not mine. That is Peter’s. KF: Who is Peter? Patient: My cousin. He came yesterday to visit. Didn’t you see him? KF: Oh, I am sorry, I was not here yesterday. But you say this arm [points to the patient’s left arm] is his, and it is paralysed. But your own left arm is ok and you can move it? Patient: Yes, I can. I do. Everyday. The above brief examples illustrate the richness in the clinical variations of the presentation of anosognosia, even at the acute stage following stroke. Full descriptions of the clinical variability of this syndrome can be found elsewhere and fall beyond the scope of this chapter (see Besharati & Fotopoulou, 2021; Besharati, Crucianelli, & Fotopoulou, 2014; Marcel et al., 2004). However, the examples included here aim to illustrate the

First-Person and Counterfactual Experiences of Selfhood  149 fact that a theory that envisions awareness as the dynamic interaction of different converging processes, rather than the function of a unitary module that can be either damaged (in the so-called anosognosia proper) or not (in the so-called denial). 6.3  Feedback versus Feedforward Beyond these denial versus deficit debates, there is also a more recent debate regarding the explanation of anosognosia based on feedback versus feedforward deficits. Older explanations of the syndrome proposed that patients with anosognosia may not be able to “discover” their paralysis in their own, first-person experience because their brain lesions had prevented them from registering their motor failures (e.g., Levine, 1991). For example, right hemisphere damage may impair one’s arm sensation to the left, as well as one’s ability to process sensory information from the left side of space, and/or the left part of one’s body (a symptom called neglect that is also frequent after right hemisphere damage). Specifically, neglect designates a consistent, exaggerated spatial asymmetry in processing information in bodily and/or extrabodily space that can include both omission errors (the inability to detect, attend or respond to stimuli in the neglected space) and commission errors (i.e., productive phenomena such as perseveration; Cubelli, 2017). Thus, it was considered plausible that anosognosic patients may not be learning about their disability, because they do not get the sensory feedback they need, including from vision or proprioception, from the affected side. However, decades of work in neuropsychology has found that such deficits are not enough to cause anosognosia, in the sense that many patients have one or more of these deficits but they are not anosognosic and vice versa (see Marcel et al., 2004 for an excellent study). Thus, progressively the idea of faulty feedback was replaced with the idea of deficits in the comparison between feedforward and feedback signals. Specifically, inspired by the popular idea in science and engineering in the 1990s (Wolpert, 1997) that a system monitors and corrects its own actions by comparing feedforward (anticipatory) and feedback signals, researchers argued that patients may fail to register the discrepancy between predicted and actual sensory feedback because of visuospatial neglect or other sensory deficits (Frith, Blakemore, & Wolpert, 2000), or because of a deficit in the comparison mechanism itself (Berti et al., 2005). In other terms, patients may be confusing what they intended to do, with what actually happened, basing their awareness on the former despite large error signals in the latter. Indeed, we were able to show that patients did experience illusions of movement (see clinical examples below) as a result of a selective dominance of motor intentions over visual feedback and

150  Aikaterini Fotopoulou and Sahba Besharati this effect could not be explained by visual neglect. When patients were intending to make a movement, but not when they were anticipating other people to move their own arms, their ability to imagine the anticipated action was confused with reality (having actually executed the movement) and any feedback to the contrary was actively ignored rather than simply neglected (Fotopoulou et al., 2008). We present examples below of an initial assessment conducted by the authors (SB & KF) for research purposes with two patients diagnosed with anosognosia to illustrate typical instances of first person, illusions of movement by the bedside, within one week from stroke. Both patients were female, in their 70s. 6.3.1  Case Example 7: Illusory Movements

SB: Patient: SB: Patient: SB:

Why are you in the hospital? They say I had a stroke, but I don’t remember anything about it. The doctors tell me you had a stroke, do you agree with them? I don’t know anything about strokes. What kind of symptoms have you noticed since you came to the hospital? Patient: I haven’t noticed anything really. SB: Do you have any weakness anywhere? Patient: Not really, I’m sure I can make a fist if I wanted to. SB: Is your left arm causing you any trouble? Patient: Not at all, no. SB: [The examiner lifts the patient’s left arm and moves it to the right hemispace] There seems to be some weakness in you left arm, do you agree? Patient: SB: Patient: SB: Patient:

No it’s fine. Can you try and move your left arm for me? Yes, I moved it. But I didn’t see your left arm move. That’s because you weren’t paying attention, I just moved it now!

6.3.2 Case Example 8: Illusory Movements against the Lack of Both Visual and Auditory Feedback

SB:

What symptoms have you noticed since the stroke? How does your body feel? Patient: It feels alright. SB: Do you have any weakness anywhere in your body?

First-Person and Counterfactual Experiences of Selfhood  151 Patient: SB: Patient: SB: Patient: SB: Patient: SB: Patient: SB: Patient: SB: Patient: SB: Patient: SB: Patient: SB: Patient:

No, no weakness. Is your left arm causing you any trouble? No, of course not. Can you raise your left leg? Yeah, sure I can. Can you please try and raise your left arm for me? [Silence. The patient does not move.] Can you try and do it for me now? [Patient uses right arm to move left arm] Did you do it? Yes, you saw it move. Yes, but did it move on its own? Well, with the help of this [right] one [arm]. Can you do it without the help of your right hand? Yeah. Do you think you can clap your hands? Yes, sure. Can you please show me? [Uses right hand to lift left hand, then slaps the top of the left had to “clap”] SB: Did you manage to clap your hands? Patient: Yes, I did it. [The patient then winks at SB] In both of these cases, the patients insist that they have executed a movement when in fact they are paralysed and unable to even raise their arms. Our lab and other labs have now verified the occurrence of these illusory experiences during carefully controlled experiments (e.g., Fotopoulou et al., 2008). According to influential feedforward accounts (Berti et al., 2007), these illusions are sufficient to explain the syndrome, in the sense that this difficulty in monitoring motor errors against one’s own sensorimotor predictions leads to a faulty (non-veridical) consciousness of movement and this consciousness is responsible for all other delusional beliefs and emotional reactions observed in right hemisphere patients. As we will see below in greater detail, we have provided evidence to the contrary, arguing that such sensorimotor deficits are only a part of the syndrome’s explanation. 6.4 Beyond the “here-and-now” of Experience: From Sensorimotor Predictions to Counterfactual Expectations In psychiatry and neuroscience, difficulties with self-awareness are typically described in the context of impaired clinical insight, i.e., the inability to acknowledge one’s illness or its consequences. Moreover, increasingly

152  Aikaterini Fotopoulou and Sahba Besharati in these fields clinical insight is not regarded as an all-or-nothing ability to accept one’s illness, but rather as a set of processes about how one forms beliefs about the self (cognitive insight) that themselves rely on higher order cognitive abilities of other- and self-directed mentalisation (David, Bedford, Wiffen, & Gilleen, 2012; Lysaker et al., 2005), also known as social cognition (being able to see and judge oneself from the perspective of other people) and metacognition (the ability to evaluate our own cognition; Flavell, 1979; Fleming & Dolan, 2012), respectively. For example, one needs to infer the mental states of other people (other-directed mentalisation) in the audience to interpret their reaction to one’s own speech as a way to evaluate the success of the speech beyond their own, first-­ person experience of it. In several psychopathologies and neuropathologies, social cognition deficits seem to be associated with impaired insight or anosognosia (Mograbi & Morris, 2013; Chapman et al., 2019; Cosentino et al., 2016). In the case of self-directed mentalisation or metacognition, one may have low confidence (defined as the degree of subjective uncertainty) in what they think they saw in a dark room, or how well they performed a task when they did not receive any explicit feedback. Metacognition characterises the relation between this subjective uncertainty and accuracy and in this sense, it affords a measure of insight into one’s perception, or beliefs. Some have claimed that the two abilities (self-andother directed metacognition) are actually relying on the common core ability of social inference (self-metacognition is simply the ability to see the self as another), but there is an ongoing debate regarding the relation between these two higher order abilities. Importantly, it has been shown that prospective metacognition (forecasting, knowing how well one will do in the future) is an even more higher order ability than retrospective metacognition (knowing how accurate one was on a given judgement) and similar dissociations between anticipatory and emergent awareness have been shown in anosognosia, with the former being more affected and harder to treat than the latter (Moro, Scandola, Bulgarelli, Avesani, & Fotopoulou, 2014). These observations raise the following critical question: Is patients’ inability to update their anosognosic beliefs modality-specific (based on the local sensorimotor monitoring deficits outlined above)? Or do domain-general processes of metacognition (as opposed to domain-specific metacognition, see Fleming, Ryu, Golfinos, & Blackmon, 2014) and mentalisation also necessary to account for AHP (global monitoring deficits; Davies et al., 2005; Fotopoulou, 2014; 2015; Vocat & Vuilleumier, 2010)? Patients with AHP have reality-monitoring deficits, confusing, for example, merely imagined with actually executed actions (Jenkinson, Edelstyn, Drakeford, & Ellis, 2009; Saj, Vocat, & Vuilleumier, 2014) and beliefupdating deficits, being, for instance, overconfident and inflexible in verbal

First-Person and Counterfactual Experiences of Selfhood  153 information-gathering (Vocat, Saj, & Vuilleumier, 2013). Finally, anosognosic errors are associated with disruptions of allocentric mentalisation related to inferior parietal lobule lesions (Besharati et al., 2016, 2022), suggesting that patients’ self-awareness is not facilitated by the ability to see themselves as others regard them, similar to findings on impaired insight (the psychiatric equivalent of anosognosia) in other pathologies (David et al., 2012; Mograbi & Morris, 2013; Cosentino et al., 2016; Chapman et al., 2019). Moreover, sensorimotor theories cannot also account for the non-unitary loss of motor awareness (i.e., why do some patients show implicit awareness into their deficits; see Cases above). Sensorimotor theories are valuable in explaining the illusion of moving (Fotopoulou et al., 2008; see Cases above), but patients do not simply claim that they have the phenomenal experience of moving as other non-anosognosic patients may claim (e.g., “I have the impression that I am moving but I know it cannot be true because I know I am paralysed”). On the contrary, patients with anosognosia ignore the wealth of contrary evidence and medical signs indicating that they are paralysed (e.g., their medical results, disabilities, occasional accidents and others’ feedback; see cases above). Moreover, while older lesion mapping studies have attributed AHP to discrete cortical lesions in areas such as the lateral premotor cortex or the insula (Berti et al., 2005; Karnath, Baier, & Nägele, 2005), more recent studies suggest a more complex aetiology (Fotopoulou et al., 2010; Monai et al., 2020; Moro et al., 2011; 2016; Pacella et al., 2019). Using advanced neuroimaging methods and the largest sample (N = 174) to date, we found that damage to at least three functional networks is necessary for AHP, including not only sensorimotor networks, such as lesions in a premotorstriatal loop, but also damage to areas associated with much wider cognitive functions, such as the posterior parts of the limbic network (i.e., cingulum connections among the amygdala, the hippocampus and the cingulate gyrus) and the ventral attentional network [i.e., superior longitudinal fasciculus (SLF) III connections between temporo-parietal junction and ventral frontal cortex; Pacella et al., 2019]. Taken together, clinical indications and accumulated behavioural and anatomical evidence suggest that anosognosic beliefs can be explained by both impaired local sensorimotor monitoring and impairments in more global, metacognitive monitoring. However, previous multifactorial models of AHP have considered the relation between such factors as merely cumulative, with damage to at least two independent modules considered necessary for AHP to occur (see above). Using a unifying theoretical framework (the Bayesian brain hypothesis; Dayan, Hinton, Neal, & Zemel, 1995; Friston, 2005), we have proposed instead that AHP can be explained as a disconnection between several of the normally convergent sensorimotor, metacognitive and mentalisation functions that

154  Aikaterini Fotopoulou and Sahba Besharati support counterfactual self-awareness (Fotopoulou, 2014, 2015), as explained below. 6.5  Updating Counterfactual Expectations about One’s Motor Abilities According to the Bayesian brain hypothesis, the brain uses its prior learning to construct generative models about the embodied self that encode predictions not only about the hidden causes of current, noisy sensory inputs but also about the inferred causes of “counterfactual” sensory inputs. The latter depend on predicted but not-as-yet executed actions (e.g. what will it feel like when I grab that cup of hot coffee), potential spatial positions one may occupy (e.g. how would I grab that cup of coffee if I were sitting at the other side of the table), emotional and social conditions one may encounter (e.g. how embarrassed would I be if my friend saw me drop that coffee cup; D’Imperio, Bulgarelli, Bertagnoli, Avesani, & Moro, 2017; Fotopoulou, 2015). In that sense, self-awareness involves inferential processes with counterfactual depth (Fotopoulou, 2015; Palmer, Seth, & Hohwy, 2015). Accordingly, the inability of patients to update their anosognosic beliefs may be understood as the inability to draw new inferences not only about their motor abilities in the here-and-now of experience (e.g. did I just move as I intended to?), but also about counterfactual, prospective motor abilities (e.g. could I do this same action tomorrow, or at home?). To our knowledge, however, this kind of prospective awareness has only been recently examined in AHP (Kirsch et al., 2020). Furthermore, according to the Bayesian brain hypothesis framework, belief-updating is dependent upon the relative uncertainty (or, mathematically its inverse precision; Bastos et al., 2012; Friston, 2010; see Morrison, 2016 for a discussion on the relationship between uncertainty and precision, not presented here) ascribed to prior beliefs relative to sensory information, which determines how prediction errors are weighted in the formation of posterior beliefs. In computational psychiatry, precision abnormalities have provided an explanation for psychopathological symptoms, including delusions (Adams, Stephan, Brown, Frith, & Friston, 2013; Corlett, Taylor, Wang, Fletcher, & Krystal, 2010; Friston, 2017; Lawson, Mathys, & Rees, 2017). Accordingly, using a Bayesian learning framework (see Mathys & Weber, 2020; Mathys et al., 2014), we formalised and empirically investigated the hypothesis (Fotopoulou, 2015) that failures to update anosognosic beliefs about counterfactual motor abilities will be explained by abnormalities in the precision ascribed to prior beliefs relative to sensory information. Specifically, we designed a new motor belief-updating task (Kirsch et al., 2020) that manipulated both the temporal (prospective and retrospective) and spatial (affected versus unaffected hemispace) conditions

First-Person and Counterfactual Experiences of Selfhood  155 in which beliefs had to be updated. The task allowed us to measure how prospective estimates about bimanual motor abilities are updated on the basis of retrospective estimates about corresponding action attempts in the contralesional (most affected by neglect) and in the ipsilesional (less affected by neglect) hemispace. Although patients with AHP typically also suffer from hemispatial neglect, neglect is not considered a necessary, nor sufficient deficit for AHP, given the long-observed double-dissociations ­between the two symptoms (Marcel et al., 2004). However, such dissociations do not exclude the possibility that visuospatial neglect contributes to AHP in functional convergence with other deficits (Fotopoulou, 2014; Vocat et al., 2010). In the framework used here, this functional convergence can be understood as related to precision. Specifically, in predictive coding, the precision afforded by various beliefs – or sensory evidence – can be taken as the computational homologue of attention (Fotopoulou, 2014; Vocat et al., 2010). For example, attending to a particular source of information corresponds to increasing the precision of the associated (sensory) prediction errors. Thus, a formal account of visuospatial neglect – in terms of aberrant precision – may be particularly apt for explaining its contribution to anosognosia, as it has been in a related phenomenology of altered motor awareness, namely functional motor disorders (Edwards, Adams, Brown, Pareés, & Friston, 2012). In such pathologies, precision optimisation is regarded as a domain-general ability depending broadly on the functional convergence of various neuromodulatory functions (Friston, 2008). Yet in the case of AHP, the observed lesions and structural disconnections of the ventral attentional system (Besharati et al., 2016, 2022; Pacella et al., 2019), which have been linked with difficulties in reorienting attention in contralesional hemispace based on salience and behavioural relevance (Corbetta & Shulman, 2002; Mesulam, 1999), may play a similar role, particularly when there are concomitant lesions to the basal ganglia and the limbic system (Besharati et al., 2014; Fotopoulou et al., 2010; Moro et al., 2011; Pacella et al., 2019). Accordingly, using a spatial manipulation and standardised measurements of each patient’s attentional deficits (as proxies for precision), we could generate an approximate measure of each patient’s ability to attend to prediction errors in the affected, contralesional versus the unaffected, ipsilesional hemispace. We found that in the contralesional hemispace, neglect does not affect sensorimotor error monitoring, but it does seem to affect the weighting of retrospective beliefs about performance so that sensory evidence from the neglected hemispace is not used to update more general beliefs about counterfactual motor abilities. Moreover, this effect is less the result of how much confidence patients have in their ­retrospective versus their prior beliefs contralesionally, and better reflected in the neglect they exhibit in this hemispace. This finding suggests that

156  Aikaterini Fotopoulou and Sahba Besharati in the contralesional hemispace, most AHP patients cannot transfer their retrospective insight about observed motor failures to form realistic prospective beliefs about motor ability. Instead, these posterior beliefs seem closer to their unrealistic prior beliefs in the same hemispace. These results point towards a counterintuitive yet crucial finding, namely the visuospatial hemifield in which errors occur may affect prospective, belief-updating (Can I put on gloves?), without affecting retrospective, sensorimotor monitoring (How well did I put on gloves in this attempt?). In other terms, in the contralesional hemifield anosognosic patients can acknowledge their performance errors (complete failure due to the hemiplegia) to a degree, but they cannot use such observations to update their more general, prospective beliefs about their motor abilities. Exploratory (given our sample size) lesion analyses revealed that anosognosic difficulties in belief-updating were associated with disruptions in tracts of the ventral attentional network (i.e., SLF connections between temporo-parietal junction and ventral frontal cortex, including in this case lesions to the insula, in line with previous studies; Pacella et al., 2019). Interestingly, we have also found associated lesions to these temporo-­ parietal junction areas, as well as to the inferior and middle frontal gyri, with allocentric, mentalisation deficits in AHP (Besharati et al., 2016; 2022). This would suggest that anosognosic patients do not correct their unrealistic self-beliefs as they may be unable to take an allocentric stance on themselves, i.e., integrate their first-person experience of the body with third-person views to form a more objectified, counterfactual view of the self (Fotopoulou, 2015). Indeed, similar lesions, as well as disconnections of these temporo-parietal areas from their ventral frontal cortex connections via the SLF, lead also to failures to update counterfactual beliefs beyond the here-and-now of sensorimotor experience in the aforementioned study (Kirsch et al., 2020). Social or spatial perspective taking was not tested in that study, but patients were asked to use their motor, bimanual performance as it occurred in particular time and hemispace (did you achieve this task here and now?) to infer their corresponding motor abilities in a prospective manner which entails consideration of many possible (counterfactual) times and spaces (how well will you be able to achieve this task at home, or at work, tomorrow or next week?). Thus, in this sense, our findings portray that the aforementioned lesions and disconnections affect patients’ ability to use sensory error information from the contralesional hemispace to draw more abstract, conclusions about selfrelated counterfactuals. While it is known that such ventral lesions may lead to a kind of motivational neglect, or difficulty in reorienting attention in contralesional hemispace based on salience and behavioural relevance (Corbetta & Shulman, 2002; Mesulam, 1999), our findings about anosognosic beliefs (rather than just misperceptions) suggest that self-beliefs

First-Person and Counterfactual Experiences of Selfhood  157 are also subject to similar salience modulation. This finding is also reminiscent of rare observations made by Mesulam (1999, p. 1329) regarding the relationship between neglect and motivational expectations; “Patients with unilateral neglect devalue the left side of the world and behave not only as if nothing is actually happening in the left but also as if nothing of any importance could be expected to emanate from that side.” Our prior findings and theoretical perspective extends this observation to the level of belief formation, so that even when patients are able to observe what has happened in the left hemispace (their motor errors), they do not experience such errors as “important enough” beyond the given context to update their more abstract beliefs about their self. Or, as one of our patients said, “I know I can put on gloves by myself, I just could not do it now. If we were at home, this would be no problem.” Indeed, we have proposed that the delusional aspects of anosognosia are best explained as the failure to evaluate the salience or relevance of contextdependent sensorimotor errors (they occur in specific time and space) to more abstract (context-independent; they can refer to any time and space) beliefs about the self (Fotopoulou, 2015; Kirsch et al., 2020). Typically, errors occurring in the neglected hemispace and disconnections in the right salience network seem to result in patients being unable to assimilate the information from that space appropriately. Ultimately, they fail to integrate their sensorimotor errors from that space with other beliefs about their counterfactual self. This interpretation is also consistent with prior findings regarding the disruption and disintegration of several phenomenological and cognitive aspects of self-processing following damage to the temporoparietal region, including self-reduplication and out-of-body experiences (e.g., Blanke & Arzy, 2005). The exact relationship between the counterfactual belief-updating impairment we examined in the present study and similar deficits in allocentric mentalisation (Besharati et al., 2016; see also above), prospective metacognition and weak central coherence (Frith, 1989; Happé & Frith, 2006) that have been associated with similar multimodal integration networks, needs to be determined in future studies. Some indications can however be found in our existing studies. Indeed, we have recently shown that anosognosic patients are not impaired in third-person visuospatial or verbal perspective taking itself (e.g. Besharati et al., 2015; Fotopoulou et al., 2011; Fotopoulou, Rudd, Holmes, & Kopelman, 2009), but they have a selective impairment in mentalising (reading the mental states of others) in allocentric mode (Besharati et al., 2016). Specifically, while patients were able to read other people’s mental states when they had to take the perspective of other people in brief stories that also involved themselves (mentalisation from an egocentric stance), they were unable to read other people’s mental states when the stories read out to them did not involve them at all, i.e., they were

158  Aikaterini Fotopoulou and Sahba Besharati stories referring to the relationship between two other people (mentalisation from an allocentric stance; Besharati et al., 2016). Indeed, in the field of autism, other researchers have noted that perspective taking may simply involve the transposition of the egocentric stance in a different location (a third-person perspective can remain egocentric and self-referent), whereas an allocentric stance acknowledges a relation between other minds that is completely independent of one’s own egocentric perspective (Vogeley & Fink, 2003; Vogeley et al., 2004; Frith & de Vignemont, 2005). In previous writings, borrowing insights from Merleau-Ponty (1945/1962), one of us (KF) has called this aspect of the self-experience, the impersonalised body (Fotopoulou, 2015), to emphasise not only that it goes beyond our egocentric, first-person experience of the body but also that it does not relate to any particular social, third-person perceptive on the body. Rather it relates to the integration of all possible spaces, times and social perspectives on the body, so that the body can be represented as a whole-in-itself, irrespective of current embodied or social experiences of it. Interestingly, in our study (Besharati et al., 2016), only their performance in the later, allocentric task and none of the other mentalisation and perspective-taking conditions correlated with severity of unawareness in our sample. Moreover, in that mode, patients made mostly egocentric errors, i.e., they showed an inability to inhibit the egocentric perspective while engaged in an allocentric task. Thus, there was a specific association between anosognosia and the inability to inhibit first-person perspective during allocentric mentalisation; the worse patients performed in the allocentric task, the more severe was their anosognosia. Based on these findings, as well as the aforementioned study on beliefupdating, we propose that patients with anosognosia may not be able to explain away their first-person experiences based on allocentric, counterfactual expectations about their bodies. In the absence of the ability to integrate these various experiences into unified belief about the self, some of their embodied experiences seem to be explained as though they are referring to two separate entities (see reduplication examples above). In summary, we have seen how the complex syndrome of anosognosia can be caused by patients’ expectations about their first-person sense of (motor) awareness and agency that cannot be updated because brain damage has affected how new information is processed in both sensorimotor and salience-monitoring domains. In addition, patients cannot use third-person beliefs or feedback to update their beliefs either, as it appears that the higher order processes that typically allow the integration of subjective, first person and objective, third-person perspectives on the self are also affected by disconnection between the relevant brain areas (see Figure 6.1 for a schematic representation of this argument). However, as we will argue below, it is not only motor expectations that patients

First-Person and Counterfactual Experiences of Selfhood  159

Figure 6.1 A schematic representation of the different levels of self (expected) experience as revealed by anosognosia for hemiplegia. PE = patients’ expectations

cannot update. It appears that at least some anosognosic patients cannot update their feelings about their body and its abilities, based on interoceptive feedback. 6.6  Updating Counterfactual Expectations about Interoceptive Feelings As we outlined above, while exteroceptive (sensations responsible for external perception such a vision and audition), proprioceptive (sensations about the position of the body in space) and motor deficits may be important contributors to how the body is experienced from a first-person perspective in anosognosia, they are unlikely to be its primary or sufficient causes. By contrast, another facet of how individuals experience their own body subjectively, in the first-person, may have a central role in anosognosia. As mentioned above, lesion mapping studies have indicated that grey areas such as the insula and limbic structures may be selectively associated with AHP (Fotopoulou et al., 2010; Karnath et al., 2005; Moro et al., 2011; Vocat et al., 2010) and we have conducted a recent large study (174 patients) using advanced lesion mapping techniques that revealed a disconnection of these areas from frontoparietal sensorimotor attention

160  Aikaterini Fotopoulou and Sahba Besharati and control networks (Pacella et al., 2019). Previous functional neuroimaging studies have found that the functional role of these areas and their connections concerns not only salience detection about exteroceptive signals but the processing of interoceptive signals (Craig, 2009; Critchley, Wiens, Rotshtein, Öhman, & Dolan, 2004). Interoception refers to the perception of the physiological condition of the body, involving modalities, such as temperature, itch, pain, cardiac signals, respiration, hunger, thirst, pleasure from sensual touch and other bodily feelings relating to homoeostasis (Craig, 2010; Critchley et al., 2004). It is distinct from the exteroceptive system, which refers to the classical sensory modalities for perceiving the external environment (e.g. vision, audition), as well as proprioceptive, vestibular and kinesthetic input informing about the movement and location of the body in space (Blanke & Metzinger, 2009; Craig, 2010; Critchley et al., 2004), as it is mediated by a separate specialised neuroanatomical system, linked to homoeostasis and subjective, emotional core of the self (Craig, 2009; Critchley et al., 2004; Damasio, 1994; Seth, 2013). Interoception informs the mind about how the body itself is doing in relation to certain inherited, homeostatic needs (e.g. one may be dehydrated, or stung by an insect), while exteroception informs the organism about environmental changes in relation to such needs (there is a river ahead) but independently of the physiological state of the body itself. Thus, interoception is considered the basis of the sentient self, how we feel in the here-and-now of experience. From the point of view of modern neuroscientific theories, subjective feeling states arise from predictive inferences on the causes of interoceptive signals (Barrett & Simmons, 2015; Pezzulo, Rigoli, & Friston, 2015; Seth, 2013; Seth, Suzuki, & Critchley, 2012; also see Allen & Tsakiris, 2019 and Corcoran & Hohwy, 2019, for further research on interoception and predictive coding). Accordingly, we have proposed that anosognosic and asomatognosic patients struggle to affectively personalise new sensorimotor information and related beliefs about the affected body parts because interoceptive and emotional signals about the current state of the body are weak or suppressed or unable to update current predictions and expectations of how the affected body parts should feel like (Fotopoulou, 2015; Martinaud, Besharati, Jenkinson, & Fotopoulou, 2017). In support of this hypothesis, a recent study (Romano, Gandola, Bottini, & Maravita, 2014) showed that right hemisphere patients who show somatoparahrenic beliefs about their affected body parts also show reduced physiological reactions to the threat of the same body parts, as measured by skin conductance responses. Moreover, given the higher position of such prior beliefs in the neurocognitive hierarchy (see Friston, 2013), such faulty inference may also “explain away” contrary exteroceptive signals during instances of multisensory integration (see also Fotopoulou, 2015 for some more details on striatal

First-Person and Counterfactual Experiences of Selfhood  161 lesions and their relevance to updating body beliefs in anosognosia). To use the words of one anosognosic patient who also denied the ownership of his paralysed limbs “But my eyes and my feelings don’t agree, and I must believe my feelings. I know they [left arm and leg] look like mine, but I can feel they are not, and I can’t believe my eyes” (C.W. Olsen, 1937, cited in Feinberg, 1997). Thus, it appears that as motor intentions may be confused with actually executed actions due to damage to frontoparietal networks of motor control (see above), so can interoceptive priors (how I expect my body to feel) be confused with actual feelings of the body in the here-and-now (how my body actually feels) due to damage to brain areas processing and integrating (the salience of) interoceptive feelings about the body. Thus, patients may have illusory feelings of an intact body that they struggle to integrate with their current sad reality of their paralysis. Support for this hypothesis was provided in a recent study of patients with right perisylvian lesions (N = 31), with and without delusions of somatic ownership and anosognosia (Martinaud et al., 2017). Specifically, we found that almost all of our sample experienced feelings of ownership over a realistic rubber hand presented in front of them for 15 seconds and without any tactile stimulation (a phenomenon termed visual capture of ownership, VOC). Paradoxically, the subset of these patients that had delusions regarding the ownership of their arm denied its ownership, even when they accepted (mistakenly) the ownership of a rubber hand that was placed at the same position on the left hemispace. Thus, such delusions are not merely a matter of being unable to attribute a seen left arm positioned in the left hemispace to the self, but rather it is possible that some unexpected sensation or feelings about the arm leads patients to infer that this arm cannot possibly be their own. We have proposed (Martinaud et al., 2017) that such unexpected sensations in the arm may be described as feelings of deafference, ultimately leading these patients to feel that the arm they see does not feel as they expect it to feel and hence to infer that the arm is not their own. Unfortunately, in practice, it can be difficult to reliably assess the presence of such spontaneous deafference sensations and their precise role in arm ownership. However, a possible alternative way to test these ideas is to experimentally reduce such sensations using affective touch stimulation (which as aforementioned can attenuate experimentally induced feelings of deafference; Panagiotopoulou, Filippetti, Tsakiris, & Fotopoulou, 2017) and observe the effect on body-part ownership. Two single case studies from the same group provide indirect evidence for this proposed effect. First, van Stralen, van Zandvoort, and Dijkerman (2011) reported that gentle touch increased feelings of arm belonging and related emotional attitudes in a right-hemisphere stroke patient with DSO (van Stralen et al., 2011). Second, Smit, Van Stralen, Van den Munckhof, Snijders, and Dijkerman (2019) report a patient with full body disownership

162  Aikaterini Fotopoulou and Sahba Besharati following a tumour resection in the temporo-parietal cortex, who experienced an increased sense of ownership towards a rubber hand during slow (CT-optimal) touch. Importantly, in a recent group study, we were able to demonstrate that indeed applying affective touch to right-hemisphere stroke patients, in a way that was most likely to activate C-Tactile fibres (a special class of tactile afferents thought to process interoceptive signals regarding touch), can reduce their body delusions (Jenkinson et al., 2020). Taken together, these results suggest that delusions regarding the body following right hemisphere may be at least partly explained as interoceptive experiences about the affected arm, which cannot be explained by existing top-down expectations of selfhood, thus giving rise to interoceptive prediction errors. The resulting delusions may be attempts to try to infer the causes of such interoceptive predictive errors. In summary, we have seen how the complex syndrome of anosognosia can be caused by patients’ expectations about their first-person sense of (motor) agency and their first-person feelings of body ownership that cannot be updated because brain damage has affected how new information is processed in both exteroceptive and interoceptive domains. This damage gives rise to sensorimotor and interoceptive prediction errors that require an explanation of their causes based on premorbid models of the self. Importantly, patients cannot use more higher order processes of salience-based monitoring, or third-person feedback to update their beliefs either, as it appears that the higher-order processes that typically allow the integration of subjective, first-person perspectives on the self with more objectified, counterfactual knowledge about the self are also affected by disconnection between the relevant brain areas. In other work, we have argued that the latter high-order abilities are not mere cognitive acquisitions in development. Instead, they are also socio-affective processes that depend on the quality-ofcare infants received in early childhood (see also Ciaunica & Fotopoulou, 2017; Fotopoulou & Tsakiris, 2017), but this developmental perspective escapes the scope of the current paper (see Besharati & Fotopoulou, 2021 for a description of the relevance of this view to anosognosia). 6.7 Conclusion In this chapter, we build upon empirical findings on the neurological syndrome of anosognosia for hemiplegia (lack of awareness into one’s paralysis) and recent neuroscientific theories of self-awareness (Fotopoulou, 2014, 2015; Friston, 2018) to propose that the experience of one’s self entails at least two normally integrated levels of inference, namely, inferences about the here-and-now of experience in the first person and counterfactual inferences about the self beyond first-person experience. The relation between the two is understood as dynamic, in the sense that the contextual salience

First-Person and Counterfactual Experiences of Selfhood  163 of different signals determines the degree to which prediction errors in firstperson experiences will be explained away by more objectified predictions about the self or will allow the updating of the latter. Anosognosia seems best explained as a disconnection between how one expects the body to feel in a first-person perspective (emotionally mine and under my control) and how one experiences the body when counterfactual hypotheses are taken into account. This difficulty in integrating current sensations and emotions from the body with their beliefs about the body corresponds to difficulties in the most abstract, metacognitive (allocentric and prospective) aspects of body awareness (Besharati et al., 2016, 2022; Kirsch et al., 2020). According to at least some formalisation of the Bayesian brain hypothesis, such integration is understood as precision optimisation (Kirsch et al., 2020) that can inhibit and explain away prediction errors or allow them to update counterfactual beliefs. Thus, self-awareness in anosognosia is subject to the influence of non-updated premorbid beliefs and emotions about the counterfactual self (Fotopoulou, 2014, 2015). Normally, first-person and counterfactual facets of self-awareness are integrated and experienced by individuals as unified. The dissociation observed in anosognosia allows us a rare glimpse of the normally unconscious processes of integration and inference that underlie self-experience in everyday life. Disclaimer The quoted examples, as well as a different version of the main thesis of this chapter, focusing on the integration of psychodynamic and neuroscientific theories (Besharati & Fotopoulou, 2021), will appear also in a different volume, and the two papers will have some overlap. Acknowledgements We would like to thank the patients and their families for their participation. KF’s time was funded by a European Research Council (ERC) Consolidator Award for the project ‘METABODY’ and the collaboration with SB was funded by a Strategic Partner Fund between the University of the Witwatersrand (Wits) and University College London (UCL). References Allen, M., & Tsakiris, M. (2019). The body as first prior: Interoceptive predictive processing and the primacy. In M. Tsakiris, & H. D. Preester (Eds.), The interoceptive mind: From homeostasis to awareness (pp. 27–42). Oxford: Oxford University Press. Adams, R. A., Stephan, K. E., Brown, H. R., Frith, C. D., & Friston, K. J.. (2013) The computational anatomy of psychosis. Frontiers in Psychiatry, 4, 1–26.

164  Aikaterini Fotopoulou and Sahba Besharati Babinski, J. (1914). Contribution e l’etude des troubles mentaux dans hemiplegie organique cerebrale (anosognosia) [contribution to the study of mental disorders in hemiplegia (anosognosia)]. Revue Neurologique, 27, 845–848. Barrett, L. F., & Simmons, W. K. (2015). Interoceptive predictions in the brain. Nature Reviews. Neuroscience, 16, 419–429. Bastos, A. M., Usrey, W. M., Adams, R. A., Mangun, G. R., Fries, P., & Friston, K. J. (2012). Canonical microcircuits for predictive coding. Neuron, 76(4), 695–711. Berti, A., Bottini, G., Gandola, M., Pia, L., Smania, N., Stracciari, A., & Paulesu, E. (2005). Shared cortical anatomy for motor awareness and motor control. Science, 309, 488–491. Berti, A., Spinazzola, L., Pia, L., Rabuffetti, M., Haggard, P., & Rossetti, Y. et al. (2007) Motor awareness and motor intention in anosognosia for hemiplegia. In P. Haggard (Ed.), Sensorimotor foundations of higher cognition, (pp. 163–181). Oxford: Oxford University Press. Besharati, S., Jenkinson, P. M., Kopelman, M., Solms, M., Moro, V., & Fotopoulou, A. (2022). Awareness is in the eye of the observer: Preserved third-person awareness of deficit in anosognosia for hemiplegia. Neuropsychologia, 170, 108227. Besharati, S., & Fotopoulou, A. (2021). The social reality of the self: Right perisylvian damage revisited. In C. E. Salas, O. H. Turnbull, & M. Solms (Eds.), Clinical studies in neuropsychoanalysis, revisited. Oxford: Routledge. Besharati, S., Crucianelli, L., & Fotopoulou, A. (2014). Restoring awareness: A review of rehabilitation in anosognosia for hemiplegia. Revista Chilena de Neuropsicología, 9, 31–37. Besharati, S., Forkel, S. J., Kopelman, M., Solms, M., Jenkinson, P. M., & Fotopoulou, A. (2016). Mentalizing the body: Spatial and social cognition in anosognosia for hemiplegia. Brain, 139, 971–985. Besharati, S. (2015). Cognitive, social and emotional processes in unawareness of illness following stroke (Doctoral dissertation). http://hdl.handle.net/11427/ 15491 Besharati, S., Kopelman, M., Avesani, R., Moro, V., & Fotopoulou, A. (2015). Another perspective on anosognosia: self-observation in video replay improves motor awareness. Neuropsychological rehabilitation, 2, 319–352. Besharati, S., Forkel, S. J., Kopelman, M., Solms, M., Jenkinson, P. M., & Fotopoulou, A. (2014). The affective modulation of motor awareness in anosognosia for hemiplegia: Behavioural and lesion evidence. Cortex, 61, 127–140. Blanke, O., & Arzy, S. (2005). The out-of-body experience: Disturbed self-­ processing at the temporo-parietal junction. The Neuroscientist, 11(1), 16–24. Blanke, O., & Metzinger, T. (2009). Full-body illusions and minimal phenomenal selfhood. Trends in Cognitive Science, 13, 7–13. Chapman, S., Beschin, N., Cosentino, S., Elkind, M. S., Della Sala, S., & Cocchini, G. (2019). Anosognosia for prospective and retrospective memory deficits: Assessment and theoretical considerations. Neuropsychology, 33(7), 1020–1031. Ciaunica, A., & Fotopoulou, A.. (2017) The touched self: Psychological and philosophical perspectives on proximal intersubjectivity and the self. In C. Durt, T.  Fuchs & C. Tewer (Eds), Embodiment, enaction, and culture: Investigating the constitution of the shared world (pp. 173–192). MIT Press: Cambridge, MA.

First-Person and Counterfactual Experiences of Selfhood  165 Cocchini, G., Beschin, N., Fotopoulou, A., & Della Sala, S. (2010). Explicit and implicit anosognosia or upper limb motor impairment. Neuropsychologia, 48, 1489–1494. Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulusdriven attention in the brain. Nature Reviews Neuroscience, 3(3), 201–215. Corcoran, A. W., & Hohwy, J. (2019). Allostatis, introception and the free energy principle: Feeling our way forward. In M. Tsakiris, & H. D. Preester (Eds.), The interoceptive mind: From homeostasis to awareness (pp. 27–42). Oxford: Oxford University Press. Corlett, P. R., Taylor, J. R., Wang, X. J., Fletcher, P. C., & Krystal, J. H. (2010). Toward a neurobiology of delusions. Progress in Neurobiology, 92(3), 345–369. Cosentino, S., Zhu, C., Bertrand, E., Metcalfe, J., Janicki, S., & Cines, S. (2016). Examination of the metacognitive errors that contribute to anosognosia in Alzheimer’s disease. Cortex, 84, 101–110. Craig, A. D. (2010). The sentient self. Brain Structure and Function, 214, 563–577. Craig, A. D. B. (2009). How do you feel-now? The anterior insula and human awareness. Nature Reviews Neuroscience, 10, 59–70. Critchley, H. D., Wiens, S., Rotshtein, P., Öhman, A., & Dolan, R. D. (2004). Neural systems supporting interoceptive awareness. Nature Neuroscience, 7, 189–195. Critchley, M. (1955). Personification of paralysed limbs in hemiplegics. British Medical Journal, 2, 2284–2286. Cubelli, R. (2017). Definition: Spatial neglect. Cortex: A Journal Devoted to the Study of the Nervous System and Behavior, 92, 320–321. Damasio, A. (1994). Descartes’ error: Emotion, reason, and the human brain. New York: G.P. Putnam’s Sons. David, A. S., Bedford, N., Wiffen, B., & Gilleen, J. (2012). Failures of metacognition and lack of insight in neuropsychiatric disorders. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 367(1594), 1379–1390. Davies, M., Davies, A. A., & Coltheart, M. (2005). Anosognosia and the twofactor theory of delusions. Mind and Language, 20, 209–236. Dayan, P., Hinton, G. E., Neal, R. M., & Zemel, R. S. (1995). The Helmholtz machine. Neural Computation, 7(5), 889–904. D’Imperio, D., Bulgarelli, C., Bertagnoli, S., Avesani, R., & Moro, V. (2017). Modulating anosognosia for hemiplegia: The role of dangerous actions in emergent awareness. Cortex, 92, 187–203. Edwards, M. J., Adams, R. A., Brown, H., Pareés, I., & Friston, K. J. (2012). A Bayesian account of ‘hysteria. Brain: A Journal of Neurology, 135, 3495–3512. Feinberg, T. E. (1997). Anosognosia and confabulation. In T. E. Feinberg, & M. J. Farah (Eds.), Behavioral neurology and neuropsychology (pp. 369–390). New York: McGraw Hill. Flavell, J. H. (1979). Metacognition and cognitive monitoring: A new area of cognitive–developmental inquiry. American Psychologist, 34, 906. Fleming, S. M., Ryu, J., Golfinos, J. G., & Blackmon, K. E. (2014). Domain-­specific impairment in metacognitive accuracy following anterior prefrontal ­ lesions. Brain, 137, 2811–2822.

166  Aikaterini Fotopoulou and Sahba Besharati Fleming, S. M., & Dolan, R. J. (2012). The neural basis of metacognitive ability. Philosophical Transactions of the Royal Society B, 367, 1338–1349. Fotopoulou, A. (2012a). Illusions and delusions in anosognosia for hemiplegia: From motor predictions to prior beliefs. Brain, 135, 1344–1346. Fotopoulou, A. (2012b). The history and progress of neuropsychoanalysis. In A. Fotopoulou, M. Conway, & D. Pfaff (Eds.), From the couch to the lab: trends in psychodynamic neuroscience. Oxford: Oxford University Press. Fotopoulou, A. (2012c). Towards psychodynamic neuroscience. In A. Fotopoulou, M. Conway, & D. Pfaff (Eds.), From the couch to the lab: Trends in psychodynamic neuroscience (pp. 25–47). Oxford: Oxford University Press. Fotopoulou, A. (2014). Time to get rid of the ‘Modular’ in neuropsychology: A unified theory of anosognosia as aberrant predictive coding. Journal of Neuropsychology, 8, 1–19. Fotopoulou, A. (2015). The virtual bodily self: Mentalisation of the body as revealed in anosognosia for hemiplegia. Consciousness and Cognition, 33, 500–510. Fotopoulou, A., & Tsakiris, M. (2017). Mentalizing homeostasis: The social origins of interoceptive inference. Neuropsychoanalysis, 19, 3–28. Fotopoulou, A., Jenkinson, P. M., Tsakiris, M., Haggard, P., Rudd, A., & Kopelman, M. D. (2011). Mirror-view reverses somatoparaphrenia: Dissociation between first-and third-person perspectives on body ownership. Neuropsychologia, 49, 3946–3955. Fotopoulou, A., Pernigo, S., Maeda, R., Rudd, A., & Kopelman, M. A. (2010). Implicit awareness in anosognosia for hemiplegia: Unconscious interference without conscious re-representation. Brain, 133, 3564–3577. Fotopoulou, A. (2010). The affective neuropsychology of confabulation and delusion. Cognitive Neuropsychiatry, 15(1–3), 38–63. Fotopoulou, A., Rudd, A., Holmes, P., & Kopelman, M. (2009). Self-observation reinstates motor awareness in anosognosia for hemiplegia. Neuropsychologia, 47, 1256–1260. Fotopoulou, A., Tsakiris, M., Haggard, P., Vagopoulou, A., Rudd, A., & Kopelman, M. (2008). The role of motor intention in motor awareness: An experimental study on anosognosia for hemiplegia. Brain, 131, 3432–3442. Friston, K. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society B: Biological Sciences, 360(1456), 815–36. Friston, K. (2008). Hierarchical models in the brain. PLoS Computational Biology, 4(11), e1000211. Friston, K. J. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience, 11, 127–138. Friston, K. J. (2013). Consciousness and hierarchical inference. Neuropsychoanalysis, 15(1), 38–42. Friston, K. J. (2017). Precision psychiatry. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 2(8), 640–3. Friston, K. J. (2018). Am I self-conscious? (Or does self-organization entail selfconsciousness?). Frontiers in Psychology, 9, 579. Frith, C. D., Blakemore, S. J., & Wolpert, D. M. (2000). Abnormalities in the awareness and control of action. Philosophical Transactions of the Royal Society of London, Series B. Biological Sciences, 355, 1771–1788.

First-Person and Counterfactual Experiences of Selfhood  167 Frith, U. (1989) Autism and “theory of mind”. In Diagnosis and treatment of autism (pp. 33–52). Springer, Boston, MA. Frith, U., & de Vignemont, F. (2005). Egocentrism, allocentrism, and Asperger syndrome. Consciousness and Cognition, 14, 719–738. Gerstmann, J. (1942). Problem of imperception of disease and impaired body territories with organic lesions. Archives of Neurology and Psychiatry, 48, 890–913. Hamilton, A. (2006). Against the Belief Model of Delusion. In M. Chung, W. Fulford, & G. Graham (Eds.), Reconceiving Schizophrenia (217–234). Oxford University Press. Happé, F., & Frith, U. (2006). The weak coherence account: Detail-focused cognitive style in autism spectrum disorders. Journal of Autism and Developmental Disorders, 36(1), 5–25. Jenkinson, P. M., & Fotopoulou, A. (2014). Understanding Babinski’s anosognosia: 100 years later. Cortex, 61, 1–4. Jenkinson, P. M., Edelstyn, N. M. J., Drakeford, J. L., & Ellis, S. J. (2009). Reality monitoring in anosognosia for hemiplegia. Consciousness and Cognition, 18, 458–570. Jenkinson, P. M., Papadaki, C., Besharati, S., Moro, V., Gobbetto, V., Crunicanelli, L., Kirsch, L. P., Avesani, R., Ward, N. S., & Fotopoulou, A.. (2020). Welcoming back my arm: Affective touch increases body ownership following right hemisphere stroke. Brain Communications, 2(1), fcaa034. Kaplan-Solms, K. L., & Solms, M. (2000). Clinical studies in neuropsychoanalysis. London: Karnac Books. Karnath, H. O., Baier, B., & Nägele, T. (2005). Awareness of the functioning of one’s own limbs mediated by the insular cortex? The Journal of Neuroscience, 25, 7134–7138. Kinsbourne, M. (2000). How is consciousness expressed in the cerebral activation manifold? Brain and Mind, 1, 265–274. Kirsch, L. P., Besharati, S., Papadaki, C., Crucianelli, L., Bertagnoli, S., Ward, N., Moro, V., Jenkinson, P. M., & Fotopoulou, A. (2020). Damage to the right insula disrupts the perception of affective touch. eLife, 9, e47895. Langer, K. G., & Levine, D. N. (2014) Translated from the original contribution à l’Étude des troubles mentaux dans l’Hémiplégie organique cérébrale (Anosognosie). Cortex: A Journal Devoted to the Study of the Nervous System and Behavior, 27, 845–848. Lawson, R. P., Mathys, C., & Rees, G. (2017). Adults with aurism overestimate the volatility of the sensory environment. Nature Neuroscience, 20(9), 1293–1299. Levine, D. N. (1990). Unawareness of visual and sensorimotor defects: A hypothesis. Brain and Cognition, 13, 233–281. Levine, D. N., Calvanio, R., & Rinn, W. E. (1991). The pathogenesis of anosognosia for hemiplegia. Neurology, 41, 1770–1770. Lysaker, P. H., Carcione, A., Dimaggio, G., Johannesen, J. K., Nicolò, G., Procacci, M., & Semerari, A. (2005). Metacognition amidst narratives of self and illness in schizophrenia: Associations with neurocognition, symptoms, insight and quality of life. Acta Psychiatrica scandinavica, 112(1), 64–71. Marcel, A. J., Tegnér, R., & Nimmo-Smith, I. (2004). Anosognosia for plegia: Specificity, extension, partiality and disunity of bodily unawareness. Cortex,, 40, 19–40.

168  Aikaterini Fotopoulou and Sahba Besharati Martinaud, O., Besharati, S., Jenkinson, P. M., & Fotopoulou, A. (2017). Ownership illusions in patients with body delusions: Different neural profiles of visual capture and disownership. Cortex, 87, 174–185. Mathys, C., & Weber, L.. (2020) Hierarchical gaussian filtering of sufficient statistic time series for active inference. Communications in Computer and Information Science, 52–58. Mathys, C. D., Lomakina, E. I., Daunizeau, J., Iglesias, S., Brodersen, K. H., Friston, K. J., & Stephan, K. E.. (2014) Uncertainty in perception and the hierarchical gaussian filter. Frontiers in Human Neuroscience, 8(825). Merleau-Ponty, M. (1945). Phénoménologie de la perception. In Paris: Éditions Gallimard. English translation: C. Smith (Ed.), Phenomenology of perception. London: Routledge and Kegan Paul, 1962. Mesulam, M. M. (1999). Spatial attention and neglect: Parietal, frontal and cingulate contributions to the mental representation and attentional targeting of salient extrapersonal events. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 354(1387), 1325–46. Mograbi, D. C., & Morris, R. G. (2013). Implicit awareness in anosognosia: Clinical observations, experimental evidence, and theoretical implications. Cognitive Neuroscience, 4(3–4), 181–97. Mograbi, D. C., & Morris, R. G. (2018). Anosognosia. Cortex, 103, 385–386. Monai, E., Bernocchi, F., Bisio, M., Bisogno, A. L., Salvalaggio, A., & Corbetta, M.. (2020) Multiple network disconnection in anosognosia for hemiplegia. Frontiers in Systems Neuroscience, 14(21). Morrison, J. (2016). Perceptual confidence. Analytic Philosophy, 57(1), 15–48. Moro, V., Pernigo, S., Tsakiris, M., Avesani, R., Edelstyn, N. M., Jenkinson, P. M., & Fotopoulou, A. (2016). Motor versus body awareness: Voxel-based lesion analysis in anosognosia for hemiplegia and somatoparaphrenia following right hemisphere stroke. Cortex, 83, 62–77. Moro, V., Pernigo, S., Zapparoli, P., Cordioli, Z., & Aglioti, S. M. (2011). Phenomenology and neural correlates of implicit and emergent motor awareness in patients with anosognosia for hemiplegia. Behavioural Brain Research, 225, 259–269. Moro, V., Scandola, M., Bulgarelli, C., Avesani, R., & Fotopoulou, A. (2014). Error-based training and emergent awareness in anosognosia for hemiplegia. Neuropsychological Rehabilitation, 25(4), 593–616. Nardone, I. B., Ward, R., Fotopoulou, A., & Turnbull, O. H. (2007). Attention and emotion in anosognosia: Evidence of implicit awareness and repression? Neurocase, 13, 438–445. Pacella, V., Foulon, C., Jenkinson, P. M., Bertagnoli, S., Avensani, R., Fotopoulou, A., Moro, V., & Thiebaut De Schotten, M. (2019). Anosognosia for hemiplegia is a disconnection syndrome. eLife, 8, e46075. Palmer, C. J., Seth, A. K., & Hohwy, J. (2015). The felt presence of other minds: Predictive processing, counterfactual predictions, and mentalising in autism. Consciousness and Cognition, 36, 376–89. Panagiotopoulou, E., Filippetti, M. L., Tsakiris, M., & Fotopoulou, A. (2017). Affective touch enhances self-face recognition during multisensory integration. Scientific Reports, 7(1), 10. Pezzulo, G., Rigoli, F., & Friston, K. (2015). Active inference, homeostatic regulation and adaptive behavioural control. Progress in Neurobiology, 134, 17–35.

First-Person and Counterfactual Experiences of Selfhood  169 Prigatano, G. P. (2014). Anosognosia and patterns of impaired self-awareness observed in clinical practice. Cortex, 61, 81–92. Prigatano, G. P., & Schacter, D. L. (1991). Awareness of deficit after brain injury: Clinical and theoretical issues. New York: Oxford University Press. Ramachandran, V. S. (1995). Anosognosia in parietal lobe syndrome. Consciousness and Cognition, 4, 22–51. Romano, D., Gandola, M., Bottini, G., & Maravita, A. (2014). Arousal responses to noxious stimuli in somatoparaphrenia and anosognosia: Clues to body awareness. Brain, 137, 1213–1223. Saj, A., Vocat, R., & Vuilleumier, P. (2014). Action-monitoring impairment in anosognosia for hemiplegia. Cortex, 61, 93–106. Schacter, D. L. (1990). Toward a cognitive neuropsychology of awareness: Implicit knowledge and anosognosia. Journal of Clinical and Experimental Neuropsychology, 12, 155–178. Seth, A. K. (2013). Interoceptive inference, emotion, and the embodied self. Trends in Cognitive Science, 17, 565–573. Seth, A. K., Suzuki, K., & Critchley, H. D. (2012). An interoceptive predictive coding model of conscious presence. Frontiers in Psychology, 2, 1–16. Smit, M., Van Stralen, H. E., Van den Munckhof, B., Snijders, T. J., & Dijkerman, H. C. (2019). The man who lost his body: Suboptimal multisensory integration yields body awareness problems after a right temporoparietal brain tumour. Journal of Neuropsychology, 13(3), 603–612. Turnbull, O. H., Fotopoulou, A., & Solms, M. (2014). Anosognosia as motivated unawareness: The “defence” hypothesis revisited. Cortex; a Journal Devoted to the Study of the Nervous System and Behavior, 61, 18–29. van Stralen, H. E., van Zandvoort, M. J. E. E., & Dijkerman, H. C. (2011). The role of self-touch in somatosensory and body representation disorders after stroke. Philosophical Transactions of the Royal Society B Biological Science, 366(1581), 3142–3152. Vocat, R., & Vuilleumier, P. (2010). Neuroanatomy of impaired body awareness in anosognosia and hysteria: A multi-component account. The Study of Anosognosia, 21, 359–403. Vocat, R., Saj, A., & Vuilleumier, P. (2013). The riddle of anosognosia: Does unawareness of hemiplegia involve a failure to update beliefs? Cortex, 49(7), 1771–1781. Vocat, R., Staub, F., Stroppini, T., & Vuilleumier, P. (2010). Anosognosia for hemiplegia: A clinical-anatomical prospective study. Brain, 133, 3578–3597. Vogeley, K., & Fink, G. R. (2003). Neural correlates of the first-person-­perspective. Trends in Cognitive Sciences, 7, 38–42. Vogeley, K., May, M., Ritzl, A., Falkai, P., Zilles, K., & Fink, G. R. (2004). Neural correlates of first-person perspective as one constituent of human self-­ consciousness. Journal of Cognitive Neuroscience, 16, 817–827. Vuilleumier, P. (2004). Anosognosia: The neurology of beliefs and uncertainties. Cortex, 40, 9–17. Wolpert, D. M. (1997). Computational approaches to motor control. Trends in Cognitive Sciences, 1, 209–216.

7

Predictive Processing in the “Second Brain” From Gut Complex to Meta-Awareness Tony Cheng, Lynn Chiu, Linus Huang, Ying-Tung Lin, Hsing-Hao Lee, Yi-Chuan Chen, and Su-Ling Yeh

7.1  Predictive Processing, the Brain, and the “Second Brain” “Predictive processing” (PP) is frequently cited as one of the most widely debated unified theories of how the brain functions with regard to cognition and consciousness (Hohwy & Seth, 2020). It sets out to explain perception, action, and other psychological episodes in between, such as emotions, thoughts, bodily awareness, to name just a few. Within this framework, all the relevant aspects of the mind are explained with a single idea of “prediction error minimisation” (PEM) (Hohwy, 2013; for doubt, see Williams, 2020). What’s more, predictive processes are achieved by hierarchical active inference and precision weighting. The origin of such inferentialist ideas is often traced back to von Helmholtz (1867), McClelland and Rumelhart (1981), Miall and Wolpert (1996), Rao and Ballard (1999), but the present discussions of such framework initiated from Hinton (2007), Friston (2010), Hohwy (2013), and Clark (2013, 2016). There are many useful overviews of various aspects of this framework, for example, Metzinger and Wiese (2017), Wiese (2018), Kirchhoff and Kiverstein (2019), Cheng, Sato, and Hohwy (2023), where readers can find abundant details for references. Although the PP framework is quite controversial (see discussions Miłkowski & Litwin, 2022; Sims, 2016; Sun & Firestone, 2020), in this chapter we do not seek to defend it from objections. Rather, assuming such a framework is generally correct for the brain, we are going to explore the hypothesis that it might also be correct for the so-called second brain, i.e., the gut neuro-immune-endocrine network and its surroundings, especially gut microorganisms. It is well established that the gut can influence moods (Appleton, 2018) and many other psychological phenomena, as we shall see below, but whether it can be modeled by the PP framework is an open empirical question that has not been explored. In what follows, we will examine the issues and distinctions that need to be clarified in order to establish a PP framework for the “gut brain.” DOI: 10.4324/9781003084082-9

Predictive Processing in the “Second Brain” 171 To anticipate potential objections, we are not attempting to provide strong arguments for this hypothesis, but we believe that the exploration of this thesis is worthy of being taken seriously. The key theoretical motivation is that the “gut complex” – the integrated enteric nervous, immune, endocrine, and microbiota systems (NIEM, see Greslehner, Boem, Chiu, & Konsman, 2023) – has been argued to be the “first brain” that developed and evolved before the central nervous system (Furness & Stebbing, 2017) to organize an organism’s informational interface with its environment. In organisms with brains, its functionality and organization are largely preserved, connecting with the brain through the gut-brain axis. Our conjecture is that if PP is the right model for the brain, it should also be right for its predecessor and collaborator – the gut complex. This paper lays out the preliminary groundwork for such a project. In the next section, we will look into aspects of the gut complex from a PP perspective. Since our emphasis is on the hypothesis and its plausibility, the discussion of the gut microbiota and the gut systems will be highly selective and stay at the conceptual level. 7.2  The Gut Complex from a PP Perspective The gut is classically understood as a passive reflex system that controls the motility of the gastrointestinal tract (Hansen, 2003). In contrast, the gut, in our view, is a proto-sensory and cognitive system. The gut complex is not just the enteric nervous system but also the gut immune system, the gut endocrine system, and the gut microbiota system (Carabotti, Scirocco, Maselli, & Severi, 2015; Greslehner et al., 2023). That is why it is called the gut “complex.” It has been established that the systems work closely together – e.g., stimulations from microbiota can facilitate the developments and functioning of various components of the gut complex and vice versa. However, their joint networks and computational powers have not been fully investigated. In the past decade, there has been an explosion of research investigating the unexpected connections between the gut microbiota and the brain (Galland 2014) as well as psychological disorders and neuropathology (Dinan, 2022; Strandwitz et al., 2019). Some even call for a “psychobiotic revolution” (Anderson, Cryan, & Dinan, 2017); that is, to treat mental illness and mood disorders by directly treating the gut and its trillions of residential microbiotas. Such a revolution, however, is still far from finished. The majority of experimental work is based on rodent models, with no direct translation to human conditions, especially in clinical contexts (Dinan & Cryan, 2017). The purported causal associations between gut microbiome and the brain are also highly questionable, with key concepts such as causation (Hooks, Konsman, & O’Malley, 2019) and dysbiosis under heavy scrutiny (Hooks & O’Malley, 2017).

172  Tony Cheng, Lynn Chiu, Linus Huang, et al. However, with more and more works dedicated to the connecting mechanisms and pathways along the so-called microbiota-gut-brain axis, the hope is that by better understanding how the gut can influence the brain and hence the mind, and vice versa, we can better understand the role gut microbiota can play in the psychological life of their host individuals. This “axis” is currently the main framework used to clarify the bidirectional influence between gut microbiota and the brain. Traditionally, research on the gut-brain axis concerns the pathways that transmit brain signals to the gut and vice versa. The variety of pathways that constitute the axis include the vagus nerve, the sympathetic and parasympathetic autonomic nervous systems, and connections through the neuroendocrine and immune systems (Breit, Kupferberg, Rogler, & Hasler, 2018). The microbiota-gut-brain axis is an extension of the gut-brain axis. Research in this area elucidates how gut microbiota can impact the brain through the gut-brain axis (e.g., through directly produced and host-derived metabolites and neurotransmitters) and conversely, how brain signaling can impact gut microbiota through the gut-brain axis either directly (e.g., through neurotransmitters) or indirectly (e.g., by affecting the gut environment through changes to gut motility). A common but imperfect conceptual framework of the microbiota-gut-brain axis is an input-output communication model, a framework that models the connections between the gut microbiota and the brain as transmission pathways that transit and modulate inputs/outputs from the brain and the gut. This “input-output transmission model” treats gut microbiota and the brain as signal sources and destinations. The other components of the gut-brain axis – the enteric nervous system, the gut endocrine and immune systems, other gut microbiota, etc. – are transmitters that connect the two (e.g., Grenham, Clarke, Cryan, & Dinan, 2011). Under this input-output model, the nervous, immune, endocrine, and microbiota systems of the gut are not seen as processors in their own right, but a medium that carries and modulates signals from the brain or the gut microbiota. Beyond the input-output model, we propose that the gut complex is more complex than usually assumed. The gut complex – the enteric nervous, endocrine, immune, and microbiota systems that co-develop, co-evolve, and co-function (the NEIM system, Greslehner et al., 2023; Boem, et al. 2023) – is a crucial node that not only modulates and mediates but also initiates and processes incoming information from the luminal contents of the gut and the brain. The deep integration betwen the component parts of the gut complex has led to hypotheses that the complex constitutes a proto-cognitive system (Boem et al., 2023). To better understand the psychological and the psychiatric influences of gut microbiota, we need to take seriously the potential psychological roles of the gut complex.

Predictive Processing in the “Second Brain” 173 The gut complex pre-exists the evolution of the brain, and this might be one reason why its psychological role has been ignored: traditionally, what they do are thought to be reflexes only. But as we know, gut complex is a crucial element of the interoceptive system (Mayer, 2011), so it is quite unlikely that it has nothing to contribute psychologically. The surprising connection between gut microbes and psychological capacities has reignited interests in the psychological role of the gut. Research on the gut-brain-microbe axis reveals new and non-neutral signaling pathways (Bauer, Huus, & Brett Finlay, 2016). The gut modulates multicellular organism-environment relations and sensory motor responses (Furness & Stebbing, 2017), a likely proto-cognitive system. Now, is the gut still operating like a proto-cognitive system with the advent of the central nervous system? Does it have a “mind” of its own, however primitive? Is the gut a sensory or perceptual organ? Is it an originator of sensory states, perceptual states, emotive states, or motivational states? Does it count as interoception or exteroception? Is there a “whatit-is-like” that involves the gut as an information-processing center? What kinds of psychological states directly involve the gut? Can we become attentive to or even exert control over gut processes, e.g., through mindfulness and meditative practices? Is there a distinct “gut feeling” that can be assessed through meta-cognition and meta-awareness? We are unable to answer these questions at this stage, but these are questions that serve as backgrounds for further research. However, at least we do know this much: the gut is often treated as a “black box” operating well beyond our awareness and conscious experiences; its activities entering awareness only when something has gone wrong (Leder, 1990). Part of the reasoning is that the gut operates autonomously through reflex feedback loops. Conscious interventions are not needed for the gut to work. Yet the gut does not just invoke literal gut feelings, but it is also involved in emotions, the sense of self, etc. But before going up to those levels, we must remember that gut sensations are primarily low-level. Visceral pains (including inflammation), gut distension (discomfort pains), and gut motility are typical types of low-level gut sensations (Farmer & Aziz, 2013). There are different ways of measuring awareness of low-level gut sensations. Rectal distension test invokes responses to balloon distension in the rectum (sensitivity test). From these low-level sensations, higher level psychological states can be generated. For example, there is a strong connection between gut motility and emotions such as anger, stress, and excitement. The key question is whether we can distinguish emotions that arise from or relate to the gut. There are also ways of manipulating gutemotion relations, such as hypnotic relaxation interventions on balloon distension test (Houghton, Calvert, Jackson, Cooper, & Whorwell, 2002).

174  Tony Cheng, Lynn Chiu, Linus Huang, et al. As for visceral pains, gut microbiota can modulate sensitivity (visceral hypersensitivity) and nociception to pains, with dramatic impacts on normal pain sensations (Lomax, Pradhananga, Sessenwein, & O’Malley, 2019; Pusceddu & Gareau, 2018). How are these related to PP? Among the various signals that contribute to interoception, the role of the immune system has been investigated a bit more under the PP framework. For instance, Bhat, Parr, Ramstead, and Friston (2021) apply active inference to the immune system and introduce the notion of “immunoceptive inference.” Analogous to the nervous system, the immune system can be seen as “furnishing predictions of – and acting upon – sensory input, forming ‘beliefs’ about whether an antigen belongs to the category of ‘self’ or ‘non-self’.” The immune system and the central nervous system, with the hypothalamus as the interface between them, are seen as parts of a larger Markov blanket (i.e., the statistical boundary or information separation of a system) and jointly optimize a shared generative model. Another example is in Kiverstein, Kirchhoff, and Thacker (2022)’s embodied PP theory of pain experience. According to their theory, the immune system, the neuroendocrine system, and the autonomic system work as a complex adaptive system, the neural endocrine immune (NEI) system, in a coordinated and coherent manner to maintain homeostasis. Pain is the outcome of PP that takes place not just in the brain but also in the entire neural axis in continuous reciprocal interaction with the systems that compose the NEI system. These processes function by predicting the states that must be maintained within a range of values consistent with the integrity of the body, as well as correcting for prediction errors when they arise. Individually, the immune system, autonomic system, and endocrine system can correct prediction errors in their own ways, through inflammatory responses of the immune system and stress responses of the endocrine system and the autonomic nervous system. Moreover, the immune system can get used/adapted to repetitive stimulations, and this fits well with the PP framework. What about gut complex? How do we understand its role from the perspective of PP? This is an issue that is yet to be explored. Khalsa, Berner, and Anderson (2022) adopt a PP approach to understanding gastrointestinal interoception in eating disorders. They consider food consumption a process that involves a series of anticipatory processing steps from the cephalic phase, the gastric phase, to the intestinal phase. The cephalic phase involves exteroceptive senses (e.g., viewing and smelling) and motor actions (e.g., chewing and swallowing), whereas various episodes of interoception (e.g., esophageal interoception, colorectal interoception) are involved in the following phases. They suggest that eating disorder arises from abnormal interoception which can manifest at each step “via

Predictive Processing in the “Second Brain” 175 dysregulated bottom-up and top-down neural circuit interactions influenced by innate and developmental predisposing factors and various cognitive, valuative, and affective functions” (p. 51). Now what is predictive error in this context? Any cerebral/physiological input can be elements of gut complex predictions. With these in mind, we now move one step up from the gut complex to interoception more generally. This will pave the way for a further step up to meta-awareness and other meta-states. 7.3  A Step Up: Homeostasis and Interoception Interoception underlies not just homeostatic reflexes but also motivational states and emotional reactions (Bonaz et al., 2021). These higher level gut feelings may not be made aware of or originating from the gut. One key difficulty with studying interoception is that it is unclear to us – the experiencing subjects – what is going on and where. The feelings tend to be indistinct and barely perceptible. Interoception, compared to exteroception, has reduced qualities, is spatially ambiguous, and exhibits spatiotemporal discontinuity (Leder, 2005). While interoception has a limited set of sensations compared to exteroception, the visceral perceptions are even cruder. At most, we can only directly experience our intestines and stomach cramping, bloating, moving, or defecating. We can feel the heart burn of an acid reflux in the esophagus and the gripping pain and rumbles of hunger or diarrhea. Even these experiences, however, are difficult to pinpoint. Is it gut-related? Where is it happening? Why is it happening? Which kinds of pain or tightness are happening? Although these questions are hard to have answers, the signaling pathways of interoception have been well studied. Traditionally, not just vagal/sympathetic afferent fibers have been documented, but now also microbes and their metabolites are studied (Mayer, Nance, & Chen, 2022), including food-related metabolites, endogenous metabolites from the microbes, and signals from microbial components (cell wall). As we already know, PP is a kind of Bayesian framework, which operates on certain priors (see, e.g., Hohwy, 2013, pp. 34–37, on Bayes rule). As Allen and Tsakiris (2019) point out, “Bayesian inference is always bootstrapped from within the subjective needs of the agent and its embodied econiche within the world (Bruineberg, Kiverstein, and Rietveld, 2016; Bruineberg and Rietveld, 2014)” (p. 30). They go on elaborating visceral influences on precision-weighted inferences. In particular, they discuss interoceptive (including visceral) predictions and how they contribute to body ownership, the metacognitive self-model, and the multisensory self. Below we will follow their general guideline that first priors of PP are in the body. Let’s go over some more elements in Allen and Tsakiris (2019).

176  Tony Cheng, Lynn Chiu, Linus Huang, et al. They focus on “interoceptive predictive processing” (IPP, Herbert & Pollatos, 2019; Quadt, Critchley, & Garfinkel, 2019), which is relatively rare in the general PP literature. They mention several varieties of embodied sensations, such as “the beating of the heart, air filling the lungs, or the skin of the chest pulling taught with each breath” (ibid., p. 27), but they do not mention what the gut complex would contribute to our consciousness, which we have covered in the previous section. In the past, interoception was not often accommodated in the PP framework, but the situation has changed recently: for example, cardiac and respiratory control and related functions have been incorporated into the hierarchical optimization of precision in visceromotor brain areas (Sennesh et al., 2022). There are also “ways in which visceral, tactile, and proprioceptive bodily signals might be afforded some privileged status within the cortical hierarchy” (Allen & Tsakiris, 2019, p. 30). When we say one’s body contains first priors, what are more specific are homeostatic first priors: “salience is literally defined by whatever has the most (or least) impact on visceral and automatic homeostasis” (Allen & Tsakiris, 2019, p. 31), which goes beyond the similar “somatic marker hypothesis” (Damasio, 2005). Such hypothesis allows us to integrate various topics that were studied separately in the past, including the multisensory self and body ownership (Apps & Tsakiris, 2014), and leads to the suggestion that “if exteroceptive influences highlight the malleability of body awareness, interoceptive signals seem to serve the stability of body awareness in response to exteroceptive stimulation, reflecting a psychological consequence of the biologically necessary function of homeostasis” (Allen & Tsakiris, 2019, p. 36). They further relate this to the distinction between meta-cognition and visceral precision, which is analogous to the distinction of metacognitive confidence and perceptual precision in vision science (Cheng, 2022a; Denison, 2017; Morrison, 2016, 2017). Finally, Allen and Tsakiris (2019) consider the ontogenetic origins of interoceptive precisions. This is “about the ontogenetic development of self-models the lifespan, as well as about the role of visceral signals for social awareness” (p. 39). This remains poorly understood, but as we have seen above, the gut complex can be accommodated within the PP framework. Microbiota and PP homeostasis maintenance have links in the following ways: visceral influence as an indicator of status of host-microbiome relations; host-microbiome relations as a part of the interoceptive and hence bodily self (see Section 7.5); also, host and microbiome’s homeostasis may be different from other homeostasis from other body parts. Homeostasis is therefore a very important prior in our overall PP architecture. In the remainder of this section, we will look into three levels at which one can analyze interoception (de Vignemont, 2018): the physiological level, at which interoceptive signals originating from internal organs; the

Predictive Processing in the “Second Brain” 177 phenomenological level, at which interoceptive sensations or feelings (e.g., feeling one’s bladder stomach full; feeling hungry, thirsty, satiety, nausea, etc.) occur; and the introspective level, which includes introspective accuracy (as measured by heartbeat counting, for example), introspective sensibility (as measured by the confidence in one’s interoceptive accuracy), and interoceptive awareness (as measured by the relationship between accuracy and sensibility; Garfinkel, Seth, Barrett, Suzuki, & Critchley, 2015). Let’s begin with interoceptive signals originating from internal organs. Sometimes it is invoked to define interoception: “Interoception is the body-to-brain axis of signals originating from the internal body and visceral organs (such as gastrointestinal, respiratory, hormonal, and circulatory systems)” (Tsakiris & de Preester, 2019, emphasis added). It should be noted, however, that not all definitions of interoception invoke interoceptive signals explicitly, though arguably they might be behind the scene. Let’s see the four definitions summarized by Frédérique de Vignemont (2019, p. 260): The sole-object definition: Interoception consists of information that is exclusively about one’s own body. The insider definition: Interoception consists of information about what is internal to the body and not about what is at its surface. The regulator definition: Interoception consists of information that plays a role in internal regulation and monitoring. The visceral definition: Interoception consists of information coming from internal organs. As de Vignemont points out, none of the above definitions is free of problem. For our purposes, though, what is crucial is that although none of them mentions “signal” explicitly, they all rely on certain notions of “information.” To be sure, signals and information are different: while information is often understood in Shannon’s term (Shannon, 1948), which conceptualizes information as a basic quantity calculated from the probability of a particular event occurring from a random variable, signal is often understood with the contrast of the signal-to-noise ratio (SNR) in the context of signal detection theory (SDT; Tanner & Swets, 1954; for the relation to the current topic, see Allen et al., 2016). They are related in the sense that SDT seeks to measure the ability to differentiate between information-bearing patterns and random patterns that distract from the information (i.e., noises). In this way, we see that signals in the relevant sense are implicated in all the definitions. When we speak of “bodily priors,” we are referring to interoceptive signals, which might be one of the phylogenetically oldest of all sensory motor systems, and they “rise and fall according to a highly predictable pattern of slowly oscillating

178  Tony Cheng, Lynn Chiu, Linus Huang, et al. circadian and hormonal biorhythms” (Allen & Tsakiris, 2019, p. 33). As first priors, interoceptive signals might provide the necessary basis of our bodily phenomenology indicating that it is the same body all along. The appraisals of interoceptive signals have significant consequences for subjective well-being (Farb & Logie, 2019). A recent meta-analysis of the relevant fMRI studies showed significant overlaps in systems co-activated by interoceptive signals, emotional regulations, and low-level social cognition (Berntson, Gianaros, & Tsakiris, 2019), and converging evidence can also be found in lesion studies (Adolfi et al., 2017). It has also been pointed out that interoceptive signals and exteroceptive signals are integrated to produce a unified sense of the bodily self (Tsakiris, Tajadura-Jiménex, & Costantini, 2011; also see the idea that visceral inputs as self-specifying signals, Babo-Rebelo & Tallon-Baudry, 2019). In addition to the above contributions, interoceptive signals also shape our perception or awareness of time (Wittmann & Meissner, 2019). For more on the neurobiology of interoceptive signals and gut feelings, see Aziz and Ruffle (2019). What about the phenomenological level, at which interoceptive sensations or feelings occur? Again, Aziz and Ruffle (2019) critically discuss the bidirectional brain-gut axis, which underpins the so-called gut feelings, and this nicely connects back to what we have covered above. Visceral sensations are the dominant basis of both perceptual- and valued-based computations (Allen & Tsakiris, 2019). According to their model, visceral sensations are regarded as equal with other sensory motor hierarchies, and selfhood “emerges solely from the metacognitive nature of the deep hierarchy” (ibid., p. 36). Within the hierarchy, visceral sensations often have certain privileged status, as the relevant physiology allows for hyperprecision. This kind of “high priority of visceral sensations may provide a naturally reliable ‘anchor’ on which to lodge a more permanent feeling of bodily self” (ibid., p. 41). In unfortunate cases, some of these sensations can be anxiety-related (Khalsa & Feinstein, 2019) and also correlate with eating disorders (Herbert & Pollatos, 2019), though “training interoceptive accuracy could lead to a reduction in [anxiety] symptoms” (Quadt et al., 2019, p. 138). “Interoceptive emotional evaluation (IE) represents subjective appraisal related to sensations and perceptions of interoceptive signals” (ibid., p. 166). They also point out that under special circumstances, interoceptive sensations can be shut down due to aversive body appreciation and negative evaluation of interoceptive cues. In relation to PP, several models have suggested that awareness of symptoms can be resulted from “an automatic process across several hierarchical levels in which the brain interprets interoceptive sensations, informed by ‘predictions’ (priors) about the cause of the sensations” (Van den Bergh, den Zacharioudakis, & Petersen, 2019, p. 220).

Predictive Processing in the “Second Brain” 179 Now finally we look into the introspective level, which primarily includes introspective accuracy as measured by (say) heartbeat counting and introspective sensibility as measured by the confidence in one’s interoceptive accuracy. “Interoceptive accuracy refers to the objective measure of how well people perform on interoceptive tasks, such as heartbeat discrimination or counting tasks” (Quadt et al., 2019, p. 126). Some have argued that it “requires a reflective act of attention that turns to one’s bodily signals in order to perform a cognitive task (e.g., counting one’s heartbeat)” (de Vignemont, 2019, p. 261). It is well known that psychological studies of interoceptive awareness mainly invoke tasks which quantify the accuracy levels in detecting single heartbeats. It has been shown decades ago “how higher levels of interoceptive accuracy that is typically quantified in behavioral tasks that require participants to pay attention to interoceptive states such as heartbeats (Schandry, 1981), respiration (Daubenmier et al., 2013), or feelings of fullness and gastric sensitivity (Herbert et al., 2012) influence emotional processing” (Berntson et al., 2019, p. 14). More recent studies further indicate that right anterior insula activities correlate with performances in interoceptive accuracy tasks (Critchley, Wiens, Rotshtein, Ohman, & Dolan, 2004). It has also been shown that “[a]ctivation in the right anterior insula predicted accuracy in the heartbeat-detection task” (Wittmann & Meissner, 2019, p. 68). The insula and anterior cingulate cortex (ACC) play crucial roles for the joint processing of interoception and emotion. Moreover, “levels of explicit interoceptive accuracy were negatively correlated with the strength of the illusory experience of alterations in self-awareness such that individuals with lower interoceptive accuracy tend to experience a stronger RHI [rubber hand illusion] (Schauder, Mash, Bryant, & Cascio, 2015; Tsakiris et al., 2011)” (Allen & Tsakiris, 2019, p. 35). This might suggest that with less good interoceptive accuracy, one’s self-model would be predominantly exteroceptive. While for those who exhibit higher interoceptive accuracy, they tend to perform more accurately in the duration-reproduction task (Wittmann & Meissner, 2019). Similarly, “the higher the interoceptive accuracy, the more sensitive one is to the emotions of another person (Terasawa et al., 2014)” (Quadt et al., 2019, p. 128). And higher interoceptive accuracy is also indicative of higher anxiety traits (Domschke, Stevens, Pfleiderer, & Gerlach, 2010). However, this seems to be incompatible or at least in tension with the idea that “higher interoceptive accuracy (precision) is suggested to be critical for ensuring psychophysiological stability of the organism and the bodily self as well as its embodied unity in a changing environment with alternating challenges (Herbert and Pollatos, 2012; Tsakiris, 2017)” (Herbert & Pollatos, 2019, p. 177). This is an empirical controversy that remains to be tackled further.

180  Tony Cheng, Lynn Chiu, Linus Huang, et al. This completes our all-too-brief and selective review of interoceptive PP and the three levels of interoception. The aim of this section is not to offer anything like a comprehensive overview; it only serves to complement our earlier discussions of the hypothesis that the gut complex as a predictive processor, and to pave the way for another step up to various meta-states. 7.4  Another Step Up: Meta-Awareness and Other Meta-States In the previous sections, we have seen that there are many current discussions of gut complex and interoception, and it is no trivial matter to settle on any of those issues. Now when it comes to the meta-level, things can get even messier. One reason is that at this level there are multiple terminologies, and each of them has multiple definitions. In what follows, we will not aim for comprehensiveness; rather we will selectively focus on three terms, and on only some of their definitions. The goal here is to figure out several viable options in minding the gut, and how researchers can/should go about conducting further studies. Note that sometimes gut feelings themselves are taken to be meta-cognition for other experiences. What we will focus on, however, is a yet meta-level accompanying gut feelings themselves. Let’s begin with meta-awareness. According to Chin and Schooler (2009), “Meta-awareness is a state of deliberate attention toward the contents of conscious thought, serving as an appraisal of experiential consciousness” (p. 33). So first of all, there is deliberate or endogenous attention involved according to this definition. Secondly, the target here is conscious thought, as one main notion of that paper is mind-wandering, which tends to be regarded as a kind of thought. However, there is no principled reason why the targets need to be restricted like that. So if we extend the definition to encompass other conscious episodes, specifically gut feelings, we can say that meta-awareness in this area is deliberate attention toward gut feelings, serving as an appraisal of its experiential consciousness. There are of course other definitions or characterizations of metaawareness, and they are often appropriated in ways that reflect the subject domains. For example, in the context of mindfulness (Dunne, Thompson, & Schooler, 2019), it has been proposed that at least for mindfulness, meta-awareness is non-propositional, and therefore less cognitive.  This might be one way to separate meta-awareness and meta-cognition: while the former can be less or non-cognitive, the latter is itself a cognitive state (to be defined). What about meta-cognition? This term has been more widely used in the literature, so it is more difficult to pin it down given the diversity. For example, according to Metcalfe and Shimamura (1994), this term was introduced in the 1970s (e.g., Brown, 1978; Flavell, 1976, 1979) to

Predictive Processing in the “Second Brain” 181 “characterize changes in self-reflection during early development,” and later it “has been used to describe our knowledge about how we perceive, remember, think, and act; that is, what we know about what we know” (p. xi). Is this significantly different from meta-awareness introduced above? If we agree that the targets can be any first-order mental states and episodes, including not only perception, memory, thought, and act but also mind-wandering and gut feelings, the above definitions or characterizations do not conflict with one another. How about the higher order, i.e., meta-states or episodes? If we agree that “knowledge” here includes both propositional and non-propositional varieties, again there is no salient conflict in these different definitions. How about other definitions? Proust (2014) offers a provisional one as follows: meta-cognition “is the set of capacities through which an operating cognitive subsystem is evaluated or represented by another subsystem in context-sensitive way” (p. 4). With this one, we can say that gut feelings are the targets being evaluated or represented (depending on different theories; e.g., “appraisal” in Chin & Schooler, 2009), and what is higher order or meta is another subsystem that is doing the representing or evaluating. Note that in this and many other definitions of meta-cognition, consciousness or feelings are not explicitly mentioned. This is crucially different from whatever characterizations of meta-awareness, as they always have consciousness or feelings as their core component. Therefore, one might hold that there is at least one feature that sets apart meta-awareness and meta-cognition: namely, while the former is itself a conscious experience, the latter does not have to be conscious, though it might be. But things are not that simple: in the meta-cognition literature, a common notion invoked is “perceptual confidence.” For example, in the case of vision, “[o]ur visual perception is typically accompanied by a sense of subjective confidence” (Koizumi, Maniscalco, & Lau, 2015). Again, the targets do not have to be restricted to vision. For our purposes, the definition can be appropriated as follows: our gut feelings are sometimes accompanied by a sense of subjective confidence. Here “typically” is weakened into “sometimes,” as arguably our sense toward gut feelings is often less explicit than vision and perhaps some other exteroceptive senses. “Accompanied by” can be seen as deliberately neutral between evaluation and representation in Proust (2014), and researchers have debated about whether this perceptual confidence is an integral element of the perception itself, as indicated in the previous section (Beck, 2019, Cheng, 2022a; Denison, 2017; Morrison, 2016, 2017). Now, the definition also includes “subjective confidence,” and with any sensible reading of this term, “subjective” is intended to capture consciousness or feelings. So under this common construal, perceptual confidence, or more generally metacognitive confidence, is itself conscious or experiential. If

182  Tony Cheng, Lynn Chiu, Linus Huang, et al. this is true, then the tentative view introduced in the previous paragraph is challenged: the difference between meta-awareness and meta-cognition as they are used in the literature cannot be simply that while the former is by definition itself a conscious episode, the latter does not have to be. The reason is that a crucial component of meta-cognition is the relevant kind of confidence, which is itself defined as a kind of subjective experience or feeling. It is possible that we have this paradoxical consequence only because the definition from Koizumi et  al. (2015) is biased, but actually other canonical definitions of perceptual confidence mention feelings too. For example, Balsdon, Wyart, and Mamassian (2020) also have it that it is the “feelings of confidence that reflect the likelihood that the [perceptual] decision was correct” (p. 1; emphasis added). Also see Peters et al. (2017), where the authors state that “perceptual experiences are accompanied by a subjective sense of ­certainty” (p. 1; emphasis added). Of course it is possible to find definitions that do not include anything like consciousness, feelings, or subjectivity for perceptual confidence (and therefore meta-cognition), but it is unclear that those ones would be representative for the relevant literature. The tentative moral is this: while on the face of it, meta-awareness is itself conscious, and metacognition is not – however, the latter is often (though not always) characterized as a kind of conscious episode too. What is the difference, then? Perhaps the difference lies in confidence: while meta-awareness does not usually include confidence in its definition, meta-cognition often (though again not always, e.g., Proust, 2014) includes confidence as its component. These points can be schematized as follows: Meta-awareness: Always conscious, but is not usually characterized as a kind of confidence. Meta-cognition: Often characterized as conscious and in terms of confidence. If this is roughly correct, we can see that the relation between the two is complicated: they cannot be said to be identical, but they have significant overlaps in their characters. Whether researchers have picked out different psychological mechanisms in using these two terms is an empirical question. The situation is further complicated by the fact that there is yet another term, “metacognitive feelings,” that is used in the literature. In the context of developmental psychology, it is postulated for “the transition from infant’s earliest abilities concerning physical objects to the acquisition of knowledge of simple facts about physical objects” (Butterfill, 2020, p. 73). Arguing against the popular core cognition proposal (Carey, 2009), Butterfill holds that “4-month-olds’ abilities are based on a combination of object indexes, motor representations of objects and

Predictive Processing in the “Second Brain” 183 metacognitive feelings of surprise” (2020, p. 87). Here is how Butterfill characterizes metacognitive feelings: Metacognitive feelings are, or involve, aspects of the overall phenomenal character of experiences which their subjects take to be informative about things that are only distantly related (if at all) to the things that those experiences intentionally relate the subject to. And there is no further phenomenal feature of metacognitive feelings. (p. 80) According to Butterfill, prominent examples of this kind of feelings include “the feeling of familiarity, the feeling that something is on the tip of your tongue, the feeling of confidence and the feeling that someone’s eyes are boring into your back” (ibid., p. 226). What should we make of this for our purposes? To begin with, it is a kind of feeling with phenomenal characters, so like meta-awareness, it is by definition conscious. Moreover, the feeling of confidence is one example of metacognitive feelings, so it is broad enough to encompass (but is not restricted to) subjective confidence. Therefore, one possibility is that this general notion of metacognitive feelings can accommodate both meta-awareness and meta-cognition discussed above. Some have emphasized the epistemic character of metacognitive feelings, i.e., they bring with feelings of correctness and wrongness. For example, Arango-Muñoz (2014a) has proposed that metacognitive feelings are in effect epistemic feelings (E-feelings for short), which “are phenomenal experiences that point towards mental capacities, processes, and dispositions of the subject, such as knowledge, ignorance, or uncertainty (de Sousa, 2008; Dokic, 2012; Arango-Muñoz, 2014b)” (p. 145). So again, metacognitive feelings are conscious, and they can be (though do not have to be) about confidence, which goes hand in hand with uncertainty. He also usefully lists specific examples with references: “the feeling of knowing (henceforth FOK; Reder & Ritter, 1992; Koriat, 2000), the feeling of confidence (Brewer & Sampaio, 2012; Koriat, 2008), the feeling of error (Fernandez Cruz, Arango-Muñoz, & Volz, 2016), the feeling of forgetting (henceforth FOF; Halamish, McGillivray, & Castel, 2011), and the tipof-the-tongue experience (henceforth TOT; Schwartz & Metcalfe, 2011)” (p.  146); more can be found in Arango-Muñoz and Mchaelian (2014). Here they do not include gut feelings as targets, but of course, they are not excluded in principle. So if we follow this general strand, we can tentatively operationalize our target phenomenon as follows: Metacognitive feelings about gut feelings are meta- or higher-order mental episodes that point towards gut feelings, and these meta- or higher-order mental episodes are both conscious and epistemic.

184  Tony Cheng, Lynn Chiu, Linus Huang, et al. Now, even if we tentatively settle with this operational definition, the metaphor “point towards” needs to be cashed out. This will be the main task of the remainder of this section, and candidates include evaluating, representing (Proust, 2014), accompanying (Koizumi et al., 2015), and monitoring (Block, 1995). Let’s take stock. Our subject matter now is meta-awareness over gut feelings. In Section 7.2, we have covered gut complex and the “axis,” and in Section 7.3, we have covered interoception and related phenomena. Now we want to figure out potential relations between meta-states and the gut, together with the feelings or experiences generated by it. In what follows, we discuss four candidates: meta-awareness accompanies, represents, monitors, or evaluates gut feelings. Note that these candidates do not exclude one another, at least conceptually: A can represent and monitor B at the same time, for example (more on this below). To begin with, accompaniment is a rather weak relation and seems to appear in various literatures. For example, as mentioned above, it has been claimed that “[o]ur visual perception is typically accompanied by a sense of subjective confidence” (Koizumi et al., 2015; emphasis added). What exactly the relation is seems to be up for grabs. In the context of theories of consciousness, it has been proposed that a first-order state is conscious when it is suitably accompanied by a higher order state (Rosenthal, 2005), and again readers are left wondering what more we can say to cash it out. Shadows of my body often accompany my bodily actions, but shadows themselves do next to nothing relevant to the actions. It is actually so weak that at least one different theory of consciousness – self-representationalism (Kriegel, 2009) – can readily endorse this claim concerning accompaniment. In the history of philosophy, Kant famously claims that “the ‘I think’ must be able to accompany all my representations” (1787/2007, B131; emphasis added). It is clear from the context that this notion of accompaniment is a limiting notion: the “I think” needs to have certain structural feature – for Kant, formal – in order to fulfill the possibility of accompanying all one’s representation. From these samples, it is safe to conclude that accompaniment by itself, even if not wrong, is too weak a notion to specify the relation(s) between meta-awareness and gut feelings. Without being further supplemented, this notion serves only as a placeholder at best. How should we make progress here? Again, in theories of consciousness, a typical move taken by higher order theorists is to hold that the relevant higher order state represents the target state. This is why they are sometimes called “higher-order representationalism” (Mehta, 2013). Invoking representations, though useful, can also incur many theoretical troubles. Let’s take a practical approach, starting with examples. Examples of potential cases of representations include tree rings representing trees’ ages,

Predictive Processing in the “Second Brain” 185 thermometers representing surrounding temperatures, linguistic expressions representing things and phenomena, and mental episodes representing all sorts of targets, including other mental episodes. Now, should we say meta-awareness represents gut feelings? A usual way to approach this question is to see whether we can sensibly ascribe correctness conditions to the putative states or episodes that are doing the representing (Shea, 2018). Another way to put this is to say that the possibility of misrepresentation is constitutive of the representational relation (Dretske, 1986). It seems that it is at least initially plausible to think that meta-awareness represents gut feelings: gut feelings are themselves conscious episodes that have various phenomenal features, and the relevant meta-awareness can misrepresent them as lacking those phenomenal features or having other phenomenal features. A further question, if we grant this point tentatively, is in what ways meta-awareness represents gut feelings. As indicated at the beginning of our discussion, representations can be propositional or non-propositional, and relatedly they can be conceptual or nonconceptual (Bermúdez & Cahen, 2020). In the empirical literature, a more familiar way of conceiving the relation is to ask whether this relation of representation is cognitive or not (e.g., APA dictionary of psychology, online). It seems reasonable to hold that meta-awareness can represent gut feelings cognitively or non-cognitively, depending on occasions. When one thinks of one’s gut feelings explicitly, one might be deploying one’s cognitive ­resources – such as concepts and working memories – to represent those target gut feelings; but when one is implicitly aware of one’s gut feelings, it might be possible at least in principle that one can represent one’s gut feelings without invoking cognitive resources such as concepts. It is, to be sure, an empirical question whether such non-cognitive meta-awareness exists in us, and whether it exists in human infants and non-human animals (Rochat, 2003, 2015; Smith, Zakrzewski, & Church, 2016). This brings us to the third candidate: meta-awareness monitors gut feelings. Monitoring something in the relevant sense seems to be cognitive exclusively (e.g., in a different sense, pituitary gland monitors the concentration of one’s thyroid hormone in non-cognitive ways), so when meta-awareness non-cognitively represents gut feelings, the monitoring connection seems to be out of the question. However, when metaawareness cognitively represents gut feelings, it opens the possibility that meta-awareness is also monitoring gut feelings. The relation here is not entailment: A can certainly (cognitively) represent B without monitoring B; e.g., the expression “being green” represents green properties without monitoring them. However, it seems right to say that for A to monitor B, A has to in some way cognitively represent B first; cognitive representation is a precondition of monitoring, though this can be contested by positions such as naïve realism (Martin, 2002). Setting this slight worry

186  Tony Cheng, Lynn Chiu, Linus Huang, et al. aside, let’s again take inspiration from theories of consciousness: a version of higher order theory of consciousness has it that higher order states monitor first-order states and thereby make the latter conscious (Lycan, 1995). Relatedly, in addition to phenomenal and access consciousness, it has also been proposed that monitoring consciousness is yet another variety of consciousness (Block, 1995). Back to our context, it seems right to say that meta-awareness can certainly monitor gut feelings, and this presupposes that the former cognitively represents the latter, but importantly, meta-awareness can represent gut feelings non-cognitively, and in those cases, meta-awareness does not monitor gut feelings. This echoes our discussion of the definition offered by the Schooler group, where they invoke “deliberate attention” to define metacognition: the monitoring act requires not only cognitive representation but also deliberate attention. Here we need to bear in mind that attention comes with many varieties, including endogenous vs. exogenous, focused vs. diffused, and so on (Wu, 2014); not all of them should be equated with deliberate attention. Finally, evaluation seems to be something even more advanced, as it is possible to represent and monitor something without evaluating it. Evaluation seems to be a sophisticated mental act that is distinctively human. Do other animals evaluate their inner states and episodes? This might depend on what we mean by that term. But normally, we think of other animals as locked in their immediate needs: they are drawn by affordances (Gibson, 1979) and solicitations (Merleau-Ponty, 1945/2012) and follow those flows without evaluating whether, say, some diets are better for one’s health. Of course, we can have a less demanding gloss of evaluation, in which other animals evaluate (say) how hungry they are now and whether they should search for at this point. In any case, it seems reasonable to assume that only humans like us can respond to reasons as such (McDowell, 2005): only we can step back and reflect on what we are undergoing, as it were. If meta-awareness evaluates gut feelings at all, it seems to involve sophisticated mechanisms and cognitive resources (Proust, 2014). This completes our discussions of meta-states and ways they relate to gut feelings generated by the gut complex. In the final section, we will conclude with some relevant ramifications concerning the bodily self. 7.5  Toward the Bodily Self In this concluding section, we will end with a further hypothesis that one’s self arises from the processing of a functioning brain and the second brain, bracketing revisionary ideas such as the extended self (Heersmink, 2020) and abstruse philosophical questions such as how a particular person can be me (Nagel, 1986). Given that assumption, modeling the brain and the

Predictive Processing in the “Second Brain” 187 gut in a unified way and seeing how that modeling can give rise to the self, specifically the bodily self, become natural. “The bodily self” as a notion has a long history (for various discussions, see Bermúdez, 2018). A complication is that it is often intertwined with another heated notion, the “embodied self” (Cassam, 2011; Gallagher, 2005), or the “embodied mind” more generally (Varela, Thompson, & Rosch, 1991). These terms have much theoretical baggage that we shall not go into. The most basic idea is that the human self is arguably always embodied: for the self to exemplify its psychological capacities, it has to do it with its physical body. Whether the human self is necessarily or essentially embodied, i.e., whether brains in vats do not sustain true selves, is a controversial matter that we will not go into on this occasion. The bodily self has many dimensions, such as a certain kind of nonconceptual self-consciousness, bodily immunity to error through misidentification, body ownership, and so on (in addition to Bermúdez, 2018, also see Cheng, 2022b; de Vignemont, 2018). We are not the first one to note the potential connections between PP and the bodily self (e.g., Newen, 2018; Venter, 2021), but our perspective is distinctive in that we have focused on interoception, microbiota, the immune system, gut feelings, etc., that have been relatively neglected in the relevant literature. If the body contains first priors, we need to recognize that those priors are generated by many internal signals, which need to be investigated carefully. On this occasion, we have provided some materials for further studies in this regard. Finally, there is always a worry that the PP framework is too post-hoc and general. To this, we reply that this is indeed a concern, but it is a concern for most unifying theories. In this chapter, we do not defend the plausibility of such a framework; rather, we extend the model to the “second brain” to see how far the framework can go. To be sure, this is only a starting point, and very likely that what we have proposed above contains much falsity. Still, we believe it is valuable to put the hypothesis on the table, as we should agree that cognition and consciousness should not be overly brain-centered. We hope this initial attempt can generate enough interest from the relevant audience, and more – perhaps better – works can be forthcoming. Modeling these systems might not solve any problem directly, but it provides a new understanding of how the first and the second brains work together. Perhaps microbiota and the related axis can be considered our Markov blanket in this area, which can take us very far. References Adolfi, F., Couto, B., Richter, F., Decety, J., Lopez, J., Sigman, M., … Ibanez, A. (2017). Convergence of interoception, emotion, and social cognition: A twofold fMRI meta-analysis and lesion approach. Cortex, 88, 124–142.

188  Tony Cheng, Lynn Chiu, Linus Huang, et al. Allen, M., Frank, D., Schwarzkopf, D. S., Fardo, F., Winston, J. S., Hauser, T. U., & Rees, G. (2016). Unexpected arousal modulates the influence of sensory noise on confidence. eLife, 5, e18103. Allen, M., & Tsakiris, M. (2019). The body as first prior: Interoceptive predictive processing and the primacy of self-models. In M. Tsakiris & H. de Preester (Eds.), The interoceptive mind: From homeostasis to awareness. Oxford: Oxford University Press. Anderson, S. C., Cryan, J. F., & Dinan, T. (2017). The psychobiotic revolution: Mood, food, and the new science of the gut-brain connection. Washington, DC: National Geographic Society. Appleton, J. (2018). The gut-brain axis: Influence of microbiota on mood and mental health. Integrative Medicine, 17(4), 28–32. Apps, M. A. J., & Tsakiris, M. (2014). The free-energy self: A predictive coding account of self-recognition. Neuroscience and Biobehavioral Review, 41, 85–97. Arango-Muñoz, S. (2014a). The nature of epistemic feelings. Philosophical Psychology, 27(2), 1–19. Arango-Muñoz, S. (2014b). Metacognitive feelings, self-ascriptions and mental actions. Philosophical Inquiries, 2(1), 145–162. Arango-Muñoz, S., & Mchaelian, K. (2014). Epistemic feelings and epistemic emotions. Philosophical Inquiries, 2(1), 97–122. Aziz, Q., & Ruffle, J. K. (2019). The neurobiology of gut feelings. In M. Tsakiris & H. de Preester (Eds.), The interoceptive mind: From homeostasis to awareness. Oxford: Oxford University Press. Babo-Rebelo, M., & Tallon-Baudry, C. (2019). Interoceptive signals, brain dynamics, and subjectivity. In M. Tsakiris & H. de Preester (Eds.), The interoceptive mind: From homeostasis to awareness. Oxford: Oxford University Press. Balsdon, T., Wyart, V., & Mamassian, P. (2020). Confidence controls perceptual evidence accumulation. Nature Communications, 11, 1753. Bauer, K. C., Huus, K. E., & Brett Finlay, B. (2016). Microbes and the mind: Emerging hallmarks of the gut microbiota-brain axis. Cellular Microbiology, 18(5), 632–644. Beck, J. (2019). On perceptual confidence and “completely trusting your experience.” Analytic Philosophy. 61(2), 174–188. Bergh, O. V., den Zacharioudakis, N., & Petersen, S. (2019). Interoception, categorization, and symptom perception. In M. Tsakiris & H. de Preester (Eds.), The interoceptive mind: From homeostasis to awareness. Oxford: Oxford University Press. Bermúdez, J. L. (2018). Ownership and the space of the body. In The bodily self: Selected essays. Cambridge, MA: MIT Press. Bermúdez, J. L., & Cahen, A. (2020). Nonconceptual mental content. In E. N. Zalta (Ed.), Stanford encyclopedia of philosophy. (Summer 2020 Edition), Edward N. Zalta (ed.), https://plato.stanford.edu/archives/sum2020/entries/ content-nonconceptual/ Berntson, G. G., Gianaros, P. J., & Tsakiris, M. (2019). Interoception and the autonomic nervous system: Bottom-up meets top-down. In M. Tsakiris & H. de Preester (Eds.), The interoceptive mind: From homeostasis to awareness. Oxford: Oxford University Press. Bhat, A., Parr, T., Ramstead, M., & Friston, K. (2021). Immunoceptive inference: Why are psychiatric disorders and immune responses intertwined? Biology and Philosophy, 36, 27.

Predictive Processing in the “Second Brain” 189 Block, N. (1995). On a confusion about a function of consciousness. Brain and Behavioral Sciences, 18(2), 227–247. Boem, F., Greslehner, P. G., Konsman, J. P., & Chiu, L. (2023). Minding the gut: Extending embodied cognition and perception to the gut complex. Frontiers in Neuroscience, 17. Bonaz, B., Lane, R. D., Oshinsky, M. L., Kenny, P. J., Sinha, R., Mayer, E. A., & Critchley, H. D. (2021). Diseases, disorders, and comorbidities of interoception. Trends in Neurosciences, 44, 39–51. Breit, S., Kupferberg, A., Rogler, G., & Hasler, G. (2018). Vagus nerve as modulator of the brain-gut axis in psychiatric and inflammatory disorders. Frontiers in Psychiatry, 9, 44. Brewer, W. F., & Sampaio, C. (2012). The metamemory approach to confidence: A test using semantic memory. Journal of Memory and Language, 67(1), 59–77. Brown, A. L. (1978). Knowing when, where, and how to remember: A problem of metacognition. Advances in Instructional Psychology, 1, 77–165. Butterfill, S. (2020). The developing mind: A philosophical introduction. London: Routledge. Carabotti, M., Scirocco, A., Maselli, M. A., & Severi, C. (2015). The gut-brain axis: Interactions between enteric microbiota, central and enteric nervous systems. ­Annals of Gastroenterology, 28(2), 203–209. Carey, S. (2009). The origin of concepts. Oxford: Oxford University Press. Cassam, Q. (2011). The embodied self. In S. Gallagher (Ed.), The Oxford handbook of the self. Oxford: Oxford University Press. Cheng, T. (2022a). Post-perceptual confidence and supervaluative matching profile. Inquiry: An Interdisciplinary Journal of Philosophy, 65(3), 249–277. Cheng, T. (2022b). Bodily awareness. In Internet encyclopedia of philosophy. ISSN: 2161-0002 (retrieved from: https://iep.utm.edu/bodily-awareness/) Cheng, T., Sato, R., & Hohwy, J. (2023). Mind and world, predictive style. In T. Cheng, R. Sato, & J. Hohwy (Eds.), Expected experiences: The predictive mind in an uncertain world. London: Routledge. Chin, J. M., & Schooler, J. W. (2009). Meta-awareness. In W. P. Banks (Ed.), ­Encyclopedia of consciousness. Oxford: Elsevier. Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3). doi: 10.1017/ S0140525X12000477 Clark, A. (2016). Surfing uncertainty: Prediction, action, and the embodied mind. Oxford: Oxford University Press. Critchley, H. D., Wiens, S., Rotshtein, P., Ohman, A., & Dolan, R. J. (2004). Neural systems supporting interoceptive awareness. Nature Neuroscience, 7(2), 189–195. Damasio, A. (2005). Descartes’ error: Emotion, reason, and the human brain. London: Penguin Books. de Vignemont, F. (2018). Mind the body: An exploration of bodily self-awareness. Oxford: Oxford University Press. de Vignemont, F. (2019). Was Descartes right after all? An affective background for bodily awareness. In M. Tsakiris & H. de Preester (Eds.), The interoceptive mind: From homeostasis to awareness. Oxford: Oxford University Press. Denison, R. (2017). Precision, not confidence, describes the uncertainty of perceptual experience: Comment on John Morrison’s “perceptual confidence. Analytic Philosophy, 58(1), 58–70.

190  Tony Cheng, Lynn Chiu, Linus Huang, et al. Dinan, T. G. (2022). How do gut microbes influence mental health? Trends in Urology and Men’s Health, 13(3), 26–29. Dinan, T. G., & Cryan, J. F. (2017). The microbiome-gut-brain axis in health and disease. Gastroenterology Clinics of North America, 46(1), 77–89. Domschke, K., Stevens, S., Pfleiderer, B., & Gerlach, A. L. (2010). Interoceptive sensitivity in anxiety and anxiety disorders: An overview and integration of neurobiological findings. Clinical Psychology Review, 30(1), 1–11. Dretske, F. (1986). Misrepresentation. In R. Bogdan (Ed.), Belief: Form, content, and function. Oxford: Oxford University Press. Dunne, J. D., Thompson, E., & Schooler, J. (2019). Mindful meta-awareness: Sustained and non-propositional. Current Opinion in Psychology, 28, 307–311. Farb, N. A. S., & Logie, K. (2019). Interoceptive appraisal and mental health. In M. Tsakiris & H. de Preester (Eds.), The interoceptive mind: From homeostasis to awareness. Oxford: Oxford University Press. Farmer, A. D., & Aziz, Q. (2013). Gut pain and visceral hypersensitivity. British Journal of Pain, 7(1), 39–47. Fernandez Cruz, A. L., Arango-Muñoz, S., & Volz, K. G. (2016). Oops, scratch that! Monitoring one’s own errors during mental calculation. Cognition, 146, 110–120. Flavell, J. H. (1976). Metacognitive aspects of problem solving. In L. B. Resnick (Ed.), The nature of intelligence. Hillsdale, NJ: Lawrence Erlbaum. Flavell, J. H. (1979). Metacognition and cognitive monitoring: A new area of ­cognitive-developmental inquiry. American Psychologist, 34(10), 906–911. Friston, K. (2010). The free-energy principle: A unified brain theory? Nature ­Reviews Neuroscience, 11, 127–138. Furness, J. B., & Stebbing, M. J. (2017). The first brain: Species comparisons and evolutionary implications for the enteric and central nervous systems. Neurogastroenterology and Motility, 30(2). doi: 10.1111/nmo.13234 Gallagher, S. (2005). How the body shape the mind. Oxford: Oxford University Press. Galland, L. (2014). The gut microbiome and the brain. Journal of Medicinal Food, 17(12), 1261–1272. Garfinkel, S. N., Seth, A. K., Barrett, A. B., Suzuki, K., & Critchley, H. D. (2015). Knowing your own heart: Distinguishing interoceptive accuracy from interoceptive awareness. Biological Psychology, 104, 65–74. Gibson, J. J. (1979). The ecological approach to visual perception. Boston, MA: Houghton, Mifflin and Company. Grenham, S., Clarke, G., Cryan, J. F., & Dinan, T. G. (2011). Brain-gut-microbe communication in health and disease. Frontiers in Physiology, 2, 94. Greslehner, G. P., Boem, F., Chiu, L., & Konsman, J. P. (2023). Philosophical perspectives on neuroendocrine-immune interactions: The building block model and complementary neuro-endocrine-immune-microbiota systems approach. In J. P. Konsman, & T. M. Reyes (Eds.), Neuroendocrine-immune system interactions. Berlin: Springer Nature. Halamish, V., McGillivray, S., & Castel, A. D. (2011). Monitoring one’s own forgetting in younger and older adults. Psychology and Aging, 26(3), 631–635.

Predictive Processing in the “Second Brain” 191 Hansen, M. B. (2003). The enteric nervous system II: Gastrointestinal functions. Pharmacology and Toxicology, 92(6), 249–257. Heersmink, R. (2020). Varieties of the extended self. Consciousness and Cognition, 85. doi: 10.1016/j.concog.2020.103001 Herbert, B. M., & Pollatos, O. (2019). The relevance of interoception for eating behavior and eating disorders. In M. Tsakiris, & H. de Preester (Eds.), The interoceptive mind: From homeostasis to awareness. Oxford: Oxford University Press. Hinton, G. E. (2007). Learning multiple layers of representation. Trends in Cognitive Sciences, 11, 428–434. Hohwy, J. (2013). The predictive mind. Oxford: Oxford University Press. Hohwy, J., & Seth, A. (2020). Predicative processing as a systematic basis for identifying the neural correlates of consciousness. Philosophy and the Mind, 1(2). doi: 10.33735/phimisci.2020.II.64 Hooks, K. B., Konsman, J. P., & O’Malley, M. A. (2019). Microbiota-gut-brain research: A critical analysis. Behavioral and Brain Sciences, 42, e60. Hooks, K. B., & O’Malley, M. A. (2017). Dysbiosis and its discontents. mBio, 8(5), e01492. Houghton, L. A., Calvert, E. L., Jackson, N. A., Cooper, P., & Whorwell, P. J. (2002). Visceral sensation and emotion: A study using hypnosis. Gut, 51(5), 701–704. Kant, I. (1787/2007). Critique of pure reason (N. Kemp Smith, Trans.). London: MacMillan. Khalsa, S. S., Berner, L. A., & Anderson, L. M. (2022). Gastrointestinal interoception in eating disorders: Charting a new path. Current Psychiatry Reports, 24, 47–60. Khalsa, S. S., & Feinstein, J. S. (2019). The somatic error hypothesis of anxiety. In M. Tsakiris & H. de Preester (Eds.), The interoceptive mind: From homeostasis to awareness. Oxford: Oxford University Press. Kirchhoff, M. D., & Kiverstein, J. (2019). Extended consciousness and predictive processing. A third wave view. London: Routledge. Kiverstein, J., Kirchhoff, M. D., & Thacker, M. (2022). An embodied predictive processing theory of pain experience. Review of Philosophy and Psychology, 13, 973–998. Koizumi, A., Maniscalco, B., & Lau, H. (2015). Does perceptual confidence facilitate cognitive control? Attention, Perception, and Psychophysics, 77(4), 1295–1306. Koriat, A. (2000). The feeling of knowing: Some metatheoretical implications for consciousness and control. Consciousness and Cognition, 9(2), 149–171. Koriat, A. (2008). Subjective confidence in one’s answers: The consensuality principle. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34(4), 945–959. Kriegel, U. (2009). Subjective consciousness: A self-representational theory. ­Oxford: Oxford University Press. Leder, D. (1990). The absent body. Chicago, IL: University of Chicago Press. Leder, D. (2005). Moving beyond “mind” and “body. Philosophy, Psychiatry, and Psychology, 12(2), 109–113. Lomax, A. E., Pradhananga, S., Sessenwein, J. L., & O’Malley, D. (2019). Bacterial modulation of visceral sensation: Mediators and mechanisms. American Journal of Physiology, 317(3), 363–372. Lycan, W. (1995). Consciousness. Cambridge, MA: MIT Press.

192  Tony Cheng, Lynn Chiu, Linus Huang, et al. Martin, M. G. F. (2002). The transparency of experience. Mind and Language, 17(4), 376–425. Mayer, E. A. (2011). Gut feelings: The emerging biology of gut-brain communication. Nature Reviews Neuroscience, 12(8), 453–466. Mayer, E. A., Nance, K., & Chen, S. (2022). The gut-brain axis. Annual Review of Medicine, 73, 439–453. McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception: I. An account of basic findings. Psychological Review, 88(5), 375–407. McDowell, J. (2005). Conceptual capacities in perception. In G. Abel (Ed.), Kreativität: 2005 congress of the Deutsche Gesellschaft für Philosophie. Berlin: Universitätsverlag der TU. Mehta, N. (2013). Is there a phenomenological argument for higher-order representationalism? Philosophical Studies, 164(2), 357–370. Merleau-Ponty, M. (1945/2012). Phenomenology of perception (D. A. Landes, Trans.). New York: Routledge. Metcalfe, J., & Shimamura, A. P. (Eds.). (1994). Metacognition: Knowing about knowing. Cambridge, MA: MIT Press. Metzinger, T., & Wiese, W. (2017). Vanilla PP for philosophers: A primer on predictive processing. In Philosophy and predictive processing. Frankfurt Am Main: MIND Group. Miall, R., & Wolpert, D. M. (1996). Forward models for physiological motor control. Neural Networks, 9, 1265–1279. Miłkowski, M., & Litwin, P. (2022). Testable or bust: Theoretical lessons for predictive processing. Synthese, 200, 462. Morrison, J. (2016). Perceptual confidence. Analytic Philosophy, 57(1), 15–48. Morrison, J. (2017). Perceptual confidence and categorization. Analytic Philosophy, 58(1), 71–85. Nagel, T. (1986). The view from nowhere. Oxford: Oxford University Press. Newen, A. (2018). The embodied self, the pattern theory of self, and the predictive mind. Frontiers in Psychology, 9, 2270. Peters, M. A. K., Thesen, T., Ko, Y. D., Maniscalco, B., Carlson, C., Davidson, M., … Lau, H. (2017). Perceptual confidence neglects decision-incongruent evidence in the brain. Nature Human Behaviour, 1, 139. Proust, J. (2014). The philosophy of metacognition: Mental agency and self-­ awareness. Oxford: Oxford University Press. Pusceddu, M. M., & Gareau, M. G. (2018). Visceral pain: Gut microbiota, a new hope? Journal of Biomedical Science, 25(1), 73. Quadt, L., Critchley, H., & Garfinkel, S. N. (2019). Interoception and emotion: Shared mechanisms and clinical implications. In M. Tsakiris & H. de Preester (Eds.), The interoceptive mind: From homeostasis to awareness. Oxford: Oxford University Press. Rao, R. P. N., & Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2, 79–87. Reder, L. M., & Ritter, F. E. (1992). What determines initial feeling of knowing? Familiarity with question terms, not with the answer. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18(3), 435–451.

Predictive Processing in the “Second Brain” 193 Rochat, P. (2003). Five levels of self-awareness as they unfold early in life. Consciousness and Cognition, 12, 717–731. Rochat, P. (2015). Layers of awareness in development. Developmental Review, 38, 122–145. Rosenthal, D. (2005). Consciousness and mind. Oxford: Oxford University Press. Schauder, K. B., Mash, L. E., Bryant, L. K., & Cascio, C. J. (2015). Interoceptive ability and body awareness in autism spectrum disorder. Journal of Experimental Child Psychology, 131, 193–200. Schwartz, B. L., & Metcalfe, J. (2011). Tip-of-the-tongue (TOT) states: Retrieval, behavior, and experience. Memory and Cognition, 39, 737–749. Sennesh, E., Theriault, J., Brooks, D., van de Meent, J.-W., Barrett, L. F., & Quigley, K. S. (2022). Interoception as modeling, allostasis as control. Biological Psychology, 167, 108242. Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27, 379–423. Shea, N. (2018). Representation in cognitive science. Oxford: Oxford University Press. Sims, A. (2016). A problem of scope for the free energy principle as a theory of cognition. Philosophical Psychology, 29(7), 967–980. Smith, J. D., Zakrzewski, A. C., & Church, B. A. (2016). Formal models in ­animal-metacognition research: The problem of interpreting animals’ behavior. Psychonomic Bulletin and Review, 23(5), 1341–1353. Strandwitz, P., Kim, K.-H., Terekhova, D., Liu, J. K., Sharma, A., Levering, J., … Lewis, K. (2019). GABA-modulating bacteria of the human gut microbiota. ­Nature Microbiology, 4(3), 396–403. Sun, Z., & Firestone, C. (2020). The dark room problem. Trends in Cognitive Sciences, 24(5), 346–348. Tanner, W. P., & Swets, J. A. (1954). A decision-making theory of visual detection. Psychological Review, 61(6), 401–409. Tsakiris, M., & de Preester, H. (Eds.). (2019). The interoceptive mind: From ­homeostasis to awareness. Oxford: Oxford University Press. Tsakiris, M., Tajadura-Jiménex, A., & Costantini, M. (2011). Just a heartbeat away from one’s body: Interoceptive sensitivity predicts malleability of body-­ representations. Proceedings of the Royal Society B: Biological Sciences, 278(1717), 2470–2476. Varela, F. J., Thompson, E., & Rosch, E. (1991). The embodied mind: Cognitive science and human experience. Cambridge, MA: MIT Press. Venter, E. (2021). Toward an embodied, embedded predictive processing account. Frontiers in Psychology, 12. doi: 10.3389/fpsyg.2021.543076. von Helmholtz, H. (1867/2015). Handbuch der physiologischen Optik. Sacramento, CA: Creative Media Partners. Wiese, W. (2018). Experienced wholeness: Integrating insights from Gestalt theory, cognitive neuroscience, and predictive processing. Cambridge, MA: MIT Press. Williams, D. (2020). Is the brain an organ for prediction error minimization? PhilSci Archive. Wittmann, M., & Meissner, K. (2019). The embodiment of time: How interoception shapes the perception of time. In M. Tsakiris & H. de Preester (Eds.), The interoceptive mind: From homeostasis to awareness. Oxford: Oxford University Press. Wu, W. (2014). Attention. New York: Routledge.

Part II

Related Theoretical Issues Concerning Bayesian Probability

8

Neural Implementation of (Approximate) Bayesian Inference Michael Rescorla

8.1  Bayesian Modeling of Perception In recent decades, Bayesian modeling has achieved extraordinary success within perceptual psychology (Knill & Richards, 1996; Rescorla, 2015, 2020a, 2021). Bayesian models posit that the perceptual system assigns subjective probabilities (or credences) to hypotheses regarding distal conditions (e.g. hypotheses regarding possible shapes, sizes, colors, or speeds of perceived objects). The perceptual system deploys its subjective probabilities to estimate distal conditions based upon proximal sensory input (e.g. retinal stimulations). It does so through computations that are fast, automatic, subpersonal, and inaccessible to conscious introspection. More formally, the perceptual system maintains a prior probability p(h), where each h is a different hypothesis about distal conditions. The perceptual system also maintains a prior likelihood p(e | h) that assigns a probability to sensory input e conditional on h (e.g. the probability of receiving retinal input e given that a perceived object has a certain size and is located a certain distance away). Upon receiving input e, the perceptual system computes the posterior probability p(h | e). Bayes’s theorem expresses the posterior in terms of the prior probability and the prior likelihood: p ( h | e ) = k p ( h) p ( e | h) , where k is a normalizing constant to ensure that probabilities sum to 1. The posterior assigns a probability to h conditional on sensory input e. Based on the posterior, the perceptual system selects a privileged estimate h* that goes into the final percept. In many Bayesian models, though not all, the privileged estimate h* is the maximum a posteriori (MAP) hypothesis: h* = argmax h p ( h | e ) . DOI: 10.4324/9781003084082-11

198  Michael Rescorla The privileged estimate h* is usually accessible to conscious introspection. In contrast, the priors and the posterior are not typically consciously accessible. Neither are the computations that convert the priors into the posterior or that select h*. Bayesian models supply satisfying explanations for numerous perceptual phenomena. A good example is the motion estimation model given by Weiss, Simoncelli, and Adelson (2002). The model posits a “slow motion” prior, i.e. a prior that favors slow speeds. Citing Bayesian inference based on this prior, the model explains a host of motion illusions that had previously resisted unified explanation. Thanks to such explanatory achievements, the Bayesian framework now enjoys orthodox status within perceptual psychology. A natural question raised by Bayesian perceptual psychology is how the brain implements Bayesian inference. How do neural states physically realize the priors and the posterior? Which neural operations effectuate the transition from priors to posterior? These questions have been intensively studied in computational neuroscience, and there are now several proposed neural implementation mechanisms. One proposal, well known to philosophers through the work of Clark (2015) and Hohwy (2014), highlights a computational strategy known as predictive coding. Other proposals, less known to philosophers, do not feature predictive coding. This paper canvasses several proposed implementation mechanisms, including both predictive coding and alternatives. I will not try to provide anything like an adequate survey. Nor will I defend one approach over another. Instead, I aim to promote an enhanced appreciation within the philosophical community for the diverse neural implementation mechanisms currently under active investigation. Reflection on diverse candidate neural implementation mechanisms offers several benefits. First, and most obviously, we gain a more comprehensive vista on current computational neuroscience. Second, we elucidate what it means to attribute subjective probabilities to the perceptual system. Third, we clarify the sense in which the perceptual system may be said to execute Bayesian inferences. Section 8.2 presents background material on Bayesian inference in physical systems. Sections 8.3–8.4 reviews various proposals for neural implementation of Bayesian inference. Section 8.5 compares the proposed implementation schemes with neural networks that simulate Bayesian inference. Section 8.6 explores the methodological implications of my discussion. 8.2  Credal States and Transitions A Bayesian model posits credal states: assignments of credences to hypotheses. It also posits credal transitions: transitions among credal states. The simplest models posit a single credal transition from the prior probability

Neural Implementation of (Approximate) Bayesian Inference 199 and the prior likelihood to the posterior. More complex models posit iterated credal transitions in response to sequential new sensory input. For example, the object-tracking model in Kwon, Tadin, and Knill (2015) posits iterated credal updates regarding the position and velocity of a moving stimulus. Elsewhere, I have defended a realist view of Bayesian perceptual psychology (Rescorla, 2020b). Realism holds that, when a Bayesian perceptual model is empirically successful, we have reason to believe that the model is approximately true. More specifically: when a Bayesian perceptual model is empirically successful, we have reason to believe that there are credal states and transitions resembling those posited by the model. For example, the empirical success of the motion estimation model provides reason to hold that human motion estimation deploys a “slow motion” prior similar to that posited by the model. The model’s theoretical apparatus corresponds at least roughly to psychological reality. Block (2018), Colombo and Seriès (2012), Orlandi (2014), and others espouse an opposing instrumentalist perspective. According to instrumentalists, empirical success of a Bayesian perceptual model provides no reason to believe that the perceptual system executes anything resembling the computations posited by the model. We may only conclude that the perceptual system operates as if it executes those computations. In particular, we have no reason to posit that perception involves credal states or transitions. A Bayesian model is just a useful predictive device that helps us summarize input-output mappings. For example, the motion estimation model specifies a mapping from retinal inputs to motion estimates. According to instrumentalists, the model tells us nothing about the mental processes that mediate between inputs and outputs, save that the processes generate the specified input-output mapping. To clarify the debate between realism and instrumentalism, I elucidate credal states in Section 8.2.1 and credal transitions in Section 8.2.2. In Section 8.3, I build upon those elucidations to address how the brain might implement credal states and transitions. 8.2.1  Implicit Encoding of Credences

How might a physical system encode an assignment of credences to hypotheses? The most straightforward encoding scheme is explicit enumeration: the system explicitly lists the credence assigned to each hypothesis. Unfortunately, enumeration is not feasible when the hypothesis space is infinite, as it is in most serious scientific applications. An alternative scheme is parametric encoding: the system encodes a probability distribution through a few parameters. Many examples of

200  Michael Rescorla

Figure 8.1 An example of a pdf p(x). The area under the curve between points a and b is the probability assigned to the interval [a, b].

parametric encoding involve a probability density function (pdf): a nonnegative function p(x) over ℝ whose integral is 1. We derive a probability distribution from a pdf through integration: the probability assigned to interval [a, b] is the integral of the pdf over the interval [a, b]. See Figure 8.1. Often, although not always, one can encode a pdf through a few parameters. A familiar example is the family of Gaussian distributions. Each Gaussian has a pdf of the form: p(x) =

1 e 2π σ

−( x−µ )2 2σ 2

.(1)

See Figure 8.2. Here μ is the mean of the distribution, and σ 2 is the variance. We can encode a Gaussian through the parameters (μ, σ 2 ). When a probability distribution is not finitely parametrizable, another encoding scheme is needed. One widely used encoding scheme involves sampling. Consider a system that draws samples from the hypothesis space. We can delineate an objective chance function c(h) that governs the system’s sampling behavior. In the simplest case, c(h) is the objective chance that the system draws sample h. In more complicated cases, c(h) may instead be a density that determines objective chances through integration. Either way, c(h) specifies the system’s sampling propensities over the hypothesis space. As several researchers have proposed (Fiser, Berkes, Orbán, & Lengyel, 2010; Icard, 2016; Sanborn & Chater, 2016), sampling propensities can serve as subjective probabilities. We may delineate a subjective probability assignment via the equation p ( h ) = c ( h) ,

Neural Implementation of (Approximate) Bayesian Inference 201

2

Figure 8.2  Gaussian pdf with mean μ and variance σ .

where the right-hand side specifies an objective probability (or probability density), and the left-hand side specifies the encoded subjective probability (or probability density). Sampling encoding is widely used in statistics (Gelman et al., 2014) and machine learning (Murphy, 2012). Crucially, parametric and sampling encodings are implicit rather than explicit. When a system encodes a Gaussian through the parameters (μ, σ 2 ), the system does not explicitly enumerate any credences. Instead, credences are implicit in the specification of μ and σ 2 (on the understanding that the encoded distribution is a Gaussian). Similarly for sampling encoding: credences are implicit in the system’s sampling propensities. Implicit encoding of probabilities is crucial for understanding the debate between realism and instrumentalism. The credal states posited by Bayesian perceptual psychology are typically defined over an infinite (indeed, uncountably infinite) hypothesis space. For example, the set of possible speeds is uncountable, so the motion estimation model is defined over an uncountable hypothesis space. When the hypothesis space is infinite, explicit enumeration of credences is not an option. Any plausible realist position must acknowledge that the perceptual system typically encodes credences implicitly rather than explicitly. Credences may be encoded through a parametric scheme, a sampling scheme, or some other scheme. Given that credences can be encoded in such diverse ways, we naturally ask why a physical state counts as encoding credences. What do all possible encoding schemes have in common such that they count as encodings of credences? A truly satisfying answer would give non-circular necessary and sufficient conditions for a physical system to assign a credence to a hypothesis. Beginning with Ramsey (1931), there have been several attempts to supply the desired necessary and sufficient conditions. Unfortunately, these attempts

202  Michael Rescorla are now widely regarded as problematic (Erikkson & Hájek, 2007). As a result, we cannot say what it is for a physical state to realize a credal state. Nevertheless, we can assert with great confidence that credences are physically encoded in diverse ways. After all, parametric and sampling encodings are used on a daily basis in practical applications of the Bayesian framework. 8.2.2  Computational Intractability of Bayesian Inference

Sometimes, it is easy to compute the posterior from the prior probability and the prior likelihood. To illustrate, suppose that the prior is a Gaussian of the form (1) and that the prior likelihood has the Gaussian form: p(y | x) =

1 e 2π τ

−( y− x)2 2τ 2

.(2)

An idealized Bayesian agent who starts with these priors will respond to sensory input y by forming new credences given by the posterior p(x | y). One can show that the posterior p(x | y) is a Gaussian with mean η and variance ρ2 given by

µ y + 2 2 τ η= σ 1 1  + 2 2 σ τ 1 1 1 = + . ρ2 σ 2 τ 2

(3)

See Gelman et al. (2014, pp. 39–41) for details. The posterior mean η is a weighted average of the prior probability mean µ and the fixed value y, with weights inversely proportional to the respective variances. Intuitively, then, η is a compromise between the prior probability and the sensory input y. To obtain a helpful visualization of (3), we can hold y fixed and note that (2) then yields a one-place function of x: L ( x) = p ( y | x) . L(x) is called a likelihood function. The posterior p ( x | y ) = k  p ( x) p ( y | x) = k  p ( x) L ( x) is found by multiplying the prior p(x) with the likelihood L(x) and normalizing. See Figure 8.3. Computing the posterior is not usually as easy as in (3). A neat self-­contained description of the posterior may not exist. Even when a self-contained

Neural Implementation of (Approximate) Bayesian Inference 203

Figure 8.3 The top left panel is the likelihood function defined by Equation (2) for fixed y. The top right panel is the pdf defined by Equation (1). The bottom panel is the posterior determined by Equation (3).

description exists, finding it may require computational resources beyond those available to a realistic agent (Kwisthout, Wareham, & van Rooij, 2011). Specifically, calculating the normalizing constant k in Bayes’ ­theorem may be a computationally intractable task.1 In general, then, a physical system with limited time and memory may not be able to compute the posterior from the priors. The standard solution within Bayesian decision theory is to settle for approximate Bayesian inference. Even when Bayesian inference is computationally intractable, there may be a tractable algorithm that comes close. There are two main approximation strategies:

• Variational algorithms approximate the posterior using a probability

distribution drawn from a nicely behaved family (e.g. Gaussian distributions). The basic idea is to pick the distribution from this family that is “closest” to the actual posterior. • Sampling algorithms approximate the posterior by drawing samples from the hypothesis space. In response to input e, the system alters its sampling propensities regarding each hypothesis h.

204  Michael Rescorla In both cases, the physical system instantiates credal states and transitions. It begins with a prior probability p(h) and a prior likelihood p(e | h). In response to input e, it transitions to a new credal state pnew(h) that approximates the posterior p(h | e). The relevant credal assignments may be implicit rather than explicit. For example, the posterior may be encoded by sampling propensities. See Murphy (2012) for detailed discussion of variational and sampling approximation algorithms.2 Bayesian perceptual models commonly posit priors that support tractable Bayesian inference. However, the human perceptual system need not instantiate such mathematically convenient priors. For example, numerous perceptual models posit Gaussian priors, but we know that the human perceptual system sometimes uses priors that with heavier tails than Gaussians (Stocker & Simoncelli, 2006). As Bayesian perceptual psychology develops, it will doubtless assign greater prominence to approximate rather than exact Bayesian inference. An example is the model of binocular rivalry given by Gershman, Vul, and Tenenbaum (2012), which posits a sampling approximation to an intractable Bayesian inference. The model explains a range of perceptual phenomena that arise during binocular rivalry, such as the distribution of switching times among percepts. My realist viewpoint extends straightforwardly to perceptual models that postulate approximate Bayesian inference. Realism holds that, when an approximately Bayesian model is empirically successful, we have reason to hold that the model is approximately true. We have reason to hold that the perceptual system instantiates credal states and transitions resembling those postulated by the model. Block (2018) suggests that the intractability of Bayesian inference poses a problem for realism about Bayesian perceptual psychology. Once realists concede that the perceptual system executes approximate Bayesian inference rather than exact Bayesian inference, how does their position ultimately differ from instrumentalism? As Block (2018, p. 8) puts, “[W]hat is the difference between approximate implementation of Bayesian inference and behaving roughly as if Bayesian inference is being implemented …? Until this question is answered, the jury is out on the dispute between realist and anti-realist views.” My reply: there is a huge difference between physical systems that approximately execute Bayesian inference and physical systems that merely behave as if they approximately execute Bayesian inference. A system that approximately executes Bayesian inference instantiates credal states and transitions:

• The system begins with a prior probability and a prior likelihood. • In response to sensory input e, the system transitions to a new credal state that approximates the posterior.

Neural Implementation of (Approximate) Bayesian Inference 205

Figure 8.4 Causal structure of approximate Bayesian inference. Arrows represent the direction of causal influence.

• The system can then deploy the approximate posterior in further computation, such as selection of a privileged estimate h*.

See Figure 8.4. In contrast, a system that merely behaves as if it approximately executes Bayesian inference need not instantiate any credal states. For example, a system might simulate Bayesian estimation through a (very large!) look-up table. In response to input e, the look-up table system selects an output h* close to the output that a Bayesian estimator or approximately Bayesian estimator would select. The look-up table system does not instantiate any credal states, let alone credal transitions. A physical system that approximately executes Bayesian inference has a different internal causal structure than a system that merely simulates approximate Bayesian inference (Rescorla, 2020b). The difference in causal structure has important methodological implications. Realists posit credal states embedded in the causal structure depicted by Figure 8.4. Since mental states and processes are physically realized in the brain, it becomes a pressing task to investigate how credal states and transitions are neurally realized. We must illuminate how neural activity implements the causal structure depicted by Figure 8.4. From a realist perspective, the search for neural implementation mechanisms of (approximate) Bayesian inference looks like a vital research endeavor (Ma, 2019). From an instrumentalist perspective, we have no reason to suspect that the brain implements a causal structure remotely like Figure 8.4, so we have no reason to take Figure 8.4 as a guide to neural mechanisms. If there are no credal states and no credal transitions, then it is a waste of time to investigate how credal states and transitions are neurally realized. Evidently, the dispute between realism and instrumentalism is not just an abstract “philosophical” debate about how to interpret a fixed scientific theory. The dispute has major implications for which research avenues in neuroscience look promising and which do not. My goal for

206  Michael Rescorla the rest of the paper is to gain insight into these methodological implications and, thereby, into the dispute between realism and instrumentalism. In Section 8.3, I examine some neural network models that implement approximate Bayesian inference. The models vary in biological plausibility, but some of them are under active consideration by computational neuroscientists. In Section 8.4, I examine neural network models that merely simulate approximate Bayesian inference. By comparing the models from Section 8.3 with the models from Section 8.4, I hope to clarify the diverging theoretical and methodological commitments of realism and instrumentalism. 8.3  Some Proposed Neural Implementation Schemes There are several elements we should expect from any complete theory of how the brain implements approximate Bayesian inference. To begin, our theory will identify a neural variable U that encodes the prior probability p(h). Each value u of U corresponds to a possible neural state (e.g. a profile of firing rates across a neural population). U’s value determines the prior, assuming appropriate background conditions. So U satisfies counterfactuals of the form: (4) If neural variable U were to have value u in background conditions B, then the perceptual system would assign credence p(h) to h. In principle, different values of u may encode the same prior. In practice, different values of u usually encode different priors. The qualifier regarding background conditions B is crucial because neural state taken on its own does not usually determine credal state. A credal state assigns subjective probabilities to hypotheses. In the case of perception, the hypotheses concern distal properties, such as shape, size, color, location, speed, and so on. The brain represents distal properties only due to causal relations that it bears to the distal environment (Burge, 2007, 2010), perhaps lying within the organism’s developmental history or its evolutionary past. The requisite causal relations do not supervene upon internal neurophysiology. In principle, an organism with identical neurophysiological properties could be embedded differently in the physical world, bearing such different causal relations to the distal environment that it represents different distal properties (Burge, 2007, 2010; Egan, 2010). Different represented properties entail a different credal state—an assignment of credences to different hypotheses regarding different distal properties. Since credal state does not supervene upon internal neurophysiology, it would be futile to seek a neural variable that determines credal state on its own. The best we can do is find a neural variable that

Neural Implementation of (Approximate) Bayesian Inference 207 determines credal state assuming certain background conditions, including certain causal relations to the distal environment. Ultimately, we would like to illuminate the assumed background conditions. What background conditions must obtain for neural state u to guarantee that a given credence is assigned to h? Answering that question would require progress towards necessary and sufficient conditions for physical realization of credal states. It would also require progress on the problem of intentionality, i.e. the problem of what it is for mental states to have representational properties (Loewer, 1997). Fortunately, scientific theorizing about neural implementation mechanisms need not await progress on these deep questions. Lots of research in computational neuroscience addresses neural implementation while tacitly assuming whatever background conditions are needed to ensure suitable counterfactuals (4). Formally speaking, we may summarize the connection between priors and neural states through an equation of the form: p ( h) = Φ ( u) ,(5) where p(h) is either a probability distribution or a pdf; u is a possible value of neural variable U; and Φ is a function that carries each u to p(h).3 Φ is sometimes called a decoder: it shows how to “decode” the probabilistic import of a neural state. A complete implementation theory will also address how prior likelihoods are encoded. It will identify a neural variable V that satisfies counterfactuals of the form: (6) If neural variable V were to have value v in background conditions B, then the perceptual system would assign credence p(e | h) to e conditional on h. In parallel to (5), we may formalize the encoding through an equation of the form: p (e | h) = Ψ (v ) .(7) In practice, different values of V usually encode different prior likelihoods. A complete implementation theory will additionally specify how the brain responds to input e. Here we must distinguish between deterministic versus stochastic transitions. In the deterministic case, we want a function f of the form: w = f ( u, v, e ) ,

208  Michael Rescorla where w is a possible value of some neural variable W. In the stochastic case, we want the chance distribution governing the transition from u, v, and e to w. Whether deterministic or stochastic, our model will describe operations that transform u, v, and e into w. The operations must be biologically plausible, i.e. real neural populations must be able to execute them. Since our goal is to model credal transitions, each value w of W will encode the new credal state induced by sensory input e. So W will satisfy counterfactuals of the form: (8) If neural variable W were to have value w in background conditions B, then the perceptual system would assign credence pnew(h) to h. In parallel with (5) and (7), we can formalize the encoding through an equation of the form: pnew ( h) = Γ ( w ) .(9) The decoder Γ specifies the new credal state corresponding to each w. When the system executes exact Bayesian inference, we have pnew ( h) = p ( h | e) , i.e. the new credal state is the true posterior. In general, the system may only approximate the posterior, in which case we have pnew ( h) ≈ p ( h | e) . In practice, different choices of e usually induce different values of W, and different values of W usually encode different credal states pnew(h). Figure 8.5 visualizes how the various components of a complete implementation theory fit together. A major goal of contemporary computational neuroscience is to identify decoders and neural operations satisfying something like Figure 8.5. Researchers pursue that goal by constructing neural networks simplified models of how idealized neural populations evolve. The neural networks vary greatly in their biological realism, but at least some of them are fairly realistic. A suitable neural network, coupled with suitable decoders Φ, Ψ, and Γ, models how the brain might implement approximate Bayesian inference.4 I will now examine some specific neuroscientific models along these lines. My aims are conceptual rather than empirical: I want to highlight the diverse ways in which credal states and transitions might in principle be neurally realized. For that reason, I will not address evidence for or against the neuroscientific models I discuss.

Neural Implementation of (Approximate) Bayesian Inference 209

Figure 8.5 Schematic form for a neural implementation theory. Single arrows represent the direction of causal influence. Double arrows represent decoders, which map neural states to credal states.

8.3.1  Probabilistic Population Codes

Certain neurons preferentially respond to specific values of a perceived stimulus (Dayan and Abbott, 2001, pp. 14–16), such as the orientation of a bar. We may associate each such neuron with a tuning curve fi(x), which summarizes the average response of neuron i to stimulus value x. A common posit that fits many neurons fairly well is that fi(x) is an unnormalized Gaussian. See Figure 8.6, which depicts Gaussian tuning curves for a population of neurons tuned to a one-dimensional distal variable. Each tuning curve peaks at a preferred value of the variable. The core idea behind probabilistic population codes (PPCs) is that, when a neural population is tuned to distal variable X, the firing profile over the population can encode a probability distribution over X (Knill & Pouget, 2004; Pouget, Dayan, & Zemel, 2003). Figure 8.7 illustrates. The horizontal axis groups neurons according to preferred values of X. The vertical axis gives each neuron’s firing rate on some occasion in response to  some fixed stimulus. Let the neural population contain n neurons. ri is the firing rate for the neuron with preferred value xi. r = is the profile of firing rates over the population. A particularly straightforward decoder is p(xi ) =

ri n

∑r

j

j =1

,(10)

210  Michael Rescorla

Figure 8.6 A collection of Gaussian tuning curves. The horizontal axis contains possible values of a one-dimensional continuous stimulus. Each tuning curve depicts the average response of the corresponding neuron to possible stimulus values. fi(x) is the average firing rate elicited by stimulus x in neuron i. The tuning curve fi(x) with preferred stimulus value xi is thickened. Firing rate is typically measured in spikes per second. Shapes and maximum values for fi(x) vary with the neural population.

Figure 8.7 Firing activity in a hypothetical neural population on a given occasion. The horizontal axis groups neurons according to preferred stimulus value. The vertical axis gives the firing rate for each neuron.

Neural Implementation of (Approximate) Bayesian Inference 211 so that probabilities are concentrated at the preferred values xi in proportion to the corresponding firing rates and are 0 elsewhere. (The denominator is a normalization constant.) It is often desirable to smooth out the credal assignments so as to avoid concentration of credal mass at preferred values xi. This can be done through a more sophisticated decoder of the form: n

∑r φ (x) j j

p(x) =

j =1

n

∑r

,(11)

j

j =1

where ϕj is a pdf associated with neuron j (Zemel, Dayan, & Pouget, 1998). Intuitively, neuron j votes for its preferred pdf ϕj with weight proportional to its firing rate. Collectively, the firing rates encode the pdf p(x) defined by (11). Decoders (10) and (11) can be used for likelihood functions or approximate posteriors, with L(x) or pnew(x) replacing p(x). To see (10) in action, consider a neural network with three neural populations N1, N2, and N3. Neural population N1 responds to sensory input y with firing rate profile r. These responses encode likelihood function L(x) via the decoder: L(xi ) =

ri n

∑r

.

j

j =1

Neural population N2 has firing rate profile s, to which we apply the decoder: p(xi ) =

si

.

n

∑s

j

j =1

The neural network multiplies firing rates in N1 and N2 to determine firing rates in N3. The firing rate profile t over N3 is t = r1s1 , r2s2 , …, rn sn .

212  Michael Rescorla If we use the decoder, pnew (xi ) =

ti

,

n

∑t

j

j=1

then we have pnew (xi ) =

ri si n

∑r s

.

j j

j=1

So the network implements (normalized) multiplication of a prior with a likelihood function. See Figure 8.8, and see Gershman and Beck (2017), Knill and Pouget (2004) for discussion. This analysis can be extended to decoder (11), though the needed neural operations are more complicated

Figure 8.8 Activity in N1 (encoding the likelihood function), N2 (encoding the prior probability), and N3 (encoding the posterior). In N3, each neuron’s firing rate is obtained by multiplying the firing rates of the corresponding ­neurons in N1 and N2. The vertical axis for N3 has been rescaled for greater legibility. Note the similarity with Figure 8.3.

Neural Implementation of (Approximate) Bayesian Inference 213 than multiplication of firing rates (Barber, Clark, & Anderson, 2003; Pouget et al., 2003; Zemel & Dayan, 1997). One element missing from Figure 8.8 is encoding of the prior likelihood. In Figure 8.8, N1 encodes a likelihood function L(x). Encoding a likelihood function L(x) is not the same as encoding a prior likelihood p(y | x): L(x) is a function of x, while p(y | x) is a function of x and y. Nothing that I have said addresses realization of the two-place function p(y | x). Consequently, I am not inclined to say that Figure 8.8 depicts genuine Bayesian inference (a transition from the prior probability, the prior likelihood, and sensory input to the posterior), although it certainly depicts Bayesian computation (computing the normalized product of a prior and a likelihood function). In a notable contribution, Ma, Beck, Latham, and Pouget (2006) exploit the stochastic nature of neural responses to analyze how prior likelihoods are encoded. Neural response to a stimulus is governed by an objective chance distribution. More formally, there is a conditional distribution: c ( r | x) , where r is firing activity over a neural population and x is the stimulus. Although c(r | x) is an objective chance distribution, we may regard it as encoding a subjective probability distribution. The decoder for the prior likelihood then has the form: p ( r | x) = c ( r | x) .(12) On this approach, stochastic firing propensities of the neural population encode conditional credences (Echeveste & Lengyel, 2018). A widely used posit, which fits the neurophysiological data fairly well, is that neuron i samples from a Poisson distribution with mean determined by tuning curve fi and stimulus x: c(ri | x) =

fi (x)ri e− fi (x) , ri !

where ri is the spike count of neuron i during a fixed time interval. Assuming the Poisson distributions independent of one another, we may write c(r | x) =

∏ i

fi (x)ri e − fi (x) ,(13) ri !

214  Michael Rescorla where r is the profile of spike counts over the population. Other choices for c(r | x) are possible. Assuming (13), a neural population governed by decoder (12) encodes the prior likelihood:



p(r | x) =

i

fi (x)ri e − fi (x) .(14) ri !

Different choices for c(r | x) will yield different encoded prior likelihoods. (14) has an important virtue: it can support genuine Bayesian inference (Ma et al., 2006). Suppose that the prior likelihood is encoded by neural population N1 via the decoder (14), and assume that all tuning curves are Gaussians with variance σ tc2 . Holding r fixed, one can show under mild assumptions that the likelihood p(r | x) is a (possibly unnormalized) Gaussian with mean y and variance τ2 given by n

∑r x

i i

y=

i =1 n

∑r



i

(15)

i=1

n



ri 1 = i=1 . τ 2 σ tc2 y is a weighted average of the preferred values xi, with weights given by the spike counts ri. τ2 is inversely proportional to aggregate activity in N1: more spike counts entail lower variance. We may encode the prior probability in the spike count profile for a separate neural population N2 and the posterior in the spike count profile for a third neural population N3. Let s be the spike count profile for N2. Consider a decoder that maps s to a Gaussian prior p(x) with mean µ and variance σ 2 given by n

∑s x

i i

µ=

i =1 n

∑s

i

i =1

n



si 1 i =1 = 2 . σ2 σ tc



(16)

Neural Implementation of (Approximate) Bayesian Inference 215 Applying (3) to –(15) and (16), one can easily show that the posterior p(x | r) is a Gaussian with mean η and variance ρ2 given by n

∑(r + s )x i

η=

i

i

i =1 n

∑(r + s ) i

i

i =1

n

1 = ρ2

∑(r + s ) i

i =1

i

σ tc2

.

Thus, the neural network can compute the posterior by adding together spike counts in N1 and N2 to determine spike counts in N3. The spike count profile t over N3 is t = r1 + s1 , r2 + s2 , …, rn + sn ,(17) and the encoded Gaussian has parameters given by n

∑t x

i i

η=

i =1 n

∑t

i

i=1



(18)

n



ti 1 i=1 . = ρ 2 σ tc2 See Figure 8.9. This analysis can be extended beyond Gaussians to a more general family of parametrized distributions (Beck, Ma, Latham, & Pouget, 2007; Ma et al., 2006; Sokoloski, 2017). In the general case, the posterior’s parameters are given by linear combination of spike counts rather than by mere addition. As Ma et al. (2006, p. 1435) note, one potential disadvantage of their model is that the encoded prior p(x) varies across trials due to the stochastic nature of neural firing. The decoder (16) is not wellsuited to situations where a stable prior persists across many trials. An alternative decoder proposed by Ganguli and Simoncelli (2014) avoids this problem and also dispenses with a separate neural population

216  Michael Rescorla

Figure 8.9 (Continued)

Neural Implementation of (Approximate) Bayesian Inference 217 Figure 8.9 Activity in hypothetical neural populations N1, N2, and N3. In N3, each neuron’s spike count is obtained by adding the spike counts of the corresponding neurons in N1 and N2. The spike count profile r over N1 encodes an unnormalized Gaussian likelihood with mean y and variance τ2 given by (15). The spike count profile s over N2 encodes a Gaussian prior with mean µ and variance σ 2 given by (16). The spike count profile t over N3 encodes a Gaussian posterior with mean η and variance ρ2 given by (18). Note that the mapping from r to the unnormalized Gaussian is not a decoder Ψ in the sense of (7), because it carries the spike count profile to a likelihood function rather than a prior likelihood.

for the prior. We still consider a population N whose neural responses conform to (13). The prior likelihood is still encoded via (14). Rather than posit a separate population that encodes the prior, Ganguli and Simoncelli posit that relatively stable properties of N itself encode the prior. The core intuition underlying their model is that more neural resources should be associated with more probable stimulus values. An optimally efficient allocation of neural resources will not feature tuning curves spread homogenously across all possible stimulus values (as they are in Figure 8.6). Rather, tuning curves will be arranged so that preferred values xi are clustered more densely around probable values of the stimulus. This promotes accurate encoding of more probable stimulus values while downgrading accurate encoding of less probable stimulus values. Ganguli and Simoncelli formalize these intuitions with a tuning curve density function d(x), which governs the allocation of tuning curves across the neural population: higher density around x entails more neurons whose preferred stimulus value is near x. Under mild assumptions, each tuning curve fi can be written as: fi (x) = kf (D(x) − i),(19) where k is a constant that modulates maximum average firing rate; f is a fixed function, such as an unnormalized Gaussian, that peaks at 0; and D(x) comes from integrating d(x). f serves as a tuning curve template. d(x) warps the template as described by (19), yielding a tuning curve fi with preferred value D−1(i). Ganguli and Simoncelli show that, according to a natural criterion of optimality, the optimal density function satisfies the following equation: p(x) =

d (x) ,(20) n

where n is the total number of neurons in neural population N; and p(x) is the prior over stimulus values. Accordingly, they propose a model on which the prior is encoded by the density function via (20). In the model, there is

218  Michael Rescorla

Figure 8.10 The bottom left panel depicts a collection of tuning curves fi warped by a density function d(x) via equation (19). The pdf p(x) determined by decoder (20) is depicted in the top left panel. The bottom right panel depicts a collection of tuning curves warped by a different density function. The top right panel depicts the encoded pdf.

no need for a separate population that encodes the prior. Instead, the prior is encoded by the allocation of resources across the population N whose stochastic behavior encodes the prior likelihood See Figure 8.10. Note that the proposed decoder (20) is nonparametric: it maps the density function d(x) to a unique prior p(x), without any restriction as to parametric form. Ganguli and Simoncelli (2014, pp. 2117–2118) show that their decoder supports approximate computation of the posterior. The posterior is approximated by a discrete distribution: n

∑rj log f (i − j)

pnew (xi ) =

e .(21) n n  ∑rj log f (k − j)   e j =1      k =1  j =1



By (19), f(i−j) is proportional to neuron j’s average response to neuron i’s preferred stimulus value. (21) uses logarithms of these average responses to form a weighted sum of spike counts, then exponentiates and normalizes.

Neural Implementation of (Approximate) Bayesian Inference 219 A neural network that executes the mandated operations can instantiate firing rate profile t over a separate neural population, where n

∑rj log f (i − j)

ti =

e j =1 .(22) n n  ∑rj log f (k − j )   e j =1     k =1  



Firing rate profile t encodes pnew via the decoder: pnew ( xi ) = ti .(23) Thus, a neural network that employs the density-based encoding scheme (20) can approximate the posterior. Computational neuroscientists have proposed several other PPC implementation mechanisms (Beck, Heller, & Pouget, 2012; Orhan & Ma, 2017; Pouget, Beck, Ma., & Latham, 2013). No doubt further PPC models will emerge in the near future. 8.3.2 Sampling

I now discuss an alternative neural implementation strategy centered on sampling. An early example is the Boltzmann machine (Ackley, Hinton, & Sejnowski, 1985). A Boltzmann machine consists of the following elements: a collection of n neuron-like units that can turn on and off (zi = 1 means that unit i is on, zi = 0 means that it is off); weights wij, codifying the connection strength between units i and j, such that wij = wji and wii = 0; and bias terms bi, codifying the propensity of unit i to take value 1. We may construe zi as the neural network’s current vote regarding the true value of some binary random variable Xi (e.g. whether a perceived object is concave or convex). The weights and bias terms encode a discrete probability distribution over the random variables X1,…, Xn via the decoder: p(x1 ,..., xn ) = ηe H (x1 ,...,xn ) ,(24) where H (x1 ,..., xn ) =

∑w x x + ∑b x , ij i

i< j

j

i i

i

and η is a normalization constant.

220  Michael Rescorla Suppose that the variables X1,…, Xn fall into two categories: observable (X1,…, Xk) and unobservable (Xk+1,…, Xn). We wish to form a new credence over the unobservable variables given observed values x1,…, xk of the observable variables. The posterior is p ( xk+1 ,…, xn | x1 ,…, xk ) .(25) In principle, (25) can be computed directly from (24) using the ratio formula for conditional probabilities: p(a | b) =

p(a, b) . p(b)

However, the computation is not typically tractable. We may instead approximate the posterior as follows. First, “clamp” units 1 through k to the values z1 = x1 , z2 = x2 , …, zk = xk . Second, assign arbitrary values to the remaining units. Third, sample a new value zi from unit i > k according to the conditional chance distribution: 1

c(zi = 1 | z \ i ) =

∑wij z j

− bi −

1+ e

.(26)

j ≠i

where z \ i is the profile of values currently assigned to all the units besides unit i. Cycle through all the remaining units in the same way, holding fixed the clamped units. Continue in this way for some time, sampling values for the non-clamped units according to (26). At each stage, there is an objective chance c(zk+1,…, zn) of sampling values zk+1,…, zn from units k + 1 through n. c(zk+1,…, zn) will change as we continue to draw samples. One can show that c(zk+1,…, zn) converges to the posterior p(xk+1,…, xn | x1,…, xk). If we run the sampling procedure for a sufficient “burn in” period and subsequently set pnew ( xk+1 ,…, xn )  = c ( zk+1 ,…, zn ) ,(27) then pnew(xk+1,…, xn) approximates the posterior p(xk+1,…, xn | x1,…, xk). This is an example of a sampling procedure known as Gibbs sampling, which itself is a special case of a more general sampling strategy known as Metropolis-Hastings (Murphy, 2012, pp. 839–876). See Icard (2016) for extended discussion of the Boltzmann machine and sampling propensities.

Neural Implementation of (Approximate) Bayesian Inference 221 The Boltzmann machine is not very realistic from a neurophysiological perspective. It does not even model the basic fact that neurons emit spikes. Still, it nicely illustrates how neural networks can implement approximate Bayesian inference through sampling. Similar sampling implementations are achievable by far more biologically realistic neural networks. An example is the neural network given by Buesing, Bill, Nessler, and Maass (2011), which models in a biologically plausible way the stochastic interactions within a collection of n spiking neurons. zi = 1 at time t signifies that neuron i has fired in a small time interval ending at t. zi = 0 ­signifies that neuron i has not fired in that time interval. The weights and bias terms encode a probability distribution over n binary random variables X1,…, Xn through the decoder (24).5 Neuron i has membrane potential ui, which is related to spiking activity by the equation: ui = bi +

∑w z , ij j

j ≠i

where the bias term bi codifies neuron i’s excitability; wij is the connection strength between neurons i and j; and zi reflects the current spiking behavior (or lack thereof) of neuron i. To approximate the posterior p ( xk+1 ,…, xn | x1 ,…, xk ) , the network proceeds in roughly the same fashion as the Boltzmann machine: it clamps the values of z1 = x1, z2 = x2, …, zk = xk, then serially samples values zi of the remaining neurons. Samples are drawn stochastically, in a way that depends upon membrane potentials along with other neurophysiological details. Buesing et al. (2011) prove that their stochastic sampling procedure converges to the posterior p(xk+1,…, xn | x1,…, xk). Just as with the Boltzmann machine, we may run the sampling procedure for a “burn in” period and then set pnew ( xk+1 ,…, xn )  = c ( zk+1 ,…, zn ) . The new credal state pnew approximates the true posterior. One disadvantage shared by the Boltzmann machine and the Buesing et al. (2011) model is that they only encode probability distributions over binary random variables. A model given by Nessler, Pfeiffer, Buesing, and Maass (2013) uses sampling to compute the posterior over a discrete random variable X that takes k possible values x1, …, xk. The neural network contains n input neurons, whose spiking behavior is modeled by n binary random variables Y1, …, Yn. These neurons can code values of non-binary

222  Michael Rescorla discrete sensory variables as long as n is large enough (where each input neuron corresponds to a distinct value of some sensory variable). Sample values of X are encoded by a population of k output neurons: a spike by output neuron i codes a sample xi. Output neuron i’s spiking propensity is determined by its membrane potential ui. Membrane potential in turn depends upon input neuron spikes as follows: n

ui = bi − I +

∑w y , ij

j

j =1

where the bias term bi codifies output neuron i’s excitability; wij is the connection strength between output neuron i and input neuron j; and I is an inhibition signal. The prior p(xi) is encoded by the neuron excitability profile via the decoder: p(xi ) = ebi .(28) The prior likelihood is encoded by the weights wij via the decoder: n

∑wij y j

p(y | xi ) = e j=1

,(29)

where y = . Here we must assume that the bias terms and weights meet normalization conditions, so that (28) and (29) yield normalized probabilities. Nessler et al. (2013) show that, under some additional assumptions, spiking propensities among the output neurons match the posterior p(xi | y). If pnew(xi) is encoded by the chances governing output neuron spikes, then pnew(xi) is simply the posterior p(xi | y).6 Most variables encountered in perception are continuous (e.g. shape, size, color, location) rather than discrete. The literature offers several neural network models that sample from the (approximate) posterior for a continuous random variable (e.g. Aitchison & Lengyel, 2016; Hennequin, Aitchison, & Lengyel, 2014; Moreno-Bote, Knill, & Pouget, 2011; Savin & Denève, 2014). The basic idea is usually that samples are encoded by values of a continuous neural variable, such as a neuron’s membrane potential (Orbán, Berkes, Fiser, & Lengyel, 2016). The objective chance function governing this neural variable encodes the (approximate) posterior. For example, the decoder might take the form: pnew ( x )   = c (f ( x ) ) , (30) where f(x) is the membrane potential that encodes stimulus value x and c is an objective chance function governing membrane potentials. Sampling

Neural Implementation of (Approximate) Bayesian Inference 223 neural networks are under active investigation, so we may expect the coming years to bring forth additional models.7 8.3.3  Predictive Coding

Recent philosophical literature places great emphasis on predictive coding (Clark, 2015; Hohwy, 2014). The basic idea is that the neural network generates a prediction α about sensory input. Upon receipt of actual sensory input y, the network computes a prediction error term. Typically (e.g. Rao & Ballard, 1999), prediction error ε is the difference:

ε = y − α. Alternatively (e.g. Spratling, 2016), prediction error may be the quotient:

ε = y/α. Either way, prediction error figures prominently in subsequent computation. For example, it may influence future predictions. Many predictive coding models have hierarchical structure: higher levels of the network pass predictions down to lower levels, and lower levels pass prediction errors back to higher levels.8 There is nothing inherently Bayesian about predictive coding (Aitchison & Lengyel, 2017). However, if one sets up the neural network in the right way, then predictive coding can implement approximate Bayesian inference. Consider the hierarchical neural network given by Spratling (2016). Each level of the hierarchy contains three neural populations: the first computes a vector α of sensory input predictions; the second combines α with sensory input y to compute prediction error ε; the third uses ε to update the estimate x of the underlying distal variable. The update of x, which depends upon a matrix W of feedforward weights, is used to update sensory prediction α. The prior probability is encoded as a scaling factor that modulates the feedforward weight matrix W. The prior likelihood is encoded by a population of input neurons with independent Poisson variability. The posterior is encoded by firing rates in the prediction neurons, where each prediction neuron has a preferred stimulus value. Under this decoding scheme, Spratling shows that the network can compute an approximate posterior for Gaussian priors and some nonGaussian priors. The literature offers various alternative predictive coding implementations of approximate Bayesian inference. For example, Lee and Mumford (2003) offer a sampling-based predictive coding implementation of iterated approximate Bayesian inference, while Friston (2005, 2010) develops

224  Michael Rescorla a predictive coding implementation that computes a variational approximation to the posterior. See Spratling (2017) for an overview. 8.4 Morals The previous section canvassed several theories of how the brain implements approximate Bayesian inference. I will not consider neurophysiological evidence for or against the theories. Instead, I want to advance five morals that we can draw quite apart from which theory (if any) turns out to be correct. First moral: There are diverse biologically plausible candidate neural realizers for credal states. A neural realizer for the prior probability is a neural variable U that, at a bare minimum, satisfies appropriate counterfactuals of the form (4). A neural realizer for the prior likelihood is a neural variable V that, at a bare minimum, satisfies appropriate counterfactuals of the form (6).9 A neural realizer for the approximate p ­ osterior is a neural variable W that, at a bare minimum, satisfies appropriate counterfactuals of the form (8). Section 8.3 canvassed several candidate neural realizers:

• firing rate profile over a neural population: Equations (10), (11), and (23) • spike count profile over a neural population: Equations (16) and (18) • chance distribution governing neural response to a stimulus: • • • •

Equation (14) tuning curve density function: Equation (20) sampling propensities: Equations (27) and (30) neuron excitability profile: Equation (28) weights in a neural network: Equations (24) and (29)

These candidates are all under scientific active investigation, as are other candidates. The candidates range from the relatively concrete (e.g. spike count profile) to the highly abstract (e.g. tuning curve density function). Second moral: Credal assignments may be implicit. None of the models we have considered feature explicit enumeration of prior probabilities. The closest is Equation (10), where ri may be construed as encoding the probability assigned to xi. But even (10) does not feature true explicit enumeration: first, firing rates are normalized to yield probabilities; second, the encoding scheme implicitly specifies that stimulus values other than preferred values xi receive probability 0. In the other encoding schemes, credal assignments are even more implicit. An extreme example of implicit encoding is the tuning curve density function d(x). We theorists represent d(x), but the neural network itself does not represent d(x). Assuming decoder (20), the prior is not explicitly recorded anywhere in the network’s

Neural Implementation of (Approximate) Bayesian Inference 225 computations. Instead, it is implicitly enshrined by the neural network’s allocation of resources. Third moral: The prior probability and the posterior may have very different neural realizers. In Ganguli and Simoncelli (2014), the prior probability is realized by tuning curve density d(x) via Equation (20), while the posterior is encoded by firing rate profile via Equation (23). In Nessler et al. (2013), the prior probability is encoded by the neuron excitability profile via Equation (28), while the posterior is encoded by sampling propensities. These examples demonstrate that a single neural network may realize credal assignments in different ways at different stages of computation. The examples vividly illustrate multiply realizability, a crucial mark of the mental first highlighted by Putnam (1967). A psychological state type (in this case, a credal assignment over a hypothesis space) may have distinct tokens that are quite diverse at the neural level. Our examples show that distinct tokens may be neurally diverse even within a single biologically plausible neural system. Fourth moral: The prior probability and the prior likelihood may or may not be separately encoded. They are separately encoded in most of the models I considered. In Ma et al. (2006), for example, the prior probability is encoded by spike counts in a neural population via Equation (16), while the prior likelihood is encoded by objective chances via Equation (14). In some models, though, a single neural state encodes both the prior probability and the prior likelihood. The Boltzmann machine encodes a prior p(x1,…, xn) via Equation (24), and the encoded prior determines all relevant unconditional and conditional probabilities—including the prior likelihood p(x1,…, xk | xk+1,…, xn). Similarly for the Buesing et al. (2011) model. Nothing about the Bayesian framework requires separate encoding of the prior p(h) and the prior likelihood p(e | h). One can instead encode a prior p(e, h) and then define conditional probabilities via the ratio formula. In that case, the prior probability and the prior likelihood are encoded by the same neural variable. In terms of clauses (5) and (7): U = V, and Φ maps u to the prior probability p(h), while Ψ maps v (= u) to the prior likelihood p(e | h). Fifth moral: There are diverse biologically plausible candidate neural implementation mechanisms for approximate Bayesian inference. Physical implementation of approximate Bayesian inference can be achieved, at least in principle, through diverse neural operations falling squarely within the repertoire of the human brain. In Ma et al. (2006), approximate Bayesian inference is implemented by linear combination of spike counts. In Ganguli and Simoncelli (2014), it is implemented by linear combination of spike counts, exponentiation, and normalization. In sampling models, it is implemented by a sampling algorithm. In Spratling (2016), it is implemented by computation of prediction errors. These implementation

226  Michael Rescorla schemes are under active scientific investigation, with various pieces of empirical evidence for or against each candidate scheme. In particular, predictive coding is just one proposed neural implementation among others. Many proposed implementations do not involve anything like computation of prediction error. At least some of those proposed implementations have just as much empirical support as any known predictive coding implementation (Aitchison & Lengyel, 2017). 8.5  Simulation, Not Implementation I have been discussing the diverse ways that a neural network might implement approximately Bayesian inference. I will now discuss neural networks that simulate rather than implement approximate Bayesian inference. In a typical Bayesian perceptual model, the new credal state pnew(h) is not the final output but instead is used to select a privileged estimate h*. The model determines a deterministic or stochastic mapping from inputs e to estimates h*. In principle, there are several ways a neural network might instantiate the desired mapping from e to h* without implementing the credal transition depicted by Figures 8.4 and 8.5:

• No priors, no approximate posterior (Figure 8.11). As noted in Sec-

tion 8.2, the mapping from e to h* could be implemented by a machine that consults a look-up table. Alternatively, we can sometimes train a neural network to implement the mapping (Simoncelli, 2009). There are even circumstances where unsupervised learning enables a system to emulate a Bayesian estimator (Raphan & Simoncelli, 2007). Thus, a neural network can mimic Bayesian estimation without instantiating any credal states or transitions. • Prior likelihood, approximate posterior, but no prior probability (Figure 8.12). A system may encode the prior likelihood and an approximate posterior but not the prior probability. To illustrate, consider a simplified version of the (Ma et al., 2006) model. As we have seen, the prior likelihood p(r | x) is encoded by the objective chance distribution governing a neural population N, via Equation (14). Assuming a flat prior, one can show that the posterior p(x | r) is a Gaussian with parameters determined by (15). So, assuming a flat prior, there is no need for

Figure 8.11 Causal structure of a neural network that does not encode priors or a posterior.

Neural Implementation of (Approximate) Bayesian Inference 227

Figure 8.12 Causal structure of a neural network that encodes a prior likelihood and an approximate posterior but not a prior probability.

separate encoding of the prior or separate computation of the posterior: the spike count profile r over N itself already encodes the posterior. See Ma et al. (2006, pp. 1433–1444) for discussion. • Prior probability, approximate posterior, but no prior likelihood (Figure 8.13). A system may encode the prior probability and the approximate posterior but not the prior likelihood. Consider the model given by Rullán Buxó and Savin (2021), which combines sampling and parametric encoding. Firing rates in neural population N1 encode samples from a probability distribution over random variables X1,…, Xn. At first, N1 samples spontaneously from the prior p(x1,…, xn). Upon receiving sensory input e, N1 samples from the posterior p(x1,…, xn | e). Samples produced by N1 serve as input to a second neural population N2, which computes parameters for an approximate posterior pnew(xi) over a single variable of interest Xi. Although the prior probability and the approximate posterior are implicitly encoded, the prior likelihood p(e | x1,…, xn) is not. No neural variable realizes p(e | x1,…, xn). Instead, the prior likelihood is embedded in the sampling dynamics for N1. • Priors, but no approximate posterior (Figure 8.14). A system may encode the prior probability and the prior likelihood but transform input e

Figure 8.13 Causal structure of a neural network that encodes a prior probability and an approximate posterior but not a prior likelihood.

228  Michael Rescorla

Figure 8.14 Causal structure of a neural network that encodes a prior probability and a prior likelihood but not an approximate posterior.

into estimate h* without computing an approximate posterior. Consider the predictive coding model given by Rao and Ballard (1999). The neural network encodes the prior probability and the prior probability. In response to input e, the network uses a predictive coding algorithm to select the MAP estimate. As Rao (2004, pp. 29–30) notes, the network does not compute p(h | e) or any approximation to p(h | e). It only computes argmaxh p(h | e). So it does not implement a credal transition (a transition among credal states). • Approximate posterior, but no priors (Figure 8.15). We might train a system to compute p(h | e) in response to input e even though the system does not encode the prior probability or the prior likelihood. An example is the neural network given by Echeveste, Aitchison, Hennequin, and Lengyel (2020), which was trained to respond to input e with a sampling-based encoding of the posterior p(h | e).

Figure 8.15 Causal structure of a neural network that encodes an approximate posterior but not a prior probability or a prior likelihood.

Neural Implementation of (Approximate) Bayesian Inference 229 A neural network that implements the mapping from e to h* need not instantiate (Figure 8.5). It might instead instantiate one of Figures 8.11–8.15. A key feature that differentiates Figure 8.5 from Figures 8.11–8.15 is the presence of neural realizers for credal states. Neural networks conforming to Figure 8.5 feature neural realizers for the prior probability, the prior likelihood, and the approximate posterior. Figures 8.11–8.15 depict situations where at least one of those three credal states lacks a neural realizer. Genuine implementation of approximate Bayesian inference requires that all three credal states have neural realizers. If a neural network maps inputs e to estimates h* in accord with an approximately Bayesian model, then the network’s activity must reflect the model’s priors in some way. The question is how it reflects them. According to Figure 8.5, priors are encoded by states of the neural network. The network applies general neural operations (e.g. linear combination of spike counts), yielding a new neural state that encodes a new credal state pnew(h). Figures 8.11–8.15 diverge from that picture to varying degrees. Take Figure 8.12. Here the prior probability is not encoded by any neural state but is instead subsumed into the network dynamics: the transition from the prior likelihood to the approximate posterior is only appropriate assuming the fixed prior probability. Depending on the details, it may be difficult or impossible to change the network dynamics to reflect a different prior. Figure 8.5 posits a flexible dynamics that can accommodate different priors, while Figure 8.12 posits a dynamics tailored to a specific prior. Figure 8.5 views the prior as an adjustable parameter that can change even as the network dynamics remains fixed. Figure 8.12 recognizes no such adjustable parameter. To illustrate, compare the (Ma et al., 2006) model in two versions: the version from Section 8.3.1, with a separate neural population N2 that encodes the prior; and the version captured by Figure 8.12, in which the dynamics is tailored to a flat prior. In the first version, the prior is an adjustable parameter. We can change it (by changing spike counts in N2) without changing the network dynamics. In the second version, the flat prior is not an adjustable parameter. Incorporating a non-flat prior would require radical changes to the network dynamics. A similar contrast applies to Figure 8.5 versus Figure 8.13: the ­former posits a flexible dynamics that can accommodate different prior likelihoods, while the latter posits a dynamics tailored to a specific prior ­likelihood. The contrast is even starker for Figures 8.11 and 8.15, which posit a dynamics tailored to a specific prior probability and a specific prior likelihood. Now compare Figure 8.5 with Figure 8.14. Figure 8.5 computes the full approximate posterior, while Figure 8.14 only computes an estimate h*. Networks conforming to Figure 8.5 can, at least in principle, support

230  Michael Rescorla computations that networks conforming to Figure 8.14 cannot. A neural network that encodes the approximate posterior can in principle execute (or be supplemented so as to execute) the following computations:

• expected value computation relative to the approximate posterior • probability matching, i.e. stochastically selecting a privileged estimate

h* with objective chance given approximately by the approximate posterior • further approximate Bayesian inference, with the approximate posterior serving as the new prior These computations may not always be possible in practice. But the implementation schemes surveyed in Section 8.3 and Section 8.4 support at least some of the computations through biologically plausible neural operations. For example, the Ganguli and Simoncelli (2014) model supports expected value computation, and sampling models trivially support probability matching. In contrast, a neural network that does not encode the approximate posterior cannot execute any such computations. Crucial information is irretrievably lost when a network encodes only a privileged estimate h* rather than an approximate posterior. Thus, the contrast between Figures 8.5 and 8.14 has significant implications for future computation. Our discussion highlights a crucial advantage offered by neural networks that implement approximate Bayesian inference versus neural networks that merely simulate approximate Bayesian inference: flexibility. Neural realization of the priors enables a flexible dynamics that can remain fixed as the priors change. It thereby supports Bayesian estimation across changing environmental conditions. Neural realization of the approximate posterior enables flexibility regarding which future computations can be executed. It thereby supports more computational options than are supported by selection of a privileged estimate h*. Hence, implementation offers greater computational flexibility than mere simulation.10 8.6  Methodological Implications of Realism The debate between realism and instrumentalism has major methodological implications for computational neuroscience. From the realist perspective, we should expect to find neural realizers for the credal states posited by empirically successful Bayesian perceptual models. We should take Figure 8.5 as a guide to underlying neural activity. From the instrumentalist perspective, there is no reason to take Figure 8.5 as a guide. There is no reason to expect that we will find neural realizers for priors or the approximate posterior.

Neural Implementation of (Approximate) Bayesian Inference 231 Figures 8.11–8.15 decline to extend the realist viewpoint towards the prior probability, the prior likelihood, or the approximate posterior. Importantly, though, only Figure 8.11 embodies total rejection of realism. Each other figure extends the realist viewpoint towards either the prior probability, the prior likelihood, or the approximate posterior. So each other figure embodies what one might call local realism regarding specific elements of Bayesian models (e.g. realism regarding the prior probability but not the prior likelihood or the approximate posterior). Local realism conflicts with instrumentalism, which declines to extend the realist viewpoint towards any credal states posited by Bayesian models.11 My analysis hinges upon neural realization of credal states, yet I have not said what makes it the case that a neural variable realizes a credal state. To illustrate, assume for the sake of argument that spike count profile realizes the prior according to decoder (16). Why that particular decoder rather than some other decoder or no decoder at all? Lacking concrete answers to such questions, some readers may feel that I have not identified any substantive difference between implementing versus merely simulating approximate Bayesian inference. It might seem that one can always read the causal structure from Figure 8.5 into a physical system that emulates approximate Bayesian inference. One need merely isolate variables U, V, and W that mediate in the appropriate way between e and h*. One can then interpret those variables using whatever decoders Φ, Ψ, and Γ one pleases, thereby depicting the system as conforming to Figure 8.5. Apparently, the contrast I have drawn between realism and instrumentalism evaporates.12 I find this line of thought misplaced for several reasons. First, it is hardly obvious that we can find suitable neural variables U, V, and W that mediate between e and h*. We should allow only variables that a neuroscientist would take seriously—e.g. spike count, firing rate, synaptic weight, etc. We should disallow disjunctive or gerrymandered variables. This restriction severely limits our ability to read the causal structure from Figure 8.5 into a physical system. Second, one cannot simply interpret a neural variable using whatever decoder one pleases. Whether a neural state realizes a credal state is not a matter of interpretation. Either the neural state realizes the credal state or it does not. Admittedly, I have not given necessary and sufficient conditions for realizing a credal state. But this does not mean that anything goes. As mentioned in Section 8.3, a neural state can realize a prior over a distal variable only if the neural state bears appropriate causal connections to the distal variable. Quite plausibly, the neural state must also figure or potentially figure in some characteristic Bayesian computations, such as computation of expected value or of an approximate posterior. More generally, a neural state realizes a credal state only if the neural state is appropriately related to

232  Michael Rescorla the distal environment and to other neural states. Given these restrictions on variables and decoders, it is not so easy to read Figure 8.5 into any arbitrary system that emulates approximate Bayesian estimation. For example, there is no evident way to impose Figure 8.5 upon a system that emulates Bayesian estimation using a look-up table. To take a more realistic example, there is no evident way to depict the Rao and Ballard (1999) predictive coding model as including a neural realizer for the approximate posterior. Obviously, we would like to clarify physical realization of credal states. The key point for present purposes is that, even lacking the desired clarification, realism entrains fundamentally different methodological commitments than instrumentalism. If we adopt a realist viewpoint towards a credal state, then we should seek a neural realizer for the credal state. If we extend the realist viewpoint towards the prior probability, the prior likelihood, and the posterior, then we should seek neural realizers for all three credal states. Instrumentalists see no need to seek neural realizers for any credal states. In practice, then, realists and instrumentalists will tend to pursue very different models of neural computation. Should we adopt a realist viewpoint towards the credal states posited within Bayesian perceptual psychology? In my opinion, there is strong evidence that perceptual computation exhibits the flexibility characteristic of Figure 8.5:

• Change in the prior probability. To illustrate, consider a well-known

perceptual illusion: when a moving line is viewed at low contrast, its perceived direction of motion is biased towards the perpendicular. The Weiss et al. (2002) motion estimation model explains this illusion through the “slow motion” prior: the illusory perpendicular velocity is slower than the true velocity. Sotiropoulos, Seitz, and Seriès (2011) exposed subjects to fast moving parallel lines. After exposure, subjects tended to perceive the lines as moving obliquely rather than perpendicularly, corresponding to a faster speed. The change in motion perception is well-explained by a shift in the “slow motion” prior to favor faster speeds.13 • Change in the prior likelihood. In many cases, sensory adaptation is well-explained by a change in the prior likelihood. Consider the ventriloquism illusion: if there is a conflict between visual and auditory cues to stimulus location, the visual system heavily favors the visual cue when forming a unified location estimate. The ventriloquism illusion can be explained in Bayesian terms, as an inference based on the visual cue and the auditory cue (Alais & Burr, 2004). Repeated exposure to the ventriloquism illusion induces the ventriloquism aftereffect, in which location estimates based solely on the auditory cue are

Neural Implementation of (Approximate) Bayesian Inference 233 systematically altered. Sato, Toyoizumi, and Aihara (2007) show that the ventriloquism aftereffect is well-explained by a shift in the prior likelihood relating location estimates to auditory cues. Intuitively: sustained exposure to ventriloquism changes the auditory stimulation that the perceptual system expects from a given stimulus location. • Computations that exploit the (approximate) posterior. There is strong evidence that the perceptual system can sometimes execute computations exploiting the approximate posterior (Koblinger, Fiser, & Lengyel, 2021). A good example is the object-tracking model given by Kwon et al. (2015). The model posits sequential Bayesian estimation of position and velocity in response to sequential sensory input. At each stage, the Bayesian estimator executes a new probabilistic inference, taking the posterior from the previous stage as the new prior. The model explains a range of motion illusions that otherwise resist unified explanation. These are just some representative examples. Overall, the scientific literature offers strong psychophysical evidence for flexible perceptual computations that fit better with a realist approach to credal states than with an instrumentalist approach (Rescorla, 2020b). Instrumentalists may hope to explain the psychophysical evidence through alternative anti-realist explanations. In that spirit, Block (2018) suggests that one might explain apparent changes in the prior probability through a model that simulates Bayesian estimation and also simulates a changing prior caused by changing environmental conditions. However, it is not enough merely to suggest that some possible theory might explain the change in direction perception documented by Sotiropoulos et al. (2011). One must propose an actual theory that explains the observed phenomena without positing a changed prior. One must then compare the proposed theory with the realist alternative. So far, this has not happened. Instrumentalists have not proposed alternative explanations that abjure credal states and transitions, let alone argued that such explanations can equal or surpass explanations that posit credal states and transitions. I think that we currently have good reason to favor realism over instrumentalism. We have good reason to take Figure 8.5 as a guide to neural activity. As I have documented, lots of research within computational neuroscience pursues precisely that realist agenda. The agenda has proved fruitful, with several recent studies supplying suggestive neurophysiological evidence for neural realization of credal states (Berkes, Orbán, Lengyel, & Fiser, 2011; Sohn & Narain, 2021; Walker, Cotton, Ma, & Tolias, 2020). Future scientific developments will reveal whether the realist agenda yields well-confirmed models of the brain.

234  Michael Rescorla Acknowledgments I thank Rosa Cao, Thomas Icard, Jiarui Qu, Susanna Schellenberg, Nicholas Shea, and the editors for helpful feedback on an earlier draft of this paper. I am also grateful to Jiarui Qu for preparing Figures 8.1–8.15. Notes 1 Roughly speaking, a computation is tractable when it can be executed by a physical system with limited time and memory at its disposal. A computation is intractable when it is not tractable. For discussion of computational tractability in relation to cognitive science, see van Rooij et al. (2019). 2 For some sampling algorithms, it is natural to view credences as encoded by the distribution of samples rather than by sampling propensities. Suppose that the algorithm draws n samples v1, v2, …, vn. For i ≤ n, let δvi be the Dirac measure centered at vi (i.e. the measure that allocates all probability mass at vi). We may regard the samples as encoding a probability distribution q via the equation q = 1n ∑in=1 δvi . This encoding scheme figures in the particle filter, which approximates iterated Bayesian inference given sequential inputs (Crisan & Doucet, 2002; Murphy 2012, pp. 825–837). At each time, the particle filter responds to new input by drawing a new set of n samples; the new samples encode a new probability distribution q. As the number of samples goes to infinity, the distribution q encoded at each time converges to the true posterior at that time. 3 I intend the left-hand side of (5) to denote a function that assigns a probability (or a probability density) to each possible h. To be more careful, I should use lambda notation and write (5) as λh.p(h) = Φ(u). Similarly for Equations (7) and (9) below. However, lambda notation seems to me needlessly fussy for present purposes. Throughout the text, I sloppily use the expression “p(h)” sometimes to denote a function and sometimes to denote a specific real number assigned to a specific h. Context should make clear which denotation I have in mind. 4 Some readers may worry that Figure 8.5 suggests a problematic “causal overdetermination,” whereby credal assignment p(h) and neural state u overdetermine privileged estimate h*. I want to resist any such interpretation of Figure 8.5. There is really just one channel of causal influence described twice over: at the psychological level (by citing the credal assignment) and at the neural level (by citing neural state u). The neural level realizes the psychological level, so there is no causal overdetermination. The literature offers several avenues for developing this intuitive diagnosis in more rigorous terms. See Bennett (2007), Rescorla (2014a), Woodward (2008), and Woodward (2015) for discussion of the complex interrelations between mental causation and neural realization. 5 The model can be generalized to handle other decoders (Buesing et al., 2011, p. 4). See also Pecevski et al. (2011) for further generalizations. 6 See also the sampling-based neural network given by Huang and Rao (2016), which models iterated approximate Bayesian inference for arbitrary probability distributions over a finite space. 7 Some particle filter neural implementation models feature an encoding scheme along the lines of note 2. A good example is the model given by Kutschireiter et al. (2017), which implements iterated approximate Bayesian inference for finitely many continuous random variables.

Neural Implementation of (Approximate) Bayesian Inference 235 8 See Cao (2020) for critical discussion of talk about “prediction” and “prediction error” in this context. 9 In some models, such as Ma et al. (2006) and Ganguli and Simoncelli (2014), the neural network executes an approximate Bayesian inference based on spike count profile r over a neural population. r is caused by proximal sensory input e. However, e does not enter directly into the inference. Accordingly, the neural network realizes a prior likelihood p(r | x) defined over spike count r rather than proximal sensory input e. The rationale here is that neural computation has direct access to r rather than e. 10 In the machine learning literature, it is standard to distinguish between generative versus discriminative models (Murphy, 2012, pp. 270–279). Basically, a generative model uses the prior p(h) and the prior likelihood p(e | h) to compute the posterior p(h | e), while a discriminative model computes the posterior or some function of the posterior but does not encode the priors. Generative models correspond to Figure 8.5. Discriminative models correspond either to Figure 8.11 or to Figure 8.15, depending on whether the model merely maps e to a function of p(h | e) or whether the model computes p(h | e) itself. Increased computational flexibility is widely recognized as an advantage offered by generative models over discriminative models (Murphy, 2012, p. 271). 11 Sohn and Narain (2021) distinguish two perspectives on neural implementation of Bayesian inference: the modular perspective and the transform perspective. According to the modular perspective, “probabilistic computations are carried out using independent representations of likelihood, prior, and posterior distributions, followed by the generation of an estimate” (p. 123). They cite Ma et al. (2006) as an exemplar of the modular perspective. According to the transform perspective, “uncertain sensory measurements can be directly mapped into Bayesian estimates via latent processes within which prior distributions are embedded. This process does not mandate encoding of probabilistic distributions on each trial” (pp. 122–123). The transform perspective does not require encoding of priors or prior likelihoods, nor does it require computation of the posterior. It only requires that the system emulates Bayesian estimation. Thus, it encompasses Figure 8.11–8.15. Sohn and Narain (p. 124) also cite the Ganguli and Simoncelli (2014) model as an example of the transform perspective rather than the modular perspective. In that model, the prior and the likelihood functions are not independently encoded: the density d(x) warps the tuning curves fi(x) and thereby influences the prior likelihood (14); so a change in the prior will generally induce a change in the likelihoods. I believe that the contrast between the modular and transform perspectives, while useful for some purposes, blurs vital distinctions. There is a significant difference between the Ganguli and Simoncelli (2014) model and a neural network with the causal structure of Figure 8.11: the former uses an implicitly encoded prior and prior likelihood to compute the posterior; the latter does not. Even though the Ganguli and Simoncelli (2014) model does not encode the prior and the prior likelihood independently, it seems closer in many important respects to the Ma et al. (2006) model than to a neural network that merely emulates Bayesian estimation. 12 This worry is closely connected triviality arguments regarding computational implementation, propounded by Putnam (1988) and Searle (1990). For critical discussion of triviality arguments, see Rescorla (2013,2014b). 13 Another example of flexibility: new priors can transfer from one perceptual task to another (Adams, Graf, & Ernst, 2004; Maloney & Mamassian, 2009). See Rescorla (2020b) for discussion in support of realism.

236  Michael Rescorla References Ackley, D., Hinton, G., & Sejnowski, T. (1985). A learning algorithm for Boltzmann machines. Cognitive Science, 9, 147–169. Adams, W., Graf, E., & Ernst, M. (2004). Experience can change the “light-fromabove” prior. Nature Neuroscience, 7, 1057–1058. Aitchison, L., & Lengyel, M. (2016). The Hamiltonian brain: Efficient probabilistic inference with excitatory-inhibitory neural circuit dynamics. PLoS Computational Biology, 12, e1005186. Aitchison, L., & Lengyel, M. (2017). With or without you: Predictive coding and Bayesian inference in the brain. Current Opinion in Neurobiology, 46, 219–227. Alais, D., & Burr, D. (2004). The ventriloquist effect results from near-optimal bimodal integration. Current Biology, 14, 257–262. Barber, M., Clark, J., & Anderson, C. (2003). Generating neural circuits that ­implement probabilistic reasoning. Physical Review E, 68, 041912. Beck, J., Ma, W. J., Latham, P. E., & Pouget, A. (2007). Probabilistic population codes and the exponential family of distributions. In P. Cisek, T. Drew, & J. F. Kalaska (Eds.), Computational neuroscience: Theoretical insights into brain function. New York: Elsevier. Beck, J., Heller, K., & Pouget, A. (2012). Complex inference in neural circuits with probabilistic population codes and topic models. In P. Bartlett (Ed.), Advances in neural information processing systems. Cambridge, MA: MIT Press. Bennett, K. (2007). Mental causation. Philosophy Compass, 2, 316–337. Berkes, P., Orbán, G., Lengyel, M., & Fiser, J. (2011). Spontaneous cortical activity reveals hallmarks of an internal model of the environment. Science, 331, 83–7. Block, N. (2018). If perception is probabilistic, why does it not seem probabilistic? Philosophical Transactions of the Royal Society B, 373, 20170341. Buesing, L., Bill, J., Nessler, B., & Maass, W. (2011). Neural dynamics as sampling: A model for stochastic computation in recurrent networks of spiking neurons. PLoS Computational Biology, 7, e1002211. Burge, T. (2007). Foundations of mind. Oxford: Clarendon Press. Burge, T. (2010). Origins of objectivity. Oxford: Oxford University Press. Cao, R. (2020). New labels for old ideas: Predictive processing and the interpretation of neural signals. Review of Philosophy and Psychology, 11, 517–546. Clark, A. (2015). Surfing uncertainty. Oxford: Oxford University Press. Colombo, M., & Seriès, P. (2012). Bayes on the brain – On Bayesian modeling in neuroscience. The British Journal for the Philosophy of Science, 63, 697–723. Crisan, D., & Doucet, A. (2002). A survey of convergence results on particle filtering methods for practitioners. IEEE Transaction on Signal Processing, 50, 736–746. Dayan, P., & Abbott, L. F. (2001). Theoretical neuroscience. Cambridge, MA: MIT Press. Echeveste, R., Aitchison, L., Hennequin, G., & Lengyel, M. (2020). Cortical-like dynamics in recurrent circuits optimized for sampling-based probabilistic inference. Nature Neuroscience, 23, 1138–1149. Echeveste, R., & Lengyel, M. (2018). The redemption of noise: Inference with neural populations. Trends in Neurosciences, 41, 767–770. Egan, F. (2010). Computational models: A modest role for content. Studies in History and Philosophy of Science, 41, 253–259.

Neural Implementation of (Approximate) Bayesian Inference 237 Erikkson, L., & Hájek, A. (2007). What are degrees of belief? Studia Logica, 86, 183–213. Fiser, J., Berkes, P., Orbán, G., & Lengyel, M. (2010). Statistically optimal perception and learning: From behavior to neural representations. Trends in Cognitive Science, 14, 119–130. Friston, K. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society B, 360, 815–836. Friston, K. (2010). The free-energy principle: A unified brain theory? Nature ­Reviews Neuroscience, 11, 127–138. Ganguli, D., & Simoncelli, E. (2014). Efficient sensory encoding and Bayesian inference with heterogeneous neural populations. Neural Computation, 26, 2103–2134. Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehatri, A., & Rubin, D. (2014). Bayesian data analysis (3rd ed.). New York: CRC Press. Gershman, S., & Beck, J. (2017). Complex probabilistic inference: From cognition to neural computation. In Moustafa (Ed.), Computational models of brain and behavior. Hoboken: Wiley-Blackwell. Gershman, S., Vul, E., & Tenenbaum, J. (2012). Multistability and perceptual ­inference. Neural Computation, 24, 1–24. Hennequin, G., Aitchison, L., & Lengyel, M. (2014). Fast sampling-based inference in balanced neuronal networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems 2: pp. 2240–2248. Hohwy, J. (2014). The predictive mind. Oxford: Oxford University Press. Huang, Y., & Rao, R. (2016). Bayesian inference and online learning in poisson neuronal networks. Neural Computation, 28, 1503–1526. Icard, T. (2016). Subjective probability as sampling propensity. The Review of Philosophy and Psychology, 7, 863–903. Knill, D., & Pouget, A. (2004). The Bayesian brain: The role of uncertainty in neural coding and computation. Trends in Neuroscience, 27, 712–719. Knill, D., & Richards, W. (Eds.). (1996). Perception as Bayesian inference. Cambridge: Cambridge University Press. Koblinger, Á, Fiser, J., & Lengyel, M. (2021). Representations of uncertainty: Where art thou? Current Opinion in Behavioral Sciences, 38, 150–162. Kutschireiter, A., Surace, S. C., Sprekeler, H., & Pfister, J.-P. (2017). Nonlinear Bayesian filtering and learning: A neuronal dynamics for perception. Scientific Reports, 7, 8722. Kwisthout, J., Wareham, T., & van Rooij, I. (2011). Bayesian intractability is not an ailment that approximation can cure. Cognitive Science, 35, 779–784. Kwon, O.-S., Tadin, D., & Knill, D. (2015). Unifying account of visual motion and position perception. Proceedings of the National Academy of Sciences, 112, 8142–8147. Lee, T. S., & Mumford, D. (2003). Hierarchical Bayesian inference in the visual cortex. Journal of the Optical Society of America, 20, 1434–1448. Loewer, B. (1997). A guide to naturalizing semantics. In C. Wright, & B. Hale (Eds.), A companion to the philosophy of language. Oxford: Blackwell. Ma, W. J. (2019). Bayesian decision models: A primer. Neuron, 104, 164–175. Ma, W. J., Beck, J., Latham, P., & Pouget, A. (2006). Bayesian inference with probabilistic population codes. Nature Neuroscience, 9, 1432–1438.

238  Michael Rescorla Maloney, L., & Mamassian, P. (2009). Bayesian decision theory as a model of human visual perception: Testing Bayesian transfer. Visual Neuroscience, 26, 147–155. Moreno-Bote, R., Knill, D., & Pouget, A. (2011). Bayesian sampling in visual perception. Proceedings of National Academy of Sciences, 108, 12491–12496. Murphy, K. (2012). Machine learning: A probabilistic perspective. Cambridge, MA: MIT Press. Nessler, B., Pfeiffer, M., Buesing, L., & Maass, W. (2013). Bayesian computation emerges in generic cortical microcircuits through spike-timing-dependent plasticity. PLoS Computational Biology, 9, e1003037. Orbán, G., Berkes, P., Fiser, J., & Lengyel, M. (2016). Neural variability and sampling-based probabilistic representations in the visual cortex. Neuron, 92, 530–542. Orhan, A. E., & Ma, W. J. (2017). Efficient probabilistic inference in generic neural networks trained with non-probabilistic feedback. Nature Communications, 8(1), 14. Orlandi, N. (2014). The innocent eye: Why vision is not cognitive process. Oxford: Oxford University Press. Pecevski, D., Buesing, L., & Maass, W. (2011). Probabilistic inference in general graphical models through sampling in stochastic networks of spiking neurons. PLoS Computational Biology, 7, e1002294. Pouget, A., Dayan, P., & Zemel, R. (2003). Inference and computation with population codes. Annual Review of Neuroscience, 26, 381–410. Pouget, A., Beck, J., Ma., W. J., & Latham, P. (2013). Probabilistic brains: Knowns and unknowns. Nature Neuroscience, 16, 1170–1178. Putnam, H. (1967). Psychophysical predicates. In W. Capitan, & D. Merrill (Eds.), Art, mind, and religion. Pittsburgh, PA: University of Pittsburgh Press. Putnam, H. (1988). Representation and reality. Cambridge, MA: MIT Press. Ramsey, F. P. (1931). Truth and probability. In R. B. Braithwaite (Ed.), The foundations of mathematics and other logical essays. London: Routledge and Kegan. Rao, R. (2004). Bayesian computation in recurrent neural circuit. Neural Computation, 16, 1–38. Rao, R., & Ballard, D. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2, 79–87. Raphan, M., & Simoncelli, E. (2007). Learning to be Bayesian without supervision. In B. Schölkopf, J. Platt, & T. Hofmann (Eds.), Advances in neural information processing systems (Vol. 19). Cambridge, MA: MIT Press. Rescorla, M. (2013). Against structuralist theories of computational implementation. The British Journal for the Philosophy of Science, 64, 681–704. Rescorla, M. (2014a). The causal relevance of content to computation. Philosophy and Phenomenological Research, 88, 173–208. Rescorla, M. (2014b). A theory of computational implementation. Synthese, 191, 1277–1307. Rescorla, M. (2015). Bayesian perceptual psychology. In M. Matthen (Ed.), The Oxford handbook of the philosophy of perception. Oxford: Oxford University Press. Rescorla, M. (2020a). Perceptual co-reference. The Review of Philosophy and Psychology, 11, 569–589.

Neural Implementation of (Approximate) Bayesian Inference 239 Rescorla, M. (2020b). A realist perspective on Bayesian cognitive science. In A. Nes, & T. Chan (Eds.), Inference and consciousness. New York: Routledge. Rescorla, M. (2021). Bayesian modeling of the mind: From norm to neurons. WIREs Cognitive Science, 12, e1540. Rullán Buxó, C., & Savin, C. (2021). A sampling-based circuit for optimal decision-making. Advances in Neural Information Processing Systems, 34, 14163–14175. Sanborn, A., & Chater, N. (2016). Bayesian brains without probabilities. Trends in Cognitive Science, 20, 883–893. Sato, Y., Toyoizumi, T., & Aihara, K. (2007). Bayesian inference explains perception of unity and ventriloquism aftereffect: Identification of common sources of audiovisual stimuli. Neural Computation, 19, 3335–3355. Savin, C., & Denève, S. (2014). Spatio-temporal representations of uncertainty in spiking neural networks. Advances in Neural Information Processing Systems, 27, 2024–2032. Searle, J. (1990). Is the brain a digital computer? Proceedings and Addresses of the American Philosophical Association, 64, 21–37. Simoncelli, E. (2009). Optimal estimation in sensory systems. In M. Gazzaniga (Ed.), The new cognitive neurosciences (4th ed.). Cambridge, MA: MIT Press. Sohn, H., & Narain, D. (2021). Neural implementations of Bayesian inference. Current Opinion in Neurobiology, 70, 121–129. Sokoloski, S. (2017). Implementing a Bayes filter in a neural circuit: The case of unknown stimulus dynamics. Neural Computation, 29, 2450–2490. Spratling, M. (2016). A neural implementation of Bayesian inference based on predictive coding. Connection Science, 28, 346–383. Spratling, M. (2017). A review of predictive coding algorithms. Brain and Cognition, 112, 92–97. Sotiropoulos, G., Seitz, A., & Seriès, P. (2011). Changing expectations about speed alters perceived motion direction. Current Biology, 21, R883–R884. Stocker, A., & Simoncelli, E. (2006). Noise characteristics and prior expectations in human visual speed perception. Nature Neuroscience, 4, 578–585. van Rooij, I., Blokpoel, M., Kwisthout, J., & Wareham, T. (2019). Cognition and intractability. Cambridge: Cambridge University Press. Walker, E., Cotton, R. J., Ma, W. J., & Tolias, A. (2020). A neural basis of probabilistic computation in visual cortex. Nature Neuroscience, 23, 122–129. Weiss, Y., Simoncelli, E., & Adelson, E. (2002). Motion illusions as optimal percepts. Nature Neuroscience, 5, 598–604. Woodward, J. (2008). Mental causation and neural mechanisms. In J. Hohwy, & J. Kallestrup (Eds.), Being reduced. Oxford: Oxford University Press. Woodward, J. (2015). Interventionism and causal exclusion. Philosophy and Phenomenological Research, 91, 303–347. Zemel, R., & Dayan, P. (1997). Combining probabilistic population codes. In JCAI-97: 15th international joint conference on artificial intelligence. San Francisco, CA: Morgan Kaufmann. Zemel, R., Dayan, P., & Pouget, A. (1998). Probabilistic interpretation of population codes. Neural Computation, 10, 403–430.

9

Realism and Instrumentalism in Bayesian Cognitive Science Danielle J. Williams and Zoe Drayson

9.1 Introduction There are two distinct approaches to Bayesian modeling in cognitive science. Black-box approaches use Bayesian theory to model the relationship between the inputs and outputs of a cognitive system without reference to the mediating causal processes, while mechanistic approaches make claims about the neural mechanisms that generate the outputs from the inputs. This chapter concerns the relationship between these two approaches. We argue that the dominant trend in the philosophical literature, which characterizes the relationship between black-box and mechanistic approaches to Bayesian cognitive science in terms of the dichotomy between instrumentalism and realism, is misguided. We propose that the two distinctions are orthogonal: black-box and mechanistic approaches to Bayesian modeling can each be given either an instrumentalist or a realist interpretation. We argue that the current tendency to conflate black-box approaches with instrumentalism and mechanistic approaches with realism stems from unwarranted assumptions about the nature of scientific explanation, the ontological commitments of scientific theories, and the role of abstraction and idealization in scientific models. We challenge each of these assumptions to reframe the debates over Bayesian modeling in cognitive science. This chapter proceeds as follows. In Section 9.2, we introduce Bayesian cognitive science and highlight the widespread tendency among philosophers to assume that all black-box approaches are instrumentalist and that all mechanistic approaches are realist. In Section 9.3, we outline the distinction between realism and instrumentalism in philosophy of science and argue that scientific realism is compatible with a wider range of explanatory practices than some philosophers would have us believe. We use these findings in Section 9.4 to demonstrate that the distinction between black-box and mechanistic approaches to Bayesian cognitive science does not map neatly onto the distinction between instrumentalist and realist interpretations of Bayesian models, and we show why the two issues should not be DOI: 10.4324/9781003084082-12

Realism and Instrumentalism in Bayesian Cognitive Science 241 conflated. In Section 9.5, we identify and explore three sources of the problematic conflation relating to ideas about mechanistic explanation, Marr’s levels of analysis, and the role of representation in Bayesian computation. 9.2  Bayesian Cognitive Science 9.2.1  Bayesian Inference and Cognitive Science

Bayesian inference is a method of statistical inference on which probability is understood as measuring degrees of belief in a hypothesis. Hypotheses are updated in light of new evidence or information according to Bayes’ rule of conditionalization, which specifies how to calculate the posterior probability of a hypothesis based on its prior probability, the evidence, and the likelihood of the evidence given the prior probability.1 Models of Bayesian inference have been successfully applied to a wide variety of domains: to make stock market predictions, to analyze differential gene expression, to monitor water quality conditions, and to measure the diagnostic accuracy of medical tests, for example. In these cases, Bayesian models are used to characterize the relationship between the inputs and outputs of a formal system. While we often rely on physical computers to perform the complex likelihood calculations on large datasets, there is no suggestion that the mechanisms of the stock market, genetic expression, water quality, or medical testing are themselves physical machines performing Bayesian computations. We are instead taking a “black-box” approach, on which we apply Bayesian theorizing to the inputs and outputs of a system without making any claims about the nature of the mechanisms that mediate between the inputs and outputs. In cognitive science, black-box approaches to Bayesian models are exemplified by the project of rational analysis. R ­ ational analysis uses Bayesian models of conditional probabilities to calculate the optimal input-output function for cognitive tasks, ranging from low-level sensorimotor tasks to high-level reasoning. Bayesian rational analysis models are computational in the sense that they characterize input-output functions, but they make no assumptions that cognizers are themselves physical computers that perform the calculations between input and output.2 Cognitive science has an interest in computational models, however, which are not restricted to input-output functions. The brain itself can be characterized as a physical computer: a machine that performs these computational functions by transforming the inputs into outputs according to an algorithmic process. This way of modeling cognition takes a ­mechanistic approach rather than a black-box approach, targeting the computational processes that causally mediate between the inputs and outputs. In the case of Bayesian cognitive science, the mechanistic approach suggests that the nervous system implements Bayesian computational functions: it carries

242  Danielle J. Williams and Zoe Drayson and updates information in a way that approximates Bayesian models of probabilistic inference. Some cognitive scientists apply Bayesian models only to particular cognitive functions (e.g. sensory processing, language learning), while others take the “Bayesian brain hypothesis” to provide a unified account of all cognition, perception, and action. Mechanistic approaches to Bayesian modeling in cognitive science can also differ in their details: whether they apply a single Bayesian model or a hierarchy of many Bayesian models, for example, and whether they involve prediction error minimization and data compression strategies. This distinction between black-box approaches and mechanistic approaches to Bayesian cognitive science is widely acknowledged in the literature under a variety of different labels. Jones and Love (2011), for example, use the label “Bayesian Fundamentalism” for the black-box approach and the label “Bayesian Enlightenment” for the mechanistic approach. In much of the philosophical literature on Bayesian cognitive science, however, there is a tendency to frame the distinction between mechanistic and black-box approaches to Bayesian models as a version of the distinction between realism and instrumentalism about scientific theories. Once we have provided evidence of this tendency, we will argue that a clearer understanding of scientific realism demonstrates that the realism/instrumentalism distinction is orthogonal to the distinction between mechanistic and black-box approaches to cognition. 9.2.2  Philosophical Interpretations of Bayesian Cognitive Science

Philosophical discussions of approaches to Bayesian cognitive science often liken the mechanistic approach to scientific realism, and the blackbox approach to instrumentalism. Sprevak, for example, proposes that the mechanistic “Bayesian brain” approach is realist, on the grounds that it interprets the central terms of Bayesian models as “picking out real (and as yet unobserved) entities and processes in the human brain” (Sprevak, 2016, p. 94). He contrasts the mechanistic approach with black-box approaches such as rational analysis, which he takes to be instrumentalist because they are “formal devices”, which do not refer to neural entities and processes (Sprevak, 2016, p. 94).3 Rescorla (2019) explicitly defends a realist interpretation of Bayesian cognitive science by appealing to the mechanistic approach to Bayesian models, on which causal structures implementing Bayesian inferences mediate between input-output mappings. He contrasts the “realism” of his mechanistic approach with the “instrumentalism” of black-box approaches on which Bayesian models are useful fictions: “predictively useful devices that do not accurately depict psychological reality” (Rescorla, 2019, p. 57). Conversely, Block (2018) argues that Bayesian cognitive science is not committed to the sorts of physically implemented

Realism and Instrumentalism in Bayesian Cognitive Science 243 internal representations associated with mechanistic approaches, and he uses this to justify taking an instrumentalist interpretation of Bayesian cognitive models. The literature thus seems to assume that only mechanistic approaches to Bayesian models, with their commitment to concrete neural entities and causal processes, are realist: black-box approaches to Bayesian models, which model the formal relationship between the inputs and outputs, are taken to be instrumentalist. We will now argue that these assumptions in the literature conflate several different dimensions of theory interpretation, with problematic consequences. We will first explore the debate between realism and instrumentalism more generally, before looking at how it applies to Bayesian cognitive science. 9.3  Scientific Realism Scientific realism is the position that our scientific theories and models provide us with knowledge of the mind-independent world.4 Most scientific realists make the following related claims: the semantic claim that scientific theories should be taken at face value as making truth-evaluable claims; the epistemological claim that accepting a theory involves believing that it is true; and the metaphysical claim that a theory is ontologically committed to the entities that it posits, whether observable or unobservable.5 Scientific anti-realism can take a number of different forms, depending on which of these commitments it rejects. Most prominent is instrumentalism, which claims that our best scientific theories do not provide us with knowledge of the unobservable world and, instead, are merely useful tools or instruments for practicing science.6 Scientific realism proposes that scientific explanations provide us with knowledge of the objective world, ontologically committing us to the entities that do explanatory work in a scientific theory or model. Following Psillos, we can call this the “explanatory criterion” on reality: “something is real if its positing plays an indispensable role in the explanation of wellfounded phenomena” (Psillos, 2005, p. 389). It is important to understand that the explanatory criterion itself is a permissive one, which does not place any restrictions on the kinds of things that are real, beyond their explanatory role. In particular, the explanatory criterion does not require that real entities are concrete entities.7 Some scientific realists add further constraints to the explanatory criterion, proposing that scientific explanations must be causal explanations, and that only concrete entities can figure in causal explanations. But these further constraints require additional argument and should not be mistaken for necessary conditions on scientific realism itself. It is widely accepted that science makes use of non-causal explanation in addition to causal explanation: Saatsi (2021), for example, argues that physics features

244  Danielle J. Williams and Zoe Drayson a “menagerie” of non-causal explanations, which appeal to geometry, symmetry, and intertheoretic relations (see also Lange 2016; Reutlinger & Saatsi, 2018). Even if we focus specifically on causal explanations, it is unclear that we must be committed only to concrete entities: it might be suggested that abstract entities can figure in causal explanations (e.g. Kersten, 2020) or be explanatorily relevant without being causally relevant (e.g. Pincock, 2015). As Psillos (2005) emphasizes, the explanatory criterion on scientific realism should not be confused with a causal criterion. These concerns are familiar from Quine’s “indispensability argument”, which was originally used to argue that the abstract mathematical structures that are essential to so much scientific theorizing are real. Versions of the indispensability argument have been used to argue that we should be ontologically committed to other non-concrete entities where they play an essential explanatory role in scientific theorizing. Psillos (2011), for example, proposes that if non-concrete entities such as frictionless planes, ideal gases, perfectly spherical objects, and mass-points play an indispensable role in our best scientific theories, then such entities are real. The permissiveness of the explanatory criterion on scientific realism also allows that at least some forms of abstraction and idealization are compatible with scientific realism. There is a sense in which all scientific models involve a process of abstraction: we use models to theorize about real-world phenomena because models are simpler and easier to manipulate than the phenomena themselves, allowing us to focus on particular entities, properties, and relations at the expense of others.8 Leaving out details does not entail saying anything false or inaccurate, and thus, abstraction alone does not seem to pose any challenges to scientific realism. Some scientific models, however, also involve a process of idealization: they distort the nature of certain parameters, deliberately mispresenting the world. While some forms of idealization are doubtless incompatible with realism, scientific realists can allow for idealizations insofar as they maintain approximate truth.9 As Elliott-Graves and Weisberg point out, “[r]ealists can argue that judicious idealizations are sensitive to the way that the world really is” (Elliott-Graves & Weisberg, 2014, p. 183). We propose that questions about abstraction and idealization are largely orthogonal to questions about realism and instrumentalism. Following Danks (2014), we suggest that the debate between realists and instrumentalists concerns how to interpret the commitments of theory or model, while questions about abstraction and idealization concern the dimension of approximation: what falls within the scope of a theory and what is excluded?10 In this section, we have focused on the explanatory criterion for scientific realism and suggested that scientific realism per se is compatible with non-causal explanation and non-concrete entities, as well as some forms of abstraction and idealization. In the following section, we will apply these

Realism and Instrumentalism in Bayesian Cognitive Science 245 considerations to the debate over Bayesian models in cognitive science to demonstrate that black-box approaches to Bayesian inference can be given a realist interpretation rather than a merely instrumentalist interpretation. 9.4  Reconsidering Bayesian Realism As we saw in Section 9.2.2, there is a tendency for philosophers to interpret black-box approaches to Bayesian cognitive science (such as rational analysis) as instrumentalist: these approaches are treated merely as predictive tools, rather than as explanatory theories with ontological commitments. We propose here that black-box approaches to Bayesian models in cognitive science can be genuinely explanatory and, therefore, open to a realist interpretation. First, notice that there is nothing essentially non-causal about black-box approaches in general: they can be characterizing a causal relationship between inputs and outputs even where they are abstracting away from (or “screening off”) the mediating mechanisms. In Bayesian cognitive science, however, black-box approaches are usually proposed as formal models, which involve abstracting from the causal relations to focus on formal relations. Even if we take causal explanations to be the norm in the physical sciences, this is less obviously the case in the special sciences: psychological explanations, as Weiskopf (2011) emphasizes, seem to come in causal and non-causal varieties. Once we accept that the explanatory criterion on scientific realism is not necessarily a causal criterion, then there is a prima facie case to be made that formal models are genuinely explanatory and not merely predictive.11 There is, however, a further motivation to give an instrumentalist interpretation of black-box approaches to Bayesian cognitive models. Several philosophers (e.g. Block, 2018; Colombo & Seriès, 2012) have proposed that the idealizations involved in Bayesian modeling motivate an instrumentalist interpretation of Bayesian models in cognitive science. Do black-box approaches to Bayesian cognitive models involve the sort of distortions that would make them incompatible with scientific realism? Rational analysis models, for example, seem to rely on the notion of optimal or ideal reasoning: cognitive processes that minimize expected costs with respect to a specific cost function.12 But as we suggested in Section 9.3, at least some forms of idealization are compatible with scientific realism. Optimality explanations are widely accepted in biological sciences, for example, as genuinely explanatory.13 If frictionless planes and ideal gases can be posited in scientific theories without leading to anti-realism, as Psillos (2011) suggests, then why think that positing ideal reasoners is any more problematic? A second sort of idealization associated with Bayesian models concerns their computational intractability. Block (2018) and Mandelbaum (2019) suggest that where the processes involved in calculating Bayesian likelihoods are computationally

246  Danielle J. Williams and Zoe Drayson intractable, a Bayesian model cannot be given a realist interpretation. But the appeals to approximate truth (considered in Section 9.3) that are common throughout scientific realism would seem to address this concern: where our psychological models approximate idealized Bayesian inference through tractable computations, there is no need to resort to instrumentalism.14 We thus follow Danks in concluding that “the close tie between rational analyses and instrumentalist theories is unwarranted” (Danks, 2008, p. 67).15 In this section, we have suggested that black-box approaches to Bayesian modeling in cognitive science need not be understood as merely predictive: formal Bayesian models like rational analysis can be genuinely explanatory, therefore deserving of a realist interpretation rather than an instrumentalist one. The fact that a black-box approach abstracts away from the mediating mechanisms does not entail that it lacks ontological commitments. We propose that the onus is on the instrumentalist to establish that the kinds of idealization involved in Bayesian cognitive science are any more problematic than the sorts of idealization involved in scientific models of ideal gases and frictionless planes. Conversely, we propose that mechanistic approaches to Bayesian cognitive science do not have to be given a realist interpretation. Unlike blackbox approaches, mechanistic approaches to Bayesian cognition focus on modeling the information-processing that mediates between inputs and outputs of the cognitive system. While these models are often given a realist interpretation, it is also possible to construe them instrumentally such that we are merely talking as if there are neural representations and unconscious inference: it is possible to give instrumentalist, fictionalist, and eliminativist interpretations of mechanistic information-processing models.16 The upshot of this is that the distinction between realist and instrumentalist interpretations of a Bayesian theory is logically independent of the distinction between black-box and mechanistic approaches to Bayesian modeling. In the following section, we consider why the Bayesian debate in cognitive science between black-box and mechanistic approaches has become misleadingly characterized in terms of instrumentalism and realism. We propose that there are three main reasons: the first related to recent work on mechanistic explanation, the second related to Marr’s levels of analysis, and the third related to ideas about representation. 9.5  The Sources of the Conflation 9.5.1  Mechanistic Misunderstandings

There is a recent trend in philosophy of science to focus on the role of mechanisms in scientific explanation. A mechanism, for these purposes, is a concrete system composed of causal entities organized in such a way

Realism and Instrumentalism in Bayesian Cognitive Science 247 that their activities and interactions produce a scientific phenomenon of interest. The proponents of this “new mechanist” approach focus on a particular subset of causal explanation: scientific discovery and explanation are taken to be the discovery and explanation of causal mechanisms (Machamer, Darden, & Craver, 2000).17 Some proponents of the “new mechanist” approach go so far as to suggest that all scientific explanations are mechanistic, or that a scientific theory is explanatory in virtue of its mechanistic nature.18 According to this view, formal models are not genuinely explanatory: they merely provide a framework or schema that does not become genuinely explanatory until it is cashed out with mechanistic details. Applied to Bayesian cognitive science, this would suggest that rational analysis models are not explanatory unless they are accompanied by “Bayesian brain” models of neural mechanisms, and thus, that we cannot give a realist interpretation of rational analysis models alone. We acknowledge the importance of mechanistic explanations in science but reject the claim that only mechanistic models explain, for reasons already discussed in Section 9.3. Cognitive science, in particular, has a history of embracing both mechanistic and non-­ mechanistic explanations.19 9.5.2  Marrian Misunderstandings

Some philosophers propose that black-box approaches to Bayesian cognitive science must be given an instrumentalist interpretation on the grounds that they focus on what Marr (1982) calls the “computational level” of analysis. We propose that once we get clear about realism and instrumentalism concerning physical computation and the correct understanding of Marr’s levels of analysis, it should be obvious that there is nothing essentially instrumentalist about Marr’s computational level. Consider how the distinction between realism and instrumentalism can be applied specifically to physical computation. A realist about physical computation proposes that when we describe a physical system as performing a computational function, we are making a claim about the mindindependent world: realism about [physical] computation […] is the view that whether or not a particular physical system is performing or implementing a particular computation is at least sometimes a fact that obtains independently of human beliefs, desires and intentions. (Ladyman, 2009, p. 377) An instrumentalist about physical computation can thus be characterized as claiming that when we describe a physical system as performing a

248  Danielle J. Williams and Zoe Drayson computational function, we are making a claim that is in some sense relative to our own interests, goals, or background assumptions. Hardcastle (1995) proposes such a view: whether a physical system is actually computing […] depend[s] upon the interests and aims of the people involved in the investigation. […] whether the assignment of a function to a physical system counts as an explanation depends upon the contingent interests of the relevant community. (Hardcastle, 1995, p. 314) How does this distinction between realism and instrumentalism about physical computation relate to Marr’s levels of computational analysis? Marr (1982) proposed that physical computers or information-processing systems can be described and analyzed at three distinct levels. We can ask what formal function the system is performing (the computational level), what specific algorithms or programs it is using to perform the function (the algorithmic level), and which specific hardware is implementing these programs (the implementation level). An important feature of physical computation is that one computational function can be performed by many different algorithms, which in turn can be implemented by many distinct kinds of hardware. When we specify information-processing systems at Marr’s computational level, therefore, facts about the algorithmic and implementation levels remain underdetermined. But this underdetermination should not be confused with agnosticism or skepticism about whether there is a physical implementation of the computational model. As a result, there is nothing essential instrumentalist about Marr’s computational level. In the literature on Bayesian cognitive science, arguments for instrumentalism sometimes proceed via claims about Marr’s computational level.20 Such arguments tend to start from the claim that black-box approaches to Bayesian inference (rational analysis, in particular) have a special connection to Marr’s computational level.21 We have focused here on the next step of these arguments: the claim that Marr’s computational level deserves an instrumentalist rather than a realist interpretation. We reject this move for the reasons articulated above. Each of Marr’s levels of computational analysis can be given a realist or an instrumentalist interpretation: even if Bayesian rational analysis has a special connection to Marr’s computational level, there is no straightforward argument for an instrumentalist interpretation of these Bayesian models.22 Marr’s levels of analysis provide a methodological tool that allows us to focus on different ways to understand computational systems in cognitive science. Both the realist and the instrumentalist about physical

Realism and Instrumentalism in Bayesian Cognitive Science 249 computation can adopt Marr’s framework and make claims at each of the three levels of analysis, because Marr’s framework is largely neutral with respect to these questions about theory interpretation.23 It is therefore a mistake to think that computational level claims must be given an instrumentalist interpretation. 9.5.3  Representational Misunderstandings

Bayesian cognitive models are sometimes considered to be instrumentalist if they are not ontologically committed to the existence of representational vehicles (e.g. neurons) that explicitly encode probabilities: Block, for example, proposes an instrumentalist interpretation of Bayesian cognitive models which “are not committed to the representation in real visual systems of priors or likelihoods or their multiplication within the system” (Block, 2018, p. 8). We propose that a realist interpretation of Bayesian models does not require the explicit encoding of priors and likelihoods. As Ma (2012) points out, we can distinguish Bayesian inference from computing probability distributions: the fact that a brain performs Bayesian inference (and even does so optimally) does not imply that neurons encode probabilities. A similar point is made by Rescorla, who argues that realism only requires approximate conformation to Bayesian norms and concludes that “[i]n rejecting explicit enumeration of credences, Block is not rejecting realism” (Rescorla, 2019, p. 59). As we have suggested above, to be a realist about physical computation is to think that whether a physical system implements an abstract computation is a mind-independent matter. Philosophers have widely differing views on what it is to physically implement a computation, however, and each of these views will result in a different variety of realism.24 Some philosophers propose that realism about Bayesian cognitive models does not commit one to any claims about representation, for example. Orlandi, for example, argues that one can be a “fairly robust realist” about Bayesian models of perception without thinking that such models posit representations (Orlandi, 2016, p. 342): Bayesian priors and likelihoods might merely be functional features or biases operating over non-representational causal states.25 Anderson (2017) proposes that our brains could implement Bayesian computation by reconfiguring or guiding the parameters of a control system, rather than by updating internal representations. These non-representational versions of Bayesian realism are, however, controversial: if we assume that inference is a semantically evaluable process, then Bayesian inference would seem to require some form of representation. But even if one commits to a representational view of Bayesian computational models, there is still room for debate about the nature of those representations, depending on how inflationary or deflationary a notion

250  Danielle J. Williams and Zoe Drayson of representation we adopt.26 Some deflationary notions of representation can allow hypotheses or credences to be implicitly represented rather than explicitly encoded. 9.5.4 Summary

In Section 9.5, we have explored some of the motivations that drive certain philosophers to conflate black-box Bayesian approaches with instrumentalism, or to conflate mechanistic Bayesian approaches with realism. Some philosophers assume that scientific explanations must be mechanistic, and thus, that non-mechanistic approaches can only be instrumental; some assume that non-mechanistic theories are at Marr’s computational level and that Marr’s computational level must be given an instrumentalist interpretation; and some assume that realism about physical computation is only compatible with a particularly strong claim about the sorts of representations involved in Bayesian computational models. Each of these claims requires further argumentation and does not follow from the fact that both black-box and mechanistic approaches to Bayesian cognitive science exist. 9.6 Conclusion This chapter has explored ways of understanding black-box and mechanistic approaches to Bayesian models of cognitive science. As we highlighted in Section 9.2, the philosophical literature has exhibited a tendency to map these two different approaches onto the distinction in philosophy of science between realist and instrumentalist interpretations of a theory or model. Our main contention in this chapter is that this tendency should be resisted, because there are two separate issues in play. One issue concerns the relationship between input-output models of computational functions and mechanistic models of physical computation, while a second issue concerns the ontological, semantic, and epistemological interpretations of these models. We have argued that both black-box and mechanistic approaches to Bayesian cognitive science can be given realist or instrumentalist interpretations. To be a realist about either a mechanistic or a non-mechanistic model is to think that the model provides explanations of the objective world, which are ontologically committed to the existence of mind-independent entities. Scientific realism alone does not logically entail that all entities are concrete or that all explanations describe causal mechanisms; many self-professed scientific realists reject one or both of these constraints. To be an instrumentalist about either a black-box or a mechanistic model is to think that the model is not ontologically committed to mind-independent entities: perhaps because the model fails to refer,

Realism and Instrumentalism in Bayesian Cognitive Science 251 perhaps because we are not justified in forming beliefs about the world on the basis of these models, or perhaps because the entities involved are relative to our interests. The mere fact that a model is mechanistic or nonmechanistic does not tell us whether it should be given a realist or instrumentalist interpretation. We argued for this conclusion by first considering scientific realism and instrumentalism more generally and then applying these considerations to Bayesian cognitive science. Considerations of the explanatory criterion and indispensability arguments suggest that a realist Bayesian model can posit abstract as well as concrete entities, provide explanations that go beyond descriptions of causal mechanisms, and incorporate processes of abstraction and idealization without sacrificing its realist credentials. None of these claims is particularly controversial in the literature on scientific realism: notice that they are compatible with acknowledging that some abstract entities are not scientifically explanatory, that causal explanations can have benefits that non-causal explanations lack, and that some forms of idealization are in tension with scientific realism. So why do Bayesian approaches to cognitive science take a narrower view of scientific realism, on which realism is aligned with descriptions of concrete mechanisms, and anything else is considered instrumentalist? We have proposed that this unnecessarily narrow construal of realism is motivated by one or more misunderstandings about cognitive science. It is a mistake, we argued, to think that explanations in cognitive science must be wholly mechanistic, or that the instrumentalism of non-mechanistic computational descriptions is written into Marr’s framework, or that realism about physical computation dictates specific requirements about the nature of representation. Debates about scientific realism and instrumentalism are philosophical debates about the metaphysical, epistemological, and semantic interpretations of scientific theories and models. But there is more to theory interpretation than the realist/instrumentalist dimension: questions about optimality, approximation, and abstraction, for example, are not answered simply by labeling a theory as realist or instrumentalist. This point is nicely articulated by Danks, who argues that the claims of cognitive science do not themselves dictate the complex continuum of commitments that interest us when interpreting the theories in question: The specification of a cognitive theory—whether framework, architecture, or model—almost never (in isolation) commits one to any particular picture of the world, or constrains the ways that theory could be implemented, or determines how we could confirm or learn about the truth of that theory. (Danks, 2014, p. 16)

252  Danielle J. Williams and Zoe Drayson Notes 1 The precise details of Bayes’ theorem are not relevant to our arguments here. For a thorough introduction to Bayes’ theorem, see Joyce (2008). 2 For more on the theoretical framework of rational analysis, see Anderson (1991) and Chater and Oaksford (1999). 3 Danks (2014) also notes this tendency to interpret rational analysis approaches to Bayesian modeling as instrumentalist; like us, however, he thinks this conflation should be avoided. We discuss this further in Section 9.4. 4 We will remain largely neutral with respect to the relationship between scientific theories and scientific models. For a more nuanced discussion, see Frigg and Hartmann (2020). 5 Structural realism is a form of scientific realism that reconsiders the ontological claim to suggest that we should be committed not to entities but only to the structural content of our theories. We will set aside structural realism for the rest of this chapter, but see Ladyman (2014) for an overview. 6 There are different routes to this conclusion: see Stanford (2016) for further discussion. 7 As Psillos puts it, the explanatory criterion “does not dictate the status of entities that are explanatorily indispensable; in particular it does not disallow abstract entities from being real” (Psillos, 2005, p. 389). 8 Determining the target system of a scientific model is a matter of identifying the domain of study and determining which parameters to focus on and which to omit: see Elliott-Graves (2020) on target systems, and the importance of deciding the level of grain at which to partition the domain. See also Frigg and Nguyen (2017). 9 Cashing out what approximate truth might be is a further challenge, which we will not address here. See Chakravartty (2011) for further discussion of both formal and informal explications of the concept. 10 See Chapter 2 of Danks (2014) for further discussion. A similar point is made by Weiskopf (2011), who distinguishes between the level of precision or “grain” of a theory and its correctness. Elliott-Graves and Weisberg (2014) propose that neither the realist nor the anti-realist can appeal to idealization to make their case. 11 Bechtel and Shagrir, for example, take proponents of rational analysis to be offering probabilistic models of cognition which “provide explanatory mathematical theories of a cognitive capacity without referring to specific psychological and neural mechanisms” (Bechtel & Shagrir, 2015, p. 314, our italics). Reijula similarly claims that “[r]ational analysis is an account of how probabilistic modeling can be used to construct non-mechanistic but self-standing explanatory models of the mind” (Reijula, 2017, p. 2975, our italics). 12 Notice that there is nothing about black-box approaches to Bayesian cognitive models that demand their optimality: while all optimal inference is Bayesian, it is not the case that all Bayesian inference is optimal (Ma, 2012). Insofar as rational analysis models require optimality, however, concerns about idealization will arise. 13 The best explanation of the life-cycles of cicada populations, for example, refers to the evolutionary optimality of mathematically prime periods for minimizing intersection with other creatures’ life-cycles. Rice (2012) considers both causal and non-causal interpretations of optimality explanations. 14 See Rescorla (2019) for further discussion. A similar point is made by Kirchhoff, Kiverstein, & Robertson (2022).

Realism and Instrumentalism in Bayesian Cognitive Science 253 15 Two further concerns about idealization are sometimes leveled at Bayesian cognitive models. The first draws on the connection between Bayesian inference and rational normativity to suggest that Bayesian cognitive science does not offer descriptive scientific theories (see, e.g., Mandelbaum, 2019); for a response, see Rescorla (2016). A second concern appeals to the competence/ performance distinction in cognitive psychology to suggest that Bayesian theories idealize away from performance limitations (see Franks, 1995); Patterson (1998) provides a response. 16 For examples of instrumentalist, fictionalist, and eliminativist interpretations of neural information processing, see Sprevak (2013) and Drayson (2022). 17 Mechanistic explanation goes beyond mere causal explanation of entities and their activities: it must “further describe how those entities and activities are organized (e.g., spatially and temporally) into a mechanism” (Craver, 2006, p. 373). 18 Craver allows that perhaps not all explanations are mechanistic but proposes that “in many cases [...] the distinction between explanatory and non-­ explanatory models seems to be that the latter, and not the former, describe mechanisms” (Craver, 2006, p. 367). 19 Levy and Bechtel (2013) make a similar point when they argue, in direct contrast to Craver, that abstract models play a role in explaining the behavior of particular systems: the process of abstraction both identifies the relevant causal organization and facilitates generalization. Weiskopf (2011) argues that we should not be misled into thinking that cognitive models are mechanistic even where their structure resembles that of mechanistic models. 20 Block, for example, claims that neural processes “can be considered Bayesian but only on an instrumentalist interpretation pitched at Marr’s computational level rather than the algorithmic level” (Block, 2018, p. 8). 21 See, for example, Griffiths et al. (2012); Tenenbaum, Griffiths, and Kemp (2006, p. 206), Jones and Love (2011), Oaksford and Chater (2007), and Icard (2018). 22 While we focus on rejecting the link between Marr’s computational level and instrumentalism, there may also be reason to question the link between rational analysis and Marr’s computational level: see Kitcher (1988) and Bechtel and Shagrir (2015). Danks also suggests that instrumentalist interpretations of rational analysis derive “largely from the connection with the computational level of Marr’s trichotomy” which he proposes is “neither necessary nor desirable” (Danks, 2008, p. 67). 23 Similar points are made by Danks and Egan. Danks proposes that questions of theory interpretation (e.g. realism, scope, optimality) are not settled by Marr’s trichotomy or by computational models in cognitive science more generally: instead, “we need to do some philosophy to really understand what the cognitive science means” (Danks, 2014, p. 16). Egan points out (also in relation to Marr’s framework) that if theory interpretation could be read so easily off our computational models, “much of the philosophy of science would be out of business” (Egan, 1995, p. 186). 24 As Ladyman argues, “unless we have a precise account of implementation it will not be possible to decide whether or not realism is correct just because it will not be clear what ‘computation’ means” (Ladyman, 2009, p. 377). W ­ illiams (2022) further explores the role that implementation theories play with respect to realism about physical computation. 25 In a later paper, Orlandi (2018) proposes that where a system updates according to Bayes’ theorem, this suggests a representational picture.

254  Danielle J. Williams and Zoe Drayson 26 See Ramsey (2021) for a discussion of how Chomsky’s deflationary view of representation and Egan’s quasi-deflationary view of representation compare to more robust notions of representation in cognitive science.

References Anderson, J. R. (1991). Is human cognition adaptive? Behavioral and Brain Sciences, 14(3), 471–485. Anderson, M. L. (2017). Of Bayes and bullets: An embodied, situated, targetingbased account of predictive processing. In T. Metzinger & W. Wiese (Eds.). Publications. Philosophy and Predictive Processing. Frankfurt am Main: MIND Group. Bechtel, W., & Shagrir, O. (2015). The non-redundant contributions of Marr’s three levels of analysis for explaining information processing mechanisms. Topics in Cognitive Science, 7(2), 312–322. Block, N. (2018). If perception is probabilistic, why doesn’t it seem probabilistic? Philosophical Transactions of the Royal Society B, 373(1755), 1–10. Chakravartty, A. (2011). Scientific realism. In E. N. Zalta (Ed.), Stanford encyclopedia of philosophy. https://plato.stanford.edu/archives/sum2017/entries/scientificrealism/ Chater, N., & Oaksford, M. (1999). Ten years of the rational analysis of cognition. Trends in Cognitive Sciences, 3(2), 57–65. Colombo, M., & Seriès, P. (2012). Bayes in the brain—On Bayesian modelling in neuroscience. British Journal for the Philosophy of Science, 63(3), 697–723. Craver, C. F. (2006). When mechanistic models explain. Synthese, 153(3), 355–376. Danks, D. (2008). Rational analyses, instrumentalism, and implementations. In N. Chater & M. Oaksford (Eds.), The probabilistic mind: Prospects for Bayesian cognitive science (pp. 59–75). Oxford: Oxford University Press. Danks, D. (2014). Unifying the mind: Cognitive representations as graphical models. Cambridge, MA: MIT Press. Drayson, Z. (2022). What we talk about when we talk about mental states. In T. Demeter, T. Parent, & A. Toon (Eds.), Mental fictionalism: Philosophical explorations. London: Routledge. Egan, F. (1995). Computation and content. Philosophical Review, 104(2), 181–203. Elliott-Graves, A. (2020). What is a target system? Biology and Philosophy, 35(2), 1–22. Elliott-Graves, A., & Weisberg, M. (2014). Idealization. Philosophy Compass, 9(3), 176–185. Franks, B. (1995). On explanation in cognitive science: Competence, idealization, and the failure of the classical cascade. British Journal for the Philosophy of Science, 46(4), 475–502. Frigg, R., & Hartmann, S. (2020). Models in science. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy. https://plato.stanford.edu/archives/spr2020/ entries/models-science/ Frigg, R., & Nguyen, J. (2017). Models and representation. In L. Magnani & T.  Bertolotti (Eds.), Springer handbook of model-based science (pp. 49–102). Switzerland: Springer Nature.

Realism and Instrumentalism in Bayesian Cognitive Science 255 Griffiths, T. L., Chater, N., Norris, D., & Pouget, A. (2012). How the Bayesians got their beliefs (and what those beliefs actually are): Comment on Bowers and Davis (2012). Psychological Bulletin, 138(3), 415–422. Hardcastle, V. G. (1995). Computationalism. Synthese, 105(3), 303–317. Icard, T. F. (2018). Bayes, bounds, and rational analysis. Philosophy of Science, 85(1), 79–101. Jones, M., & Love, B. C. (2011). Bayesian fundamentalism or enlightenment? On the explanatory status and theoretical contributions of Bayesian models of cognition. Behavioral and Brain Sciences, 34(4), 169–188. Joyce, J. (2008). Bayes’ theorem. In E. N. Zalta (Ed.), Stanford encyclopedia of philosophy. https://plato.stanford.edu/archives/fall2021/entries/bayes-theorem/ Kersten, L. (2020). How to be concrete: Mechanistic computation and the abstraction problem. Philosophical Explorations, 23(3), 251–266. Kirchhoff, M., Kiverstein, J., & Robertson, I. (2022). The Literalist Fallacy & the Free Energy Principle: Model-building, Scientific Realism and Instrumentalism. The British Journal for the Philosophy of Science. DOI:10.1086/720861. Kitcher, P. (1988). Marr’s computational theory of vision. Philosophy of Science, 55(March), 1–24. Ladyman, J. (2009). What does it mean to say a physical system is implements a computation? Theoretical Computer Science, 410(4–5), 376–383. Ladyman, J. (2014). Structural realism. In E. N. Zalta & U. Nodelman (Ed.), The Stanford encyclopedia of philosophy. https://plato.stanford.edu/archives/ sum2023/entries/structural-realism/ Lange, M. (2016). Because without cause: Non-causal explanations in science and mathematics. Oxford, England: Oxford University Press USA. Levy, A., & Bechtel, W. (2013). Abstraction and the organization of mechanisms. Philosophy of Science, 80(2), 241–261. Ma, W.-J. (2012). Organizing probabilistic models of perception. Trends in Cognitive Sciences, 16(10), 511–518. Machamer, P., Darden, L., & Craver, C. F. (2000). Thinking about mechanisms. Philosophy of Science, 67(1), 1–25. Mandelbaum, E. (2019). Troubles with Bayesianism: An introduction to the psychological immune system. Mind and Language, 34(2), 141–157. Marr, D. (1982). Vison. New York: W. H. Freeman. Oaksford, M., & Chater, N. (2007). Bayesian rationality: The probabilistic approach to human reasoning. Oxford: Oxford University Press. Orlandi, N. (2016). Bayesian perception is ecological perception. Philosophical Topics, 44(2), 327–351. Orlandi, N. (2018). Predictive perceptual systems. Synthese, 195(6), 2367–2386. Patterson, S. (1998). Competence and the classical cascade: A reply to Franks. British Journal for the Philosophy of Science, 49(4), 625–636. Pincock, C. (2015). Abstract explanations in science. British Journal for the Philosophy of Science, 66(4), 857–882. Psillos, S. (2005). Scientific realism and metaphysics. Ratio, 18(4), 385–404. Psillos, S. (2011). Living with the abstract: Realism and models. Synthese, 180(1), 3–17.

256  Danielle J. Williams and Zoe Drayson Ramsey, W. (2021). Defending representation realism. In J. Smortchkova, K. Dołęga, & T. Schlicht (Eds.), What are mental representations? (pp. 54–78). ­Oxford: Oxford University Press. Reijula, S. (2017). How could a rational analysis model explain? In COGSCI 2017, Proceedings of the 39th annual conference of the cognitive science society (pp. 2975–2980). Rescorla, M. (2016). Bayesian sensorimotor psychology. Mind and Language, 31(1), 3–36. Rescorla, M. (2019). A realist perspective on Bayesian cognitive science. In A. Nes & T. Chan (Eds.), Inference and consciousness. New York: Routledge. Reutlinger, A., & Saatsi, J. (Eds.). (2018). Explanation beyond causation: Philosophical perspectives on non-causal explanations. Oxford: Oxford University Press. Rice, C. C. (2012). Optimality explanations: A plea for an alternative approach. Biology and Philosophy, 27(5), 685–703. Saatsi, J. (2021). Non-causal explanations in physics. In E. Knox & A. Wilson (Eds.), The Routledge companion to philosophy of physics. New York: Routledge. Sprevak, M. (2013). Fictionalism about neural representations. The Monist, 96(4), 539–560. Sprevak, M. (2016). Philosophy of the psychological and cognitive sciences. In P. Humphreys (Ed.), The Oxford handbook of philosophy of science. Oxford: Oxford University Press. Stanford, P. K. (2016). Instrumentalism: Global, local, scientific. In P. Humphreys (Ed.), The Oxford handbook of philosophy of science. Oxford: Oxford University Press. Tenenbaum, J. B., Griffiths, T. L., & Kemp, C. (2006). Theory-based Bayesian models of inductive learning and reasoning. Trends in Cognitive Sciences, 10(7), 309–318. Weiskopf, D. A. (2011). Models and mechanisms in psychological explanation. Synthese, 183(3), 313–338. Williams, D. (2022). Markov blankets: Realism and our ontological commitments. Brain and Behavioral Sciences, 45, e217.

10 Bayesian Psychiatry and the Social Focus of Delusions Daniel Williams and Marcella Montagnese

10.1 Introduction It is common to think that psychiatric disorders are caused by dysfunctions in or disturbances to the neural mechanisms that underlie human psychology. If so, significant progress in our understanding of psychiatric disorders demands a model of how the healthy or typical brain functions. In recent decades, a large and growing body of research in cognitive science has sought to model the brain as a statistical inference mechanism, constructing and refining probabilistic models and hypotheses about the world from the streams of noisy and ambiguous information it leaves on our sensory transducers (Doya, Ishii, Pouget, & Rao, 2007; Knill & Pouget, 2004). For example, theorists have drawn on Bayesian statistics to illuminate learning and inference across a wide variety of cognitive domains, including perception, motor control, intuitive theories, and more (see Doya et al., 2007). A prominent manifestation of this work has been in predictive coding, an influential theory that models the brain as a hierarchically structured prediction machine, comparing internally generated predictions of sensory information against the sensory information generated by the body and environment and striving to minimise the difference between the two (see Clark, 2013; Friston, 2005; Hohwy, 2013; Rao & Ballard, 1999). Such ideas increasingly provide the framework for understanding healthy brain function that guides research in computational psychiatry (Friston, Stephan, Montague, & Dolan, 2014; Teufel & Fletcher, 2016). Specifically, researchers have sought to model a large range of psychiatric disorders by appealing to dysfunctions or aberrations in the neural mechanics of statistical inference and decision-making, including schizophrenia (Adams, Stephan, Brown, Frith, & Friston, 2013), autism (Lawson, Rees, & Friston, 2014), Parkinson’s disease (O’Callaghan et al., 2017), anorexia (Gadsby & Hohwy, 2019), addiction (Schwartenbeck et al., 2015), DOI: 10.4324/9781003084082-13

258  Daniel Williams and Marcella Montagnese depression (Barrett, Quigley, & Hamilton, 2016), and more. As Griffin and Fletcher (2017, p. 265) put it, The growing understanding of the brain as an organ of predictive inference has been central to establishing computational psychiatry as a framework for understanding how alterations in brain processes can drive the emergence of high-level psychiatric symptoms. (Griffin & Fletcher, 2017, p. 265) Some proponents of this approach are extremely optimistic about its explanatory reach. Carhart-Harris and Friston (2019, p. 334), for example, argue that “most, if not all, expressions of mental illness can be traced to aberrations in the normal mechanics of hierarchical predictive coding” (our emphasis). We have two principal aims in this chapter. First, we will identify and clarify some of the core theoretical attractions of what we call “Bayesian psychiatry” as a research programme. Second, we will argue that this research programme is often hindered by a focus on content-neutral, ­domain-general inferential processes that abstract away from much that is distinctive about human psychology. Drawing on psychosis and the social nature of clinical delusions to illustrate, we will argue that this focus likely blinds Bayesian psychiatry to many specific ways in which the human mind can break down and malfunction. We structure the chapter as follows. In Sections 10.2 and 10.3, we introduce Bayesian psychiatry (S2) and outline applications of this research programme to understanding psychosis (S3). In Section 10.4, we draw on the distinctive social phenomenology of psychosis to argue that such applications seem inadequate, and in Section 10.5, we suggest that combining Bayesian modelling with information about the functional specialisations of the human brain might help to address this problem. We conclude in Section 10.6 by summarising our conclusions and highlighting important areas for future research. 10.2  Bayesian Psychiatry Computational psychiatry seeks to build computational models of the dysfunctions and aberrations that underlie psychiatric disorders. It is built on two central ideas: first, psychiatry should strive to trace psychiatric disorders to dysfunctions in neural mechanisms; second, neural mechanisms are computational mechanisms, that is, mechanisms that extract and process information through transformations of and operations over information-encoding states and structures. Computational modelling of psychiatric disorders brings many theoretical benefits. For example, it provides an explanatorily

Bayesian Psychiatry and the Social Focus of Delusions 259 illuminating link between neurobiological and psychological levels of description, it forces theories – and thus the predictions of theories – to be explicit and mathematically precise, and it grounds psychiatric explanation in independently well-established models of brain function in computational and cognitive neuroscience (see Friston et al., 2014; Teufel & Fletcher, 2016). Consonant with their broader influence in neuroscience, computational psychiatry has been dominated by neural network models, reinforcement learning models, and Bayesian models, the latter of which constitute our focus here. Bayes’ theorem is an implication of probability theory that specifies the optimal procedure for redistributing the probabilities assigned to hypotheses in light of new information. Specifically, the aim of Bayesian inference is to calculate the posterior probability of a hypothesis conditional on novel evidence, p(hypothesis | evidence). Bayes’ theorem states that this is proportional to how well the hypothesis predicts the evidence, i.e. the likelihood p(evidence | hypothesis), multiplied by the hypothesis’ probability before encountering the evidence, i.e. the prior p(hypothesis). To calculate the posterior, this product is then divided by the categorical probability of the evidence, the marginal likelihood p(evidence), which is typically calculated as a sum (for discrete states) or integration (for continuous states) over the product of the priors and likelihoods for all possible hypotheses. Importantly, exact Bayesian inference of this sort is often infeasible when dealing with large or continuous hypothesis spaces. Thus, statistics and artificial intelligence have developed various algorithms for approximating Bayesian inference, the most influential of which are stochastic sampling approximations (Chater et al., 2020) and deterministic variational approximations (Friston, 2005). The growing importance of Bayesian inference and its approximations as a model of neural information processing can be traced to two principal factors. First, neuroscientists have increasingly recognised that inductive inference under profound uncertainty is a fundamental problem that the brain confronts. Bayesian inference provides the optimal method for solving this problem. Thus, it is argued that we should expect – perhaps on evolutionary grounds – that the brain implements some form of this solution. As Mathys, Daunizeau, Friston, and Stephan put it, Since a Bayesian learner processes information optimally, it should have an evolutionary advantage over other types of agents, and one might therefore expect the human brain to have evolved such that it implements an ideal Bayesian learner. (2011, p. 1) Second, experimental neuroscientists and cognitive psychologists have uncovered evidence across a wide variety of domains that human inference is approximately Bayes-optimal (see Clark, 2013; Knill & Pouget, 2004).

260  Daniel Williams and Marcella Montagnese Both factors have motivated the Bayesian brain hypothesis, the hypothesis that information processing in the brain – or at least certain parts of the brain – is approximately Bayesian (Knill & Pouget, 2004). However, this hypothesis is silent on how the brain implements approximate Bayesian inference. One of the most influential theories for addressing this issue is hierarchical predictive coding. Strictly speaking, predictive coding is an encoding strategy in which only the unpredicted elements of a signal are fed forward for further stages of information processing. In neuroscience, this encoding strategy was advanced as a model of visual processing by Rao and Ballard (1999), which proposes that cortical networks acquire and update probabilistic models of the causes of sensory signals through a process in which successive levels of cortical hierarchies attempt to minimise the error in their predictions of activity registered at the level below them. In recent work, however, “predictive coding” often refers to an extension and elaboration of this model of perceptual processing to encompass neural information processing more generally. Thus, Sterzer et al. (2018) write that, Predictive coding conceives of the brain as a hierarchy whose goal is to maximize the evidence for its model of the world by comparing prior beliefs with sensory data, and using the resultant prediction ­errors (PEs) to update the model (our emphasis). We will use the term “predictive processing” to refer to this more global theory of brain function (see, e.g., Clark, 2013; Friston, 2005, 2010; Hohwy, 2013). There are two aspects of predictive processing that will be important in what follows. The first is a conception of neural information processing in terms of hierarchical precision-weighted prediction error minimisation. “Prediction error” refers to the divergence between the brain’s predictions of incoming information and the information itself. “Precision” names the inverse of the variance of a probability density and, thus, the degree of certainty or confidence associated with it. Precisionweighting therefore adjusts the degree to which predictions are updated in light of prediction errors as a function of the relative uncertainty associated with prior expectations and incoming evidence. Consonant with the hierarchical structure of the neocortex, this process of uncertaintyweighted prediction error minimisation is thought to be iterated up an inferential hierarchy, with each successive level attempting to predict activity at the level below it. The second component is the idea that prediction error minimisation constitutes the overarching function, goal, or “imperative” of the brain (Friston, 2010; Hohwy, 2013). This radical view is often motivated by or grounded in the broader idea, central to the free energy principle (Friston,

Bayesian Psychiatry and the Social Focus of Delusions 261 2010), that all self-organising systems obey an overarching imperative to minimise surprise or maximise model evidence. On this view, action – typically described as active inference – is also modelled as a form of prediction error minimisation, except that rather than updating predictions to match incoming sensory information, action involves intervening on the world to match sensory information to the brain’s expectations, the most fundamental of which are thought to be installed by evolution (see Hohwy, 2013). These broad ideas about brain function have played a central role in a large and growing body of research in computational psychiatry (see Friston et al., 2014; Teufel & Fletcher, 2016). We will henceforth use the term “Bayesian psychiatry” to refer to this broad research programme, which draws on concepts of hierarchical Bayesian inference and predictive processing to illuminate psychiatric disorders.1 Setting aside the details and potential criticisms of specific theories and hypotheses within Bayesian psychiatry, we believe that it constitutes a promising framework for modelling psychopathologies for reasons over and above the more general theoretical attractions of computational psychiatry. First, the Bayesian brain and related ideas were not developed to explain psychiatric disorders but rather have compelling independent support as models of neural information processing (see Knill & Pouget, 2004). Second, conceptualising the brain as an inferential organ provides an explanatorily illuminating link between the biological mechanisms that implement information processing and the intentionality and the role of misrepresentation essential to many psychiatric disorders (Friston et al., 2014). Third, Bayesian psychiatry draws attention to the important role of uncertainty and uncertainty management in psychiatric disorders (Hohwy, 2013). Fourth, the emphasis on bi-directional hierarchical processing within theories such as predictive coding constitutes a promising framework for understanding the complex and often bi-directional interplay among percepts, beliefs, and more abstract self-narratives that are central to many psychiatric disorders (Sterzer et al., 2018). Finally, and most importantly, Bayesian psychiatry has undeniably been explanatorily fecund, generating myriad novel conceptualisations and surprising predictions (Teufel & Fletcher, 2016). In the next section, we will illustrate these attractions by appealing to influential predictive coding models of psychosis. 10.3  Psychosis and Bayesian Psychiatry Psychosis is a complex and heterogeneous functional disorder that is generally understood as impairment in reality testing. The term is used as an umbrella category for a cluster of symptoms that comprise hallucinations

262  Daniel Williams and Marcella Montagnese and delusions that can occur across many psychiatric, neurodevelopmental, and neurodegenerative disorders. Thus, psychotic symptoms have been widely researched in affective disorders such as bipolar disease (Shinn et al., 2012) and in neurodegenerative ones such as Parkinson’s Disease (Fénelon, Soulas, Zenasni, & de Langavant, 2010) and dementia with Lewy bodies (Waters et al., 2014). The clinical manifestation of psychosis is varied and heterogenous across these nosological categories, as well as within each disorder. Here we will focus mostly on psychosis in schizophrenia, where much of the research within Bayesian psychiatry on psychosis has been focused. 10.3.1  Psychosis in Schizophrenia

Schizophrenia is a mental disorder affecting 0.3–0.7% of the population worldwide (American Psychiatric Association, 2013). Patients diagnosed with schizophrenia can show a heterogeneity of symptoms, which are classified as either positive or negative, where positive symptoms include hallucinations and delusions and negative symptoms include a lack of useful goal-directed behaviours (Patel, Cherian, Gohil, & Atkinson, 2014), anhedonia (i.e. a lack of anticipation and seeking of rewards), poverty of speech, and asociality (Frith, 2005). R ­ esearch has shown that the aetiological roots of schizophrenia span from genetic risk factors (Tsuang, 2000) to social and environmental ones (Mortensen et al., 1999; see Section 10.5), including complex interactions between them (Ursini et al., 2018). Further, it is important to distinguish changes in patients’ psychopathology across time. For example, chronic patients with schizophrenia tend to have fixed delusions, whilst these tend to be less immovable in those at early stages of the disorder, such as in First-Episode Psychosis, where individuals often retain insight about the implausibility of delusional thoughts (see Sterzer et al., 2018). Even though the exact causes of schizophrenia are still not well understood, there is abundant evidence implicating different neurotransmitters (especially dopamine and glutamate) and multiple brain areas (see McCutcheon et al., 2020). Hallucinations take different forms in schizophrenia, with heterogeneous manifestations across different sensory modalities, although auditory hallucinations are the most studied (Montagnese et al., 2020). These tend to revolve around hearing voices, either individually or in conversation, which often generate a running commentary on the individual’s behaviour. Although the specific contents of delusions vary widely across individuals and cultures, they tend to cluster in a surprisingly small subset of themes, almost all of which concern the individual’s standing in the social world (Bentall, Kaney, & Dewey, 1991; Gold, 2017). Here, we will focus

Bayesian Psychiatry and the Social Focus of Delusions 263 largely on the most common form of delusions, persecutory delusions, an extreme form of paranoia which involves the unsubstantiated belief that an agent or group of agents wants to harm the delusional individual (Freeman, 2016). 10.3.2  Predictive Coding and Psychosis

The most important precursors to predictive coding models of psychosis are those that posit a dysfunction in the integration of sensory experience, learned expectations, and higher level explanations of such experiences (see Sterzer et al., 2018). For example, Maher (1974) famously proposed that delusions are best understood as reasonable responses to anomalous experiences caused by dysfunctions in or damages to perceptual mechanisms. Building on this research and on the aforementioned work implicating dopamine in psychosis, Kapur (2003) suggested that dopaminergic dysregulation in schizophrenia might disrupt the attribution of salience to stimuli. According to this influential aberrant salience hypothesis, seemingly irrelevant events and stimuli elicit excessive attributions of salience and delusions are understood as the individual’s attempts to make sense of and explain such anomalous experiences. Another influential precursor comes from the model of control of intended action developed by Frith, Blakemore, and Wolpert (2000). Here, one’s sense of agency can be seen as emerging from the integration of different agency cues, including both internal (e.g., from processes serving motor control) and external cues (e.g., feedback from sensory systems), as well as prior information, where each kind of information is weighed by its reliability. To feel like the agent of one’s actions, this model holds that agents must be able to reliably anticipate the sensory consequences of such actions. A failure in such prediction will thus render one’s own behaviour surprising, thus suggesting an external cause. By extending this framework to psychosis more generally (Moore & Fletcher, 2012), positive symptoms can be seen as emerging from repeated confusion between external and internal origins of sensory data. Experimental evidence confirms this loss of normal attenuation of sensory feedback for motor action in patients with psychosis (Blakemore, Smith, Steel, Johnstone, & Frith, 2000; Shergill, Samson, Bays, Frith, & Wolpert, 2005). Such ideas have laid the groundwork for the development of what Sterzer et al. (2018) call the “canonical predictive coding account of psychosis.” According to this model, the emergence of psychosis can be explained in terms of a dysfunction in the interaction between and integration of topdown expectations and bottom-up information. As noted above, optimal prediction error minimisation necessitates that prediction errors are effectively weighted by their precision or certainty. The canonical predictive

264  Daniel Williams and Marcella Montagnese coding accounts posit that this process of precision-weighting is disrupted in psychosis, such that sensory data is assigned too much precision relative to higher level, more abstract expectations, leading to maladaptive statistical inference and learning and thus the development of inaccurate models of the world (see Adams et al., 2013; Clark, 2016). Further, because of the bi-directional interaction between perceptual experiences and higher level beliefs within predictive coding, inaccurate inferences at lower levels of the inferential hierarchy both influence and are influenced by maladaptive higher level expectations, driving both hallucinations and delusions and a complex interplay between them (see Sterzer et al., 2017). This canonical predictive coding model of psychosis thus diverges from traditional models of delusions that posit either a perceptual dysfunction, a reasoning dysfunction, or – as with influential “two-factor” models of delusions (Davies, Coltheart, Langdon, & Breen, 2001) – a combination of the two (see Fletcher & Frith, 2009). Not only does the canonical predictive coding model propose a single underlying dysfunction – albeit one with potentially highly varied manifestations – but it also repudiates a sharp distinction between perception and cognition and assumes a significant degree of top-down cognitive influence on perception (Corlett & Fletcher, 2015). Except when stated otherwise, reference to the predictive coding model of psychosis in what follows refers to this canonical model. 10.4  The Social Contents of Delusions The canonical predictive coding model of psychosis has many well-­ advertised attractions (see Sterzer et al., 2018). For example, predictive coding comes with an implementational theory in which precisionweighting is regulated by the action of neuromodulators such as dopamine, and, as noted, there is substantial independent evidence that dopamine dysregulation plays a causal role in psychosis. Further, there is compelling neuro-imaging and behavioural evidence that individuals with psychosis do exhibit deficits in prediction error-driven learning and probabilistic reasoning. Finally, there are interesting simulations demonstrating that aberrations in precision-weighting generate effects similar to those observed in individuals with psychosis, including in psychological domains such as visual tracking distinct from psychosis itself (see Adams et al., 2013). Nevertheless, this theory also faces several objections and challenges (see Bell, Raihani, & Wilkinson, 2019; Sterzer et al., 2018; Williams, 2018 for a review). Here, we focus on just one: namely, how to reconcile the hypothesis that psychosis results from a domain-general dysfunction of the sort posited by this theory with the apparent domain specificity of psychosis itself.

Bayesian Psychiatry and the Social Focus of Delusions 265 To see this problem, first consider how schematic the proposed account of psychosis is. Summarising this explanation, for example, Clark (2013, p. 197) writes that [U]nderstanding the positive symptoms of schizophrenia requires understanding disturbances in the generation and weighting of prediction error… [M]alfunctions within that complex economy … yield wave upon wave of persistent and highly weighted “false errors” that then propagate all the way up the hierarchy forcing, in severe cases … extremely deep revisions in our model of the world. The improbable (telepathy, conspiracy, persecution, etc.) then becomes the least surprising … (our emphasis). However, this explanation leaves it opaque why the contents of common delusional themes such as persecution and conspiracy should constitute the least surprising hypotheses about the world in light of aberrant precision-weighting. Specifically, although dysfunctions in precisionweighting and prediction error-driven processing can explain why individuals process information in abnormal ways and thus form beliefs that appear implausible to those not suffering from the relevant dysfunction, an adequate explanation of delusions must explain why individuals form the highly specific delusional beliefs that they come to hold (Parrott, 2019). That is, psychosis demands an explanation of the distinctive way in which psychotic experience is abnormal out of the vast space of possible ways in which it could deviate from normal perception and belief but does not. Focusing specifically on delusions, the predictive coding model conforms to the standard view in the psychiatric literature that the explanandum should be characterised in a way that is content-neutral. Thus, the DSM-5 defines clinical delusions as “fixed beliefs that are not amenable to change in light of conflicting evidence” (American Psychiatric Association 2013, p. 87). Setting aside the problem that this definition subsumes many widespread non-delusional (e.g., religious, ideological, self-serving) beliefs, it characterises delusions in a way that focuses on their purely formal characteristics, specifically their irrationality. It therefore invites the view that delusions result from inferential or reasoning abnormalities (see Gold, 2017). Further, because the definition is content-neutral, it strongly suggests that such abnormalities afflict domain-general inferential processes ranging over all possible contents of thought. This is a deep problem, however, because – as highlighted above – the distribution of delusional beliefs is not a random sample of all possible abnormal beliefs, but a highly specific subset, almost all of which concern the individual’s standing in the social universe (see Bell et al., 2019; Gold & Gold, 2015).

266  Daniel Williams and Marcella Montagnese Further, it is not clear that delusional subjects do exhibit any significant domain-general inferential impairments or reasoning abnormalities (see Bell et al., 2019; Gold, 2017). At best, the voluminous body of empirical research attempting to identify such impairments is inconclusive. Perhaps the most influential proposal in this area – often taken as support for the predictive coding model of psychosis (Adams et al., 2013) – is that delusional subjects suffer from a “jumping to conclusions” bias (Garety, 1991). In the famous “beads task,” for example, participants are told that there are two jars, A and B, with jar A containing 85% red beads and 15% black beads and jar B containing the reverse. On the basis of drawing beads from a jar, participants are asked to judge which of the jars the beads come from. The core finding is that individuals with psychosis tend to form a judgement on the basis of fewer beads than controls (Garety, 1991). In addition, recent meta-analyses also indicate small-to-moderate effect sizes when it comes to other reasoning biases (McLean, Mattiske, & Balzan, 2017). There are problems with this research, however. For example, often the alleged differences between delusional subjects and healthy individuals disappear when controlling for general cognitive function, which is known to be reduced in individuals with psychotic symptoms (Bell et al., 2019). In some meta-analyses, such as McLean et al.’s (2017), theorists do not control for possible confounds of this kind. Further, the domain-general reasoning differences between delusional subjects and healthy controls are typically small, especially when compared to the striking deviations from normality observed in psychosis. Thus, the relevant question is not whether delusional subjects exhibit domain-general differences in inference relative to neurotypical controls, but whether – and, if so, in what way – such differences are causally responsible for the formation and entrenchment of delusional beliefs. The relatively small differences in domain-general inference that have been discovered in the empirical literature suggest that such differences might be better understood as effects of other underlying factors associated with but not responsible for psychosis, or else factors that function as necessary but not sufficient causes of psychotic experience and delusions. Importantly, proponents of the predictive coding model of psychosis are aware of at least the first of these problems. Thus, Griffin and Fletcher (2017, p. 272) refer to the paradox of why, given that we are positing a very domain-general problem with weighting information by its reliability in Bayesian inference, delusions tend to be domain specific in their content, which usually “seem to concern the patient’s place in the social universe. (Bentall et al., 1991, p. 14)

Bayesian Psychiatry and the Social Focus of Delusions 267 There are various responses available to proponents of the predictive coding approach. One strategy is to appeal to the contents of specific experiences. As noted above, an influential theory dating back to Maher (1974) is that delusions constitute attempts to explain – and thus derive their contents from – anomalous experiences. This seems highly applicable in many cases. For example, Capgras delusion has famously been connected to dysfunction in which facial recognition is disconnected from interoceptive mechanisms in such a way that individuals cognitively recognise loved ones but fail to experience any of the typical autonomic (i.e., affective) cues that accompany such recognition (Langdon & Coltheart, 2000). This violation of expectations cries out for explanation, thus generating the thought that perhaps the “loved one” is really an imposter. Similarly, influential precursors to predictive coding described above trace the hallucinated voices and illusions of control that are common in psychosis to dysfunctions in sensory predictive mechanisms that make the individual’s own voice and actions seem surprising, thus suggesting an external cause. Nevertheless, although it is extremely likely that anomalous experience plays an important causal role in delusion formation, there are two problems with locating delusional contents wholly in perceptual experiences. The first is that even in cases such as Capgras where one can identify a specific anomalous experience, there is still the question of why delusional subjects gravitate towards specific delusional hypotheses. As has been widely noted, for example, positing an imposter looks like an exceptionally implausible explanation in such cases (see Parrott, 2019). Not only is the belief in tension both with many other beliefs that people hold in general (i.e., about the limits of disguise) and with the testimony of doctors and trusted loved ones, but there appear to be many other, more plausible explanations of the relevant experience (e.g., “there’s something wrong with me”). Second, the canonical predictive coding account of psychosis is supposed to apply in cases where there are no specific anomalous experiences over and above those generated by aberrant precision-weighting. One might respond that aberrant precision-weighting provides a computational level description of – and can thus draw on the explanatory resources of – K ­ apur’s (2003) influential “aberrant salience” model of psychosis described above, according to which dopaminergic dysfunction (here understood as aberrant precision-weighting) causes otherwise irrelevant stimuli and connections between stimuli to strike the agent as highly salient and thus in need of explanation. Once again, however, tracing delusions to a domain-general aberration in salience attribution predicts that delusional beliefs will range freely over all possible topics of attention (Gold & Gold, 2015). Further, it is unclear why hyper-attention to otherwise irrelevant low-level sensory stimuli (driven by highly weighted low-level sensory

268  Daniel Williams and Marcella Montagnese prediction errors) does not merely generate an immersion in the sensory world of the sort observed in autism. Indeed, as has been noted (Sterzer et al., 2018), the dominant predictive coding account of autism (Lawson et al., 2014) looks highly similar to the canonical predictive coding account of psychosis, which is a problem given the substantial dissimilarities in their associated symptoms. Another suggestion is that social cognition is likely to be differentially impaired by a domain-general dysfunction in precision-weighting. For example, Griffin and Fletcher write that [S]ocial cues may be inherently more uncertain than non-social ones, because they rely on inferring intentions from ambiguous physical acts. Consequently, representations of the social world could be the first to break down when the system encounters a relatively minor impairment in uncertainty-weighting inference…. (2017, p. 276) Even if one accepts that social inference is more difficult than non-social inference, however, the social focus of delusions is not characterised by a general breakdown in social inference. For example, persecutory delusions are distinctive not just because they diverge from ordinary, nondelusional beliefs about the social world, but because of the malign and self-directed intentions that they attribute to other agents. Why should a paranoid stance towards the social world result from greater uncertainty or difficulty in social inference? One suggestion is that “aberrant predictive coding could render other people unreliable, to be treated with suspicion” (Griffin & Fletcher, 2017, p. 276; our emphasis). To quote Griffin and Fletcher (2017, p. 276) again, Just as reduced discriminability in PE [prediction error] signalling could lead to a consistent sense of unease or surprise, so too could reduce discriminability between social sources make everything (and everyone) seem uniformly unreliable, even suspicious. (2017, p. 276) Again, however, even granting that aberrant precision-weighting might make other people seem unreliable – and it is not clear why the substantial divergence between the individual’s beliefs and other people’s does not make her question her own reliability – unreliability need not entail suspicion or the attribution of malign intentions. Astrologists are unreliable, but we do not generally assume that they are part of a hidden plot to do us harm. Further, in the case of persecutory delusions, people’s unreliability manifests itself primarily in disagreement over the veracity of

Bayesian Psychiatry and the Social Focus of Delusions 269 the delusions, suggesting that the paranoia is the cause of the epistemic estrangement from other people, not the effect of such estrangement. Finally, in recent work, theorists have posited a novel kind of domaingeneral inferential difference as a potential driver of paranoia. In a set of fascinating experimental studies, Reed et al. (2020) demonstrate that paranoid individuals expect greater volatility relative to non-paranoid controls in a non-social learning task, and they show that this greater expectation of volatility can be reproduced in rats exposed to methamphetamine, a drug that is known to increase paranoia in humans. They (Reed et al., 2020) take this as “evidence of fundamental, domain-general learning differences in paranoid individuals” (p. 1) and thus hypothesise “that aberrations to these domain-general learning mechanisms underlie paranoia” (p. 2; our emphasis). Granting the existence of such domain-general differences, however, the explanatory connection between a greater expectation of volatility and the specific focus and contents of paranoia and persecutory beliefs is opaque. Reed et al. (2020, p. 2) write that “since excessive unexpected uncertainty is a signal of change, it might drive the recategorization of allies as enemies” (our emphasis). Why should higher levels of expected volatility drive the recategorisation of allies as enemies rather than the reverse, however, or no change in their status at all? Reed et al. (2020, p. 29) suggest that “when humans experience non-social volatility … they appeal to the influence of powerful enemies, even when those enemies’ influence is not obviously linked to the volatility,” but positing malevolent agency as the explanation of volatility without sufficient evidence constitutes an implausible – and so presumably unlikely – explanation of volatility. They also suggest that “with a well-defined persecutor in mind, a volatile world may be perceived to have less randomly distributed risk” (Reed et al., 2020, p. 29). It is not clear how connecting volatility to the seemingly unrelated actions of agents with hidden and inexplicitly malevolent intentions towards oneself – intentions which must be as volatile as the events they cause – is supposed to reduce uncertainty, however. Further, it is opaque why populating the world with malevolent agency directed towards oneself should be a desirable psychological outcome even if it did reduce uncertainty. Importantly, our point here is not to deny that human beings might be biased towards suspicion and paranoia of the sort highlighted in these explanations. Our point is that these biases are independent of any proposed difference in domain-general statistical inference and thus illicitly imported in from contingent assumptions about the human mind. These assumptions might be correct, but they reflect aspects of human psychology that are not themselves logical consequences of domain-general aberrations in statistical inference.

270  Daniel Williams and Marcella Montagnese 10.4.1 Summary

To summarise, the predictive coding account of psychosis is both attractive and problematic. Although there is compelling evidence that some form of dysfunction in uncertainty estimation plays a causal role in psychosis, it is difficult to reconcile such a domain-general explanation with the conspicuous domain specificity of psychotic symptoms, especially when it comes to delusions. Attempts to avoid this conclusion are either unconvincing or end up importing contingent assumptions about human psychology external to the model itself and beyond the scope of content-neutral, domaingeneral learning differences. Crucially, this problem seems to stem directly from the emphasis on abstract, domain-general inferential processes within Bayesian psychiatry more generally. As noted above, this framework is often aligned with predictive processing, a global theory of brain function in which the brain is viewed as a general-purpose uncertainty management mechanism operating in the service of a single, overarching epistemic goal – namely, minimising (long-term, average) prediction error or maximising model evidence (see Friston, 2010; Hohwy, 2013). Thus, Adams et al.’s (2013, p. 10) article outlining the canonical predictive coding account of psychosis involves “[s]tarting with the assumption that the brain is trying to maximize the evidence for its model of the world …” (our emphasis). Given this assumption, it is difficult to see how the account that they develop could locate psychosis in anything but a content-neutral, domain-general dysfunction in statistical inference. This assumption abstracts away from almost all of the distinctive functions, motives, interests, and concerns of the human mind, however. Thus, perhaps by integrating such contingent features of human psychology back into the framework and its starting assumptions, one might be able to address the explanatory gap described in this section. We turn to this possibility next. 10.5  A Bayesian Social Theory of Delusions In recent years, a prominent social framework for understanding delusions has emerged (see, e.g., Bell et al., 2019; Gold, 2017; Gold & Gold, 2015; Raihani & Bell, 2019). Although highly schematic, the unifying idea underlying this approach is that we should understand delusions not primarily in terms of domain-general inferential impairments but rather in terms of an evolved social psychology adapted to the recurring features, opportunities, and risks encountered in human social life. It is easy to see why this framework has been opposed to the predictive coding account – or, more generally, accounts – of psychosis (see, e.g., Bell et al., 2019). In this section, we briefly outline this framework and then argue that it can in

Bayesian Psychiatry and the Social Focus of Delusions 271 fact be reconciled with Bayesian psychiatry once the latter’s focus on abstract, domain-general inferential processes is replaced with a richer view of human psychology in which statistical inference mechanisms operate in the context of the distinctive and often idiosyncratic functions of the human mind. 10.5.1  The Social Approach to Delusions

The social approach to delusions is motivated by some of the facts outlined above: for example, that evidence of significant domain-general inferential differences between delusional subjects relative to neurotypical controls is weak, and that the actual delusional themes that occur cluster in a tiny region of the vast space of possible themes, with the overwhelming majority concerning the social world (Bell et al., 2019; Gold & Gold, 2015). According to proponents of a social approach, these and other explananda suggest that delusions are better understood in terms of dysfunctions in psychological mechanisms specialised for the distinctive problems and opportunities of human social life. As Gold and Gold (2015, p. 289) put it, “To understand delusions, one has to understand the history of human sociality.” Thus, this approach takes its inspiration from an evolutionary framework for understanding human psychology, according to which the human mind is best understood not as a general-purpose statistical inference mechanism but as a mosaic of specialised mechanisms adapted to the distinctive features, ­opportunities, and risks of human life (see Del Giudice, 2018).2 Although human social dynamics exhibit massive variation across place and time, this variation is underpinned by certain core characteristics. Most fundamentally, human social life is characterised by a complex interplay between cooperation and competition at multiple scales, including both within and between groups. Success within such environments is thus dependent on substantial social support, protection, and interpersonal coordination in the service of shared goals, but such cooperation is always fragile given the diverse and often divergent interests of individuals and groups competing for dominance, prestige, and resources. Further, the difficulties of navigating such opportunities and risks are amplified by the suite of unique human traits that underpin cooperation and competition, including sophisticated communication abilities (along with the attendant risk of deliberate deception), flexible and reliable mindreading, and highly developed reasoning capacities that facilitate long-term plans and complex behavioural strategies. How might such characteristics have been selected for a psychological apparatus vulnerable to delusion? One proposal concerns the evolution of psychological mechanisms concerned with detecting and responding to social threats (see Gold & Gold, 2015). “Social threat” here names a

272  Daniel Williams and Marcella Montagnese heterogeneous category of costs imposed by other agents and coalitions of agents, including those generated by outright violence, exploitation, betrayal, free riding on one’s investments, and more. Such threats are ubiquitous and have likely constituted the most significant danger to individual survival and reproductive success throughout our ancestral past (Dunbar, 1998). It is thus highly unlikely that the human mind has evolved to learn about the costs, cues, and sources of such threats wholly from experience. Such a blank slate would be quickly outcompeted by agents structured in advance of experience to detect, respond to, and actively learn about this recurring risk of human social life. What characteristics would one expect from psychological mechanisms specialised for navigating social threats? First, one would expect them to err on the side of caution (see Gold & Gold, 2015). That is, given the high – and potentially catastrophic – risk of social threats, false positives are likely to be less costly than false negatives. Further, this cost asymmetry is exacerbated by the fact that an absence of evidence of social threat does not imply evidence of its absence, especially given that the sources of such threats have the capacity and motivation to deliberately conceal their intentions from us. Thus, once a genuine suspicion of threat is activated, one would expect this suspicion to be difficult to assuage, and for agents to downgrade their level of trust in threat-related testimony. In these ways, the structural characteristics of threat detection might have been selected for a mild form of paranoia – or at least hypervigilance – even in properly functioning mechanisms (Raihani & Bell, 2019). Second, one would also expect the threshold for threat detection – and, by corollary, social trust – to be calibrated to the characteristics of the social environment that individuals encounter. That is, just as social threat detection, mechanisms should motivate individuals to learn about and detect specific cues of potential threats; they should also modulate the threshold for threat detection in response to the more general statistical characteristics of the environment. There is considerable evidence for conditional adaptation of this kind, which involves adjustments to the structural development of mechanisms (including information-processing mechanisms) in response to environmental cues, especially during sensitive periods such as childhood (see Del Giudice, 2018). Thus, early and/or recurrent exposure to social stressors and exploitation would be expected to lower the threshold for social threat detection, sometimes in ways that are extremely difficult to change. Third, mechanisms of social threat detection need to combine both fast and automatic detection of threats – and attention to the potential cues of threats – posed by the immediate environmental context with a powerful motivation to ruminate and reflect on the possibility of more distant threats generated by complex, future-oriented and deliberately concealed intentions

Bayesian Psychiatry and the Social Focus of Delusions 273 (see Gold & Gold, 2015). That is, threat detection is not merely – or even mostly – a perceptual function but must draw on the resources of reasoning capacities capable of both integrating information from diverse sources and of exploring complex hypothetical risk scenarios and possibilities. Finally, one consideration that has not – to the best of our knowledge, at least – been explored in the psychiatric literature is the fact that one’s beliefs about social threats provide important information to other agents. Thus, beliefs about the likelihood of social exploitation might be influenced by social signalling pressures that adjust one’s level of suspicion not just to the available evidence but to the deterrent effect of one’s suspicion on others (see Williams, 2021a). In The Godfather, for example, Don Corleone exemplifies how deterrence can motivate a kind of strategic irrationality when he informs fellow mafia bosses of his willingness to jump to conclusions without evidence: … I’m a superstitious man, and if some unlucky accident should befall [my son], if he should get shot in the head by a police officer, or if he should hang himself in his jail cell, or if he’s struck by a bolt of lightning, then I’m going to blame some of the people in this room. Such considerations help to clarify what is meant by “functional specialization.” Gold and Gold (2015; see also Gold, 2017) posit a “suspicion system” for detecting and responding to social threats, but this terminology carries the unfortunate connotation of a discrete self-contained cognitive module. As the foregoing suggests, adaptive threat detection requires mechanisms that integrate information from a variety of different sources, which are capable of substantial learning, and that modulate the activity of other psychological mechanisms involved in attention, deliberation, action, and so on. Such information-processing mechanisms and procedures are thus not self-contained and are certainly not realised in a discrete anatomical module at the macroscopic level of brain structures. Nevertheless, such mechanisms are still specialised insofar as their characteristics would not be appropriate for many other cognitive tasks, such as estimating the spatial layout of the environment, forecasting the weather, or parsing the syntactic structure of a sentence. As noted, even properly functioning mechanisms of social threat detection might exhibit signs of paranoia. Now consider a dysfunction in such mechanisms, however. This dysfunction could make individuals less capable of detecting and responding to social threats and, thus, extremely vulnerable to exploitation. Equally, however, it could make individuals overly sensitive to the possibility of social threat, driving their attention towards and ruminating on the possibility of such threats in a way that will appear wholly disconnected from objective evidence to other agents. At first, this

274  Daniel Williams and Marcella Montagnese hypersensitivity to social threat might be reined in by conscious reflection on the implausibility of paranoid thoughts. Over time, however, hyperactive threat detection might result in an accumulation of evidence that overcomes such rational defences and gives rise to entrenched persecutory beliefs, driving conscious reasoning away from challenging paranoid thoughts and towards integrating them with the rest of the individual’s worldview. This is the essence of the model of persecutory delusions advanced by Gold and Gold (2015) and Gold (2017). As they note, it has a myriad of attractions. First, it explains why persecutory delusions have the specific theme that they do. Of course, contingent features about the relevant individual’s time and cultural milieu will no doubt influence what kinds of social threats are salient to them, but this model has the advantage of explaining why social threat in general is such a common theme of delusional ideation. Second, it accounts for the relatively weak differences in domaingeneral inference found in the empirical literature. Although this model is consistent with such differences (see below), it suggests that they are not the sole driver of delusions. Third, it illuminates powerful correlations found between various forms of social adversity (e.g., trauma, abuse, exploitation) and the risk of clinical paranoia (see Raihani & Bell, 2019). As noted above, conditional adaptation might have been selected for a lower threshold for threat detection in response to such circumstances. Finally, there is some direct – albeit fairly limited – evidence that social threat detection is specifically impaired in conditions such as schizophrenia (see Gold, 2017; Gold & Gold, 2015). Of course, as sketched here and as found in Gold and Gold’s proposal, this model is highly schematic. For example, it may be that dysfunctional mechanisms underlying persecutory delusions do not track social threat as such but – at least in many cases – specific coalitional threats, which could account for why severe paranoia often involves misperceptions of coalitional boundaries and collective action (Raihani and Bell, 2017). Further, persecutory delusions are obviously not the only kind of delusion. ­Nevertheless, this model illustrates a much more general approach to understanding delusions, one that explicitly connects the contents of delusional ideation and beliefs to the distinctive concerns, motives, and functions of the human mind, and the psychological mechanisms specialised for such distinctive characteristics (Bell et al., 2019; Del Giudice, 2018; Gold & Gold, 2015). 10.5.2  A Bayesian Social Approach to Delusions

This social approach to understanding delusions appears to conflict with the predictive coding model – or, more generally, models – of psychosis (see Bell et al., 2019). Nevertheless, the apparent domain specificity

Bayesian Psychiatry and the Social Focus of Delusions 275 of delusions need not be in tension with Bayesian modelling as such. ­Indeed, there are various ways in which Bayesian inference generally – and uncertainty-­weighted prediction error minimisation specifically – could accommodate domain specificity. Most obviously, domain-specific mechanisms might themselves make use of Bayesian computations that infer social threat from ambiguous cues. That is, even if psychological mechanisms are “function-specific, their algorithms needn’t be” (Carruthers, 2006, p. 62). As Sperber puts it, [T]he fact that the formal properties of a learning procedure are best specified without assigning to it any specific domain or goal does not entail that the use of such a procedure in an organism or a machine cannot be tied and adjusted to specific goals. (2019, p. 36) Such adjustment to specific goals or functions might take various forms. For example, it might involve domain-specific priors (Sperber, 2019). Given the ubiquity and risks of social threats, it is highly like that humans have priors concerning the presence of such threats both in general and in specific contexts that need not be acquired wholly from experience. More subtly, a central issue for Bayesian inference concerns the hypothesis space itself (Parrott, 2019). In principle, an infinite number of hypotheses could explain any given observation, and a real-life Bayesian inference machine cannot consider all of them. Thus, evolution might have endowed the human brain with constraints that narrow and structure the hypothesis space within which Bayesian takes place, including the procedures for generating candidate hypotheses. For example, people might instinctively consider threat-related hypotheses as explanations for events, especially those that strike the person as anomalous or distressing. Further, Bayesian decision theory provides a formal framework for explicitly modelling how asymmetries in the costs of false positives and false negatives modulate judgement and decision-making in different domains (see Williams, 2021b). Finally, all of these features of Bayesian mechanisms could be adjusted in accordance with conditional adaptation, such that individuals exposed to early social stressors and exploitation might have higher social threat-related priors, a greater motivation to generate social threat-related hypotheses, and a lower threshold for social threat detection. Given such considerations, there is nothing in the social approach to delusions that is in tension with the idea that the computational architecture underlying delusions makes use of Bayesian inference or prediction error minimisation. Indeed, one might view these frameworks as highly complementary, with the social approach proposing distinctive functions and dysfunctions that underlie delusional cognition at a conceptual level and

276  Daniel Williams and Marcella Montagnese the Bayesian approach generating hypotheses about how such phenomena are implemented in the brain’s information-processing mechanisms. Return to the canonical predictive coding model of psychosis, for example. At the core of this model is the idea that psychosis in schizophrenia is driven by aberrant uncertainty-estimation, with a bias towards assigning greater precision to lower level sensory prediction errors relative to higher level, more abstract expectations. As we have seen, there is compelling evidence for this proposal, but it struggles to account for the domain specificity of delusional ideation. Adams et al. (2013, pp. 1–2), for example, propose that the failure of precision-weighting that they posit can be “understood intuitively by considering classical statistical inference,” where “if we overestimate the precision of the data … we expose ourselves to false positives.” As noted above, however, positing such abstract, ­domain-general failures in statistical inference as the cause of psychosis fails to account for the highly specific focus of delusional ideation and belief. Now consider how such a domain-general difference in statistical inference might interact with the functionally specialised machinery for social threat detection outlined in the previous sub-section, however. First, we have already seen that such machinery is likely biased towards false positives independent of any aberration in precision-weighting, perhaps especially so in individuals previously exposed to social stressors. Thus, an additional – and perhaps initially domain-general – bias towards false positives might have a disproportionate effect on social threat processing, with threat-related cues coming to seem even more salient and thus capturing the individual’s attention. Further, as noted, it is similarly plausible that people will have an inherent bias to generate and search for hypotheses positing social threats when confronted with anomalous experiences in general. Further, the tendency to generate such hypotheses is likely to be amplified given the anxiety known to be associated with paranoia and psychosis (see Freeman, 2007). For example, Pezzulo (2014) has argued that interoceptive cues of anxiety (e.g. an increased heart rate and galvanic skin response) provide evidence that can bias Bayesian updating towards paranoid inferences that might seem deeply implausible to those without the relevant interoceptive evidence, just as the paranoid hypotheses that occur to us after watching a horror film at night might seem comically implausible to us when we awake the next morning.3 Once the possibility of social threat is seriously entertained as a consequence of one or both of these factors, the considerations about threat detection described above – for example, the difficulties in finding evidence of the absence of threat, the risks of wilful deception, and the potential source of threats in complex and concealed plans – will motivate individuals to differentially search out, attend to, and ruminate on threat-related

Bayesian Psychiatry and the Social Focus of Delusions 277 information and possibilities, in addition to being on greater guard against the possibility of wilful deception. In this way, the motivated search for threat-related information and possibilities might interact with a general oversensitivity to low-level sensory prediction errors to provide additional evidence that fuels the paranoia. Whilst at first such paranoid forms of informational-sampling and hypothesis generation might be reined in by more global, integrative systems of reflection, over time the apparent accumulation of evidence might overpower such defences, changing the focus of conscious reasoning away from reasonable scepticism and towards the development of explanations that rationalise the evidence of social threat.4 Thus, this might explain the transition from a prodromal phase in schizophrenia in which individuals retain insight concerning the implausibility of their paranoia towards the entrenchment of more fixed persecutory beliefs. Finally, as the estimated risk of social threat and potential exploitation increases, the motivation for increasing the confidence in one’s paranoid thoughts might be further incentivised by the deterrent effects of such paranoia on others. Here, the willingness to identify persecutors and conspiracies in a world that has become increasingly distressing serves a protective function, signalling to others hypervigilance for potential exploitation. Although such conspicuous paranoia might serve this protective function well, however, it will also further alienate the individual from others and erode social trust, thereby reinforcing the paranoia and its evidential basis further. Our repeated use of the word “might” in these suggestions should be emphasised. That is, we do not intend these extremely schematic and highly speculative suggestions as a serious model of clinical paranoia and the onset of persecutory delusions. Instead, we have advanced them to illustrate how augmenting a Bayesian approach to understanding psychosis with the content-rich, domain-specific biases and concerns of the human mind helps to broaden the hypothesis space for this approach, thus providing a greater range of potential explanations when it comes to accounting for some of the distinctive features of psychosis and delusional ideation. 10.6 Conclusion We are convinced of the explanatory power and fecundity of the Bayesian brain and predictive coding when it comes to modelling the information-­ processing dysfunctions and aberrations that underlie psychiatric disorders. Nevertheless, we also believe that the emphasis within much of Bayesian psychiatry on highly abstract, domain-general inferential processes likely blinds it to many distinctive features of human psychology and psychopathology. The human brain is not a general-purpose blank slate employing statistical algorithms in the service of dispassionate inference,

278  Daniel Williams and Marcella Montagnese but the control centre of a unique primate that evolved to navigate a distinct world of opportunities and risks. This control centre might make extensive use of sophisticated statistical learning and inference, but such strategies must be understood in the context of the distinctive features, functions, and interests of the human mind. We have sought to illustrate this lesson by appealing to a highly influential predictive coding model of psychosis, which – we have argued – is currently unable to capture the specific contents of delusional ideation precisely because of its exclusive focus on aberrations in content-neutral, domain-general statistical inference. As noted, we are aware of how schematic and speculative our proposals have been for integrating this application of Bayesian psychiatry with a richer, evolutionary framework for understanding human psychology. Nevertheless, we hope that this chapter motivates more extensive, detailed investigations into this subject in the future. Notes 1 We use the term “Bayesian psychiatry” to capture the fact that although predictive processing has played a central role in this research programme (see Friston et al., 2014), some research within it posits different forms of Bayesian approximations to those typically associated with predictive processing (e.g., Gershman, 2019). 2 Importantly, however, the embrace of functional specialisation does not imply the existence of discrete, informationally encapsulated “modules” visible at the level of macroscale brain structures. Instead, it assumes that the brain evolved to solve a multiplicity of distinctive social and ecological tasks, not a single, generic task, and that both the structure of the brain and the information that it encodes is specialised for the performance of such tasks. As stressed below, this is consistent with the exploitation of common computational principles such as Bayesian inference across diverse tasks (see Carruthers, 2006; Del Giudice, 2018). 3 The role of anxiety in biasing individuals towards paranoid hypotheses is central to Freeman’s (2007) threat anticipation model of paranoia and persecutory delusions. 4 Note that this might also occur due to more direct damage to those regions of the brain that subserve higher level belief integration and evaluation (see Langdon & Coltheart, 2000).

References Adams, R. A., Stephan, K. E., Brown, H. R., Frith, C. D., & Friston, K. J. (2013). The computational anatomy of psychosis. Frontiers in Psychiatry, 4, 47. American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). (DSM-5). Washington, DC: American Psychiatric Publishing. Barrett, L. F., Quigley, K. S., & Hamilton, P. (2016). An active inference theory of allostasis and interoception in depression. Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1708), 20160011.

Bayesian Psychiatry and the Social Focus of Delusions 279 Bell, V., Raihani, N., & Wilkinson, S. (2019). De-rationalising delusions. Clinical Psychological Science, 91, 24–37. Bentall, R. P., Kaney, S., & Dewey, M. E. (1991). Paranoia and social reasoning: An attribution theory analysis. British Journal of Clinical Psychology, 30(1), 13–23. Blakemore, S. J., Smith, J., Steel, R., Johnstone, E. C., & Frith, C. D. (2000). The perception of self-produced sensory stimuli in patients with auditory hallucinations and passivity experiences: Evidence for a breakdown in self-monitoring. Psychological Medicine, 30(5), 1131–1139. Carhart-Harris, R. L., & Friston, K. J. (2019). REBUS and the anarchic brain: Toward a unified model of the brain action of psychedelics. Pharmacological Reviews, 71(3), 316–344. Carruthers, P. (2006). The architecture of the mind. Oxford: Oxford University Press. Chater, N., Zhu, J. Q., Spicer, J., Sundh, J., León-Villagrá, P., & Sanborn, A. (2020). Probabilistic biases meet the Bayesian brain. Current Directions in Psychological Science, 29(5), 506–512. Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181–204. Clark, A. (2016). Surfing uncertainty. Oxford: Oxford University Press. Corlett, P. R., & Fletcher, P. C. (2015). Delusions and prediction error: Clarifying the roles of behavioural and brain responses. Cognitive Neuropsychiatry, 20(2), 95–105. Davies, M., Coltheart, M., Langdon, R., & Breen, N. (2001). Monothematic delusions: Towards a two-factor account. Philosophy, Psychiatry, & Psychology, 8(2), 133–158. Del Giudice, M. (2018). Evolutionary psychopathology: A unified approach. ­Oxford: Oxford University Press. Doya, K., Ishii, S., Pouget, A., & Rao, R. P. (Eds.). (2007). Bayesian brain: Probabilistic approaches to neural coding. London: MIT press. Dunbar, R. I. (1998). The social brain hypothesis. Evolutionary Anthropology: ­Issues, News, and Reviews: Issues, News, and Reviews, 6(5), 178–190. Fénelon, G., Soulas, T., Zenasni, F., & de Langavant, L. (2010). The changing face of Parkinson’s disease-associated psychosis: A cross-sectional study based on the new NINDS-NIMH criteria. Movement Disorders, 25(6), 763–766. Fletcher, P. C., & Frith, C. D. (2009). Perceiving is believing: A Bayesian approach to explaining the positive symptoms of schizophrenia. Nature Reviews Neuroscience, 10(1), 48–58. Freeman, D. (2007). Suspicious minds: The psychology of persecutory delusions. Clinical Psychology Review, 27(4), 425–457. Freeman, D. (2016). Persecutory delusions: A cognitive perspective on understanding and treatment. The Lancet Psychiatry, 3(7), 685–692. Friston, K. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society B: Biological Sciences, 360(1456), 815–836. Friston, K. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience, 11(2), 127–138. Friston, K. J., Stephan, K. E., Montague, R., & Dolan, R. J. (2014). Computational psychiatry: The brain as a phantastic organ. The Lancet Psychiatry, 1(2), 148–158.

280  Daniel Williams and Marcella Montagnese Frith, C. (2005). The cognitive neuropsychology of schizophrenia (1st ed.). London: Psychology Press. Frith, C. D., Blakemore, S. J., & Wolpert, D. M. (2000). Explaining the symptoms of schizophrenia: Abnormalities in the awareness of action. Brain Research ­Reviews, 31(2–3), 357–363. Gadsby, S., & Hohwy, J. (2019). Why use predictive processing to explain psychopathology? The case of anorexia nervosa. In D. Mendonca, M. Curado, & S. Gouveia (Eds.), The philosophy and science of predictive processing. London: Bloomsbury Academic. Garety, P. (1991). Reasoning and delusions. The British Journal of Psychiatry, 159(S14), 14–18. Gershman, S. J. (2019). The generative adversarial brain. Frontiers in Artificial Intelligence, 2, 18. Gold, J., & Gold, I. (2015). Suspicious minds. Riverside: Free Press. Gold, I. (2017).Outline of a theory of delusion: Irrationality and pathological belief. In T.-W. Hung & T. J. Lane (Eds.), Rationality: Constraints and contexts (pp. 95–119). Cambridge, MA. Elsevier Academic Press. Griffin, J. D., & Fletcher, P. C. (2017). Predictive processing, source monitoring, and psychosis. Annual Review of Clinical Psychology, 13, 265–289. Hohwy, J. (2013). The predictive mind. Oxford: Oxford University Press. Kapur, S. (2003). Psychosis as a state of aberrant salience: A framework linking ­biology, phenomenology, and pharmacology in schizophrenia. American Journal of Psychiatry, 160, 13–23. Knill, D. C., & Pouget, A. (2004). The Bayesian brain: The role of uncertainty in neural coding and computation. Trends in Neurosciences, 27(12), 712–719. Langdon, R., & Coltheart, M. (2000). The cognitive neuropsychology of delusions. Mind and Language, 15(1), 184–218. https://doi.org/10.1111/1468-0017.00129 Lawson, R. P., Rees, G., & Friston, K. J. (2014). An aberrant precision account of autism. Frontiers in Human Neuroscience, 8, 302. Maher, B. A. (1974). Delusional thinking and perceptual disorder. Journal of Individual Psychology, 30, 98–113. Mathys, C., Daunizeau, J., Friston, K. J., & Stephan, K. E. (2011). A Bayesian foundation for individual learning under uncertainty. Frontiers in Human Neuroscience, 5, 39. McCutcheon, R. A., Krystal, J. H., & Howes, O. D. (2020). Dopamine and glutamate in schizophrenia: biology, symptoms and treatment. World Psychiatry, 19(1), 15–33. McLean, B. F., Mattiske, J. K., & Balzan, R. P. (2017). Association of the jumping to conclusions and evidence integration biases with delusions in psychosis: A detailed meta-analysis. Schizophrenia Bulletin, 43(2), 344–354. https://doi. org/10.1093/schbul/sbw056 Montagnese, M., Leptourgos, P., Fernyhough, C., Waters, F., Larøi, F., & Jardri, R., … Urwyler, P. (2020). A review of multimodal hallucinations: Categorization, assessment, theoretical perspectives, and clinical recommendations. Schizophrenia Bulletin. doi: 10.1093/schbul/sbaa101 Moore, J. W., & Fletcher, P. C. (2012). Sense of agency in health and disease: A review of cue integration approaches. Consciousness and Cognition, 21(1), 59–68.

Bayesian Psychiatry and the Social Focus of Delusions 281 Mortensen, P., Pedersen, C., Westergaard, T., Wohlfahrt, J., Ewald, H., & Mors, O., … Melbye, M. (1999). Effects of family history and place and season of birth on the risk of schizophrenia. New England Journal of Medicine, 340(8), 603–608. O’Callaghan, C., Hall, J. M., Tomassini, A., Muller, A. J., Walpola, I. C., Moustafa, A. A., & Lewis, S. J. (2017). Visual hallucinations are characterized by impaired sensory evidence accumulation: Insights from hierarchical drift diffusion modeling in Parkinson’s disease. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 2(8), 680–688. Parrott, M. (2019). Delusional predictions and explanations. The British Journal for the Philosophy of Science, 72(1). http://doi.org/10.1093/bjps/axz003 Patel, K. R., Cherian, J., Gohil, K., & Atkinson, D. (2014). Schizophrenia: overview and treatment options. Pharmacy and Therapeutics, 39(9), 638. Pezzulo, G. (2014). Why do you fear the bogeyman? An embodied predictive coding model of perceptual inference. Cognitive, Affective, & Behavioral Neuroscience, 14(3), 902–911. Raihani, N. J., & Bell, V. (2017). Paranoia and the social representation of others: a large-scale game theory approach. Scientific Reports, 7(1), 4544. Raihani, N. J., & Bell, V. (2019). An evolutionary perspective on paranoia. Nature Human Behaviour, 3(2), 114–121. Rao, R., & Ballard, D. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1), 79–87. Reed, E. J., Uddenberg, S., Suthaharan, P., Mathys, C. D., Taylor, J. R., Groman, S. M., & Corlett, P. R. (2020). Paranoia as a deficit in non-social belief updating. eLife, 9, e56345. Schwartenbeck, P., FitzGerald, T. H., Mathys, C., Dolan, R., Wurst, F., Kronbichler, M., & Friston, K. (2015). Optimal inference with suboptimal models: Addiction and active Bayesian inference. Medical Hypotheses, 84(2), 109–117. Shergill, S. S., Samson, G., Bays, P. M., Frith, C. D., & Wolpert, D. M. (2005). Evidence for sensory prediction deficits in schizophrenia. American Journal of Psychiatry, 162(12), 2384–2386. Shinn, A., Pfaff, D., Young, S., Lewandowski, K., Cohen, B., & Öngür, D. (2012). Auditory hallucinations in a cross-diagnostic sample of psychotic disorder ­patients: A descriptive, cross-sectional study. Comprehensive Psychiatry, 53(6), 718–726. Sperber, D. (2019). Instincts or gadgets? Not the debate we should be having. Behavioral and Brain Sciences, 42. Sterzer, P., Adams, R., Fletcher, P., Frith, C., Lawrie, S., & Muckli, L., … Corlett, P. (2018). The predictive coding account of psychosis. Biological Psychiatry, 84(9), 634–643. Teufel, C., & Fletcher, P. C. (2016). The promises and pitfalls of applying computational models to neurological and psychiatric disorders. Brain, 139(10), 2600–2608. Tsuang, M. (2000). Schizophrenia: Genes and environment. Biological Psychiatry, 47(3), 210–220. Ursini, G., Punzi, G., Chen, Q., Marenco, S., Robinson, J., Porcelli, A., … Weinberger, D.R. (2018). Convergence of placenta biology and genetic risk for schizophrenia. Nature Medicine, 24(6), 792–801.

282  Daniel Williams and Marcella Montagnese Waters, F., Collerton, D., ffytche, D., Jardri, R., Pins, D., & Dudley, R., … Larøi, F. (2014). Visual hallucinations in the psychosis spectrum and comparative ­information from neurodegenerative disorders and eye disease. Schizophrenia Bulletin, 40(Suppl_4), S233–S245. Williams, D. (2018). Hierarchical Bayesian models of delusion. Consciousness and Cognition, 61, 129–147. Williams, D. (2021a). Socially adaptive belief. Mind and Language, 36(3), 333–354. Williams, D. (2021b). Epistemic irrationality in the Bayesian brain. The British Journal for the Philosophy of Science, 72(4), 913–938.

11 Higher-Order Bayesian Statistical Decision Theory of Consciousness, Probabilistic Justification, and Predictive Processing Tony Cheng 11.1 First-Order Theories, Higher-Order Theories, and Statistical Decision Jakob Hohwy has suggested that “the free energy principle may fit with a contemporary theory of consciousness, namely a Bayesian metacognitive theory recently proposed by Hakwan Lau” (2015, p. 295), according to which “perceptual consciousness depends on our Bayesian decisions, i.e., criterion setting, based on… higher-order representations [of the variance of probability density functions]” (Lau, 2008, p. 39). Hohwy then goes into some details in making this connection, while setting aside “the higherorder thought theory with which it is initially presented and, instead, via the notion of active inference” (Hohwy, 2015, p. 295). In this chapter, I shall not repeat Hohwy’s argumentation in his paper and will instead develop a possible direction that is more in line with Lau’s original settings. One crucial point to be noted is that the version of higher-order theory being explored here should not be understood as a higher-order thought theory; nor should it be understood as a higher-order perception theory.1 This will be clear only after the positive view is presented.2 Consciousness research has been one of the central themes in both philosophical and scientific studies of the mind at least from a quarter of century ago. In studying consciousness, epistemological considerations have been by and large relegated as irrelevant or distracting. As Jerry Fodor states, “it is indeed Very Bad Practice to run your psychology with epistemological malice aforethought. People who do so generally make the worst of both” (2008, p. 170).3 Fodor’s observation might be correct with regard to the actual situation, but I believe that it is normatively incorrect: psychology and epistemology should be considered together, at least in principle; they should constrain each other.4 In this mutual constraining relation, empirical psychology might have priority, at least sometimes, as sciences in general have established itself as solid ways of gaining knowledge. In this chapter I do not argue for this meta-philosophical view

DOI: 10.4324/9781003084082-14

284  Tony Cheng directly; instead, I shall focus on a specific example: how higher-order statistical decision theory, a version of higher-order theory of consciousness, could say and should say about epistemic justification, and whether these views are plausible. I begin by explaining the contrast between first-order and higher-order theories, and then the kernel of higher-order statistical decision theory (henceforth HOSDT). Both first-order and higher-order theories agree that there are first-­order states or representations when it comes to sensations, perceptions, and other mental episodes. They disagree when it comes to the key question: what distinguishes conscious and unconscious first-order states?5 The answer from first-order theories can vary, but they all deny that it is higherorder states that explain the target first-order state’s being conscious (e.g., Block, 1995; Dretske, 1995; Tye, 1995). Higher-order theories hold that the relevant higher-order states are the correct explanation of consciousness (e.g., Carruthers, 2005; Lycan, 1996; Rosenthal, 2005), though they disagree about the nature of the states (e.g., thought vs. perception and occurrent vs. dispositional) and the nature of the relation between higherorder states and first-order states (e.g., accompaniment, representation, and other psychological relations). Details aside, the basic contrast between these two groups of views is quite straightforward. What is then our HOSDT? “Statistical decision” in this context is neither thought nor perception; it is decision at the subpersonal level.6 How exactly we can draw the distinction between the personal and the subpersonal is a matter of controversy (Bermúdez, 2005), but here the meaning is quite clear: statistical decisions here are made by physiological-informational systems; in the case of humans, it is the human brain, not the person. Even for identity theories, explanations at the brain/cerebral level are distinguished from explanations at the personal level (“levels of analysis”). More specifically, parts of the brain (presumably parts of prefrontal cortex, PFC) decide what we consciously perceive given the current perceptual inputs. It is statistical because certainty does not apply here: in most or even all cases, the system can only decide that given the inputs and other factors, it is most likely that the current scene is like this. This characterisation connects directly to a basic argument for the view.7 That argument goes like this: neural representations are noisy. It does not really make sense to say that the neurons representing a face are firing and therefore you have the representation of a face. We need to know what the baseline firing is, and how variable things are, etc. The brain has to have a mechanism to make these statistical decisions to decide what first-order representations are reliable enough to be considered further, especially so in downstream syntactical processes where errors propagate in a serial manner (von Neumann’s problem). Together with other empirical considerations discussed in Lau and Rosenthal (2011), I suggest that

Higher-Order Bayesian Statistical Decision Theory  285 this statistical decision mechanism, which very likely exists, is the one by which conscious and unconscious first-order states are distinguished. This is by no means supposed to be a decisive argument for HOSDT; far from it. The debate between first-order theories and higher-order theories is an ongoing one, and amongst higher-order theories HOSDT is not the mainstream view. For more empirical arguments for higher-order views in general, see Carruthers and Gennaro (2001/2020). The above semi-­ empirical argument serves only to give the minimal plausibility of the view recommended here. I only mention and discuss an empirical argument with the conviction that although all kinds of argument should be taken into account, at least methodologically speaking empirical ­ arguments should have certain priority in this kind of debate. Again, here this metaphilosophical view is not argued directly; instead, I start from these initial empirical considerations and move to the main concern on this occasion, i.e., what this version of the higher-order theory could say and should say about epistemic justification, and whether those epistemic views are plausible. Towards the end I will consider potential connections between this view and the predictive processing framework. 11.2  HOSDT and Probabilistic Justification Epistemic justification has many facets; here I will focus on three of them. 11.2.1  The Contents of Perceptual Experiences

It is natural to think, on the one hand, that our conscious perceptions at least sometimes justify relevant beliefs (“minimal empiricism,” McDowell, 1996). For example, I currently believe that it is raining outside, and this belief is justified by my seeing that it is raining outside. This natural thought has been seriously challenged in the analytic tradition: for example, Donald Davidson famously argues that “nothing can count as a reason for holding a belief except another belief” (1986, p. 310). His intricate arguments aside, many have found this position too strong.8 To preserve the crucial insight, for example, John McDowell (1996) has argued for a slightly weaker view (than Davidson’s) that perceptual experiences can justify beliefs, but with the proviso that the contents of perceptions are conceptual all the way out, because only conceptual contents can figure in the space of reasons and justification. However, on the other hand, it seems also natural to hold that at least some contents of perceptual experiences are non-conceptual, due to considerations from both the richness and fineness of experiences, the cognitive impenetrability of some experiences, and experiences in animals/human infants (Evans, 1982; Peacocke, 1992; Tye, 2006). These two lines of thought are obviously in tension.

286  Tony Cheng Now, there are dozens of potential ways to deal with this tension in the literature, and it is not my intention here to survey the current terrain. What I will do is to assume these two natural lines are by and large correct and see how HOSDT can cope with both of them. Before meeting this challenge, let’s remind ourselves of the shape of higher-order theories. At bottom, this group of views holds that first-order representations (be they vehicle properties, dispositions, or else) are not sufficient for any conscious experience to ever occur; something further downstream, i.e., higher-order representations, are necessary for making the first-order states conscious. A mere change in some downstream process, keeping the first-order representation constant, is sufficient for leading to change in the conscious experiences themself. The specific version of higher-order views I tentatively favour, HOSDT, goes like this: the relevant higher-order representation says something like “I am having this representation which is statistically reliable,” where “this” demonstratively points to a particular first-order representation. The charge of overintellectualisation here is misguided, as the higher-order representation is not full-fledged thought. It is statistical decisions made by physiologicalinformational systems, including mature human brains, infant human brains, and varieties of animal brains. Now the proposed resolution of the tension is this: we can have first-­ order representations having non-conceptual contents, and yet the relevant higher-order representations can play the justificatory role. Whether these higher-order representations are conceptual is a further, open empirical question, and partially depends on what one means by “conceptual.” This justification would look something like this: because I am having this representation that is statistically reliable, whatever this representation says is what I believe. Conscious perceptions are individuated in part by the first-order non-conceptual contents, and in part by higher-order representations. Again, whether these higher-order representations are conceptual is a further issue; what is crucial is that they are statistical decisions: although they are not conscious, personal-level decisions, they have assertoric force. They assert, sub-personally, that this should be the case given the current environment. This is how, at least preliminarily, HOSDT responds to the conceptualism/non-conceptualism debate. A vivid way to illustrate the differences among HOSDT, HOT, and firstorder theories is to focus on this following argument: P1. Conscious experiences with non-conceptual contents in principle cannot be used to justify beliefs. P2. Some conscious experiences do have non-conceptual contents. C. Some conscious experiences in principle cannot be used to justify beliefs.

Higher-Order Bayesian Statistical Decision Theory  287 Now, our version of HO theory rejects P1, as we explained above. This has the advantage that we acknowledge non-conceptual contents of conscious experiences, which is compatible with many important theories of consciousness. HOT theories have to reject P2: for them, experiences are determined by higher-order thoughts, which are conceptual. For FO theories, the situation is more complicated. Many of them use different kinds of content to explain the conscious/non-conscious divide; in that case, it is harder for them to reject P2, as both conscious and unconscious mental states might have non-conceptual contents. They can of course reject P1: it is a huge debate that goes beyond theories of consciousness. Or they can simply accept C, though this move has some distinctive theoretical burdens. 11.2.2  The Role of First-Person Access

Another huge debate concerning epistemic justification is the disagreement between externalism and internalism. Reliabilism, arguably the most popular externalism in this context, has it that the nature of justification lies in reliable processes (e.g., Goldman, 1975, 2008). Internalism (e.g., BonJour, 1985) argues that first-person access is required for justification: this is what makes knowledge valuable. HOSDT is closer to reliabilism, since it demands the higher-order representations to make reliable statistical decisions. This view does not directly address the internalist insistence that first-person access is crucial for justification. But HOSDT is perfectly compatible with the plausible idea that higher-order statistical decisions play a crucial role in belief formations: together with other psychological mechanisms, perceptual beliefs can be formed and be justified from the relevant higher-order representations. Those further downstream process can be accessed first-personally. Therefore, the way to reconcile externalism and internalism about justification, according to HOSDT, is to put them at different stages: first we have higher-order statistical decisions made by subpersonal systems where first-person access has no application, and then after that first-person access comes in and secures the subjective aspect of justification. 11.2.3  Speckled Hen, Peripheral Vision, and so on

When viewing many items at the same time, what we see seem to be indeterminate in some way (Chisholm, 1942; Stazicker, 2011). Similar is true of peripheral vision: many things in the periphery are seen in a sense, but the situation is crucially different from foveal vision. In addition to physiological explanations, HOSDT adds a good supplementary explanation: we normally feel that we see lots of details in the periphery because of

288  Tony Cheng higher-order statistical decisions. Under normal circumstances, those decisions tend to overestimate what we actually see (Lau & Rosenthal, 2011). 11.3  Objections and Replies This is how roughly HOSDT answers several questions concerning epistemic justification. In what follows I consider some potential objections: Objection 1: How is this view different from saying that a conscious experience causes a higher-order representation that in turn does the work of justification? Any first-order theory can allow such higher-order representation to occur without their being a constitutive part of the experience. Reply: On this account, first-order views cannot explain why the experience seems to give such immediate and consistent justificatory force. On the current higher-order view, the higher-order representation is not an effect based on a causal mechanism that can sometimes fail: if it fails there will be no conscious experience to begin with. On the current view, whenever you have an experience, you will be quite ready to believe what you perceive, and very often justifiably so. On the first-order views, the higher-order representation occurs only via causal mechanisms, and this is crucially different from the view I am recommending. Mere causation cannot explain justification. Objection 2: What about we take a first-order view and do a two-step justification? First, we justify the belief that “this first-order representation that I am having is statistically reliable,” and then from there we justify that because of the reliability of this first-order representation, I should believe whatever this representation says. The first step is not necessarily merely causal because if it is a belief concerning an internal representation demonstratively referred to, it may very well be the most reasonable thing for the brain to introspectively look inside and do all those statistical decisions. And then the second step is trivial. This would mean one can still do the justificatory job within a first-order framework. Reply: If one has to accept a first-order framework, this may well be the way to go, but there are several reasons to be sceptical about this view. First, just because you are justified to form the higher-order belief, it does not mean you always do. So, when you fail to form the first justified belief, your second step will not be justified, which means you are not always justified in having the final belief you are aiming for. Second, in cases where you have hallucinatory experiences but do not form the corresponding justified beliefs, my tentative view gives a better account of the conscious phenomenology. Say you have a very strong prior expectation that you should not be seeing flying human heads, but you seem to be consciously seeing some now. Naturally, you believe you are hallucinating rather than believing that you are seeing flying human heads. This seems to be reasonable.

Higher-Order Bayesian Statistical Decision Theory  289 On the current higher-order view, what happens is that you experience the hallucination because the higher-order mechanism is at fault and mistakenly considers the first-order representation of flying heads to be reliable. This higher-order representation is in conflict with your strong prior belief, so the belief that there are flying heads is not justified. You form the justified belief that you are hallucinating vividly instead. But the fact that the higher-order representation and the prior belief are in conflict explains the bewildering nature of hallucination. On the version of the first-order view described above, you are not justified in forming the belief that your first-order representation is reliable, because in doing so it would only be reasonable for you to take into account everything you already know, including that you should not be seeing flying human heads. Therefore, the only thing you are justified to believe is that there is no flying human head. At the propositional level, as far as your justified beliefs are concerned, there is no conflict. There is admittedly a conflict between what you justifiably believe and what you experience, but the experience loses its justificatory force entirely. On purely phenomenological ground I submit that this is implausible. I believe that there is still a sense, given the full vividness of the hallucination in this hypothetical case, that it would seem to you the most reasonable thing to take the hallucination seriously and to act accordingly (e.g., run like hell), although at the same time, you are also justified in believing that there is no flying head. One should acknowledge the limits of this kind of phenomenological argument in general, but together with other considerations, there should be a good case for the view I tentatively favour. Objection 3: Why not take a more standard, well-established higherorder view, such as Rosenthal’s HOT? Reply: In a sense, HOSDT is closer to the thought version (Rosenthal) than the perception version (Lycan); most of Rosenthal’s arguments against higher-order perception theory will also be in favour of the current view. The difference between Rosenthal’s view and my view mainly concerns whether the higher-order representation itself determines the nature of the experience or not. On his view, the higher-order thought goes something like “I am in the mental state of seeing red,” i.e., the relevant perceptual content is within the higher-order state. This attracts criticisms based on cases of misrepresentation/mismatch: what if the higher-order state says red and the first-order state says blue (Block, 2011)? The current view bypasses this problem. Above all, the consideration is empirical: I doubt the neural mechanisms supporting higher-order representations will have the capacity and luxury to duplicate the contents of first-order states, which may well be determined by rather fine-grained first-order non-conceptual contents. These replies to potential objections serve to further strengthen my tentative case for HOSDT and its implications concerning epistemic justification.

290  Tony Cheng I recognise that what have been offered above is only a starting point; much more need to be said in supporting the view I recommend here. 11.4 Subpersonal Statistical Decision as an Element in Predictive Processing In this concluding section, I put HOSDT into the context of predictive processing. To be sure, they are logically independent theses: holding one does not commit the theorists to hold the other. They are also different in scope: while HOSDT’s subject matter is consciousness, predictive processing is a universal framework seeking to describe the cognitive and conscious brain. However, there are obvious connections between the two, and I will highlight two of them. To begin with, one key element of HOSDT is subpersonal statistical decision, and that fits well with predictive processing in that the latter describes the brain as constantly making statistical predictions about the immediate environment, including one’s own body, and one way to cash that out is to say that the kind of subpersonal statistical decision ­identified by HOSDT is one such mechanism. Obviously, there are many open empirical questions here, e.g., are there other kinds of p ­ redictions the brain does that are not related to consciousness at all? Or perhaps there is only one kind of prediction, but only some of them are relevant to consciousness because they need to be coupled with something else, for example, some appropriate first-order states. Researchers in this area, such as Hohwy (2013), Clark (2015), and Seth (2021), all have their own views concerning how their own variants of the predictive processing framework should account for consciousness. Here, without fully endorsing the predictive processing framework myself, I propose that the subpersonal statistical mechanism identified by HOSDT can be one such candidate. Secondly, as predictive processing is a naturalistic theory, if they wish to take a stand on epistemology at all, externalist reliabilism might be a ­nature way to go. Hohwy (2013) discusses how his variant of the predictive processing framework faces sceptical challenges and how he can deal with it. I propose that the challenges can be better met if it couples with certain version of reliabilism, and the picture sketched above can be a natural fit. Again, there is no logical entailment between these views, but in empirical arguments, one should not expect deductively valid inferences; after all, they are not mathematics or any a priori investigations. Still, these views seem to chime well with one another but given the current specialisation and fine divisions of labour, the connections between them are seldom in view. On this occasion, I propose a sketch of one version of higher-order theory of consciousness and explore its relevance to probabilistic justification and predictive processing; more works need to be done toward this direction, or so I shall urge.

Higher-Order Bayesian Statistical Decision Theory  291 Notes 1 There are many other versions I will not consider, such as the one in Renero and Brown (2022), though also note that such a theory might be about consciousness that is introspective, not phenomenal. 2 This chapter began as a joint project by Hakwan and myself, when the former wished to extend his views on consciousness to some issues in epistemology. After several rounds of back and forth, he decided to opt out due to his intention to focus on empirical works. His contribution to this chapter, though, should be fairly obvious and fully acknowledged. As for my own view, I am sympathetic with both higher-order theories of consciousness and the predictive processing framework, at least in outlines, but for now I do not fully commit to either of them. 3 David Rosenthal has expressed similar misgiving in conversation. 4 Notable examples in this regard include Dretske (1981), McDowell (1996), Siegel (2017), Silins (2016), Schellenberg (2018), and Smithies (2019), to name just a few. 5 To minimise metaphysical complications, in what follows I do not distinguish among mental states, episodes, events, processes, and acts, but it should be borne in mind that distinctions between them can become important for other purposes (Steward, 1997). 6 For caveats on applying the personal/subpersonal distinction, see Drayson (2012). 7 This might have implications concerning the prefrontal/anterior vs. the posterior theories of neural correlates of consciousness; NCC for short (Odegaard, Knight, & Lau, 2017; Michel & Morales, 2020; Raccah, Block, & Fox, 2021). I am developing a twofold theory of NCC elsewhere, according to which only reason-related consciousness has its PFC correlates, and this might be in tension with what I say in this chapter. However, since at this stage it is unclear what exactly I want to say concerning that debate, I will stay neutral about it on this occasion. 8 For a more recent challenge, and from within the higher-order perspective, see Berger (2020). Relatedly, there is a debate concerning the possibility of unconscious perceptual justification (Berger, Nanay, & Quilty-Dunn, 2018), which I entirely bypass here.

References Berger, J. (2020). Perceptual consciousness plays no epistemic role. Philosophical Issues, 30(1), 7–23. Berger, J., Nanay, B., & Quilty-Dunn, J. (2018). Unconscious perceptual justification. Inquiry: An Interdisciplinary Journal of Philosophy, 61(5–6), 569–589. Bermúdez, J. L. (2005). Philosophy of psychology: A contemporary introduction. New York: Routledge. Block, N. (1995). On a confusion about a function of consciousness. Behavioral and Brain Sciences, 18, 227–287. Block, N. (2011). The higher order approach to consciousness is defunct. Analysis, 71(3), 419–431. BonJour, L. (1985). The structure of empirical knowledge. Cambridge, MA: Harvard University Press.

292  Tony Cheng Carruthers, P. (2005). Consciousness: Essays from a higher-order perspective. ­Oxford: Oxford University Press. Carruthers, P., & Gennaro, R. (2001/2020). Higher-order theories of consciousness. In E. N. Zalta (Ed.), Stanford encyclopedia of philosophy. ISSN: 1095-5054. Retrieved from: https://plato.stanford.edu/entries/consciousness-higher/ Chisholm, R. (1942). The problem of the speckled hen. Mind, 51(204), 378–373. Clark, A. (2015). Surfing uncertainty: Prediction, action, and the embodied mind. Oxford: Oxford University Press. Davidson, D. (1986). A coherence theory of truth and knowledge. In E. Lepore (Ed.), Truth and interpretation: Perspectives on the philosophy of Donald ­Davidson. New York: Blackwell. Drayson, Z. (2012). The uses and abuses of the personal/subpersonal distinction. Philosophical Perspectives, 26(1), 1–18. Dretske, F. (1981). Knowledge and the flow of information. Cambridge, MA: MIT Press. Dretske, F. (1995). Naturalizing the mind. Cambridge, MA: MIT Press. Evans, G. (1982). The varieties of reference. Oxford: Oxford University Press. Fodor, J. A. (2008). LOT 2: The language of thought revisited. Oxford: Oxford University Press. Goldman, A. (1975). The nature of natural knowledge. In S. D. Guttenplan (Ed.), Mind and language. Oxford: Clarendon Press. Goldman, A. (2008). Immediate justification and process reliabilism. In Q. Smith (Ed.), Epistemology: New essays. Oxford: Oxford University Press. Hohwy, J. (2013). The predictive mind. Oxford: Oxford University Press. Hohwy, J. (2015). Prediction error minimization, mental and developmental disorder, and statistical theories of consciousness. In R. J. Gennaro (Ed.), Disturbed consciousness: New essays on psychopathology and theories of consciousness. Cambridge, MA: MIT Press. Lau, H. (2008). A higher order Bayesian decision theory of consciousness. Progress in Brain Research, 168, 35–48. Lau, H., & Rosenthal, D. (2011). Empirical support for higher-order theories of conscious awareness. Trends in Cognitive Sciences, 15(8), 365–373. Lycan, W. (1996). Consciousness and experience. Cambridge, MA: MIT Press. McDowell, J. (1996). Mind and world. Cambridge, MA: Harvard University Press. Michel, M., & Morales, J. (2020). Minority reports: Consciousness and the prefrontal cortex. Mind and Language, 35(4), 493–513. Odegaard, B., Knight, R. T., & Lau, H. (2017). Should a few null findings falsify prefrontal theories of conscious perception? The Journal of Neuroscience, 37(40), 9593–9602. Peacocke, C. (1992). A study of concepts. Cambridge, MA: MIT Press. Raccah, O., Block, N., & Fox, K. C. R. (2021). Does the prefrontal cortex play an essential role in consciousness? Insights from intracranial electrical stimulation of the human brain. Journal of Neuroscience, 41(10), 2076–2087. Renero, A., & Brown, R. (2022). A HOROR theory for introspective consciousness. Journal of Consciousness Studies, 29(11–12), 155–173. Rosenthal, D. (2005). Consciousness and mind. Oxford: Oxford University Press.

Higher-Order Bayesian Statistical Decision Theory  293 Schellenberg, S. (2018). The unity of perception: Content, consciousness, evidence. Oxford: Oxford University Press. Seth, A. (2021). Being you: A new science of consciousness. London: Farber & Farber. Siegel, S. (2017). The rationality of perception. Oxford: Oxford University Press. Silins, N. (2016). Cognitive penetration and the epistemology of perception. Philosophy Compass, 11, 24–42. Smithies, D. (2019). The epistemic role of consciousness. Oxford: Oxford University Press. Stazicker, J. (2011). Attention, visual consciousness and indeterminacy. Mind and Language, 26(2), 156–184. Steward, H. (1997). The ontology of mind: Events, processes, and states. Oxford: Oxford University Press. Tye, M. (1995). Ten problems of consciousness: A representational theory of the phenomenal mind. Cambridge, MA: MIT Press. Tye, M. (2006). Nonconceptual content, richness, and fineness of grain. In T. S. Gendler, & J. Hawthorne (Eds.), Perceptual experience. Oxford: Oxford University Press.

Index

action 1, 2, 9, 14, 16–19, 21–23, 27–29, 31–33, 35–37, 47, 49, 56–60, 62–64, 76–85, 87–90, 119, 128–131, 149–150, 152, 154–155, 161, 170, 175, 184, 242, 261, 263–264, 267, 269, 274; mental 38, 130 active inference 1, 9–11, 15, 17–20, 23–25, 27, 29–31, 35–36, 38, 57, 59, 63–64, 88, 170, 174, 261, 283 affordance 19, 21, 29–32, 36–37, 186 agency 11, 60, 77, 89–90, 158, 162, 263, 269 agent 14, 16–23, 28–33, 36–37, 51, 89, 147, 176, 202–203, 259, 263, 267–269, 272–273 amnesia 47, 52, 56, 61 anosognosia 140–143, 148–150, 152– 153, 155, 157–159, 161–163 anxiety 178–179, 276, 278 asomatognosia 142 attention 1, 3, 18, 49–52, 58–59, 61, 96–97, 103–109, 113, 120, 124–134, 144, 150, 155–156, 159, 179–180, 186, 261, 267, 272–273, 276; affectbiased 113, 126, 131–132; endogenous 127–128, 131, 134, 180; exogenous 127, 134; feature-based 107; spatial 125, 127–130; voluntary 125, 129–130, 132 audition 98–99, 159–160 autism 48, 158, 257, 268

awareness 49, 54, 84, 140, 143–145, 149, 152–155, 158, 162–163, 170, 173, 176–177, 179 Bayesian brain 47, 57, 153–154, 163, 242, 247, 260–261, 277 binocular rivalry 2, 84 Block, Ned 108, 184, 186, 199, 204, 233, 242, 245, 249, 253, 284, 289, 291 bodily feeling 17, 22, 160 body ownership 142, 162, 176, 187 Boltzmann machine 219–221, 225 BonJour, Laurence 287 Burge, Tyler 206 Carruthers, Peter 275, 278, 284–285 central nervous system 171, 173–174 Cheng, Tony 1, 108, 170, 176, 181, 187, 283 Chisholm, Roderick 287 Clark, Andy 1, 3–4, 9, 17–18, 33, 47– 48, 57, 62–63, 108, 112–113, 122–123, 130–131, 133–134, 170, 198, 223, 257, 259–260, 264–265, 290 cognition 1, 3, 47, 57, 62–65, 77, 79, 84, 112, 118, 134, 142, 152, 182, 187, 241–242, 246, 252, 264, 268, 275 Coltheart, Max 143, 264, 267, 278 computational psychiatry 2, 30, 154, 257–259, 261 concept 47, 51, 55, 57, 62, 114–115, 172, 185, 252, 261

Index  295 confidence 19, 21, 32–34, 36–37, 152, 155, 176–177, 179, 181–184, 202, 260, 277 consciousness 1–3, 9–11, 24, 28–30, 34–35, 38, 135, 151, 170, 176, 180–182, 184, 186–187, 283–284, 287, 290–291; neural correlate of 291 Craver, Carl 247, 253 credence 197–199, 201–202, 206–208, 213, 220, 234, 249–250 Damasio, Antonio 160, 176 Danks, David 244, 246, 251–253 dark room problem 133–135 Davidson, Donald 285 decision-making 1–2, 97, 123, 257, 275 Dehaene, Stanislas 56 delusion 47, 56, 62–63, 89, 141–143, 154, 161–162, 257–258, 262– 271, 274–278 depersonalization 30 depression 11, 24, 27, 29–38, 48, 134, 258 desire 63, 88, 128, 130–131, 134, 247 dopamine 2, 262–264 Dretske, Fred 185, 284, 290 embodied self 154, 187 emotion 2, 37, 127, 140, 145, 163, 170, 173–174, 179 empathy 11, 145 expected experience 2, 47, 58 exteroception 160, 173, 175 Evans, Gareth 285 feeling 22, 28–29, 31, 34, 89, 142, 159–162, 175, 177–179, 181–184; bodily 17, 22, 160; epistemic 183; existential 29, 32; gut 173, 178, 180–181, 183–187; metacognitive 182–183 Firestone, Chaz 133, 135, 170 fMRI 82, 119, 178 Fodor, Jerry A. 283 Fotopoulou, Aikaterini 140–141, 143– 145, 148, 150, 152–163 free energy principle 62, 260, 283 free-energy principle 9–14, 22, 24, 38

Friston, Karl J. 1–2, 9–15, 17–19, 22– 23, 26, 30–32, 38, 48, 57–58, 61–62, 64–65, 81, 86, 88, 105, 112–115, 126, 130, 133, 135, 140, 153–155, 160, 162, 170, 174, 223, 257–261, 270, 278 Gallagher, S. 10, 187 Gibson, James J. 3, 19, 186 Goldman, Alvin 287 gut complex 170–176, 180, 184, 186 gut-brain axis 171–172 Haggard, Patrick 89 hallucination 47, 51–52, 55–56, 60– 62, 261–262, 264, 289 Heidegger, Martin 10–11, 28–29 Helmholtz, Hermann von 1, 57, 170 Heyes, Cecilia 76, 85, 88 higher-order thought 56, 62–64, 283, 287, 289 Hohwy, Jakob 1–4, 9, 48, 57, 62, 105, 108, 112–115, 126, 128–130, 154, 160, 170, 175, 198, 223, 257, 260–261, 270, 283, 290 homoeostasis 160 Husserl, Edmund 10, 29 hypnosis 47–55, 59, 64 illusion 2, 25, 54, 56, 149–151, 153, 198, 232–233; rubber hand 53, 179 imagination 49, 56, 114 imaginative suggestion 47–51, 53–56, 58–59, 62–64 immune system 32, 171, 174, 187 intention 54, 56, 63–64, 128, 142, 149, 161, 247, 268–269, 272, 286 intentionality 14, 207, 261 interoception 17, 48, 160, 173–177, 180, 184, 187 interoceptive signal 18, 160, 162, 176–178 instrumentalism 199, 201, 204–206, 230–232, 240, 242–244, 246– 248, 251, 253 James, William 25, 58, 76 justification 284–289, 291; probabilistic 283, 285

296 Index Kant, Immanuel 57, 184 Lau, Hakwan 181, 283–284, 288, 291 Lycan, William 186, 284, 289 machine learning 18, 201, 235 Markov blanket 12, 14, 23, 174, 187 Marr, David 241, 246–251, 253 Martin, Michael G. F. 185 McDowell, John 3, 186, 285, 291 memory 112, 119–121, 124–127, 131, 134, 144, 181, 203, 234 mental image 56 mental imagery 60–61 Merleau-Ponty, Maurice 29, 158, 186 meta-awareness 170, 173, 175, 180–186 metacognition 45, 48, 56, 62, 152, 157 meta-cognition 173, 176, 180–183 Metzinger, Thomas 1, 134, 160, 170 microbiome 172, 176–177 microbiota 171–174, 176, 187 mind-wandering 180–181 mood 32–33, 35, 133, 170, 172 Nagel, Thomas 186 neglect 149, 155–157; spatial 143; unilateral 157; visual 150; visuospatial 149, 155 Neisser, Ulric 129–130 neurophenomenology 9–10 noise 19, 58, 79–80, 86, 96–97, 99, 102–103, 107, 115, 123, 126– 127, 177–178 pain 30, 49, 53–55, 130, 145–148, 160, 173–175 paralysis 47, 51, 61, 140–141, 143– 144, 148, 161 Peacocke, Christopher 285 perception 1–2, 15, 17, 24, 28, 35–38, 47, 57, 63, 76–82, 85–90, 96–98, 102, 104–105, 108– 109, 112–115, 124, 128, 134, 143, 152, 159–160, 170, 175, 178, 181, 184, 199, 206, 222, 233, 242, 249, 257, 264–265, 283–286, 289 perceptual clarity 98, 100, 103, 107–108

perceptual experience 27, 47, 76–78, 87–88, 97–101, 103–104, 107, 264, 267, 285 peripersonal space 125–126 phenomenological control 47–48, 51–54, 56–57, 59–65 placebo 49, 55, 63 Posner, Michael 64, 127–128 posterior 47, 57, 77, 88, 96, 102, 197– 199, 202–205, 208, 211–215, 217–235, 259 posterior probability 15, 96, 114–115, 197, 241, 259 precision 1–2, 15, 17, 19–22, 31, 34, 36–37, 48, 57–61, 88, 97–99, 102–103, 105–108, 123–124, 126–127, 129–133, 154–155, 163, 170, 176, 179, 252, 260, 263–265, 276 precision-weighting 48, 57–59, 98–99, 105–106, 130, 264–265, 267– 268, 276 prediction error minimisation 1, 57, 170, 260, 263, 275 prediction error minimization 113, 115, 133–134, 242 predictive coding 57, 59–61, 63, 155, 160, 198, 223–224, 226, 228, 232, 257–258, 260–261, 263– 268, 270, 274, 276–278 predictive processing 1, 47, 57–58, 62, 77, 87–88, 105, 112, 134, 170, 176, 260, 278, 283, 285, 290 prefrontal cortex 119, 123–124, 284 prior 16, 23, 25–26, 31, 47–48, 58–64, 77, 88, 96, 102–103, 114–115, 123, 132, 135, 161, 175–179, 187, 198–199, 202–204, 206–207, 212–215, 217–218, 222–233, 235, 249, 259, 275 prior probability 96, 114, 197–198, 202, 204, 206, 212–213, 223– 229, 231–233, 241 proprioception 58, 60, 143, 149 psychosis 3, 89, 258, 261–268, 270, 276–278 Putnam, Hilary 225, 235

Index  297 Quine, Willard V. O. 244 Ratcliffe, Matthew 27, 29–30, 32, 35–36 realism 199, 201, 204–206, 208, 230–233, 235, 240, 242–245, 247–253; naïve 185; scientific 240, 242–246, 250–252 representation 3, 76, 78–79, 82, 85– 86, 88, 97–98, 100, 102–103, 126, 158–159, 183–186, 235, 241, 243, 246, 249–251, 254, 268, 283–284, 286–289 representationalism 184 Rescorla, Michael 108, 197, 199, 205, 233–235, 242, 249, 252–253 Rosenthal, David 56, 184, 284, 288– 289, 291 Schellenberg, Susanna 234, 291 Schizophrenia 30, 48, 89–90, 141, 262–263, 265, 274, 276–277 self-awareness 140, 144, 151, 153– 154, 162–163, 179 selfhood 140, 162, 178 sensation 14, 26, 53, 59, 140, 149, 159, 161, 163, 173–179, 284

Siegel, Susanna 108–109, 291 Seth, Anil 17–18, 31, 47–48, 50, 57, 154, 160, 170, 177, 290 somatoparaphrenia 142 specious present 25–26, 36, 38 Sprevak, Mark 2, 242, 253 statistical decision 283–288, 290 synaesthesia 53 touch 53–54, 58, 82, 145–146, 160–162 Treisman, Anne 127 Tye, Michael 31, 284–285 uncertainty 22, 32–33, 37–38, 86, 96– 99, 101–103, 105–109, 152, 154, 183, 259–261, 268–270 ventriloquism 232–233 ventriloquist effect 98–99 visual cortex 82, 85, 99, 103 visual field 51, 59, 100, 102–103, 108, 125, 129, 132 volition 51, 127–128 Wiese, Wanja 1, 9–10, 24–26, 134, 170