An Introduction to Model-Based Cognitive Neuroscience [2 ed.] 3031452704, 9783031452703

The main goal of this edited collection is to promote the integration of cognitive modeling and cognitive neuroscience.

127 58 13MB

English Pages 394 [384] Year 2024

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Contents
General Introduction to Model-Based Cognitive Neuroscience
1 Introduction
1.1 What Is Model-Based Cognitive Neuroscience?
1.2 Neural Data Constrain the Behavioral Model
1.3 Behavioral Model Predicts Neural Data
1.4 Simultaneous Modeling
2 Prominent Models and Measures in the Field of Model-Based Cognitive Neuroscience
2.1 Types of Behavioral Measures
2.2 Types of Neural Measures
2.3 Types of Cognitive Models
3 Applications in the Field of Model-Based Cognitive Neuroscience
4 Future Directions
5 Open Challenges
References
Linking Models with Brain Measures
1 Introduction
2 Some Functions of Models in Science
3 Levels of Analysis
4 Other Types of Models Useful in Analysing Brain Data
5 General Comparison of Model and Brain Data
6 Cognitive Model as Integral Part of the Data Analysis
7 Individual Differences
8 Models Can Uncover Useful Latent States
9 Comparing Model and Brain Representations
10 Multiple Levels of Representation
11 Conclusions
Questions for Consideration
Further Reading
References
Reinforcement Learning
1 Introduction
2 Reinforcement Learning
2.1 Pavlovian Conditioning
2.1.1 Temporal-Difference Learning
2.2 Instrumental Conditioning
2.2.1 Actor-Critic Model
3 Model-Based fMRI
3.1 Univariate Approach
3.2 Multivariate Analyses
4 Considerations When Linking RL and fMRI Models
4.1 Evaluating Model Quality
4.2 Addressing Model Considerations
5 Bridging Levels of Analyses
5.1 Neural Correlates of Computational Processes
5.2 Leveraging fMRI to Adjudicate Between Models
5.3 Future Directions
6 Exercises
7 Further Reading
References
An Introduction to the Diffusion Model of Decision-Making
1 Historical Origins
2 Diffusion Processes and Random Walks
3 The Standard Diffusion Model
4 Components of Processing
5 Bias and Speed-Accuracy Tradeoff Effects
6 Mathematical Methods for Diffusion Models
7 The Representation of Empirical Data
8 Fitting the Model to Experimental Data
9 Diffusion Models of Continuous Outcome Decisions
10 Conclusion
11 Suggestions for Further Reading
12 Exercises
References
Discovering Cognitive Stages in M/EEG Data to Inform CognitiveModels
1 Introduction
2 Part 1: The Discovery of Processing Stages in M/EEG Data
2.1 The HsMM-MVPA Method
2.2 Discovering Cognitive Processing Stages in Associative Recognition
3 Part 2: A Symbolic Process Model
3.1 The Cognitive Architecture ACT-R
3.2 A Model of Associative Recognition
4 General Discussion
Exercises
Answers
Further Reading
References
Spiking, Salience, and Saccades: Using Cognitive Models to Bridge the Gap Between ``How'' and ``Why''
1 Introduction
1.1 Dimensions of Constraint
2 A Case Study: SCRI
2.1 Phenomena to Be Explained
2.2 The Model
2.2.1 Motivating Principles
2.2.2 Conceptual Outline
2.2.3 Formal Description
2.3 Applying the Model
2.3.1 Structure of the Data
2.3.2 Parameter Estimation
2.3.3 Model Comparison
2.4 Insights
2.4.1 Quality of Fit
2.4.2 Most Important SCRI Mechanisms Across Neurons
2.4.3 Most Important SCRI Mechanisms for Individual Neurons
2.5 Closing the Loop
3 Discussion
3.1 Turning Points
3.1.1 Designing the Model
3.1.2 Fitting and Comparing Models
3.1.3 History Effects in Neural Spiking
3.1.4 Parameters for Unrecorded Neurons
3.1.5 Joint Modeling
3.2 Prospects
A Exercises
B Recommended Reading
References
Ultrahigh Field Magnetic Resonance Imaging for Model-Based Neuroscience
1 Ultrahigh Field MRI
1.1 How Functional Imaging Changes at UHF
1.2 Analysis of UHF Data
1.3 Alternative Functional Contrast Sources
2 Pushing the Limits for Cerebellum, Subcortex, and Within-Cortex Imaging
2.1 In the Subcortex: The Example of the Subthalamic Nucleus and the Locus Coeruleus
2.2 In the Cerebellum: Highly Folded Lobules and Small Subcortical Nuclei
2.3 In the Cerebral Cortex: Layers and Columns
3 UHF Neuroanatomy with Quantitative MRI
3.1 Common qMRI Models
3.2 qMRI at UHF
4 Structure-Function Relationships for Modeling
References
An Introduction to EEG/MEG for Model-Based CognitiveNeuroscience
1 Introduction
2 What Are We Measuring with EEG/MEG?
3 Practical Considerations and Pre-processing Steps
4 Event-Related Potentials
5 Time-Frequency Modulations
6 Estimating Source Locations
7 Advanced Signal Processing
8 Concluding Remarks
Exercises
Answers
Further Reading
References
Advancements in Joint Modeling of Neural and Behavioral Data
1 Introduction
2 Overview of the Joint Modeling Approach
3 Directed
4 Covariance
4.1 FA NDDM
4.2 Trivariate Modeling
4.3 Gaussian Process Joint Modeling
5 Integrative Modeling
6 Practical Concerns
6.1 Accessibility
6.2 Adaptability
6.3 Computation
6.4 Utility
6.5 Constraint
7 Conclusions
8 Suggested Readings
9 Thought-Provoking Questions
References
Cognitive Models as a Tool to Link Decision Behavior with EEGSignals
1 Introduction
2 A Linking Tutorial: EEG and Accumulator Models of Decision Making
2.1 Established Linking Approaches
2.1.1 Correlation-Based Linking
2.1.2 Regression-Based Linking
2.2 Modern Linking Approaches: Joint Models
3 Linking in Practice: EEG and Reinforcement Learning Models of Decision Making
3.1 Correlations and Qualitative Comparisons
3.2 Reinforcement Learning Model Estimates as Regressors in EEG
3.3 Advanced Approaches
3.4 Summary Caveats
4 Future Directions and Outstanding Questions
5 Further Reading
6 Exercises
References
Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses
1 Introduction
1.1 Working Memory
1.2 The Stability-Flexibility Tradeoff
2 Revealing the Subprocesses of WM
2.1 Closing the Gate (Entering Maintenance Mode)
2.2 Opening the Gate (Entering Updating Mode)
2.3 Removing Information from WM
2.4 Substituting Items in WM
2.5 Retrieving, Selecting, and Operating on Multiple Items
3 Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses
4 Concluding Remarks
A.1 Exercises
B.1 Further Reading
References
Assessing neurocognitive hypotheses in a likelihood-based model of the free-recall task
1 Introduction
2 Overview of the Context Maintenance and Retrieval (CMR) Model
2.1 Basic Operation of the Model
2.2 Evaluating the Model
3 Assessing a Neurocognitive Linking Hypothesis
4 Simulation Exercises
4.1 Exercise 1: Basic Parameter Recovery
4.2 Exercise 2: Fluctuating Temporal Reinstatement and Synthetic Neural Data
5 Conclusion
6 Further Exercises
References
Cognitive Modeling in Neuroeconomics
1 Introduction
2 Attention and Value-Based Decision-Making
2.1 Sequential Sampling Models
2.2 Eye Tracking: A Window into Attention
2.3 The Attentional Drift Diffusion Model
2.3.1 Extensions of the aDDM
2.4 Outstanding Questions on Attention and Value-Based Decision-Making
3 Decisions in Reinforcement Learning
3.1 Modeling of Response Times During Reinforcement Learning
3.2 Reinforcement Learning Diffusion Decision Models
3.2.1 Extensions of the RLDDM
3.2.2 Clinical Applications
3.2.3 Optimality of Behavior
3.2.4 Methodological Advantages of Combined Learning and Choice Models
4 Open Challenges and a Warning
5 Exercises
6 Further Readings
7 Exercises with Answers
References
Cognitive Control of Choices and Actions
1 Introduction
2 Controlling When to Act
3 Controlling Which Actions to Take
3.1 Delay Discounting
3.2 Controlling Spatial Attention
4 Controlling Which Actions to Withhold
4.1 The Stop-Signal Paradigm in Non-human Primates
4.2 The Stop-Signal Paradigm in Humans
4.3 Problems with Modelling Unobserved “Responses”
5 Discussion
A.1 Exercises
B.1 Recommended Reading
References
Index
Recommend Papers

An Introduction to Model-Based Cognitive Neuroscience [2 ed.]
 3031452704, 9783031452703

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Birte U. Forstmann · Brandon M. Turner   Editors

An Introduction to Model-Based Cognitive Neuroscience Second Edition

An Introduction to Model-Based Cognitive Neuroscience

Birte U. Forstmann • Brandon M. Turner Editors

An Introduction to Model-Based Cognitive Neuroscience Second Edition

Editors Birte U. Forstmann Department of Psychology University of Amsterdam Amsterdam, The Netherlands

Brandon M. Turner Psychology Department The Ohio State University Columbus, OH, USA

ISBN 978-3-031-45270-3 ISBN 978-3-031-45271-0 https://doi.org/10.1007/978-3-031-45271-0

(eBook)

1st edition: © Springer Science+Business Media, LLC 2015 2nd edition: © Springer Nature Switzerland AG 2024 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.

Contents

General Introduction to Model-Based Cognitive Neuroscience . . . . . . . . . . . . . Birte U. Forstmann and Brandon M. Turner

1

Linking Models with Brain Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bradley C. Love

17

Reinforcement Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vincent Man and John P. O’Doherty

39

An Introduction to the Diffusion Model of Decision-Making . . . . . . . . . . . . . . . Philip L. Smith and Roger Ratcliff

67

Discovering Cognitive Stages in M/EEG Data to Inform Cognitive Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Jelmer P. Borst and John R. Anderson Spiking, Salience, and Saccades: Using Cognitive Models to Bridge the Gap Between “How” and “Why” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Gregory E. Cox, Thomas J. Palmeri, Gordon D. Logan, Philip L. Smith, and Jeffrey D. Schall Ultrahigh Field Magnetic Resonance Imaging for Model-Based Neuroscience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Nikos Priovoulos, Ícaro Agenor Ferreira de Oliveira, Wietske van der Zwaag, and Pierre-Louis Bazin An Introduction to EEG/MEG for Model-Based Cognitive Neuroscience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Bernadette C. M. van Wijk Advancements in Joint Modeling of Neural and Behavioral Data . . . . . . . . . . 211 Brandon M. Turner, Giwon Bahg, Matthew Galdo, and Qingfang Liu

v

vi

Contents

Cognitive Models as a Tool to Link Decision Behavior with EEG Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Guy E. Hawkins, James F. Cavanagh, Scott D. Brown, and Mark Steyvers Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Russell J. Boag, Steven Mileti´c, Anne C. Trutti, and Birte U. Forstmann Assessing neurocognitive hypotheses in a likelihood-based model of the free-recall task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Sean M. Polyn Cognitive Modeling in Neuroeconomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 Sebastian Gluth and Laura Fontanesi Cognitive Control of Choices and Actions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 Andrew Heathcote, Frederick Verbruggen, C. Nico Boehler, and Dora Matzke Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387

General Introduction to Model-Based Cognitive Neuroscience Birte U. Forstmann and Brandon M. Turner

Abstract In this chapter, we provide an overview to the second edition of the Introduction to Model-based Cognitive Neuroscience (MbCN) book. In so doing, we provide a gentle introduction to motivate the use of cognitive models in understanding data, followed by a high-level summary of the many strategies for linking cognitive models to neurophysiological data. We then discuss a few key applications of MbCN while highlighting specific contributions that will be discussed within the book. We close with an outlook on potential future directions in the field. Keywords Linking · Model-based cognitive neuroscience

1 Introduction Welcome to the second edition of “Introduction to Model-based Cognitive Neuroscience.” Given the numerous developments that have occurred since the first edition, we felt that a second edition would be a great way to highlight inspiring new approaches as well as underscoring difficult challenges in the area of model-based cognitive neuroscience. This edition has a set of 33 contributors with researchers at different stages of seniority from full professors to postdocs to graduate students. Before we get to the book, we wanted to provide a general overview of what the field of model-based cognitive neuroscience is all about, some general strategies

This work was supported by a CAREER award from the National Science Foundation (BMT), a Consolidator grant from the European Research Council (BUF), and a Vici grant from the Dutch Research Council (BUF). B. U. Forstmann () University of Amsterdam, Amsterdam, The Netherlands B. M. Turner The Ohio State University, Columbus, OH, USA © Springer Nature Switzerland AG 2024 B. U. Forstmann, B. M. Turner (eds.), An Introduction to Model-Based Cognitive Neuroscience, https://doi.org/10.1007/978-3-031-45271-0_1

1

2

B. U. Forstmann and B. M. Turner

and approaches for performing model-based analyses, as well as an overview of the chapters. Finally, we close with an outlook on future directions and open challenges that have yet to be solved.

1.1 What Is Model-Based Cognitive Neuroscience? We see model-based cognitive neuroscience as a subsection within the field of computational cognitive neuroscience (Kriegeskorte & Douglas, 2018; Ashby & Helie, 2011; O’Reilly et al., 2012) that uses formal mathematical models to articulate a theory or theories about cognition. Figure 1 illustrates the various ways of doing cognitive science in the modern era. It all starts with a cognitive process that we assume participants use when completing a task. Despite having advanced technology for measuring changes in blood flow, electrical and magnetic potentials, cognition is inherently an unobservable process because it is a latent construct. Per the tradition of psychology, latent constructs can only be detected through manifest variables, such as the decisions people make or fluctuations in activations of brain networks. Areas such as experimental psychology emphasize experimental design to detect changes in cognition over different trial types or different blocks within the

Fig. 1 Model in the Middle. The “model-in-the-middle” approach unifies three different disciplines for understanding the mind through behavioral data (taken with permission from Forstmann et al., 2011)

General Introduction to Model-Based Cognitive Neuroscience

3

experiment. For example, if an experimenter wanted to investigate how the cognitive process changes in response to a smaller amount of time to process information, they could create two blocks: one block in which the participant was asked to respond “as quickly as possible” and one block in which the participant was asked to respond “as accurately as possible.” A simple change in the emphasis of instruction across the two conditions typically motivates participants to adjust how they prioritize the processing of information relative to some held belief about what constitutes a “quick” response. In aggregate, the typical result is that responses are more accurate when accuracy is emphasized, and response times are smaller when responding quickly is emphasized. In a similar vein, the area of cognitive neuroscience (i.e., traditional neuroscience) relies heavily on a well-designed experiment to detect changes in the cognitive process. This approach differs from that of experimental psychologists in that the manifest variables characterize some type of change in the brain. For example, functional magnetic resonance imaging (fMRI) detects changes in the electromagnetic signal emitted by hydrogen atoms when blood flows into different parts of the brain. Relative to a control condition, the changes in this electromagnetic signal can be quantified and statistically evaluated for significant effects. If there is a significant effect, the typical conclusion is that those areas of the brain must have played a role in changing the cognitive process across the two conditions. Finally, one can embrace the idea that cognition is a latent construct and build statistical and mathematical models that work to fully specify a theory about how cognition works, an area referred to as “mathematical psychology.” These models are referred to as the “cognitive” or “behavioral” model, and they typically contain mechanisms that are intended to instantiate psychological constructs, such as “speed of information processing” or “bias toward a response.” Associated with these mechanisms are model parameters: like knobs on a machine, turning up or down a parameter will have a direct effect on the model’s predictions for manifest variables. Modifying the parameters such that they make the model’s predictions as close as possible to the observed data is a process referred to as “fitting a model to data,” and once complete, the idea is that if there are changes in the best-fitting model parameters across conditions, the mechanisms associated with those parameters are the ones that most likely changed cognition across conditions. This is important because it allows researchers to interpret complex changes that happened in the manifest variables in terms of psychologically meaningful mechanisms. Traditionally, mathematical psychologists use models to characterize behavioral data exclusively. However, many researchers have realized the massive potential of using models to understand and explain how the brain produces cognition. Brain data provide compelling new constraints on theories of cognition because they offer exquisite temporal and spatial information on the one organ designed to produce mental operations. Given the unique potential for the advancement of theories of cognition, many researchers have attempted to engage with brain data, and as different researchers often have different goals in completing their understanding, the strategies for leveraging cognitive models to understand brain data can roughly be divided into three approaches: using the neural data to constrain theories of

4

B. U. Forstmann and B. M. Turner

cognition, using the behavioral model to predict neural data, and developing models that can simultaneously predict both behavioral and neural data. Figure 2 illustrates the general strategies divided into approaches (rows). Each diagram represents how neural data (N) and behavioral data (B) can be related to one another (arrows) through some type of parameters (Greek letters) used within a cognitive model. We now turn to a brief discussion of each approach and an elaboration of what these diagrams convey.

1.2 Neural Data Constrain the Behavioral Model The first approach centers on the idea that the neural data can provide interesting new ways to expand an existing theory into brain space. Although a cognitive theory can sound plausible when described in terms of mechanisms, the ability of the model instantiating that theory to fit behavioral data is often used as a testament to its validation. However, sometimes a good fit to data is neither sufficient for making claims about the validity nor the plausibility of a model (Roberts & Pashler, 2000; Pitt & Myung, 2002). One of the important considerations is whether there exists data that the model cannot predict, and another is whether the model predicts patterns of data that have never been observed. These counterfactuals are important, yet they often are not tested at the inception of a new theory or model due to the nature of scientific work: Theories are meant to be tested and scrutinized by the community so as to compile evidence in support of or against that theory. Another way to build evidence in support of a theory is to represent that theory in the biological substrate of the mind. If a model is a good working characterization of some mental operation, then it should follow that dynamics of the model can resemble dynamics of the brain. Although this may sound reasonable in principle, it is sometimes quite difficult to even characterize brain dynamics. Even with advanced technology, dynamics have spatial (i.e., from one region of the brain to another) and temporal (i.e., from one time point to another) properties that often require complex analyses to fully understand how cognition could emerge. For this reason, it is often also difficult to map these dynamics retrospectively onto a cognitive model that was initially developed to explain behavioral data. The top row of Fig. 2 provides some examples of how brain data can be used to constrain a cognitive model. The first approach is purely theoretical. In developing a model, understanding brain function can help to specify the type of dynamics that might exist within the model. A now iconic example of this is the Parallel, Distributed, Processing models that specify how nodes within layers of the model interact with one another, often through sigmoidal activation functions, competitive pooling dynamics, and recurrence, all of which have been observed by computational neuroscientists studying the behavior of single-unit neurons. The basic behavior of neuron interactions were used to specify a theory about how cognition could emerge from a simplified mathematical model. The second approach is the two-stage behavioral strategy, in which some model parameters that

General Introduction to Model-Based Cognitive Neuroscience

5

Fig. 2 Types of Linking Structures. An illustration of several approaches used for linking neural and behavioral data, organized by specific modeling goals. N represents the neural data, B represents the behavioral data, .N ∗ represents simulated internal model states, and .θ, .δ, and .Ω represent model parameters. When an approach is procedural, progression through processing stages is represented by arrows of decreasing darkness (e.g., the Latent Input Approach). Dashed lines indicate conceptual constraints (e.g., the Theoretical Approach), whereas solid lines indicate statistical constraints (taken with permission from Turner et al., 2017)

describe the neural data are regressed onto the parameters of a cognitive model. The idea here is that if there is some pattern in the brain data across trials or across subjects, then this same pattern should roughly exist in patterns of parameters from the behavioral model across the same dimension (e.g., trials or subjects). Finally, a direct input approach attempts to treat the neural data as a direct reflection of a theory, and with the appropriate transformation, the neural data can replace what would otherwise be some kind of mathematical abstraction. One great example of this is the work of Palmeri et al. (2015); Purcell et al. (2010, 2012) that takes

6

B. U. Forstmann and B. M. Turner

single-unit activity and attempts to describe behavior through various dynamical transformations (see Chap. 6 for another application).

1.3 Behavioral Model Predicts Neural Data The second approach reverses the directionality and emphasis of the first approach and tries to use the cognitive model to predict or explain neural data. The motivating principle of this approach is to determine whether a model’s representation can be substantiated within brain data, typically making the approach very confirmatory in nature: A model is fit to behavioral data and then either the model’s internal representation or some parameters of the model are linked (e.g., correlated) to measures of brain activity. In the most prototypical case, the model components are used within a generalized linear model (GLM) to explain patterns of trial-level activations in the brain. The second row of Fig. 2 illustrates some examples of this approach. One very prominent approach is the latent input approach, where a model is fit to data, and the trial-level representation of the model is used as a regressor in a whole-brain generalized linear model. This approach was initially coined “model-based fMRI” (O’Doherty et al., 2007) because it typically involves a standard GLM with modelbased regressors. The most prominent examples are when the behavioral models are reinforcement learning models, and mechanisms within the model such as prediction error are calculated for each trial and regressed against the brain data. Another example is the two-stage neural approach, which is nearly identical to the two-stage behavioral, only that the motivation for the analysis begins with a theory or model, and not the empirical phenomena realized in the neural data.

1.4 Simultaneous Modeling The last approach we will discuss here is the simultaneous modeling approach where both the neural and behavioral data are explained by a single model. There are good statistical reasons to prefer a simultaneous model, which mainly center on the argument that uncertainty in the model parameters is not properly accounted for when doing analyses of brain and behavior separately (Turner et al., 2013). We will not go into those details here; the point is simply that when using the simultaneous approach, both the neural and behavioral data have an effect on the parameter estimates, and because of this effect, the mechanisms in the model can detect changes across experimental conditions or trials, allowing for a better interpretation of which aspect of cognition is altered as a consequence of the experiment (e.g., the intended manipulation or the type of stimuli from one trial to the next). The bottom row of Fig. 2 illustrates two approaches. First, the joint modeling approach tries to exploit the covariance structure that occurs between the neural

General Introduction to Model-Based Cognitive Neuroscience

7

and behavioral data through hierarchical modeling. Statistically, the information in the brain data directly impacts the estimates of parameters (.δ) specific to the neural “submodel,” which in turn affect the “hyperparameters” (.Ω). An interesting aspect of hierarchical modeling is that the hyperparameters are also connected to a behavioral submodel, and so they communicate information to the estimates of those parameters (.θ ) indirectly from the brain data. Together, this symbiotic relationship enforces constraint on the two submodels that would not otherwise be there, and as a result, a more complete understanding of how the brain produces behavior can be gleaned. Second, the integrative approach tries to directly explain both neural and behavioral data. This is a much more challenging task because the parameters of the model must explain data that are likely to be very different, such as on different scales or connected to time in different ways. For example, brain data may give several measurements for one trial, whereas the behavioral data are often only one or two measurements at the end of the trial. The usual work around for these types of problems is to specify some function over time and allow the model to modulate that function when explaining neural data.

2 Prominent Models and Measures in the Field of Model-Based Cognitive Neuroscience If you are interested in doing model-based cognitive neuroscience research, it might also be helpful to know what kinds of measurements and models are prominent in the field. Of course, we cannot do the entire field complete justice in this brief introduction, but we can roughly summarize the types of brain and behavioral measurements you are likely to come across, as well as some of the more common cognitive models that are used and discussed in the field.

2.1 Types of Behavioral Measures By and large, the dominant measure of behavior is choice. Typically, choices are discrete, signaling a single “best” choice among a set of N options. More recently, there has been considerable interest in collecting continuous measures of choice such as selecting an area within a space, or selecting a color value along a continuum (Smith, 2016; Ratcliff, 2018; Kvam & Turner, 2021). Although moving to continuous response options can provide better insight into how evidence accumulates, it can also present interesting challenges such as modeling the variance of choices over trials. Although some of this variance is likely due to noise in the decision process, how much of the variance is due to the apparatus used to collect responses? This is important because there are known relationships between targeting areas of space and the distance from the initial location of the participant’s

8

B. U. Forstmann and B. M. Turner

response mechanism (e.g., finger or cursor) such as Fitts’s law (Fitts & Peterson, 1964). It becomes important then to develop models of the cognitive process as well as the response-generation process. Another key variable in describing behavior is response time. Over many decades, researchers have continually found that response times provide a considerable amount of constraint on model parameters (Ratcliff, 1978; Luce, 1986; Van Zandt, 2000) because they define a window in which the dynamics of the model must fit in order to be consistent with the data. Response times have also played an important role in cognitive neuroscience because it is argued that they provide insight into whether a participant is alert or engaged in the task. For example, response times have been used as regressors in whole-brain GLM analyses to identify activations of the “default mode network,” a brain network that is regularly implicated in task disengagement and mind-wandering. Another variable that is used less frequently is confidence. Typically, confidence measures are collected following the response, but there are also experimental paradigms in which choice and confidence measures are obtained simultaneously. Although there is some debate about the relative merits of the two approaches and what cognitive processes may go into them (Moran), the added benefit of having confidence is that it provides a slightly more informative measure of how certain the participant is when making their choices. In many ways, confidence provides a similar measure about the response certainty that response times do, but these are not always directly related (Pleskac & Busemeyer, 2010). Confidence measures also show up fairly regularly in memory research because they give a slightly more nuanced perspective on how items are stored, and whether they are recognized or recalled (Eldridge et al., 2000). A final measure of behavior that is worth discussing is process tracing data, such as those obtained by eye trackers or analysis of cursor position. Whereas eye tracking data provide a direct measure of overt attention, cursor positions are also argued to provide some level of insight into decision certainty, much like response times and confidence (Stillman et al., 2018). Some styles of modeling use eye tracking data to directly inform decision models (Krajbich et al., 2010; Krajbich & Rangel, 2011), much like the use of single unit recordings in Purcell et al. (2010) and the Cox Chapter. Process tracing methods are much less common in cognitive neuroscience mainly for some incompatibility reasons. First, the eyeball actually creates a large arranged magnetic field (called a dipole), and when it moves to fixate on items in the field of view, it creates a large change in electrical activity that can create artifacts that dramatically impact the quality of neural data (e.g., EEG and fMRI). It is also hard to obtain eye trackers that are compatible with the fMRI scanner. As such, if eye trackers are used in cognitive neuroscience, they are often used as a data cleaning mechanism to detect when the participant was not directly fixating on some centralized stimulus. Cursor analysis methods are also problematic in cognitive neuroscience because the nature of cursor trajectories is movement, and movement can create artifacts in many types of neuroscientific data.

General Introduction to Model-Based Cognitive Neuroscience

9

2.2 Types of Neural Measures We are lucky to live in a time period where there are many types of neural measurements that can be used to quantify different types of changes in the brain. Most measures in cognitive neuroscience focus on changes in the electrical, magnetic, or blood flow in the brain, but some participant situations lend themselves to more invasive recordings such as single- or multi-unit recordings of neurons. Typically, recordings of neurons are not of a single neuron (although there are recordings where electrodes are implanted intracellularly) but are of many hundreds or thousands of neurons depending on the size of the filament of the recording device. Recording neurons is much more common in animal research, such as in rats or non-human primates, but clinical applications of recordings in humans are becoming more common. Regarding blood flow, one of the original measurements was the positron emission tomography (PET). Although some PET studies are still being conducted, the modern field of cognitive neuroscience is dominated by structural and functional magnetic resonance imaging (s/fMRI). As briefly mentioned above, fMRI is an indirect measure of neural activity: The principle is that if a brain area is processing information, it consumes energy and must be replenished with oxygenated blood. When this blood arrives, it changes the local electromagnetic field, and this change can be detected by the scanner. Although fMRI can provide incredibly detailed spatial information about cognition, because it relies on blood flow, the measurements are delayed from the actual time that cognition was carried out, and this lag requires careful consideration of the experimental design. A faster measurement of brain activity relies on changes in the electrical or magnetic field potential, such as the electroencephalogram (EEG) or the magnetoencephalography (MEG), respectively. Because of a principle known as volume conduction, when changes in the electrical field potential occur in the brain, they are immediately reflected elsewhere, such as on the scalp. As will be discussed in Chap. 8, there are many factors that must occur for the electrical changes in the brain to be projected onto the scalp, but if they occur, cognitive neuroscientists can enjoy brain measures that have temporal precision, although a much weaker spatial precision due to the difficulties in source localization from the scalp. MEG is based on magnetic changes rather than electrical, and for good reason. Whereas electrical changes are subject to impedance, which is a major issue because the skull impedes electrical charges, magnetic changes are not and can readily be detected with an appropriate scanner. Also due to the lack of impedance, MEG provides substantially better spatial resolution compared to EEG.

10

B. U. Forstmann and B. M. Turner

2.3 Types of Cognitive Models There are many types of cognitive models readily available for the aspiring modelbased cognitive neuroscientist, but there are some models that have been more actively studied than others. Perhaps the most popular cognitive model that has stood the test of time is the diffusion decision model (DDM, Ratcliff (1978)). The DDM assumes that observers take a noisy sequence of samples of information over time, and the information contained in these samples is added up until a threshold amount of evidence has been obtained. The model’s core is motivated by principles first established in the theory of signal detection (Macmillan & Creelman, 2005) but has since served as its own core to more complex models that explain how evidence accumulates for preferential choices (such as Decision Field Theory; Busemeyer and Townsend (1993)) and how choice, response time, and confidence are related in the dynamic signal detection model (DSD; Pleskac and Busemeyer (2010) or the RT-CON model (Ratcliff & Starns, 2009)). Other researchers have questioned the specific ways in which evidence accumulates, where some researchers have assumed a more complex, stochastic process with some neural inspiration, such as the Leaky Competing Accumulator model (Usher & McClelland, 2001), whereas others have tried to eliminate moment-to-moment variability altogether such as in the Ballistic Accumulator Model (Brown & Heathcote, 2005) and the Linear Ballistic Accumulator Model (Brown & Heathcote, 2008). Both the DDM and the LBA will be discussed in this book; see Chaps. 4 and 10. Another prominent set of models in the field of model-based cognitive neuroscience is the class of reinforcement learning models. Although there is a commitment to a more broad definition of what a reinforcement learning model is Sutton and Barto (1998), the more typical models are based on “temporal difference” learning that makes adjustments to the model’s representation of value based on the difference between what reward was expected and what reward was received. Reinforcement learning models will be discussed in greater detail in Chap. 3. There are also many models of memory, such as the retrieving effectively from memory model (REM; Shiffrin and Steyvers (1997)), the Bind-Cue-Decide Model of Episodic Memory (BCDMEM, Dennis and Humphreys (2001)), the Search of Associative Memory model (SAM; Raaijmakers and Shiffrin (1981)), the Temporal Context Model (TCM; Howard and Kahana (2002)), the Context Maintenance and Retrieval model (CMR; Polyn et al. (2009)), and the Scale Invariant Memory, Perception, and Learning model (SIMPLE; Brown et al. (2007)). These models vary in the way that they construct memory traces and the type of interference that they produce during recognition and recall (Turner et al., 2013; Kahana, 2012). Although there are many types of memory models, one memory model that has been used effectively in model-based cognitive neuroscience work is the CMR model (Kragel et al., 2015), which will be discussed in Chap. 12. A final type of model is the set of cognitive architecture models, such as the PDP models (discussed above, (McClelland & Rumelhart, 1986; Rumelhart & McClelland, 1986) and the ACT-R model (Anderson, 2007)). ACT-R models

General Introduction to Model-Based Cognitive Neuroscience

11

take seriously the commitment that individual modules of cognitive processing can interact to create cognition. The modules themselves have been mapped to areas of the brain (see Anderson 2007), and by simulating the model, patterns of module activity can be linked to brain data by convolving the module activity with a theoretical curve for the blood oxygenated level response (i.e., the hemodynamic response function), making it directly suitable for applications to integratively model behavior and neural data. ACT-R and its utility for cognitive neuroscience will be discussed in more detail in Chap. 5.

3 Applications in the Field of Model-Based Cognitive Neuroscience In this book, prominent authors in the field model-based cognitive neuroscience present exciting new applications by focusing on computational memory in Chap. 12, economic decision-making in Chap. 13, and cognitive control of choices and actions in Chap. 14. The chapter of Polyn assesses the neurocognitive hypotheses in a likelihood-based model of the free-recall task. In free-recall tasks, participants study a list of words that are usually presented one at a time and recalled afterward in whatever order they come to mind. The crucial measure is the recall sequence itself (i.e., the series of responses made by the participant during the recall period). There are various theoretical models that try to capture the cognitive operations relevant for this task. Polyn presents a tutorial overview of a model-based approach in which he assesses the neural signals in terms of their potential correspondence with the cognitive processes embodied in the Context Maintenance and Retrieval model (CMR; Polyn et al. (2009); Kragel et al. (2015)). The chapter of Gluth and Fontanesi discusses two model-based applications in the field of economic decision-making. The first application focuses on the role of visual attention in value-based decision-making. Some sequential sampling models (SSMs) are discussed in the context of their formalization of the cognitive processes behind eye movements and choices. The second application also uses SSMs in combination with reinforcement learning models. The latter is an exciting new field in which questions such as how feedback in the form of rewards and punishments changes the dynamics of decisions are answered. They end with a discussion on existing attempts and future avenues to connect the work of attention and reinforcement learning in neuroeconomics. Finally, Heathcote and colleagues review model-based neuroscience work on cognitive control of choices and actions. This is a nascent field including many theoretical and conceptual model frameworks. Here, Heathcote et al. consider formal modeling of both strategically deployed executive processes and more automatic influences in simple and more complex tasks. Conflict tasks, where automatic and executive control processes sometimes act in opposition, delay discounting tasks, which require self-control to obtain larger rewards, and the inhibition of action is included. Interestingly, for all

12

B. U. Forstmann and B. M. Turner

of these common control tasks, dynamic cognitive models including SSMs have been developed in recent years (see also Chap. 11 from Boag et al.). Moreover, some authors have started to link behavioral cognitive control data with neural data including EEG and fMRI measurements as well as joint models. Heathcote et al. end their chapter emphasizing generative Bayesian estimation methods that are exciting new tools and well-suited to the complexities of model-based neuroscience studies.

4 Future Directions In recent years, exciting new combinations of SSMs and reinforcement learning models have been proposed (Turner (2019); Fontanesi et al. (2019); Mileti´c et al. (2020); Pedersen and Frank (2020), Chap. 13 of this book). Concretely, reinforcement learning models of error-driven learning and SSM’s capturing decisionmaking processes have provided significant insight into the neural basis of a variety of cognitive processes. Noteworthy, model-based cognitive neuroscience research using both frameworks has evolved separately and independently. Recent efforts have illustrated the complementary nature of both modeling traditions and showed how they can be integrated into a unified theoretical framework. This integration has the power to explain trial-by-trial dependencies in choice behavior as well as response time distributions and can be viewed as a new class of models that hold great merit for model-based cognitive modeling. In a similar vein, Boag et al. (2023) as well as Heathcote et al. (Chap. 14) suggest extending the classical SSM modeling approach to capture cognitive processes that underlie observed behavior in applied domains such as air-traffic control (ATC), driving, forensic and medical image discrimination, and maritime surveillance. Excitingly, this new class of models allow researchers to understand how the cognitive system adapts to task demands and interventions, such as task automation and conflict resolution. Ultimately, this approach has the potential for a wider adoption of cognitive modeling in Human Factors research. Next to the extensions of SSMs including choice behavior and behavior relevant in more applied domains, Pinto et al. (2022) zoom into the multiple timescales of sensory-evidence accumulation across different brain regions. Understanding the evolving and intrinsic decisionmaking process over time is important, as it allows us to better understand the hierarchical organization of cognitive behavior. In particular, decisions requiring the gradual accrual of sensory evidence over time may recruit widespread brain areas. To test this assumption, Pinto et al. (2022) trained mice to accumulate evidence over seconds while navigating in virtual reality and optogenetically silenced the activity of many cortical areas during different brief trial epochs. They found that distinct changes in the weighting of sensory evidence occurring during and before silencing, such that frontal inactivations led to stronger deficits on long timescales than posterior cortical ones. In sum, this approach suggests that intrinsic timescale hierarchy of distributed cortical areas is an important component of evidence-

General Introduction to Model-Based Cognitive Neuroscience

13

accumulation mechanisms. Finally, new powerful statistical sampling methods have been developed that allow a better and faster estimation and simulation of models and model predictions (Gunawan et al., 2020). To improve on current approaches to LBA (Brown & Heathcote, 2008) inference, Gunawan et al. (2020) introduce two methods that are based on recent advances in particle MCMC methodology that are qualitatively different from existing approaches as well as from each other. One approach is the particle Metropolis-within-Gibbs sampler; the other approach is density-tempered sequential Monte Carlo. Both new approaches provide very efficient sampling and can be applied to estimate the marginal likelihood, which provides Bayes factors for model selection. The first approach is usually faster. The second approach provides a direct estimate of the marginal likelihood, uses the first approach in its Markov move step, and is very efficient to parallelize on high-performance computers.

5 Open Challenges Despite the progress made in the nascent field of model-based cognitive neuroscience, open challenges remain. One of these open challenges is the robust estimation of single-trial latent cognitive processes. Traditional evidence-accumulation models assume that drift rates and start points or thresholds vary from trial to trial but only allow for estimating the distributions of drift rates and start points across all trials, not their values on individual trials. This limits our ability to understand sequential effects, which have long been known to account for a substantial part of the variance in response time distributions (e.g., Laming (1968); Wagenmakers et al. (2004); Gilden (1997)). It also limits our ability to simultaneously study brain and behavior correlations on the individual trial level. Various approaches have been suggested to obtain single-trial parameter estimates based on behavioral data alone (e.g., Gluth and Meiran (2019); van Maanen et al. (2011); Rodriguez et al. (2015)) or based on simultaneously acquired behavioral and neural data through joint modeling (e.g., Turner et al. (2015, 2017)), but no approach so far has explicitly incorporated a cognitive theory that proposes how and why cognitive processes change over time, for example, due to adaptation effects and mind-wandering.

References Anderson, J. R. (2007). How can the human mind occur in the physical universe?. New York: Oxford University Press. Ashby, F. G., & Helie, S. (2011). Journal of Mathematical Psychology, 55, 273. Boag, R. J., Strickland, L., Heathcote, A., Neal, A., Palada, H., & Loft, S. (2023). Trends in Cognitive Sciences, 27, 175. Brown, S., & Heathcote, A. (2005). Psychological Review, 112, 117. Brown, S., & Heathcote, A. (2008). Cognitive Psychology, 57, 153.

14

B. U. Forstmann and B. M. Turner

Brown, G. D. A., Neath, I., & Chater, N. (2007). Psychological Review, 114, 539–576. Busemeyer, J., & Townsend, J. (1993). Psychological Review, 100, 432. Dennis, S., & Humphreys, M. S. (2001). Psychological Review, 108, 452. Eldridge, L., Knowlton, B., Furmanski, C., Bookheimer, S., & Engel, S. (2000). Nature Neuroscience, 3, 1149. Fitts, P. M., & Peterson, J. R. (1964). Journal of Experimental Psychology, 67, 103–112. Fontanesi, L., Gluth, S., Spektor, M., & Rieskamp, J. (2019). PBR, 26, 1099–1121. Forstmann, B. U., Wagenmakers, E.-J., Eichele, T., Brown, S., & Serences, J. (2011). Reciprocal Relations Between Cognitive Neuroscience and Cognitive Models: Opposites Attract? Trends in Cognitive Sciences, 6, 272–279. Gilden, D. L. (1997). Psychological Science, 8, 296. Gluth, S., & Meiran, N. (2019). eLife, 8, e42607. Gunawan, D., Hawkins, G. E., Tran, M. N., Kohn, R., & Brown, S. D. (2020). Journal of Mathematical Psychology, 96, 102368. Howard, M. W., & Kahana, M. J. (2002). Journal of Mathematical Psychology, 46, 269. Kahana, M. J. (2012). Foundations of human memory. Oxford: Oxford University Press. Kragel, J. E., Morton, N. W., & Polyn, S. M. (2015). Journal of Neuroscience, 35, 2914. Krajbich, I., & Rangel, A. (2011). Proceedings of the National Academy of Sciences of the USA, 108, 13852. Krajbich, I., Armel, C., & Rangel, A. (2010). Nature Neuroscience, 13, 1292. Kriegeskorte, K., & Douglas, P. K. (2018). Nature Neuroscience, 21, 1148–1160. Kvam, P. D., & Turner, B. M. (2021). Psychological Review, 128, 766–786. Laming, D. R. (1968). Information theory of choice reaction time. New York: Wiley. Luce, R. D. (1986). Response times: Their role in inferring elementary mental organization. New York: Oxford University Press. Macmillan, N. A., & Creelman, C. D. (2005). Detection theory: A user’s guide. New Jersey: Lawrence Erlbaum Associates. McClelland, J., & Rumelhart, D. (1986). Parallel distributed processing: Explorations in the microstructure of cognition. In Psychological and Biological Models (Vol. 2). Cambridge, MA: MIT Press. Mileti´c, S., Boag, R. J., & Forstmann, B. U. (2020). Neuropsychologia, 136, 107261. O’Doherty, J. P., Hampton, A., & Kim, H. (2007). Annals of the New York Academy of Science, 1104, 35. O’Reilly, R. C., Munakata, Y., Frank, M. J., Hazy, T. E., et al. (2012). Computational cognitive neuroscience, (4th ed.). https://CompCogNeuro.org. https://github.com/CompCogNeuro/ed4. Palmeri, T., Schall, J., & Logan, G. (2015). In J. R. Busemeyer, J. Townsend, Z. J. Wang, & A. Eidels (Eds.), Oxford handbook of computational and mathematical psychology. Oxford: Oxford University Press. Pedersen, M., & Frank, M. (2020). Computational Brain & Behavior, 3, 458. Pinto, L., Tank, D. W., & Brody, C. D. (2022). eLife, 11, e70263. Pitt, M. A., & Myung, J. I. (2002). Trends in Cognitive Sciences, 6, 421. Pleskac, T. J., & Busemeyer, J. R. (2010). Psychological Review, 117, 864. Polyn, S. M., Norman, K. A., & Kahana, M. J. (2009). Psychological Review, 116, 129. Purcell, B., Heitz, R., Cohen, J., Schall, J., Logan, G., & Palmeri, T. (2010). Psychological Review, 117, 1113. Purcell, B., Schall, J., Logan, G., & Palmeri, T. (2012). Journal of Neuroscience, 32, 3433. Raaijmakers, J. G. W., & Shiffrin, R. M. (1981). Psychological Review, 88, 93. Ratcliff, R. (1978). Psychological Review, 85, 59. Ratcliff, R. (2018). Psychological Review, 125, 888–935. Ratcliff, R., & Starns, J. (2009). Psychological Review, 116, 59. Roberts, S., & Pashler, H. (2000). Psychological Review, 107, 358. Rodriguez, C. A., Turner, B. M., Van Zandt, T., & McClure, S. M. (2015). European Journal of Neuroscience, 42(5), 2179–2189.

General Introduction to Model-Based Cognitive Neuroscience

15

Rumelhart, D., & McClelland, J. (1986). Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1). Foundations. Cambridge, MA: MIT Press. Shiffrin, R. M., & Steyvers, M. (1997). Psychonomic Bulletin and Review, 4, 145. Smith, P. L. (2016). Psychological Review, 123, 425. Stillman, P. E., Shen, X., & Ferguson, M. J. (2018). Trends in Cognitive Science, 22, 531. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: The MIT Press. Turner, B. M. (2019). Psychological Review, 126, 660. Turner, B. M., Dennis, S., & Van Zandt, T. (2013). Psychological Review, 120, 667. Turner, B. M., Van Maanen, L., & Forstmann, B. U. (2015). Psychological Review, 122, 312. Turner, B. M., Forstmann, B. U., Wagenmakers, E. J., Brown, S. D., Sederberg, P. B., & Steyvers, M. (2013). NeuroImage, 72, 193. Turner, B. M., Forstmann, B. U., Love, B. C., Palmeri, T. J., & Van Maanen, L. (2017). Approaches of Analysis in Model-based Cognitive Neuroscience. Journal of Mathematical Psychology, 76, 65–79. Usher, M., & McClelland, J. L. (2001). Psychological Review, 108, 550. van Maanen, L., Brown, S. D., Eichele, T., Wagenmakers, E. J., Ho, T., & Serences, J. (2011). Journal of Neuroscience, 31, 17488. Van Zandt, T. (2000). Psychonomic Bulletin and Review, 7, 424. Wagenmakers, E. J., Farrell, S., & Ratcliff, R. (2004). Psychonomic Bulletin and Review, 11, 579.

Linking Models with Brain Measures Bradley C. Love

Abstract Linking models and brain measures offers a number of advantages over standard analyses. Models that have been evaluated on previous datasets can provide theoretical constraints and assist in integrating findings across studies. Model-based analyses can be more sensitive and allow for evaluation of hypotheses that would not otherwise be addressable. For example, a cognitive model that is informed from several behavioural studies could be used to examine how multiple cognitive processes unfold across time in the brain. Models can be linked to brain measures in a number of ways. The information flow and constraints can be from model to brain, brain to model, or reciprocal. Likewise, the linkage from model and brain can be univariate or multivariate, as in studies that relate patterns of brain activity with model states. Models have multiple aspects that can be related to different facets of brain activity. This is well illustrated by deep learning models that have multiple layers or representations that can be aligned with different brain regions. Model-based approaches offer a lens on brain data that is complementary to popular multivariate decoding and representational similarity analysis approaches. Indeed, these approaches can realise greater theoretical significance when situated within a model-based approach. Keywords Linking · Cognitive models · Multivariate measures of cognition

1 Introduction Psychology and neuroscience are concerned with theoretical concepts that cannot be directly measured. For example, theoretical concepts like recognition, familiarity, error, learning, replay, receptive field, fear, prejudice, value, and uncertainty need to

B. C. Love () University College London, The Alan Turing Institute, London, UK e-mail: [email protected] © Springer Nature Switzerland AG 2024 B. U. Forstmann, B. M. Turner (eds.), An Introduction to Model-Based Cognitive Neuroscience, https://doi.org/10.1007/978-3-031-45271-0_2

17

18

B. C. Love

be operationalised. We cannot directly measure these concepts like we measure the temperature of a room with a thermometer or the length of a bolt with a ruler. To further complicate matters, we are often interested in how processes unfold over time. For example, memory by definition involves processes that extend over time and involve generalisation or similarity structure. Likewise, decision-making processes, such as evidence accumulation for competing options, involve decision variables that change over time (Shadlen & Kiani, 2013). The dynamical nature of cognition is central in many accounts of behaviour (Busemeyer & Townsend, 1993; Tanenhaus et al., 1995; Wijeakumar et al., 2017). To understand the brain basis of theoretical concepts in psychology, we need to measure these concepts and relate our measurements to the brain. Formal models offer one way forward. Models can be used to characterise cognitive processes in terms of the steps people carry out while performing a task. For example, driftdiffusion models (see chapter “Reinforcement Learning: Application to fMRI”) characterise how evidence is accumulated over time for choice options (Ratcliff, 1978). Learning models characterise how knowledge is updated in light of corrective feedback, detailing the nature of error signals (Kruschke, 1992; Love et al., 2004). Cognitive models that have been rigorously evaluated are our best guess of how cognitive processes unfold. By fitting these models, such as to behavioural data, we can operationalise and quantify theoretical concepts of interest, akin to how a thermometer allows us to measure temperature. One research goal in model-based neuroscience is to understand how abstract processes and representations detailed in cognitive models are instantiated in the brain (Forstmann et al., 2011; Palmeri et al., 2015; Turner et al., 2017). Additionally, as I will discuss, relating theoretical concepts to brain measures may also help advance our understanding of cognition by introducing additional constraints when fitting and selecting among candidate cognitive models. In effect, there can be a two-way street in which cognitive models help us to understand the brain and the brain helps us to develop and evaluate cognitive models. Cognitive models can serve as the bridge between abstract theories and brain measures (Love, 2015). Model-based neuroscience offers the possibility of advancing our understanding along multiple levels of analysis. Linking models with brain measures also creates a number of exciting opportunities. As I will review, there are a number of cases in which brain imaging researchers could not have made an advance without a model-based analysis approach. In this chapter, I will consider several ways in which cognitive models can be related to brain measures and provide illustrative examples. As reviewed in Turner et al. (2017), cognitive models, which are concerned with behaviour, can be related to brain data in a number of ways, including (1) using the brain measures to constrain the cognitive model, (2) using the cognitive model to predict neural data, and (3) considering both the brain and behavioural data simultaneously. These approaches can be univariate or multivariate (i.e. patterns of brain activity are considered).

Linking Models with Brain Measures

19

2 Some Functions of Models in Science Models can play a number of constructive roles in psychology, neuroscience, and science more broadly. One function is simply organising one’s ideas and making assumptions clear. Formal models require researchers to detail each step, which can reduce wiggle room relative to purely verbal theories. Whatever wiggle room is left (e.g. tuneable parameters) is made explicit. As a consequence, what is predicted under different circumstances is made clear. Rather than debate what a theory predicts, a model can be simulated. For example, early work showing an advantage in processing category prototypes led researchers to believe that abstract prototypes were stored in memory, but subsequent work demonstrated that such effects were compatible with exemplar models that store no abstractions in memory (Medin & Schaffer, 1978). More recently, models have played a related role in the design and interpretation of fMRI (functional magnetic resonance imaging) studies of memory (Caplan & Madan, 2016; Nosofsky et al., 2012). Models can play a constructive role in directing empirical investigations. Science often progresses by evaluating competing theoretical accounts. Models afford the possibility of model comparison in which competing accounts can be pitted against one another, and the model that performs best can be favoured. This approach is standard in mathematical psychology (Pitt et al., 2002) but can also be done in cognitive neuroscience. For example, Mack et al. (2013) formally evaluated whether the representations in an exemplar or prototype model best matched the BOLD (blood-oxygen-level-dependent) response and found that the exemplar model was more consistent (also see Stillesjö et al. (2019)). In such cases, brain data can help adjudicate between competing models when behavioural data alone cannot (Ditterich, 2010; Mack et al., 2013; Purcell et al., 2012). Recent work evaluating whether the hippocampus learns to associate objects and words incrementally or in an all-or-none fashion used a related approach that favoured the all-or-none account (Berens et al., 2018). Model comparison can even be done in cases in which behavioural data are not analysed. For example, recent work (Bobadilla-Suarez et al., 2019) asks what makes two brain states similar evaluating a number of basic accounts of similarity, such as Euclidean distance, Mahalanobis distance, Pearson correlation, etc., and found that the same similarity measures were operable across brain states but differed across tasks or stimuli. Models can serve a powerful integrative role by linking seemingly disparate findings through common computational mechanisms. For example, a simple model of familiarity and recognition memory captured findings from both fMRI studies of visual categorisation and word list memory (Davis et al., 2014). In my own work, the same clustering approach for capturing behaviour in learning studies has been applied to a number of fMRI studies (Davis et al., 2012a, b; Inhoff et al., 2018; Mack et al., 2016, 2020). Applying the same model to multiple studies helps to theoretically integrate these empirical contributions, which is especially helpful when studies involve different paradigms and dependent measures. More recently, our clustering work (Mok & Love, 2019) has extended these same

20

B. C. Love

model mechanisms to offer an alternative explanation for place and grid cell responses in rodents and humans. This account makes novel predictions for how cell responses should change under different experimental conditions. In summary, cognitive models are useful tools for clarifying one’s thinking, evaluating theoretical proposals, and, as will be discussed here, linking behaviour and brain.

3 Levels of Analysis The aforementioned models can be considered cognitive models. These models are hypothesised to involve the same processes and representations as the human mind. Cognitive models reside at Marr’s (1982) algorithmic level and are well placed to help explain how the brain implements higher-level computations (Love, 2015). As discussed below, the algorithmic level resides between higher-level considerations related to the description or goal of the overall computation and lower-level accounts of the computation’s physical realisation, such as in the brain. Marr’s tripartite hierarchy (Marr, 1982) is perhaps the most well-known and influential organisation of levels in neuroscience. In brief, the computational level is the top level where the problem to be addressed is specified. Rather than detail the form of a potential solution, the computational level simply states the problem (i.e. the input-output mapping desired). For example, for object recognition, a computational-level account could involve naming various images under various conditions. The next level is the algorithmic level. As its name indicates, the algorithmic level is concerned with how the function specified at the computational level is computed (i.e. the processes and representations used). For example, if the computational-level task were to sort an array of numbers in ascending order, then the algorithmic level would specify a possible approach, such as bubble sort or quicksort. Different algorithms may solve the computational task in different ways, have different runtimes, etc., but they should all conform to the computational-level goal (e.g. correctly sort the array). Finally, the implementational level describes the physical substrate for the computation (e.g. the computer that executes quicksort). The previous examples from computer science are apropos as Marr was clearly inspired by abstraction layers, a central concept in computer science (Wing, 2008). Note that Marr’s top two levels, the computational and algorithmic, neatly map onto the top two levels in a common abstraction hierarchy in computing (Fig. 1). Abstraction layers in computing can contain finer-grain levels, including multiple levels describing the physical computing device. In contrast, Marr effectively lumped all of neuroscience into a single implementational level, which might partly explain why some neuroscientists find his hierarchy inadequate (Churchland et al., 1990). Although Marr’s scheme is highly influential, there are alternatives (Pylyshyn, 1984). Moreover, there is no reason to restrict to three levels. For example, there are a number of four-level schemes in cognitive science (Dawson, 2013; Newell, 1980, 1990; Sun, 2009). Indeed, Bechtel and Richardon’s (1993) mechanistic

Linking Models with Brain Measures Categorisation Example Behavioral theory and data of interest on what people learn and generalise, f(task)=behavior.

21

Marr’s Levels

Abstraction Layers

Computational Level (What? Input-output function)

Application

Algorithmic Level (How? The steps to follow)

Algorithm

Sorting Example

f( C

B

D

)=(

A

B

C

D

)

Bubble Sort A

Programming Language

Assembly Language

A

A

C

B

B

C

D

D

Machine Code

A

B

C

D

Instruction Set Architecture Implementational Level (Where? Physical substrate)

Microarchitecture

A,B,C,D Gates and Registers

Transitors

Physics

Fig. 1 Marr’s levels compared to abstraction layers in computing with examples of each. Marr’s levels are clearly influenced by abstraction layers in computer science, though Marr’s levels are less fine grain, particularly for levels of interest to many neuroscientists. On the left, an example from category learning is shown in which an algorithmic model (Love et al., 2004) was fit to behaviour and its internal representations are used to interpret BOLD response (Mack et al., 2016). On the right, a sorting algorithm addressed the computational-level problem of sorting and was implemented by a digital computer. The abstraction layers in computing make clear that moving to a lower layer introduces additional detail (more information) about the computation whereas higher layers introduce abstract constructs that can be realised in multiple ways. (Figure and discussion from Love (2020a))

approach can be characterised as a “levels of mechanism” hierarchy in which there are not a fixed number of levels. For example, a car can be seen as mechanism consisting of interacting parts, such as an engine, drivetrain, steering wheel, brakes, etc. A component of a mechanism itself can be further decomposed into its own mechanism (e.g. braking system) and so forth with no limit except those imposed by particle physics. For the present purposes, the important point is that cognitive models reside at an intermediary level that details the “how” of cognition. Given this placement, cognitive models can bridge between input-output descriptions of behaviour and brain implementation.

4 Other Types of Models Useful in Analysing Brain Data In addition to using cognitive models, neuroscientists also use formal models as data analysis tools. For example, the generalised linear model (GLM) itself is a formal model that has assumptions and tuneable parameters that are fit to data. Of course, the GLM is not a model of how people process and represent information.

22

B. C. Love

Returning to Marr’s levels, it is clear that the GLM does not lie at the algorithmic level in understanding human cognition nor any other level. Instead, the GLM is an analysis tool. Other examples of data analysis tools that are not cognitive models include dynamic causal modelling (Friston et al., 2003), techniques to measure the intrinsic or functional dimensionality of fMRI data (Ahlheim & Love, 2018), and multi-voxel pattern analysis (MVPA). MVPA decoding approaches apply a machine classifier to “mind read” from the BOLD response whether a participant, for example, is viewing a house or a face (Cetron et al., 2019). Although these are not psychological models, they can be used to make interesting behavioural predictions. For example, participants tend to have faster response times for stimuli that are further from the classifier’s decision bound, which indicates the classifier is more confident about its decision (Ritchie & Op de Beeck, 2019a). Decoding approaches can also be used to determine when people are engaging in replay (Lee et al., 2019; Momennejad et al., 2018; Shanahan et al., 2018; Xue, 2018). There is a lot of room for creativity and innovation in using non-cognitive models, such as decoding procedures. For example, Shen et al. (2019) coupled a decoding approach with a deep convolutional network to visualise the image a person was viewing. Other methodological innovations include hyperalignment, which creates a common brain space for multiple participants to increase decoding performance (Haxby et al., 2011). Hyperalignment is successful because voxels do not exactly align across individuals’ brains, but simple transformations to a common space can reveal commonalities across individuals. The line distinguishing cognitive models and data analysis tools can be blurred at times. The distinction can depend on the intentions of the researcher using the model. Analogously, a Bayesian model can be taken as a computational-level theory of cognition (i.e. describing the behaviour that should occur under different circumstances with no recourse to the processes or representations that people use) or as algorithmic proposals of how people algebraically solve the task (Jones & Love, 2011a). For example, an algorithmic Bayesian model may predict response times depending on the nature of model updates, which are interpreted as mental operations, not computational-level descriptions. Making clear the nature of the model used is important because it determines how the model should be evaluated (Jones & Love, 2011b).

5 General Comparison of Model and Brain Data A lot of early brain-inspired work in cognitive science was only loosely informed by findings in neuroscience. For example, the original parallel distributed processing (PDP) movement in the 1980s was motivated by the idea that brain computation is

Linking Models with Brain Measures

23

distributed across neurons and that cognitive models should reflect this observation (Rumelhart & McClelland, 1986). Notice this linkage between PDP models and the brain does not involve the fit of neural measures nor other formal coupling. Theoretical assertions of being brain-like or biologically plausible can be controversial in part because they are often underspecified whereas model selection procedures make claims and results clearer (Love, 2020a). The PDP models neglected many of the details of actual neurons, such as ion channels and spiking activity. Abstracting away details is not necessarily negative – in accord with Occam’s razor, models should be as simple as possible while capturing the data of interest, which may or may not include the specifics of neurons. Again, model selection approaches make clear what data the scientist intends to explain. The loose coupling of models and brain can be made somewhat more direct in cognitive models that attempt to simulate basic patterns of behaviour across different populations that vary in some key way, such as whether a group has a hippocampal lesion (Love & Gureckis, 2007; Nosofsky & Zaki, 1998). This basic approach is common and has been fruitful in exploring semantic processing impairments (Lambon Ralph et al., 2006; Tyler et al., 2000). Again, in these lines of work, cognitive modelling and analysis of brain data are happening separately from one another. The relation between model simulations and brain measures can become quite rich. For example, recent work relates clustering mechanisms that have been used in concept learning to explain grid and place cell recordings in the rodent brain during navigation tasks (Mok & Love, 2019). In this case, the cognitive model is predicting how lower-level cell activity should vary with changes in task and environment. Although this work is theoretical and links cognitive models to the level of neurons, notice that this linkage does not involve exploiting any joint constraints in the data analysis. For example, the cognitive model is not being used to identify cell types by applying it to neural data. Instead, the model is being simulated and theoretically related to brain activity to help interpret and conceptualise findings. In some sense, the entire emerging field of computational psychiatry falls under this heading of loosely connecting cognitive models to brain function. In computational psychiatry, cognitive models are routinely fit to behaviour, and fitted parameters for different populations (e.g. depressives vs. non-depressives) are compared (Adams et al., 2015; Blanco et al., 2013). Certainly, work that provides a general conceptual link between brain and behaviour can be valuable. However, ideally, models would also be integrated into the data analysis. The remainder of this chapter focuses on incorporating cognitive models into the analysis of brain measures. Such model-based neuroscience approaches both theoretically relate cognitive models to the brain (as do the accounts reviewed in this section) and incorporate constraints across levels of analysis when evaluating models and brain data.

24

B. C. Love

6 Cognitive Model as Integral Part of the Data Analysis In a typical task fMRI (or EEG, MEG, etc.) analysis, experimental conditions are contrasted with one another. For example, one may contrast voxels that are more active for face than for house stimuli. The simplest model-based analyses replace the stimulus condition with some model measure (e.g. prediction error) that varies across trials (Daw et al., 2006). By entering this regressor (e.g. prediction error) from the cognitive model into the GLM, one can evaluate which voxels co-vary with the cognitive construct. As shown in Fig. 2, both the typical contrast approach and simple model-based analyses are univariate. Instead, standard MVPA start from a collection of voxels (multivariate) and aim to predict some experimental condition, such as whether the participant is viewing a house or a face. One innovation is to make the target of decoding a model measure, such as item familiarly according to a cognitive model (Mack et al., 2013). The four quadrants shown in Fig. 2 are not an exhaustive taxonomy of how to relate models to the BOLD response (for a more complete treatment, see Turner et al. (2017)). Perhaps because it is relatively straightforward, the univariate model-based approach is most common in the field. Typically, a model is fit to behavioural data

Fig. 2 The top row illustrates approaches that are not model-based in that they do not leverage a cognitive model of the task. For example, in the top-left panel, a standard analysis might identify voxels that are more active for faces than for house stimuli, whereas in the top-right panel, a decoder might try to classify whether the participant is viewing a house or a face stimulus on each trial. In the bottom row, a cognitive model is at the centre of the analysis. In the bottom-left panel, some measure from the cognitive model (which is usually fit to behavioural data), such as item familiarity, learning update, etc., is entered into the GLM. Such an analysis will identify voxels that show a similar activation profile to the model measure. In contrast, in the bottom-right quadrant, a classifier is applied to the brain to try to decode some internal measure from the cognitive model. In this case, models are favoured to the extent that their internal state is decodable (Mack et al., 2013). (Figure and discussion from Love (2020b))

Linking Models with Brain Measures

25

and then used as a lens on the fMRI data. For example, an associative learning model was fit to behavioural data from a task where people formed impressions of various social groups through trial-by-trial feedback (Spiers et al., 2017). The fitted model provided a GLM trial-by-trial measure of valence or prejudice for each group, which tracked activity in the anterior temporal lobe in the model-based analysis. Modelbased analysis was critical for capturing changes in memory across study trials. In a category learning study (Davis et al., 2012a), a model-based analysis with a clustering model of learning was critical to capturing two time courses, one across trials and one within. This study examined the hippocampus’ role in acquiring categories in which most items followed a rule but some items (exceptions) did not. A clustering model (Love et al., 2004) was fit to the behavioural data (i.e. the learning curves), and two model-based measures were entered into the GLM, one for recognition strength or familiarity and one for error correction or learning update. As shown in Fig. 3, the hippocampus tracked the model’s recognition measure at stimulus presentation and the error measure at feedback presentation. Interestingly, a standard analysis contrasting exception and rule-following items found no significant difference – the cognitive model proved critical to capturing how hippocampal response changes over the course of study trials. The same modelling approach can also be used to localise two simultaneous processes (by using two different model-based measures) within the same phase of a trial-to-draw distinction between the functions of anterior and posterior hippocampus (Davis et al., 2012b). Another way to scale up this basic univariate modelling approach is to adopt an encoder approach in which the fitted cognitive model provides a number of model-based regressors to enter into the GLM with the goal of explaining the most variance possible within brain regions of interest (van Gerven, 2017). In the encoding approach, rather than trying to identify voxels that significantly regress on some specific model-based measure (e.g. prediction error), the goal is for multiple model measures to capture the most overall variance possible in the GLM. Another model-based work (Kragel et al., 2015; Palmeri et al., 2015) reverses the flow of information to incorporate brain measures directly into the operation of the model to better predict behaviour. For example, Kragel et al. (2015) used a variant of the context maintenance and retrieval (CMR) model of free recall (Polyn et al., 2009) that took signals from the medial temporal lobe (MTL) to determine whether contextual reactivation was successful at each potential recall event. The model that incorporated the BOLD input performed better than a baseline model in predicting behaviour. Another example of this approach is replacing parameters in decision models, such as in drift-diffusion model (Ratcliff, 1978) and variants (Usher & McClelland, 2001) with neural recordings from regions thought to implement the functions of those parameters (Palmeri et al., 2015; Purcell et al., 2010). Rather than linking from model to brain or brain to model, joint modelling approaches (Turner et al., 2019a, b) simultaneously model the mutual constraints between behavioural and brain measures through an intermediary cognitive model. This approach can deal with multiple brain measures (e.g. fMRI and EEG) and can make predictions about missing measures based on covariance with the observed

26

B. C. Love

Fig. 3 Panels a and b show model-based regressors for a measure of recognition strength (i.e. familiarity) and error correction (i.e. learning update). These model-based regressors track hippocampal activity at the stimulus presentation (panel c) and feedback phases (panel d) of trials, respectively (Davis et al., 2012a). In contrast, a standard contrast of exception > rule-following items (panels e and f) results in no statistically significant voxels, because this contrast does not track the time course of hippocampal activitys

measures. This approach can be quite powerful and useful in practice. For example, one could collect behavioural data from a number of participants and more costly neural recordings from only a subset of participants and leverage the constraints across measures and participants through hierarchal Bayesian modelling. There are a number of other creative ways to link cognitive models to BOLD response. One way is to link a key event, as indexed by the cognitive model, to an operation in the brain. For example, a recent study finds that prediction errors during

Linking Models with Brain Measures

27

study are predictive of later replay events (Momennejad et al., 2018). In other work, a Bayesian model determined the probability that an item would be remembered, which correlated with hippocampal activity during encoding (Gluth et al., 2015). Finally, a cognitive model’s fitted parameters can be related to the BOLD response instead of a trial-by-trial measure from the model. During category learning, models (Love et al., 2004; Nosofsky, 1986) predict that goal-relevant aspects of the stimuli will receive greater weight or attention. A recent study found that the learned attentional weights from category learning models fit to behaviour were predictive of how well those stimulus aspects could be decoded from the BOLD response (Braunlich & Love, 2019). Relatedly, in a study exploring vmPFC (ventromedial prefrontal cortex)-hippocampal interactions during concept learning (Mack et al., 2020), the pattern of goal-directed representation compression in vmPFC paralleled the attention weights from a model fitted to behaviour.

7 Individual Differences Both behavioural and brain measures, such as fMRI’s BOLD response, tend to be very noisy both within and across individuals. Somewhat surprisingly, cognitive models that are fit to individual’s behaviour can be used to understand individual differences in brain response. For example, in studies of category learning, individuals learn to attend to relevant stimulus dimensions that discriminate between the category responses (Kruschke, 1992; Love et al., 2004; Nosofsky, 1986). According to the fits of cognitive models, individuals’ attentional strategies differ slightly from one another, which affects how attended each stimulus dimension is. Interestingly, these individual differences in attention weights arising from fitting behaviour can also be observed in brain response – stimulus aspects that are more attended by an individual are easier to decode in visual areas using MVPA (i.e. mind reading) on the fMRI BOLD response (Braunlich & Love, 2019). Relatedly, compression signals found in the ventromedial prefrontal cortex (vmPFC) thought to relate to attentional allocation and also relate to individual differences in attentional weighting over the course of learning. A final example comes from the neuroeconomics literature from a task patterned after shopping on Amazon. Participants’ willingness to update their beliefs in the face of Amazon reviews was modelled by a Bayesian model fit to behaviour with the tendency of an individual to update, correlating with overall activity in the dorsomedial prefrontal cortex (De Martino et al., 2017). In the aforementioned analyses, estimates for individuals were independent from another in that individuals were not linked during the analysis. An alternative approach, such as in Bayesian hierarchal modelling, is to assume that individuals belong to a common family such that estimates of individual inform the estimates for others. When data are noisy, hierarchal approaches that link estimates may offer advantages and have been used successfully in modelling individual differences in cognitive control (Molloy et al., 2019). When using an independent or hierarchal approach, the conclusion that cognitive models can reflect a reality at both the

28

B. C. Love

behavioural and neural levels for individual participants is exciting and demonstrates how modelling can extract fine-grain information.

8 Models Can Uncover Useful Latent States Models can be useful in inferring latent states that can help explain behaviour and its brain basis. One example of latent variables are the clusters in the aforementioned learning models (Anderson, 1991; Love et al., 2004) which detail how related items are stored together in memory (Mack et al., 2018). Models operationalise these hypothesised representational structures, which can be useful in analysing BOLD response. Inferring latent state is more complex when researchers aim to characterise complex mental operations that unfold through time (Wijeakumar et al., 2017). One popular approach is to use hidden Markov models (HMMs) to infer what operations people are currently undertaking and using this characterisation to interpret the BOLD response (Anderson et al., 2018; Tubridy et al., 2018). The importance of inferring latent state is also becoming appreciated in related fields, such as reinforcement learning (Niv, 2019). Many of the same conceptual issues and brain systems are implicated in these tasks as in goal-directed concept learning. For example, strategic exploration relies on hippocampal-prefrontal cooperation (Wang & Voss, 2014) as is found during memory tasks (Mack et al., 2020).

9 Comparing Model and Brain Representations In addition to MVPA decoding, multivariate pattern analysis can be used to compare proposed (e.g. model) representations and voxel representations (Haxby, 2001). This pattern comparison analysis is popularly known as representational similarity analysis (RSA) (Dimsdale-Zucker & Ranganath, 2018). RSA correlates two similarity matrices, one from the cognitive model and one from the brain, to assess how well the two similarity spaces align. RSA can be used as confirmatory evidence that a model provides the correct representational account of a brain region or in an exploratory fashion such as in a whole-brain searchlight analysis. One application of RSA is to compare proposed memory representations acquired by models of concept learning to brain regions thought to implement those functions (Mack et al., 2013; Ritchie & Op de Beeck, 2019b). For example, RSA analyses found that hippocampal representations of objects (see Fig. 4) are modulated by changes in the task goal (Mack et al., 2016). For an RSA to be model-based, one of the similarity matrices should be generated by a cognitive model. RSA can involve the evaluation of several cognitive models. A variety of models can be considered, and the model whose representations best align with the brain can be favoured (Ritchie & Op de Beeck, 2019b). However,

Linking Models with Brain Measures

29

Fig. 4 Representation similarity analysis (RSA) can be used to compare a cognitive model’s representations to those of the brain. In this example (Mack et al., 2016), a cognitive model was fit to behaviour for different learning problems (shown in red and teal). For each problem, the cognitive model was used to calculate a similarity matrix for the stimulus items. Similarity matrices were also calculated by comparing voxel activity for the stimulus items. In the left anterior hippocampus, the similarity patterns predicted by the model and those observed in the brain agreed

not all RSAs are model-based and the dividing line can be blurry. For example, technically, finding that hippocampus CA1 codes distance to a goal (Spiers et al., 2018) is not model-based (because distance is specified by the task), whereas coding distance to some model quantity, such as distance to a category prototype (Seger et al., 2015), is model-based (because the prototype is specified by the fitted cognitive model). For a model-based analysis to be useful, it should add something beyond a standard analysis. Ideally, a model-based analysis would improve both data fit and our understanding of the domain. For example, a model may largely code distance to goal but diverge in informative ways under certain circumstances that could be empirically verified and in turn deepen our understanding of the domain. Certainly, univariate analyses can be rigorous, interesting, and motivated but not model-based. The same is true in RSA. For example, a recent study (Martin et al., 2018) used similarity matrices designed to capture perceptual or conceptual similarity to hone in on the function of perirhinal cortex and other regions. This work is exciting and valuable, but because the similarity matrices were derived from human ratings rather than generated by a model of perceptual or conceptual processing, the analysis is not model-based. Although RSA is popular and powerful, it is not entirely clear what advantages it offers over general statistical approaches such as canonical correlation analysis (CCA) or related techniques such as partial least squares (PLS). CCA maximises the correlation between two sets of multivariate measurements. For example, one set of measures could be on the brain side, such as a collection of voxels or the time course for an individual voxel, and the other set of measures could be from a cognitive model, a set of experiment ratings, etc. Although CCA has been used in imaging analysis and software tools exist (Bilenko & Gallant, 2016), it is not as popular as RSA at the present time, though that could change as CCA seems to offer a number

30

B. C. Love

of advantages (e.g. it infers weights for the individual measures in the two domains, takes the reliability of measures into account, etc.) and no disadvantages that I can discern. It is also preferred over RSA for related problems, such as comparing representations from deep learning networks (Morcos et al., 2018).

10 Multiple Levels of Representation The advent of deep learning has opened a number of possibilities in model-based neuroscience. Deep learning models are the descendants of connectionist models that were prominent in psychology in the 1980s (Rumelhart & McClelland, 1986). Like those earlier models, the weights in deep learning models are typically trained end-to-end through gradient descent procedures. Through architectural innovations, such as multiple convolutional and pooling layers, these networks display abilities that eclipse their predictors and excel at computer vision benchmarks (Krizhevsky et al., 2012). Despite being developed for engineering purposes, these models provide leading accounts of computation along the human and monkey ventral stream (Guclu & van Gerven, 2015; Khaligh-Razavi & Kriegeskorte, 2014; Kubilius et al., 2018; Yamins & DiCarlo, 2016). They have also been useful for exploring ideas about the nature of neural code (Guest & Love, 2017). Because deep learning models can take photographic stimuli as input, they open a number of opportunities for researchers, such as using these networks to derive stimuli that should best drive the response of a brain region (Bashivan et al., 2019). One positive aspect of these models is that they contain multiple levels of representation (see Fig. 5). Each layer of the model takes as input the output of the previous layer and transforms it, such that the initial input is a photograph and the final output is an object recognition decision. At each step in this transformation, the representations can be compared to the activity patterns in brain regions. One common finding is that the early and late layers in models tend to correspond to early and late regions along the visual ventral stream (Guclu & van Gerven, 2015; Khaligh-Razavi & Kriegeskorte, 2014; Kubilius et al., 2018; Yamins & DiCarlo, 2016). Model representations can be related to brain response using either RSA or encoder approaches. Although these models have been successful in accounting for object recognition and activity along the ventral stream, one future challenge is to incorporate additional processes, such as top-down, goal-directed attention (Lindsay & Miller, 2018; Roads & Love, 2019).

11 Conclusions Adopting a model-based approach to analysing brain measures offers a number of advantages. In some cases, one can evaluate hypotheses that otherwise would not be possible with a standard analysis approach. Models, which formalise related

Linking Models with Brain Measures

31

Fig. 5 Deep learning models contain multiple layers or processing stages that transform the stimulus. This enables evaluation of hypotheses that span brain regions, such as that the levels of object recognition deep networks correspond to stages along the ventral visual stream. (Image from Guest and Love (2017))

theories, offer the hope that results will be theoretically grounded. As related models are applied across data sets, models may promote a more systematic and cohesive science. Cognitive models are well positioned to integrate findings across levels of analysis (Love, 2015). I have reviewed a number of ways to relate cognitive models to brain response. Possibilities include fitting models to behaviour and incorporating derived trial-bytrial measures into the GLM, model decoding approaches (Mack et al., 2013), using brain response to drive the behavioural predictions of the model, joint modelling to simultaneously address brain and behavioural measures, and comparing model representations and brain response. Which approach is suitable is largely a function of the study’s design and the researcher’s aims. Opportunities and choices in conducting model-based analysis of brain data are rapidly increasing. It is an exciting time as there is latitude to be creative whether one is applying an existing technique or developing a novel analysis approach to address a new challenge. Although flexibility in inference can lead to false positives, model-based analyses can provide additional constraints by linking measures and multiple datasets. Model-based approaches can offer more stringent tests of theories and the possibility of comparing competing models. As open science initiatives and data repositories, such as OpenNeuro, make more datasets publicly available, the importance of model-based approaches, especially those that link multiple datasets, will only increase. Against this backdrop, modellers should do their part by making their code and details of their analyses publicly available through hosting and version control services such as GitHub. One key question to consider is why do model-based analyses work? Models are not magical nor guaranteed to be helpful, so why are there so many cases in which model-based analyses succeed in pulling more from the data than would be possible through a standard analysis? The answer is that models have the ability to incorporate constraints that are outside the immediate study. In my own work, models are developed over years and honed while being applied to multiple behavioural and fMRI datasets. In this sense, the models have a reality and

32

B. C. Love

value outside their immediate application, which is critical because a model-based analysis is only as credible as the model used. Acknowledgements This work was supported by the NIH Grant 1P01HD080679, ESRC grant (ES/W007347/1), Wellcome Trust Investigator Award WT106931MA, and Royal Society Wolfson Fellowship 183029 to B.C.L. Although mostly original, this paper draws on some previously published work (Love, 2020a, b; Turner et al., 2017). Thanks to Sebastian Bobadilla-Suarez for helpful comments on a previous draft. Conflict of Interest Nothing declared.

Questions for Consideration Model-based analyses can offer additional theoretical constraints but can also introduce degrees of freedom when choosing which model-based analysis to conduct. How should one choose which model-based analysis to conduct? How much should we demand of researchers in terms of verifying their models before conducting a model-based analysis given that the analysis is only as good as the model used? Will behavioural studies be increasingly valued as one avenue to verify models for model-based neuroscience? The motivation for a model-based analysis can involve more than the model itself to include the bridge theory that links model components to brain regions. How does one choose between this focused, top-down approach to model application and a bottom-up, data-driven approach? Models can be specified at multiple levels of abstraction (see “levels of mechanism” discussion). Why is it rare to have multiple models for the same task that differ in their level of abstraction?

Further Reading • Love, B. C. (2020a). Levels of biological plausibility. Philosophical Transactions of the Royal Society B. https://doi.org/10.1098/rstb.2019.0632 • Love, B. C. (2020b). Model-based fMRI analysis of memory. Current Opinion in Behavioral Sciences, 32, 88–93. https://doi.org/10.1016/j.cobeha.2020.02.012 • Turner, B. M., Forstmann, B. U., Love, B. C., Palmeri, T. J., & Van Maanen, L. (2017). Approaches to analysis in model-based cognitive neuroscience. Journal of Mathematical Psychology, 76, 65–79. • Turner, B. M., Forstmann, B. U., & Steyvers, M. (2019). Joint models of neural and behavioral data. Springer International Publishing. https://doi.org/10.1007/ 978-3-030-03688-1

Linking Models with Brain Measures

33

References Adams, R. A., Huys, Q. J. M., & Roiser, J. P. (2015). Computational psychiatry: Towards a mathematically informed understanding of mental illness. Journal of Neurology, Neurosurgery & Psychiatry, jnnp-2015-310737. https://doi.org/10.1136/jnnp-2015-310737 Ahlheim, C., & Love, B. C. (2018). Estimating the functional dimensionality of neural representations. NeuroImage, 179, 51–62. https://doi.org/10.1016/j.neuroimage.2018.06.015 Anderson, J. R. (1991). The adaptive nature of human categorization. Psychological Review, 98, 409–429. Anderson, J. R., Borst, J. P., Fincham, J. M., Ghuman, A. S., Tenison, C., & Zhang, Q. (2018). The common time course of memory processes revealed. Psychological Science, 29(9), 1463–1474. https://doi.org/10.1177/0956797618774526 Bashivan, P., Kar, K., & DiCarlo, J. J. (2019). Neural population control via deep image synthesis. Science, 364(6439), eaav9436. https://doi.org/10.1126/science.aav9436 Bechtel, W., & Richardson, R. C. (1993). Discovering complexity: Decomposition and localization as strategies in scientific research. Princeton University Press. Berens, S. C., Horst, J. S., & Bird, C. M. (2018). Cross-situational learning is supported by proposebut-verify hypothesis testing. Current Biology, 28(7), 1132–1136.e5. https://doi.org/10.1016/ j.cub.2018.02.042 Bilenko, N. Y., & Gallant, J. L. (2016). Pyrcca: Regularized kernel canonical correlation analysis in python and its applications to neuroimaging. Frontiers in Neuroinformatics, 10. https://doi.org/ 10.3389/fninf.2016.00049 Blanco, N. J., Otto, A. R., Maddox, W. T., Beevers, C. G., & Love, B. C. (2013). The influence of depression symptoms on exploratory decision-making. Cognition, 129(3), 563–568. https:// doi.org/10.1016/j.cognition.2013.08.018 Bobadilla-Suarez, S., Ahlheim, C., Mehrotra, A., Panos, A., & Love, B. C. (2019). Measures of neural similarity. Computational Brain & Behavior. https://doi.org/10.1007/s42113-01900068-5 Braunlich, K., & Love, B. C. (2019). Occipitotemporal representations reflect individual differences in conceptual knowledge. Journal of Experimental Psychology: General, 148(7), 1192–1203. https://doi.org/10.1037/xge0000501 Busemeyer, J. R., & Townsend, J. (1993). Decision field theory: A dynamic-cognitive approach to decision-making in an uncertain environment. Psychological Review, 100, 432–459. Caplan, J. B., & Madan, C. R. (2016). Word imageability enhances association-memory by increasing hippocampal engagement. Journal of Cognitive Neuroscience, 28(10), 1522–1538. https://doi.org/10.1162/jocn_a_00992 Cetron, J. S., Connolly, A. C., Diamond, S. G., May, V. V., Haxby, J. V., & Kraemer, D. J. M. (2019). Decoding individual differences in STEM learning from functional MRI data. Nature Communications, 10(1), 2027. https://doi.org/10.1038/s41467-019-10053-y Churchland, P. S., Koch, C., & Sejnowski, T. J. (1990). What is computational neuroscience? In Computational neuroscience (pp. 46–55). MIT Press. Davis, T., Love, B. C., & Preston, A. R. (2012a). Learning the exception to the rule: Model-based FMRI reveals specialized representations for surprising category members. Cerebral Cortex, 22(2), 260–273. https://doi.org/10.1093/cercor/bhr036 Davis, T., Love, B. C., & Preston, A. R. (2012b). Striatal and hippocampal entropy and recognition signals in category learning: Simultaneous processes revealed by model-based fMRI. Journal of Experimental Psychology: Learning, Memory, and Cognition, 38(4), 821–839. https://doi.org/ 10.1037/a0027865 Davis, T., Xue, G., Love, B. C., Preston, A. R., & Poldrack, R. A. (2014). Global neural pattern similarity as a common basis for categorization and recognition memory. Journal of Neuroscience, 34(22), 7472–7484. https://doi.org/10.1523/JNEUROSCI.3376-13.2014 Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441(7095), 876–879.

34

B. C. Love

Dawson, M. R. W. (2013). Mind, body, world: Foundations of cognitive science. Athabasca University Press. De Martino, B., Bobadilla-Suarez, S., Nouguchi, T., Sharot, T., & Love, B. C. (2017). Social information is integrated into value and confidence judgments according to its reliability. The Journal of Neuroscience, 37(25), 6066–6074. https://doi.org/10.1523/JNEUROSCI.388016.2017 Dimsdale-Zucker, H. R., & Ranganath, C. (2018). Representational similarity analyses. In Handbook of behavioral neuroscience (Vol. 28, pp. 509–525). Elsevier. https://doi.org/10.1016/ B978-0-12-812028-6.00027-6 Ditterich, J. (2010). A comparison between mechanisms of multi-alternative perceptual decision making: Ability to explain human behavior, predictions for neurophysiology, and relationship with decision theory. Frontiers in Neuroscience, 4. https://doi.org/10.3389/fnins.2010.00184 Forstmann, B. U., Wagenmakers, E.-J., Eichele, T., Brown, S., & Serences, J. T. (2011). Reciprocal relations between cognitive neuroscience and formal cognitive models: Opposites attract? Trends in Cognitive Sciences, 15(6), 272–279. https://doi.org/10.1016/j.tics.2011.04.002 Friston, K. J., Harrison, L., & Penny, W. (2003). Dynamic causal modelling. NeuroImage, 19(4), 1273–1302. https://doi.org/10.1016/S1053-8119(03)00202-7 Gluth, S., Sommer, T., Rieskamp, J., & Büchel, C. (2015). Effective connectivity between hippocampus and ventromedial prefrontal cortex controls preferential choices from memory. Neuron, 86(4), 1078–1090. https://doi.org/10.1016/j.neuron.2015.04.023 Guclu, U., & van Gerven, M. A. J. (2015). Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience, 35(27), 10005– 10014. https://doi.org/10.1523/JNEUROSCI.5023-14.2015 Guest, O., & Love, B. C. (2017). What the success of brain imaging implies about the neural code. eLife, 6, e21397. https://doi.org/10.7554/eLife.21397 Haxby, J. V. (2001). Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293(5539), 2425–2430. https://doi.org/10.1126/science.1063736 Haxby, J. V., Guntupalli, J. S., Connolly, A. C., Halchenko, Y. O., Conroy, B. R., Gobbini, M. I., Hanke, M., & Ramadge, P. J. (2011). A common, high-dimensional model of the representational space in human ventral temporal cortex. Neuron, 72(2), 404–416. https:// doi.org/10.1016/j.neuron.2011.08.026 Inhoff, M. C., Libby, L. A., Noguchi, T., Love, B. C., & Ranganath, C. (2018). Dynamic integration of conceptual information during learning. PLoS One, 13(11), e0207357. https:// doi.org/10.1371/journal.pone.0207357 Jones, M., & Love, B. C. (2011a). Bayesian fundamentalism or enlightenment? On the explanatory status and theoretical contributions of Bayesian models of cognition. The Behavioral and Brain Sciences, 34(4), 169–188. https://doi.org/10.1017/S0140525X10003134. discussion 188–231. Jones, M., & Love, B. C. (2011b). Pinning down the theoretical commitments of Bayesian cognitive models. Behavioral and Brain Sciences, 34(4), 215–231. https://doi.org/10.1017/ S0140525X11001439 Khaligh-Razavi, S.-M., & Kriegeskorte, N. (2014). Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Computational Biology, 10(11), e1003915. https:/ /doi.org/10.1371/journal.pcbi.1003915 Kragel, J. E., Morton, N. W., & Polyn, S. M. (2015). Neural activity in the medial temporal lobe reveals the fidelity of mental time travel. Journal of Neuroscience, 35(7), 2914–2926. https:// doi.org/10.1523/JNEUROSCI.3378-14.2015 Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems 25 (pp. 1097–1105). Curran Associates, Inc.. Kruschke, J. K. (1992). ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99, 22–44. Kubilius, J., Schrimpf, M., Nayebi, A., Bear, D., Yamins, D. L. K., & DiCarlo, J. J. (2018). CORnet: Modeling the neural mechanisms of core object recognition [Preprint]. Neuroscience. https:// doi.org/10.1101/408385

Linking Models with Brain Measures

35

Lambon Ralph, M. A., Lowe, C., & Rogers, T. T. (2006). Neural basis of category-specific semantic deficits for living things: Evidence from semantic dementia, HSVE and a neural network model. Brain, 130(4), 1127–1137. https://doi.org/10.1093/brain/awm025 Lee, S.-H., Kravitz, D. J., & Baker, C. I. (2019). Differential representations of perceived and retrieved visual information in hippocampus and cortex. Cerebral Cortex, 29(10), 4452–4461. https://doi.org/10.1093/cercor/bhy325 Lindsay, G. W., & Miller, K. D. (2018). How biological attention mechanisms improve task performance in a large-scale visual system model. eLife, 7, e38105. https://doi.org/10.7554/ eLife.38105 Love, B. C. (2015). The algorithmic level is the bridge between computation and brain. Topics in Cognitive Science, 7(2), 230–242. https://doi.org/10.1111/tops.12131 Love, B. C. (2020a). Levels of biological plausibility. Philosophical Transactions of the Royal Society B. https://doi.org/10.1098/rstb.2019.0632 Love, B. C. (2020b). Model-based fMRI analysis of memory. Current Opinion in Behavioral Sciences, 32, 88–93. https://doi.org/10.1016/j.cobeha.2020.02.012 Love, B. C., & Gureckis, T. M. (2007). Models in search of a brain. Cognitive, Affective, & Behavioral Neuroscience, 7(2), 90–108. Love, B. C., Medin, D. L., & Gureckis, T. M. (2004). SUSTAIN: A network model of category learning. Psychological Review, 111(2), 309–332. https://doi.org/10.1037/0033295X.111.2.309 Mack, M. L., Preston, A. R., & Love, B. C. (2013). Decoding the brain’s algorithm for categorization from its neural implementation. Current Biology, 23, 2023–2027. Mack, M. L., Love, B. C., & Preston, A. R. (2016). Dynamic updating of hippocampal object representations reflects new conceptual knowledge. Proceedings of the National Academy of Sciences, 113(46), 13203–13208. https://doi.org/10.1073/pnas.1614048113 Mack, M. L., Love, B. C., & Preston, A. R. (2018). Building concepts one episode at a time: The hippocampus and concept formation. Neuroscience Letters, 680, 31–38. https://doi.org/ 10.1016/j.neulet.2017.07.061 Mack, M. L., Preston, A. R., & Love, B. C. (2020). Ventromedial prefrontal cortex compression during concept learning. Nature Communications, 11(1), 46. https://doi.org/10.1038/s41467019-13930-8 Marr, D. (1982). Vision. W. H. Freeman. Martin, C. B., Douglas, D., Newsome, R. N., Man, L. L., & Barense, M. D. (2018). Integrative and distinctive coding of visual and conceptual object features in the ventral visual stream. eLife, 7, e31873. https://doi.org/10.7554/eLife.31873 Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological Review, 85, 207–238. Mok, R. M., & Love, B. C. (2019). A non-spatial account of place and grid cells based on clustering models of concept learning. Nature Communications, 10(1), 5685. https://doi.org/ 10.1038/s41467-019-13760-8 Molloy, M. F., Bahg, G., Lu, Z.-L., & Turner, B. M. (2019). Individual differences in the neural dynamics of response inhibition. Journal of Cognitive Neuroscience, 31(12), 1976–1996. https://doi.org/10.1162/jocn_a_01458 Momennejad, I., Otto, A. R., Daw, N. D., & Norman, K. A. (2018). Offline replay supports planning in human reinforcement learning. eLife, 7, e32548. https://doi.org/10.7554/eLife.32548 Morcos, A., Raghu, M., & Bengio, S. (2018). Insights on representational similarity in neural networks with canonical correlation. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Advances in neural information processing systems 31 (pp. 5727–5736). Curran Associates, Inc. http://papers.nips.cc/paper/7815-insights-onrepresentational-similarity-in-neural-networks-with-canonical-correlation.pdf Newell, A. (1980). Physical symbol systems*. Cognitive Science, 4(2), 135–183. https://doi.org/ 10.1207/s15516709cog0402_2 Newell, A. (1990). Unified theories of cognition. Harvard University Press.

36

B. C. Love

Niv, Y. (2019). Learning task-state representations. Nature Neuroscience, 22(10), 1544–1553. https://doi.org/10.1038/s41593-019-0470-8 Nosofsky, R. M. (1986). Attention, similairty, and the identification-categorization relationship. Journal of Experimental Psychology: General, 115, 39–57. Nosofsky, R. M., & Zaki, S. F. (1998). Dissociations between categorization and recognition in amnesic and normal individuals. Psychological Science, 9, 247–255. Nosofsky, R. M., Little, D. R., & James, T. W. (2012). Activation in the neural network responsible for categorization and recognition reflects parameter changes. Proceedings of the National Academy of Sciences of the United States of America, 109(1), 333–338. https://doi.org/10.1073/ pnas.1111304109 Palmeri, T. J., Schall, J. D., & Logan, G. D. (2015). In J. R. Busemeyer, Z. Wang, J. T. Townsend, & A. Eidels (Eds.), Neurocognitive modeling of perceptual decision making (Vol. 1). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199957996.013.15 Pitt, M. A., Myung, I., & Zhang, S. (2002). Toward a method of selecting among computational models of cognition. Psychological Review, 109, 472–491. Polyn, S. M., Norman, K. A., & Kahana, M. J. (2009). A context maintenance and retrieval model of organizational processes in free recall. Psychological Review, 116(1), 129–156. https:/ /doi.org/10.1037/a0014420 Purcell, B. A., Heitz, R. P., Cohen, J. Y., Schall, J. D., Logan, G. D., & Palmeri, T. J. (2010). Neurally constrained modeling of perceptual decision making. Psychological Review, 117(4), 1113–1143. https://doi.org/10.1037/a0020311 Purcell, B. A., Schall, J. D., Logan, G. D., & Palmeri, T. J. (2012). From salience to saccades: Multiple-alternative gated stochastic accumulator model of visual search. Journal of Neuroscience, 32(10), 3433–3446. https://doi.org/10.1523/JNEUROSCI.4622-11.2012 Pylyshyn, Z. W. (1984). Computation and cognition. Toward a foundation for cognitive science. MIT Press. Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59–108. Ritchie, J. B., & Op de Beeck, H. (2019a). Using neural distance to predict reaction time for categorizing the animacy, shape, and abstract properties of objects. Scientific Reports, 9(1), 13201. https://doi.org/10.1038/s41598-019-49732-7 Ritchie, J. B., & Op de Beeck, H. (2019b). A varying role for abstraction in models of category learning constructed from neural representations in early visual cortex. Journal of Cognitive Neuroscience, 31(1), 155–173. https://doi.org/10.1162/jocn_a_01339 Roads, B. D., & Love, B. C. (2019). Learning as the unsupervised alignment of conceptual systems. ArXiv:1906.09012 [Cs, Stat]. http://arxiv.org/abs/1906.09012 Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing: Explorations in the microstructure of cognition; Volume 1: Foundations. MIT Press. Seger, C. A., Braunlich, K., Wehe, H. S., & Liu, Z. (2015). Generalization in category learning: The roles of representational and decisional uncertainty. Journal of Neuroscience, 35(23), 8802– 8812. https://doi.org/10.1523/JNEUROSCI.0654-15.2015 Shadlen, M. N., & Kiani, R. (2013). Decision making as a window on cognition. Neuron, 80(3), 791–806. https://doi.org/10.1016/j.neuron.2013.10.047 Shanahan, L. K., Gjorgieva, E., Paller, K. A., Kahnt, T., & Gottfried, J. A. (2018). Odor-evoked category reactivation in human ventromedial prefrontal cortex during sleep promotes memory consolidation. eLife, 7, e39681. https://doi.org/10.7554/eLife.39681 Shen, G., Horikawa, T., Majima, K., & Kamitani, Y. (2019). Deep image reconstruction from human brain activity. PLoS Computational Biology, 15(1), e1006633. https://doi.org/10.1371/ journal.pcbi.1006633 Spiers, H. J., Love, B. C., Le Pelley, M. E., Gibb, C. E., & Murphy, R. A. (2017). Anterior temporal lobe tracks the formation of prejudice. Journal of Cognitive Neuroscience, 29(3), 530–544. https://doi.org/10.1162/jocn_a_01056 Spiers, H. J., Olafsdottir, H. F., & Lever, C. (2018). Hippocampal CA1 activity correlated with the distance to the goal and navigation performance. Hippocampus, 28(9), 644–658. https:// doi.org/10.1002/hipo.22813

Linking Models with Brain Measures

37

Stillesjö, S., Nyberg, L., & Wirebring, L. K. (2019). Building memory representations for exemplar-based judgment: A role for ventral precuneus. Frontiers in Human Neuroscience, 13, 228. https://doi.org/10.3389/fnhum.2019.00228 Sun, R. (2009). Theoretical status of computational cognitive modeling. Cognitive Systems Research, 10(2), 124–140. https://doi.org/10.1016/j.cogsys.2008.07.002 Tanenhaus, M., Spivey-Knowlton, M., Eberhard, K., & Sedivy, J. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268(5217), 1632–1634. https://doi.org/10.1126/science.7777863 Tubridy, S., Halpern, D., Davachi, L., & Gureckis, T. M. (2018). A neurocognitive model for predicting the fate of individual memories [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/ 7r3jp Turner, B. M., Forstmann, B. U., Love, B. C., Palmeri, T. J., & Van Maanen, L. (2017). Approaches to analysis in model-based cognitive neuroscience. Journal of Mathematical Psychology, 76, 65–79. Turner, B. M., Forstmann, B. U., & Steyvers, M. (2019a). Joint models of neural and behavioral data. Springer International Publishing. https://doi.org/10.1007/978-3-030-03688-1 Turner, B. M., Palestro, J. J., Mileti´c, S., & Forstmann, B. U. (2019b). Advances in techniques for imposing reciprocity in brain-behavior relations. Neuroscience & Biobehavioral Reviews, 102, 327–336. https://doi.org/10.1016/j.neubiorev.2019.04.018 Tyler, L. K., Moss, H. E., Durrant-Peatfield, M. R., & Levy, J. P. (2000). Conceptual structure and the structure of concepts: A distributed account of category-specific deficits. Brain and Language, 75(2), 195–231. https://doi.org/10.1006/brln.2000.2353 Usher, M., & McClelland, J. L. (2001). The time course of perceptual choice: The leaky, competing accumulator model. Psychological Review, 108(3), 550–592. https://doi.org/10.1037/0033295X.108.3.550 van Gerven, M. A. J. (2017). A primer on encoding models in sensory neuroscience. Journal of Mathematical Psychology, 76, 172–183. https://doi.org/10.1016/j.jmp.2016.06.009 Wang, J. X., & Voss, J. L. (2014). Brain networks for exploration decisions utilizing distinct modeled information types during contextual learning. Neuron, 82(5), 1171–1182. https:// doi.org/10.1016/j.neuron.2014.04.028 Wijeakumar, S., Ambrose, J. P., Spencer, J. P., & Curtu, R. (2017). Model-based functional neuroimaging using dynamic neural fields: An integrative cognitive neuroscience approach. Journal of Mathematical Psychology, 76, 212–235. https://doi.org/10.1016/j.jmp.2016.11.002 Wing, J. M. (2008). Computational thinking and thinking about computing. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 366(1881), 3717–3725. https://doi.org/10.1098/rsta.2008.0118 Xue, G. (2018). The neural representations underlying human episodic memory. Trends in Cognitive Sciences, 22(6), 544–561. https://doi.org/10.1016/j.tics.2018.03.004 Yamins, D. L. K., & DiCarlo, J. J. (2016). Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience, 19(3), 356–365. https://doi.org/10.1038/nn.4244

Reinforcement Learning Application to fMRI Vincent Man and John P. O’Doherty

Abstract Over the last two decades, the model-based approach to analysing functional magnetic resonance imaging (fMRI) data has been adopted across the cognitive neurosciences to study how computations are implemented in the brain. In this time, methods have advanced along both computational modelling and neuroimaging domains. This chapter aims to provide an introduction to the general method of integrating computational models into fMRI analyses as well as a discussion on contemporary considerations regarding these recent advances. The chapter begins with an exposition to the formalisation of qualitative psychological hypotheses into quantitatively testable and falsifiable computational models. We use examples from the conditioning and reinforcement learning literature to ground this discussion given the origin of model-based fMRI in uncovering neural correlates of learning processes. We then provide an overview of the methodological approach underlying model-based fMRI. This extends to pragmatic considerations when working in this domain with an eye towards more recent developments in both fMRI and computational modelling, such as multivariate analyses and assessing model quality, respectively. Finally, we provide examples in which computations described in the first section of the chapter were successfully bridged with fMRI analyses to provide a richer understanding of reinforcement learning in the brain. This chapter is therefore aimed at both the cognitive neuroscientist seeking to adapt computational approaches to their neuroimaging research as well as those specifically interested in learning and decision-making across levels of analyses.

V. Man () Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA e-mail: [email protected] J. P. O’Doherty Division of Humanities and Social Sciences, Computation and Neural Systems, California Institute of Technology, Pasadena, CA, USA e-mail: [email protected] © Springer Nature Switzerland AG 2024 B. U. Forstmann, B. M. Turner (eds.), An Introduction to Model-Based Cognitive Neuroscience, https://doi.org/10.1007/978-3-031-45271-0_3

39

40

V. Man and J. P. O’Doherty

Keywords Reinforcement learning · Computational model · Functional magnetic resonance imaging · Model-based fMRI

1 Introduction The application of computational models to probe neural implementations of cognitive processes is a maturing approach across multiple imaging modalities. In this chapter, we outline the basic approach and review the way it has been used in the domain of reinforcement learning. Our focus on reinforcement learning, a case study of a successful bridge between theory and empirical evidence, serves two general objectives. The first is to demonstrate the utility of computational approaches that formalise falsifiable accounts of mechanisms underlying psychological phenomena. The second is to bridge the computational approach with the treatment of functional magnetic resonance imaging data, such that precise inferences about the nature of cognitive processes conferred by computational models can be extended to understand how the brain implements these processes. To this end, in the first section, we present an introduction to reinforcement learning (RL) as a framework for understanding learning and decision-making through experience. To highlight the core and unique computations across RL model architectures, we review examples from Pavlovian and instrumental conditioning. We discuss the Rescorla–Wagner model that arose from work on animal Pavlovian conditioning and illustrate through example how specifying a core learning mechanism in the form of reward prediction error was conducive to generating testable predictions, and how formalising hypotheses through a computational model in turn led to developments that overcame initial theoretical shortcomings. The position we take throughout the chapter is that qualitative theories rooted in the psychology literature can be expressed using computational frameworks and that doing so is advantageous for theoretical advances. For example, mirroring the difference between Pavlovian and instrumental conditioning, whereby the latter incorporates the role of action alongside learning in the decision process, computational advances in RL expanded the Rescorla–Wagner learning mechanism to include a component for producing actions. We use this multi-component computational approach as an example of leveraging a formal treatment to better explain the variability of observed behaviour in human and animal data during learning. Our second objective is to motivate the integration of computational models, especially within the RL domain, into analyses of functional magnetic resonance imaging (fMRI) data. This section begins with a brief primer on the model-based fMRI approach in the univariate and multivariate domains, with a focus towards methodological considerations when working with RL models specifically. We proceed with issues related to applying RL models to behavioural data, including whether estimates of model parameters are reliable and identifiable, meaning that variables in a model have unique solutions and critically that different patterns of behaviour can be ascribed to differing levels of unique variables. Furthermore,

Reinforcement Learning

41

we touch on methods in assessing the quality of a model as a whole. These issues importantly interact with the statistical model employed in bridging computational models with fMRI and bear weight in making sound inferences from neural data generally. This discussion naturally extends to pragmatic considerations when leveraging components of a RL model for fMRI analyses. The practical section of this chapter is anchored in existing examples where RL models have been applied to neuroimaging data, with the goal of highlighting the advantages that computational approaches provide over traditional fMRI analyses. These examples cross multiple temporal scales to make inferences about processes occurring within a trial, between trials, and across individuals. The chapter concludes with an attempt to address the question of what fMRI, and neural data in general, can tell us about psychological processes above and beyond behaviour. We employ illustrative examples from the story of reward prediction error in the RL literature and argue that not only can computational models delineate the neural implementation of separable cognitive processes in a forward ‘brainmapping’ approach, but that neural data can contribute to the adjudication of multiple psychological hypotheses when competing candidate models might make similar behavioural predictions. Throughout we argue for a synergistic relationship between computational and neuroimaging approaches.

2 Reinforcement Learning 2.1 Pavlovian Conditioning The predominant objective of reinforcement learning is to learn though experience how to maximise reward in a given environment. In the psychological literature, the role of reinforcement as a core learning process is underpinned by the range of behavioural phenomena described across the Pavlovian and instrumental conditioning literatures. In Pavlovian (i.e., classical) conditioning, initially neutral cues (the ‘conditioned stimulus’; CS) are repeatedly paired with innately aversive or appetitive stimuli (the ‘unconditioned stimulus’; US), in turn causing reflexive responses (the ‘unconditioned response’; UR). Through repetitive experience of the paired CS and US, the observing animal learns to associate the innately valenced outcome with the originally neutral CS, such that the subsequent sole presentation of the CS elicits the now-conditioned response (CR). This phenomenon is typically exemplified by a scenario in which a dog salivates (UR) in the presence of meat powder (US). If the presentation of the meat powder is repeatedly paired with a bell (CS), over multiple experiences, the bell will in turn cause the dog to salivate (CR) (Pavlov & Anrep, 1927). While well-described in the behavioural psychology literature, open questions remained regarding the mechanisms underlying this associative learning process. Questions regarding the internal processes that give rise to this observation might include the nature of how the mental representation of the CS changes over

42

V. Man and J. P. O’Doherty

time as it is repeatedly paired in presentation with the US. It is intuitive to posit that this representation changes given that the CS was originally neutral but eventually is able to elicit reflexive behaviour; the contribution of a computational modelling approach to address this intuition is that the model makes explicit the experimenter’s hypothesis about the nature of the transformation through a mathematical expression. In the case of Pavlovian conditioning, the Rescorla–Wagner (R-W) model is the preeminent example of this approach to theory. It importantly advanced the idea that the discrepancy between an observation and expectation (Bush & Mosteller, 1951), called the prediction error .δ, drives the incremental update of internal variables such as the associative strength of the CS (Rescorla, 1972). The prediction is formalised as δ = λU S − V CS

.

(1)

λU S corresponds to the maximum conditioning possible given the appetitiveness or aversiveness of the US. For example, it could represent how appetitive the meat powder would be to the dog, and thus the limit of the value that an initially neutral cue could acquire over learning.1 .V CS represents the current intrinsic value of the CS and is updated from trial to trial according to the prediction error:

.

V CS ← V CS + αδ

.

(2)

Importantly, .δ is scaled by a parameter .α that controls the rate in which .V CS updates, and needs to be estimated (i.e., a ‘free’ parameter). The scaling factor .α therefore corresponds to the rate in which the CR acquisition curve reaches its asymptote and is often referred to as the learning rate.2 Once learning approaches completion, in that the value of the CS converges to the maximal value conferred by the US, the prediction error will decrease towards 0. Together the R-W model is meant to formalise a hypothesis regarding the mechanism through which a CS acquires the value of a US during Pavlovian conditioning. Qualitatively, the hypothesis posits that the estimated value of a CS is updated sequentially and according to a weighted combination of its previous estimated value and the current observation. The model thus gives the experimenter direct quantitative access to each subject’s learning rate, the relative contribution of old and new information

1 There

are a variety of ways in which .λU S might be set depending on the experimental context. A simple coding would be to define .λU S as 1 on trials in which a reward is presented and 0 otherwise, for example, in experiments where reward magnitudes are not manipulated. Alternatively, the magnitude of the UR elicited by the US may be a useful metric. 2 It should be noted that the full formulation of the R-W model includes a second learning rate parameter associated with the US to incorporate the assumption that the rate of learning may also depend on the particular US in the experiment (Rescorla, 1972).

Reinforcement Learning

43

towards the learning process, and on a trial-by-trial basis the amount in which the value of the CS changes through the prediction error. As we will spend the latter half of this chapter discussing, this gives the experimenter an additional tool to probe how learning processes are implemented in the brain. One hallmark of any qualitative or quantitative theory lies in its ability to explain not only a single phenomenon, but also to generalise and be useful for interpreting diverse empirical findings. The ability to do so would suggest that the theory is capturing a core, domain-general process. It was therefore a success of the R-W model to not only provide a mechanistic description of the learning process during Pavlovian conditioning, but also explain a phenomenon known as blocking, in which previous learning about a CS can prevent, i.e., ‘block’ new learning about a new CS when they are concurrently paired (Kamin, 1969). This effect can be encompassed CS that represents the neatly within the R-W formalism by considering .V CS as a .Vnet aggregate value of all CS.i : CS Vnet =



.

CSi

(3)

i

Because the first pre-trained CS predicts the US and thus already asymptotically approaches the maximal value of the US (i.e., .λU S ), when the new CS is concurrently presented, the net value across both CS would already be very close or the same as .λU S . Consequently, the prediction error .δ would be close to 0. With no update signal, no new learning will occur for the second CS (Rescorla, 1972).

2.1.1

Temporal-Difference Learning

Like all models, the R-W theory is limited in its explanatory scope. One significant example is the inability for the R-W model to explain second-order conditioning. In second-order conditioning, a first-order CS is initially paired with the US and therefore acquires associative strength. A second CS is subsequently paired with the first-order CS; this ‘second-order’ CS then elicits a CR from the animal despite never having been directly paired with the US (Rescorla, 1972; Rizley & Rescorla, 1972; Holland & Rescorla, 1975). Because the mechanisms of the theory are made explicit with a computational model, we can see that the R-W theory would instead predict the second-order CS to signal the absence of a US. An animal would have learned to expect the US in tandem with the presentation of the first-order CS, so when the first- and second-order CS are paired, the value .V CS is positive, .λU S is 0 because no US is present, and .δ is therefore negative. Repeated pairings of the first- and second-order CS would be predicted by the R-W model to result in the second-order CS taking on negative value (i.e., a conditioned inhibitor; Miller et al. (1995)).

44

V. Man and J. P. O’Doherty

A natural intuition at this point would be to question why a learning signal, in the form of prediction error, would only be specified to occur at the time of US presentation. In the case of a typical conditioning experiment, this would correspond to once at the end of each trial. Given that the first-order CS was initially already paired with the US, one perspective may be to regard the first-order CS itself as a ‘pseudo-US’, in the sense of holding its own maximal value (e.g., .λCS1 ) thus resulting in a similar prediction error computation when the first- and secondorder CS are paired. Indeed, this intuition is the predominant mechanism in a later refinement of R-W: the temporal-difference (TD) model (Sutton, 1988). The TD model extends R-W (Sutton & Barto, 1981, 1987) by describing a mechanism through which intermediate cues that hold reward-predictive information are able to broadcast that information to new cues, through an adapted form of prediction error: δtT D = rt + V (st+1 ) − V (st )

.

(4)

The extension of the TD prediction error over R-W into a finer level of temporal resolution is denoted by the subscript t that refers to a particular time bin within a trial, making TD a ‘real-time’ model (Hull, 1939; Sutton, 1995).3 The TD formalism also differs from R-W by introducing the notion for a state s that refers to intermediate situations encountered within the span of a trial, such as sequentially presented stimuli in the context of a simple Pavlovian task (Sutton & Barto, 1987; Daw & Tobler, 2014). The goal is thus to estimate the value of being in a state .V (st ) (e.g., the value of the second-order CS), and this estimation is updated after repeated experience in the same manner as Eq. (2), albeit with this more general prediction error. Like other variants of RL models, TD shares with R-W a core principle of error-based updating; however, TD fundamentally differs from R-W by describing mechanisms of predicting future expected reward across time intervals within a trial. The implication of this computational difference is that multiple possible error signals can be computed within the span of a single trial, which not only captures phenomena otherwise insufficiently explained by R-W such as secondorder conditioning, but also offers the experimenter a potential methodological tool to look at cognitive processes at a finer temporal scale. Nevertheless, the initial formalisation of a prediction error-based mechanism in R-W in turn inspired the TD model, providing an example of one advantage of the computational approach generally in refining and generating theory.

3 Here we ignore some important aspects of the TD framework for simplicity, such as temporal discounting and eligibility traces. References to relevant papers on these aspects are presented in Sect. 7.

Reinforcement Learning

45

2.2 Instrumental Conditioning Making good decisions in service of the goal of maximising reward in part depends on being able to learn which aspects of the world are predictive of the reward that we seek, as described in the previous section. However, a fuller description of the decision process requires not only a learning mechanism, but also an account of how we produce actions that can bring us to those desired rewards. Critically, good choices between potential actions are not made randomly but are informed by what we have learned. Instrumental conditioning is distinguished from Pavlovian conditioning in describing how we actively interact with the environment to elicit rewards and avoid punishments. It therefore bridges the processes involved in learning and decision-making. Consistent with the experimental framework we used above to describe Pavlovian conditioning, in which stimuli (S) are predictive of outcomes (O), instrumental conditioning includes an intermediate active step of responses (R) to stimuli that then may elicit outcomes (i.e., S-R-O associations). This incorporation of a response to a stimulus consequently introduces the notion of control to the chain of processing whereby the human or animal is not passively learning, but needs to figure out what to do in order to gain more reward. This process of figuring out what to do can range from very simple trial-and-error-based learning mechanisms, in which various actions are tried and those that led to desired outcomes are strengthened and more likely to be repeated (Thorndike, 1898; Skinner, 1963), to more complex deliberative actions that might be planned according to a representation of the environment (Tolman, 1948). These differences in instrumental behaviour are respectively well-described by the psychological literature on habits and goals and the computational framework of model-free and model-based learning (O’Doherty et al., 2017; Dolan & Dayan, 2013).

2.2.1

Actor-Critic Model

Accordingly, to better capture the processes involved in decision-making a computational approach needs to extend beyond the learning processes captured by models of Pavlovian conditioning and describe a mechanism for producing actions. Given the task of having to choose between alternatives, the intuitive link between these two mechanisms is that a good choice is the one that will result in the most total reward. Thus RL models that bridge learning and making good choices critically specify the action mechanism to be a function of the predicted values of the alternatives, which might be learned through an error-based process as described above (see Eqs. (1) and (4)) (Lau & Glimcher, 2005). A first suggestion of a straightforward action rule would be to simply choose the alternative most predictive

46

V. Man and J. P. O’Doherty

of reward, known as a ‘greedy’ policy.4 However, the specification of the action mechanism carries its own considerations. For example, it is also reasonable to posit that a good way to act in the world would be to not exclusively pursue the most reward-predictive alternative, but sometimes choose potentially less rewardpredictive alternatives in order to avoid errors of omission and gain information, known respectively as exploitation and exploration (Gittins & Jones, 1979; Sutton et al., 1998). Several candidate action mechanisms have been introduced to deal with this exploration–exploitation trade-off. For example, a simple modification to the greedy policy to promote exploration would be to generally pick the alternative with the currently highest estimated value, but sometimes with probability .ϵ randomly choose something else: the .ϵ-greedy policy (Sutton et al., 1998). Another potential candidate mechanism, which we discuss in more depth here, is to make choices in proportion to their estimated value, formalised using a softmax5 function: exp(βV (i)) p(a = i) =  j exp(βV (j ))

.

(5)

where the probability that the action a is to choose alternative i is a function of its value .V (i) relative to the value of other possible actions, denoted by j . Importantly, this function includes another free parameter, the inverse temperature .β, which controls the trade-off between choosing the alternative with the highest value versus non-best alternatives (when .β → 0). We highlight this particular action mechanism given that it has been shown to be more representative to how humans actually trade off exploration versus exploitation (Daw et al., 2006). The conjunction of a mechanism describing how values are learned through experience, alongside a mechanism describing how that predictive information is used to inform choosing good actions, brings us closer to a fuller account of the decision process. Indeed, an architecture known as the actor-critic model, described initially in computational work but subsequently leveraged for neuroscientific insight, is captured by the combination of these two components of learning and choice (Witten, 1977; Barto et al., 1983; Sutton et al., 1998; Colas et al., 2017). One common version of the actor-critic model incorporates variants of the mechanisms we have already discussed throughout the chapter but integrates them into a unified framework that describes how an agent is able to simultaneously learn value estimates and improve its policy to select good actions (Fig. 1). The model comprises a critic that evaluates the quality of the new state brought forth by the previous action. This evaluation takes the form of TD error (Eq. (4)), and value estimates are accordingly refined by an update rule similar to that described above (Eq. (2)). At the same time, the critic passes update information, in the form of

4 The

term ‘policy’ refers to how an animal or human acts given the situation they face.

5 In the case of two alternatives, the softmax function reduces to a logistic sigmoid function of their

difference.

Reinforcement Learning

47

Fig. 1 The actor-critic model. The TD error computed by the critic is sent to update both the value function and the policy, which the actor employs to select actions. Adapted from Sutton et al. (1998)

TD error, to the actor that uses this information to refine preferences for actions according to a similar update rule: p(s, a) ← p(s, a) + ηδ T D

.

(6)

where .p(s, a) denotes the propensity to choose an action a in a given state s, and .η is another free parameter that controls the amount of update, just like the learning rate in Eq. (2). The model finally outputs concrete actions according to the same softmax rule (Eq. (5)) though here as a function of the action propensities .p(s, a) (Sutton et al. (1998); see Fig. 1). A compelling aspect of the actor-critic framework is that it is able to account for both learning values and choosing good actions succinctly, in that both are updated by a common source of information (the divergent arrow in Fig. 1). In this way, it provides a more complete computational picture of the reinforcement learning problem by describing mechanisms through which animals and humans can both interact well with their environment to produce good actions that maximise reward, and use feedback from this interaction to learn. This example furthermore highlights the process through which computational approaches advance psychological theory, by expressing the mechanisms underlying relevant processes and integrating these mechanisms to better explain findings in the empirical literature. Finally, as we explore further in Sect. 5, the actor-critic framework serves as a useful example of leveraging the explicit theoretical structure afforded by computational models to disentangle separable psychological processes in the brain.

48

V. Man and J. P. O’Doherty

3 Model-Based fMRI 3.1 Univariate Approach The spirit of a model-based approach to analysing and interpreting neural signal, such as the blood-oxygen-level-dependent (BOLD) signal measured with fMRI, is very similar to that of taking a model-based approach to make inferences from behaviour. A model-based approach to fMRI affords greater insight into mechanism just like computational formalisation of qualitative behavioural hypotheses. Applying the model-based approach to neuroimaging allows the experimenter to probe how regions of the brain might implement a particular process (O’Doherty et al., 2007; Gläscher & O’Doherty, 2010). This grants the experimenter additional insight compared to traditional approaches to fMRI analyses, in which neural signal is typically correlated with direct changes across an experimental manipulation. In fact, one of the principal goals of model-based fMRI is to leverage this advantage to ultimately link across levels of analyses (Marr & Poggio, 1976; Niv & Langdon, 2016). We begin this section by briefly reviewing the methodological approach and then jump into discussion on contemporary considerations for both computational and statistical modelling. The model-based fMRI method includes as a first step the specification of computational models according to specific hypotheses about the process of interest. As such it involves procedures of model fitting (i.e., estimating the free parameter(s) of the model with respect to behavioural data using maximum likelihood estimation [MLE]) and model selection (via quantitative and qualitative model comparison); these methods have been discussed at length in the literature (see Sect. 7) (O’Doherty et al., 2007; Gläscher & O’Doherty, 2010; Daw et al., 2011; Wilson & Collins, 2019; Palminteri et al., 2017). In the following Sect. 4, we discuss assessments of model quality relevant for model-based fMRI. However, we begin here by assuming that reliable parameter estimates of a high-quality model of behaviour have already been derived, allowing us to introduce the method of integrating computational components into the statistical models used to analyse fMRI data. Critically, the first consideration concerns the level of temporal resolution of the process of interest, and consequently the component of the computational model that speaks to this process. Variables of interest can be extracted from RL and other computational models at multiple temporal scales: from within-trial dynamics as in the case of TD error, to between-subject characteristics such as varying learning rates. Statistical models employed in typical univariate fMRI analyses similarly model effects at different temporal scales in the data, from trial-by-trial fluctuations in the BOLD signal to subject-level averages (Worsley et al., 2002; Friston et al., 1999). Once the appropriate level of temporal analysis has been determined with respect to the computational variable of interest, it is relatively straightforward to incorporate the computational variable in the general linear statistical model (GLM)

Reinforcement Learning

49

of the BOLD signal (Friston et al., 1994). For example, to examine neural correlates of prediction error, which is a between-trial continuous variable derived from the model, a regressor corresponding to the prediction error signal can be created by parametrically modulating a trial-wise stick (or block) function locked to the timing of feedback onset when RPEs are expected to be computed. Like regressors in typical fMRI analyses, this resulting trial-wise time series is then convolved with a basis or hemodynamic response function to account for hemodynamic lag in the BOLD signal and entered into the GLM (Fig. 2a; O’Doherty et al. (2003, 2007); Büchel et al. (1998)). It is important to also include in the GLM a non-modulated regressor corresponding to the time point at which the computational variable is specified, such as feedback onset in the case of prediction error. This allows for a more precise interpretation of the degree to which the computational variable explains variance in the BOLD signal above and beyond brain responses to trial event onsets, such as changes in the visual input (Fig. 2b).6 It is also important to think deeply about what other regressors of interest should be included in the GLM. Just as when working with GLMs outside of the fMRI context, each regressor included in the statistical model competes to explain variance in the dependent measure. Thus the effect of each regressor should be carefully interpreted as the contribution of that regressor in explaining variance in the BOLD signal, above and beyond the variance explained by other regressors in the set. In some cases, leveraging this competition between regressors can be useful for discriminating between model components that may not be distinguishable by their behavioural predictions alone (see Sect. 5; Hampton et al. (2008)). Care should be taken to verify the degree of correlation (i.e., multicollinearity) between convolved regressors in the GLM.7 Increasing correlation between regressors in a model decreases the reliability of their respective coefficient estimates (Mumford et al., 2015). For example, a regressor modelling RPE is typically highly correlated with one modelling the reward outcome given that RPE is computed as a linear function of the outcome and both variables correspond to the same timings within a trial, complicating the disentanglement of these variables in the brain when they are entered into the same model (Rutledge et al., 2010; Caplin & Dean, 2008).

3.2 Multivariate Analyses In many ways, the model-based fMRI approach is similar to traditional, computational model-independent fMRI analyses that specify continuous regressors in 6 It can be important to check whether the parametric regressor is normalised or mean-centred. If not, it can be artificially highly correlated with the corresponding onset regressor, particularly if the parametric variable has only positive values. Some, but not all, common fMRI software packages automatically scale this regressor. 7 This can be done easily with most standard fMRI software packages, which include functions to estimate the efficiency of the GLM design (Fig. 2c).

50

V. Man and J. P. O’Doherty

Fig. 2 Setting up a statistical model for model-based fMRI. (a) Example of the variability of a reward prediction error derived from a simple RL model at the trial level. A signal such as this would be convolved with the BOLD HRF (b). (c) Regressors at the temporal scale of the BOLD signal (TR). The top row depicts non-modulated regressors (before convolution) corresponding to stimulus (black) and feedback (grey) onset. The middle row shows an example of two corresponding computational variables respective to each trial event in a typical bandit task: the value of the chosen stimulus (black) and the RPE (grey). The bottom row shows the same two computational variables after convolution with the HRF; these finally comprise the regressors entered into the fMRI GLM design (d). (e) Design efficiency matrix depicting the correlation (multicollinearity) between the regressors depicted in (d). For (d) and (e) the regressor IDs are: (0) non-modulated stimulus onset; (1) non-modulated feedback onset; (2) value of the chosen stimulus; (3) RPE

Reinforcement Learning

51

the statistical design matrix (e.g., pain ratings (Büchel et al., 2002)). Analogously, multivariate extensions of the model-based fMRI approach retain many of the same methodological procedures and considerations as their model-independent counterparts (see Sect. 7 for a list of relevant review papers on the method). Rather than modelling the BOLD signal on a voxel-by-voxel basis, multivariate pattern analyses (MVPA) exploit the spatial pattern of brain activity related to a particular psychological process (Norman et al., 2006; Haynes & Rees, 2006) in a manner inspired by work in theoretical neuroscience about the structure of neural representation as distributed population codes (Pouget et al., 2000). Multiple lines of research across cognitive neuroscience have used MVPA to discriminate between hidden cognitive states (Haxby et al., 2001; Polyn et al., 2005), to characterise the structure of neural representation space (Edelman et al., 1998; Schuck et al., 2016), and even to reconstruct stimuli experienced by the participant from the fMRI signal (Naselaris et al., 2009; Schoenmakers et al., 2013). Critically, in the domain of value-based decisions, MVPA has revealed insights about the brain otherwise undetected by standard mass univariate approaches (Kahnt et al., 2011), though this not unexpected given that a univariate approach targets the mean activation across a region of voxels, whereas MVPA targets the variable encoding of information across those voxels (Norman et al., 2006; Davis et al., 2014). Multivariate fMRI analyses commonly fall into two classes of decoding (e.g., classification) and representational similarity analysis (Haynes, 2015; Cohen et al., 2017) that are unified in the principle that information in the brain can be encoded across spatially distributed patterns. The general analytic approach in the classification domain is to reverse the direction of prediction exemplified in the GLM used with mass univariate analyses. Instead of predicting the BOLD signal with a set of regressors, a decoding model predicts the experimental or computational variable (the target variable) using a set of regressors that correspond to the set of voxels in the brain region under examination (the feature set). This set of features derived from the BOLD signal thus represents in the model the contribution of a distribution of voxels in explaining the variance in the target variable, such as a discrete variable coding different experimental conditions or a continuous variable derived from a computational model. One model-based extension is to simply define the target variable in terms of a computational variable, either in the discrete (i.e., classification) or continuous (i.e., regression) case (e.g., Mack et al., 2013). For example, trials across the task could be labelled according to whether or not participants exploited or explored on that trial, which might be inferred by whether or not they chose the alternative with the highest subjective value, respectively. This classification would be model-based since inference of the alternatives’ subjective values depends on a computational model capturing hypotheses about the participants’ valuation process. Moreover, trial-by-trial continuously varying expected values, which similarly depends on a model of how values are estimated, could be specified as the target variable to be

52

V. Man and J. P. O’Doherty

predicted by a set of voxels in a region of interest (Kahnt et al., 2011).8 Recall the methodological step in mass univariate analyses of convolving a computational variable with the HRF to create regressors that account for the hemodynamic lag and are in same temporal space as the BOLD activity (i.e., at the same sampling rate as the repetition time [TR]). Because the direction of inference is reversed in MVPA, the BOLD activity needs to be correspondingly transformed to the space of the computational variable (i.e., the experimental space, such as trials) and typically normalised before entering into the feature set. Multiple methods have been used for this transformation, such as computing trial-by-trial estimates by using a firststage GLM (i.e., deconvolution; Mumford et al. (2012, 2014)), averaging over TRs that span a relevant time window within a trial corresponding to the target variable (Haynes, 2015), or even training decoders in a time-resolving manner (e.g., TR by TR; Kriegeskorte et al. (2006); Kahnt et al. (2011); Polyn et al. (2005)). A similar multivariate approach, albeit with a different analytic objective, is embodied in the branch of MVPA known as representational similarity analysis (RSA; Kriegeskorte et al., 2008). Rather than the goal of classifying internal cognitive processes, the principal objective of RSA is to characterise the representational space of these processes and the degree to which the structure of information encoded in distributed neural activity corresponds to a hypothesised abstract structure predicted by a theoretical model, thereby indicating whether and how information is encoded across voxels in a region (Diedrichsen & Kriegeskorte, 2017; Kriegeskorte et al., 2008). The RSA approach abstracts away from specific measurement, experimental, and computational spaces to a generalised space of (dis)similarity that then allows for explicit quantitative comparison (Kriegeskorte et al., 2008). Because representational models correspond to hypotheses about the structure and distribution of neural activity across conditions, computational models can be intrinsically integrated into the RSA approach (Diedrichsen & Kriegeskorte, 2017). A stimulus (or condition in a task) is associated with a representational pattern defined by the voxel-wise BOLD activity it elicits, across voxels in a region. The representational patterns across the stimulus set are abstracted by computing the dissimilarity between each pair of stimuli, resulting in a stimulusby-stimulus dissimilarity matrix for that region. This matrix can consequently be compared against matrices in the same abstract space corresponding to predefined hypotheses (Kriegeskorte & Kievit, 2013; Kriegeskorte et al., 2008). In the domain of value-based decision, such an approach has been used to dissociate attention and action selection processes in prefrontal firing rates with nonhuman primate neurophysiology (Hunt et al., 2018), in which the authors tested the relative explanatory power of dissimilarity matrices representing different hypothesised processes. 8 In

the case of Kahnt et al. (2011), values were computed from a model based on the objective features of the choice alternatives, rather a model of subjective valuation that depends on parameters fit to participants’ data. One possibility is that in the latter case it is advantageous to binarise computationally derived continuous variables, thereby turning the problem from one of regression to classification, depending on the reliability of the parameter estimates. Further simulation work is required to test these different approaches.

Reinforcement Learning

53

Explicit quantities predicted by computational models can also be abstracted into a dissimilarity space for comparison against the BOLD data. For example, this approach was used to demonstrate that log-posterior beliefs about the latent state underlying observed stimuli in a categorisation task held a representational structure that corresponded to distributed BOLD activity in the suborbital sulcus (Chan et al., 2016).

4 Considerations When Linking RL and fMRI Models 4.1 Evaluating Model Quality In the preceding Sects. 3.1 and 3.2, we assumed that computational models were well-selected and -specified with respect to the behavioural data. We put aside issues related to estimating model parameters in order to describe the univariate and multivariate analytic approach of linking model-derived variables to the fMRI data. We return now to examine various considerations in the realm of computational modelling relevant for linking model components to neural data. We begin with a concern regarding the non-identifiability of model parameters, in which multiple values of a model parameter result in equivalent performance (Gershman, 2016; Wilson & Collins, 2019; Casella & Berger, 2021). In other words, different combinations of parameter values in the model result in equal ability of the model to capture variance in the observed data. Consequences of this problem include the inability to clearly ascribe changes in behaviour to unique parameter values, which accordingly translates to a lack of inferential precision when computational variables derived from the model parameters are modelled against fMRI data (though see Wilson and Niv (2015)). Here we highlight this issue in the context of standard RL models, as well as other methodological concerns when working with RL models that have implications for downstream model-based fMRI analyses. We also review diagnostic tools to examine the extent of these problems. RL models of the form we discussed in Sect. 2 suffer from some degree of non-identifiability between its two free parameters: the learning rate .α (Eq. (3)) and the inverse temperature .β (i.e., logistic coefficient, Eq. (5); Daw et al., 2011; Gershman, 2016). Simulation work to characterise the nature of the correlation between the .α and .β parameters has demonstrated that the degree to which .α is positively predictive of behavioural improvement, the purported cognitive process meant to be captured by this mechanism, is moderated by the value of .β (Wilson & Collins, 2019). This issue could then propagate to fMRI analyses that rely on model parameter estimates, for example, when looking at individual differences in learning rates (Lebreton et al., 2019). There are multiple ways to diagnose the extent of this problem in the model at hand, with respect to a particular cognitive task. For example, the distribution of the likelihood corresponding to combinations of the two parameters can be plotted

54

V. Man and J. P. O’Doherty

(Fig. 3a); the mode of the distribution (i.e., its highest value) corresponds to the MLE, and the shape of the distribution is informative of correlations between the parameters (Daw et al., 2011). As this method requires the computation of the joint likelihood for a large range of parameter values, it can be computationally expensive in models with more free parameters, or for models where the likelihood function requires Monte Carlo approximation (e.g., Milosavljevic et al. 2010; Hutcherson et al. 2015). Instead, quantities can be computed from this multivariate likelihood function. The inverse Hessian matrix evaluated at the MLE provides an estimate of the variance and covariance of the respective parameters; large covariance values are diagnostic of non-identifiable parameter estimates as changes to the parameter estimate result in similar likelihood values (Daw et al., 2011). Expanding the MLE problem, Bayesian parameter estimation techniques that calculate or approximate the posterior distribution over the model parameter(s) provide information on both the degree of correlation between parameters, as well as the uncertainty around the parameter estimates (Gelman et al., 2013; Lee & Wagenmakers, 2014). Another simple alternative approach is to examine the correlation between fitted parameter estimates across subjects, in which strong correlations can be diagnostic of issues with parameter identifiability (Gershman, 2016; Wilson & Collins, 2019). Another important consideration when designing and fitting computational models with the intent of translating the model to fMRI analyses is whether the particular model at hand adequately captures the behavioural variability relevant to the cognitive process of interest. A common method for comparing between competing models, each representative of a particular psychological hypothesis, is to perform quantitative model comparison. With various methods to capture hierarchical structure in the data (Stephan et al., 2009; Piray et al., 2019), this procedure generally trades off the likelihood of the model against the complexity of the model, operationalised as the number of free parameters to estimate (Akaike, 1974; Schwarz et al., 1978; Geman et al., 1992) (see Sect. 7). However, it is important to consider that common metrics used in quantitative model comparison such as the Bayesian Information Criterion (Schwarz et al., 1978) is a summary description of the model quality based on the total likelihood across the entire data set. The implication of this is that the total likelihood alone does not provide a complete picture of what aspects of the data the model is able to capture (Palminteri et al., 2017), and importantly, whether the generative model is able to reproduce patterns observed in the empirical data critical for interpreting the underlying cognitive process of interest. If not, it would indeed be problematic to then extract computational variables from this model for fMRI analyses, with the goal of making inferences about how computational mechanisms are instantiated in the brain. For example in the RL domain, observed behaviours (i.e., choices between alternatives) are interpreted to reflect the contribution of a particular variable, such as expected value, towards decisions. While this interpretation is usually reasonable, it is also plausible that other variables come into play during a subset of the experiment, which on those trials might better explain the choice data. A model incorporating the expected value of the alternatives as the only choice variable may still fit the data relatively well, overall but the set of models tested could be inadequate in

Reinforcement Learning

55

capturing other choice variables that come into play. It is therefore important to also qualitatively verify the model before commencing with model-based fMRI analyses.

4.2 Addressing Model Considerations The degree to which the model captures observed behaviour can be assessed qualitatively through simulations, bringing the model selection procedure closer to a form of absolute, rather than relative, comparison (Palminteri et al., 2017). The related methodological procedure of parameter recovery not only contributes to assessing in more qualitative terms the behaviour of a model across a range of parameter regimes but also serves as a diagnostic tool for assessing the identifiability of the model parameters (Fig. 3b). In parameter recovery, a simulation step is followed by parameter estimation to assess the ability for the model to recover the ‘ground-truth’ generative parameters driving the simulation (Wilson & Collins, 2019). Critically, informative simulation procedures need to integrate the structure of the experimental task: A simulation is essentially running the model with specified parameter values through the task and having the model produce data of the same form as would be observed with real data (e.g., choices, reaction times, etc.), for a range of generative parameter values. During the recovery phase of the procedure, these simulated data are treated as if it were from a real subject, in that the same model’s free parameters are fit via optimisation procedures, without access to the ground-truth generative parameter values. Finally, the generative values are compared against the estimated values; the closer these values are to each other, the more successfully the parameters are ‘recovered’ (Sect. 7 provides references to papers that go into more detail on particular considerations of the parameter recovery procedure). The intuition is that the computational model is performing the particular task in the simulation exactly according to the mechanisms that have been specified by the experimenter. If the generative parameters cannot be recovered when the latent variables (e.g., free parameters) within the mechanisms are fully known, then fitting the model to real data when the latent variables are inaccessible and interpreting these latent variables or integrating them into fMRI analyses become uninterpretable and problematic. Parameter recovery analyses should be done prior to analysing observed data, and ideally before data are collected as part of the iterative procedure of refining the experimental design. Once the data have been collected, there are several ways to continue addressing the computational modelling concerns outlined in Sect. 4.1. For example, it has been demonstrated that informing the parameter estimation procedure with empirical priors, thereby optimising for the maximum a posteriori rather than maximum likelihood estimates, can improve the identifiability and reliability of .α and .β parameter estimates in basic RL models (Fig. 3c; Gershman (2016)). Simulating the behaviour of a model after it has been fit to real data is similarly critical for assessing the quality of the model and is a very informative approach to model comparison. Here the ‘posterior predictive check’ procedure

56

V. Man and J. P. O’Doherty

(Gelman et al., 1996) is similar to the simulation component of parameter recovery exercises, except that instead of simulating a range of models with plausible generative parameter values, the simulation is run with parameter values estimated from the subject’s data. The output of this simulation can then be plotted against real human data to assess whether or not the model is able to reproduce what the subject does during the task (Fig. 3d). A critical check at this stage is to assess whether the model is able to produce effects observed independently of the computational model in the real data, such as statistically significant differences between key experimental conditions.

5 Bridging Levels of Analyses 5.1 Neural Correlates of Computational Processes We end the chapter with a review of classic and contemporary examples in the RL literature in which a model-based approach to fMRI analysis has led to a successful bridging between computational theory and neural data. In this section, we begin by discussing early fMRI work that sought to map the functional organisation of the brain, but do so at a process level that characterises how computations are implemented in spatially separable regions. In Sect. 2.1.1, we discussed the extension of the early R-W model of Pavlovian conditioning to account for learning within the course of a trial and the contribution of TD prediction error (Eq. (4)) to within-trial update signals. Early work in non-human primate neurophysiology documented the role of phasic dopamine neurons in encoding signals reflecting TD prediction error (Montague et al., 1996; Schultz et al., 1997). Motivated by this finding, an early model-based fMRI study in the context of a Pavlovian conditioning task specified regressors corresponding to the (HRF-convolved) model-generated RPE signal at multiple time points within the span of a trial (at the onsets of the CS and US) (O’Doherty et al., 2003). The authors found evidence corroborating the previous neurophysiological findings, in that model-derived RPE signals correlated with BOLD activity in the bilateral ventral striatum, rich in dopamine receptors. Over the course of learning, the left ventral striatum RPE signal propagated back in time from the US to CS onset, consistent with a TD prediction error. A similar analytic approach was subsequently employed to test the neural implementation of the actor-critic model described in Sect. 2.2.1 in the context of instrumental conditioning (O’Doherty et al., 2004). Recall that the actor-critic model delineates two components involved in the learning and action selection processes that underpin instrumental conditioning. The critic computes a single TD prediction error that is shared in updating both state value estimates and action propensities (Fig. 1). Importantly, the state value estimation process is hypothesised to be involved under both Pavlovian and instrumental conditioning contexts, whereas updating action propensities is only relevant under instrumental

Reinforcement Learning

57

Fig. 3 Assessing computational model quality. (a) Joint likelihood of the .α and .β parameters of a simple RL model. The shape of the distribution is indicative of correlation between parameters. The MLE (square) deviates from the generative parameters (circle), but not the maximum a posteriori estimates (triangle, overlaid on circle). See Figures 1 and 3 in Daw et al. (2011). (b) An example of an empirical prior for the softmax .β with hyper-parameters from Gershman (2016). (c) Parameter recovery of .α and .β improves as the number of experimental trials increases (panels from left to right). The red and blue lines depict the correlation between the generative and recovered parameters, and the black dashed lines depict identity. (d) Simulating the behaviour of a model after its parameters have been fit can illustrate where the model succeeds and fails in capturing important patterns in real data on a trial-by-trial basis. In a reversal learning task (reversals denoted by vertical red lines), a simple RL model (black line) is able to capture some, but not all of the learning dynamics observed in the empirical data (grey line and blue dots). Dots depict whether or not participants made the correct choice on each trial (choosing the alternative with the highest value), and lines are moving averages over real and simulated choices

58

V. Man and J. P. O’Doherty

conditioning. The authors exploited this computational prediction and tested for BOLD correlates using a model-based approach across both instrumental and Pavlovian tasks. They found that TD prediction error correlated with the ventral striatum across both tasks but only with the dorsal striatum during the instrumental task, thereby dissociating neural implementations of the critic and actor components of the model, respectively.

5.2 Leveraging fMRI to Adjudicate Between Models Since this early work mapping the correspondence between computations described within the RL framework and separable neural correlates, the model-based fMRI approach has been adapted to test innumerable computational hypotheses across cognitive neuroscience. Emerging in the literature, however, is an interesting complementary way to bridge computational modelling and fMRI analysis in which neural data are used to adjudicate between competing computational hypotheses (i.e., for model selection) (Mack et al., 2013). Despite lacking a unified approach across examples in the literature, the general underlying principle of fMRI-based model selection is that components of candidate computational models are tested in their ability to explain the BOLD activity of a particular region or across the brain, analogous to the procedure of model selection using behavioural data. Consistent with this perspective of fMRI analysis, studies in the univariate domain have extracted components of computational models in a model-based fashion and allowed regressors in the same GLM to compete for variance (Hampton et al., 2006, 2008). An alternative approach similarly compared the likelihoods of multiple GLMs, each of which is specified by the prediction of a different candidate computational model (Niv et al., 2015). In an example of the former approach, one study compared the predictions of a Bayesian model positing that participants were explicitly representing the anti-correlated structure of a two-armed bandit instrumental learning task with reversing reward contingencies, against a RL mechanism in which values were incrementally updated without encoding the hidden reversal structure (Hampton et al., 2006). Critically, the authors leveraged the BOLD activity in the medial prefrontal cortex (mPFC) to quantitatively and qualitatively compare between the two computational hypotheses. When the predictions of each model were directly entered into the same GLM as model-based regressors, the Bayesian model better accounted for variance in mPFC BOLD activity. Moreover, in the spirit of posterior predictive checks as a method of assessing absolute model quality (see Sect. 4.2; Palminteri et al. (2017)), the authors further demonstrated the mPFC BOLD activity at the time of choice reflected the pattern of change in model-derived expected value after a behavioural switch consistent with the Bayesian, but not RL, model. While this last example of model selection using fMRI data strengthens the inference that the ‘winning’ model’s mechanisms better describe the participants’ internal processes during the task, it does so by reaffirming the results of model

Reinforcement Learning

59

comparison based on behavioural data. It is thus particularly compelling when this adjudicative approach informs model selection when behavioural data alone cannot. This indeed was the case in a study testing between two competing theories of categorisation: exemplar versus prototype theory (Mack et al., 2013). Critically, in the context of a simple object categorisation task, both theoretical accounts provided equivalent behavioural predictions and so could not be discriminated using the behavioural data alone. However, computational formalisation of the hypotheses embedded in each theory made explicit their different underlying representations. The authors thus exploited this computational advantage by testing whether multivariate signals across the brain could predict the computationally derived representations of each model, integrating the model-based approach into MVPA, and found that distributed BOLD activity better corresponded to the latent representations of the exemplar model. Other multivariate approaches, such as RSA, have also been used for neural model comparison (Chan et al., 2016).

5.3 Future Directions Adjudicating between models using fMRI data is an encouraging and increasingly used advance in bridging between computational, algorithmic, and implementational levels of analyses (Marr & Poggio, 1976), for example, by imposing biological constraints on computational hypotheses. Specific detailed issues related to this technique, however, require further investigation. Existing work has discussed how deviations in parameter estimates can impact inferences made from model-based fMRI analyses (Wilson & Niv, 2015); further simulation work is required to characterise the circumstances, such as model complexity or the type of model-based fMRI analysis, under which this impact is more or less consequential. Moreover, as mentioned above, the analytic strategy for model selection using fMRI data is not unified, and various approaches have not been directly quantitatively compared. Further work will, for example, need to elucidate the differential or complementary explanatory contributions of a multivariate versus mass univariate approach to model selection. As psychological questions in cognitive neuroscience increase in both breadth and depth, experimental paradigms have increased in complexity and ecological validity. Indeed, recent work has argued for experimental expansion to incorporate more naturalistic stimuli with increased dimensionality (Haxby et al., 2020; Sonkusare et al., 2019; Nastase et al., 2020). Computational approaches need to scale correspondingly, both in their ability to quantitatively capture hypotheses about richer psychological questions and in their ability to describe complex behaviours elicited by multidimensional tasks. A promising emerging avenue in model-based fMRI has been to employ deep neural network architectures as implementation-level models (Khaligh-Razavi & Kriegeskorte, 2014; Kriegeskorte, 2015; Güçlü & van Gerven, 2015). Indeed, a recent study in the context of RL directly addressed how complex tasks are compressed into efficient representations

60

V. Man and J. P. O’Doherty

using a neural network model-based approach to fMRI data (Cross et al., 2020). The exciting promise of these developing advances in model-based fMRI is that they directly tackle some of the most lingering questions in the field of RL, such as how RL algorithms can overcome the ‘curse of dimensionality’ (Sutton et al., 1998) to grapple with increasingly complex situations. These advances therefore have the potential to ultimately explain learning and decision-making in the real world.

6 Exercises 1. How do predictions from RW diverge from findings about second-order conditioning? 2. Give an example of how shortcomings of a computational model advance theory. 3. How is the TD model able to address limitations of the R-W model in explaining second-order conditioning? Describe the mechanisms in the TD model that contribute to this. 4. Here we ignored the property of discounting in the TD PE. Why is it important to include a discount factor? (Hint: What happens if the task does not have a concrete end point? See Sutton et al. (1998).) 5. Can you think of some strategies either in experimental design or in analysis to separate out neural correlates of RPE from those of reward? 6. What unifying neural principle is shared across different multivariate techniques? How is the direction of inference different between the two multivariate techniques discussed in the chapter? (Hint: Consider differences between encoding and decoding models. See Haynes (2015).) 7. Implement the actor-critic model described in the chapter. Without collecting data, which procedures to evaluate the quality of the model are you able to do? Which are you not? For the ones you can do, implement them.

7 Further Reading 1. Successes and failures of the Rescorla–Wagner model: Miller et al. (1995) 2. A discussion between temporal difference and Rescorla–Wagner prediction error: Niv and Schoenbaum (2008) 3. Eligibility traces with real-time models such as the temporal-difference model: Barto et al. (1983); Sutton et al. (1998) 4. A standard text in computational reinforcement learning: Sutton et al. (1998) 5. Sequential decisions and intermediate reward signals: Sutton et al. (1999); Botvinick et al. (2009); Parr and Russell (1998) 6. Multiple systems of behavioural control: O’Doherty et al. (2017, 2021) 7. Extensions of the actor-critic model in cognitive neuroscience: Collins and Frank (2014)

Reinforcement Learning

61

8. Further discussion of applications of model-based fMRI in reinforcement learning: O’Doherty et al. (2007); Gläscher and O’Doherty (2010) 9. Tutorials on multivariate approaches to fMRI (Norman et al., 2006; Haynes, 2015; Kriegeskorte et al., 2008) 10. Issues related to model complexity including bias–variance trade-off: Geman et al. (1992); Glaser et al. (2020) 11. Estimating model parameters and model comparison: Daw et al. (2011); Stephan et al. (2009); Myung (2003) 12. Parameter recovery and simulations: Palminteri et al. (2017); Wilson and Collins (2019) 13. Using fMRI to adjudicate between computational models: Hampton et al. (2006, 2008); Niv et al. (2015); Mack et al. (2013); Turner et al. (2017); Wilson and Niv (2015) Conflict of Interest The authors declare that they have no conflict of interest.

References Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723. Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 5, 834–846. Botvinick, M. M., Niv, Y., & Barto, A. G. (2009). Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition, 113(3), 262–280. Büchel, C., Bornhövd, K., Quante, M., Glauche, V., Bromm, B., & Weiller, C. (2002). Dissociable neural responses related to pain intensity, stimulus intensity, and stimulus awareness within the anterior cingulate cortex: a parametric single-trial laser functional magnetic resonance imaging study. Journal of Neuroscience, 22(3), 970–976. Büchel, C., Holmes, A., Rees, G., & Friston, K. (1998). Characterizing stimulus–response functions using nonlinear regressors in parametric fMRI experiments. Neuroimage, 8(2), 140– 148. Bush, R. R., & Mosteller, F. (1951). A mathematical model for simple learning. Psychological Review, 58(5), 313. Caplin, A., & Dean, M. (2008). Axiomatic methods, dopamine and reward prediction error. Current Opinion in Neurobiology, 18(2), 197–202. Casella, G., & Berger, R. L. (2021). Statistical inference. Cengage Learning. Chan, S. C., Niv, Y., & Norman, K. A. (2016). A probability distribution over latent causes, in the orbitofrontal cortex. Journal of Neuroscience, 36(30), 7817–7828. Cohen, J. D., Daw, N., Engelhardt, B., Hasson, U., Li, K., Niv, Y., Norman, K. A., Pillow, J., Ramadge, P. J., Turk-Browne, N. B., et al. (2017). Computational approaches to fMRI analysis. Nature Neuroscience, 20(3), 304–313. Colas, J. T., Pauli, W. M., Larsen, T., Tyszka, J. M., & O’Doherty, J. P. (2017). Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: Evidence from high-resolution fMRI. PLoS Computational Biology, 13(10), e1005810.

62

V. Man and J. P. O’Doherty

Collins, A. G., & Frank, M. J. (2014). Opponent actor learning (OpAL): Modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. Psychological Review, 121(3), 337. Cross, L., Cockburn, J., Yue, Y., & O’Doherty, J. P. (2020). Using deep reinforcement learning to reveal how the brain encodes abstract state-space representations in high-dimensional environments. Neuron, 109(4), 724–738. Davis, T., LaRocque, K. F., Mumford, J. A., Norman, K. A., Wagner, A. D., & Poldrack, R. A. (2014). What do differences between multi-voxel and univariate analysis mean? how subject-, voxel-, and trial-level variance impact fMRI analysis. Neuroimage, 97, 271–283. Daw, N. D. et al. (2011). Trial-by-trial data analysis using computational models. Decision Making, Affect, and Learning: Attention and Performance XXIII, 23(1), 3–38. Daw, N. D., O’doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441(7095), 876–879. Daw, N. D., & Tobler, P. N. (2014). Value learning through reinforcement: the basics of dopamine and reinforcement learning. In Neuroeconomics (pp. 283–298). Elsevier. Diedrichsen, J., & Kriegeskorte, N. (2017). Representational models: A common framework for understanding encoding, pattern-component, and representational-similarity analysis. PLoS Computational Biology, 13(4), e1005508. Dolan, R. J., & Dayan, P. (2013). Goals and habits in the brain. Neuron, 80(2), 312–325. Edelman, S., Grill-Spector, K., Kushnir, T., & Malach, R. (1998). Toward direct visualization of the internal shape representation space by fMRI. Psychobiology, 26(4), 309–321. Friston, K. J., Holmes, A. P., Price, C., Büchel, C., & Worsley, K. (1999). Multisubject fMRI studies and conjunction analyses. Neuroimage, 10(4), 385–396. Friston, K. J., Holmes, A. P., Worsley, K. J., Poline, J.-P., Frith, C. D., & Frackowiak, R. S. (1994). Statistical parametric maps in functional imaging: A general linear approach. Human Brain Mapping, 2(4), 189–210. Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. CRC Press. Gelman, A., Meng, X.-L., & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6(4), 733–760. Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4(1), 1–58. Gershman, S. J. (2016). Empirical priors for reinforcement learning models. Journal of Mathematical Psychology, 71, 1–6. Gittins, J. C., & Jones, D. M. (1979). A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika, 66(3), 561–565. Gläscher, J. P., & O’Doherty, J. P. (2010). Model-based approaches to neuroimaging: combining reinforcement learning theory with fMRI data. Wiley Interdisciplinary Reviews: Cognitive Science, 1(4), 501–510. Glaser, J. I., Benjamin, A. S., Chowdhury, R. H., Perich, M. G., Miller, L. E., & Kording, K. P. (2020). Machine learning for neural decoding. Eneuro, 7(4), 1–16. Güçlü, U., & van Gerven, M. A. (2015). Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience, 35(27), 10005– 10014. Hampton, A. N., Bossaerts, P., & O’doherty, J. P. (2006). The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. Journal of Neuroscience, 26(32), 8360–8367. Hampton, A. N., Bossaerts, P., & O’Doherty, J. P. (2008). Neural correlates of mentalizing-related computations during strategic interactions in humans. Proceedings of the National Academy of Sciences, 105(18), 6741–6746. Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001). Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293(5539), 2425–2430.

Reinforcement Learning

63

Haxby, J. V., Gobbini, M. I., & Nastase, S. A. (2020). Naturalistic stimuli reveal a dominant role for agentic action in visual representation. Neuroimage, 216, 116561. Haynes, J.-D. (2015). A primer on pattern-based approaches to fMRI: principles, pitfalls, and perspectives. Neuron, 87(2), 257–270. Haynes, J.-D., & Rees, G. (2006). Decoding mental states from brain activity in humans. Nature Reviews Neuroscience, 7(7), 523–534. Holland, P. C., & Rescorla, R. A. (1975). Second-order conditioning with food unconditioned stimulus. Journal of Comparative and Physiological Psychology, 88(1), 459. Hull, C. L. (1939). The problem of stimulus equivalence in behavior theory. Psychological Review, 46(1), 9. Hunt, L. T., Malalasekera, W. N., de Berker, A. O., Miranda, B., Farmer, S. F., Behrens, T. E., & Kennerley, S. W. (2018). Triple dissociation of attention and decision computations across prefrontal cortex. Nature Neuroscience, 21(10), 1471–1481. Hutcherson, C. A., Bushong, B., & Rangel, A. (2015). A neurocomputational model of altruistic choice and its implications. Neuron, 87(2), 451–462. Kahnt, T., Heinzle, J., Park, S. Q., & Haynes, J.-D. (2011). Decoding different roles for VMPFC and DLPFC in multi-attribute decision making. Neuroimage, 56(2), 709–715. Kamin, L. (1969). Predictability, surprise, attention, and conditioning. in B. A. Campbell, & R. M. Church (Eds.). Punishment and aversive behavior (pp. 279-296). New York: AppletonCentury-Crofts. Khaligh-Razavi, S.-M., & Kriegeskorte, N. (2014). Deep supervised, but not unsupervised, models may explain it cortical representation. PLoS Computational Biology, 10(11), e1003915. Kriegeskorte, N. (2015). Deep neural networks: A new framework for modeling biological vision and brain information processing. Annual Review of Vision Science, 1, 417–446. Kriegeskorte, N., Goebel, R., & Bandettini, P. (2006). Information-based functional brain mapping. Proceedings of the National Academy of Sciences, 103(10), 3863–3868. Kriegeskorte, N., & Kievit, R. A. (2013). Representational geometry: integrating cognition, computation, and the brain. Trends in Cognitive Sciences, 17(8), 401–412. Kriegeskorte, N., Mur, M., & Bandettini, P. A. (2008). Representational similarity analysisconnecting the branches of systems neuroscience. Frontiers in Systems Neuroscience, 2, 4. Lau, B., & Glimcher, P. W. (2005). Dynamic response-by-response models of matching behavior in rhesus monkeys. Journal of the Experimental Analysis of Behavior, 84(3), 555–579. Lebreton, M., Bavard, S., Daunizeau, J., & Palminteri, S. (2019). Assessing inter-individual differences with task-related functional neuroimaging. Nature Human Behaviour, 3(9), 897– 905. Lee, M. D., & Wagenmakers, E.-J. (2014). Bayesian cognitive modeling: A practical course. Cambridge University Press. Mack, M. L., Preston, A. R., & Love, B. C. (2013). Decoding the brain’s algorithm for categorization from its neural implementation. Current Biology, 23(20), 2023–2027. Marr, D., & Poggio, T. (1976). From understanding computation to understanding neural circuitry. Miller, R. R., Barnet, R. C., & Grahame, N. J. (1995). Assessment of the Rescorla-Wagner model. Psychological Bulletin, 117(3), 363. Milosavljevic, M., Malmaud, J., Huth, A., Koch, C., & Rangel, A. (2010). The drift diffusion model can account for value-based choice response times under high and low time pressure. Judgment and Decision Making, 5(6), 437–449. Montague, P. R., Dayan, P., & Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. Journal of Neuroscience, 16(5), 1936–1947. Mumford, J. A., Davis, T., & Poldrack, R. A. (2014). The impact of study design on pattern estimation for single-trial multivariate pattern analysis. Neuroimage, 103, 130–138. Mumford, J. A., Poline, J.-B., & Poldrack, R. A. (2015). Orthogonalization of regressors in fMRI models. PloS One, 10(4), e0126255. Mumford, J. A., Turner, B. O., Ashby, F. G., & Poldrack, R. A. (2012). Deconvolving bold activation in event-related designs for multivoxel pattern classification analyses. Neuroimage, 59(3), 2636–2643.

64

V. Man and J. P. O’Doherty

Myung, I. J. (2003). Tutorial on maximum likelihood estimation. Journal of Mathematical Psychology, 47(1), 90–100. Naselaris, T., Prenger, R. J., Kay, K. N., Oliver, M., & Gallant, J. L. (2009). Bayesian reconstruction of natural images from human brain activity. Neuron, 63(6), 902–915. Nastase, S. A., Goldstein, A., & Hasson, U. (2020). Keep it real: Rethinking the primacy of experimental control in cognitive neuroscience. NeuroImage, 222, 117254. Niv, Y., Daniel, R., Geana, A., Gershman, S. J., Leong, Y. C., Radulescu, A., & Wilson, R. C. (2015). Reinforcement learning in multidimensional environments relies on attention mechanisms. Journal of Neuroscience, 35(21), 8145–8157. Niv, Y., & Langdon, A. (2016). Reinforcement learning with MARR. Current Opinion in Behavioral Sciences, 11, 67–73. Niv, Y., & Schoenbaum, G. (2008). Dialogues on prediction errors. Trends in Cognitive Sciences, 12(7), 265–272. Norman, K. A., Polyn, S. M., Detre, G. J., & Haxby, J. V. (2006). Beyond mind-reading: Multivoxel pattern analysis of fMRI data. Trends in Cognitive Sciences, 10(9), 424–430. O’Doherty, J., Dayan, P., Schultz, J., Deichmann, R., Friston, K., & Dolan, R. J. (2004). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science, 304(5669), 452–454. O’Doherty, J. P., Cockburn, J., & Pauli, W. M. (2017). Learning, reward, and decision making. Annual Review of Psychology, 68, 73–100. O’Doherty, J. P., Dayan, P., Friston, K., Critchley, H., & Dolan, R. J. (2003). Temporal difference models and reward-related learning in the human brain. Neuron, 38(2), 329–337. O’Doherty, J. P., Hampton, A., & Kim, H. (2007). Model-based fMRI and its application to reward learning and decision making. Annals of the New York Academy of Sciences, 1104(1), 35–53. O’Doherty, J. P., Lee, S., Tadayonnejad, R., Cockburn, J., Iigaya, K., & Charpentier, C. J. (2021). Why and how the brain weights contributions from a mixture of experts. Neuroscience & Biobehavioral Reviews, 123, 14–23. Palminteri, S., Wyart, V., & Koechlin, E. (2017). The importance of falsification in computational cognitive modeling. Trends in Cognitive Sciences, 21(6), 425–433. Parr, R., & Russell, S. (1998). Reinforcement learning with hierarchies of machines. Advances in Neural Information Processing Systems, 10, 1043–1049. Pavlov, I. P., & Anrep, G. V. (1927). Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex (Vol. 3). London: Oxford University Press Piray, P., Dezfouli, A., Heskes, T., Frank, M. J., & Daw, N. D. (2019). Hierarchical Bayesian inference for concurrent model fitting and comparison for group studies. PLoS Computational Biology, 15(6), e1007043. Polyn, S. M., Natu, V. S., Cohen, J. D., & Norman, K. A. (2005). Category-specific cortical activity precedes retrieval during memory search. Science, 310(5756), 1963–1966. Pouget, A., Dayan, P., & Zemel, R. (2000). Information processing with population codes. Nature Reviews Neuroscience, 1(2), 125–132. Rescorla, R. A. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In Current research and theory (pp. 64–99). Rizley, R. C., & Rescorla, R. A. (1972). Associations in second-order conditioning and sensory preconditioning. Journal of Comparative and Physiological Psychology, 81(1), 1. Rutledge, R. B., Dean, M., Caplin, A., & Glimcher, P. W. (2010). Testing the reward prediction error hypothesis with an axiomatic model. Journal of Neuroscience, 30(40), 13525–13536. Schoenmakers, S., Barth, M., Heskes, T., & Van Gerven, M. (2013). Linear reconstruction of perceived images from human brain activity. NeuroImage, 83, 951–961. Schuck, N. W., Cai, M. B., Wilson, R. C., & Niv, Y. (2016). Human orbitofrontal cortex represents a cognitive map of state space. Neuron, 91(6), 1402–1412. Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593–1599. Schwarz, G., et al. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461– 464. Skinner, B. F. (1963). Operant behavior. American Psychologist, 18(8), 503.

Reinforcement Learning

65

Sonkusare, S., Breakspear, M., & Guo, C. (2019). Naturalistic stimuli in neuroscience: critically acclaimed. Trends in Cognitive Sciences, 23(8), 699–714. Stephan, K. E., Penny, W. D., Daunizeau, J., Moran, R. J., & Friston, K. J. (2009). Bayesian model selection for group studies. Neuroimage, 46(4), 1004–1017. Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9–44. Sutton, R. S. (1995). TD models: Modeling the world at a mixture of time scales. In Machine Learning Proceedings 1995 (pp. 531–539). Elsevier. Sutton, R. S., & Barto, A. G. (1981). Toward a modern theory of adaptive networks: Expectation and prediction. Psychological Review, 88(2), 135. Sutton, R. S., & Barto, A. G. (1987). A temporal-difference model of classical conditioning. In Proceedings of the Ninth Annual Conference of the Cognitive Science Society (pp. 355–378). Seattle, WA. Sutton, R. S., Barto, A. G., et al. (1998). Introduction to reinforcement learning (Vol. 135). Cambridge: MIT Press. Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPS and semi-MDPS: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2), 181–211. Thorndike, E. L. (1898). Animal intelligence: an experimental study of the associative processes in animals. The Psychological Review: Monograph Supplements, 2(4), i. Tolman, E. C. (1948). Cognitive maps in rats and men. Psychological Review, 55(4), 189. Turner, B. M., Forstmann, B. U., Love, B. C., Palmeri, T. J., & Van Maanen, L. (2017). Approaches to analysis in model-based cognitive neuroscience. Journal of Mathematical Psychology, 76, 65–79. Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. Elife, 8, e49547. Wilson, R. C., & Niv, Y. (2015). Is model fitting necessary for model-based fMRI? PLoS Computational Biology, 11(6), e1004237. Witten, I. H. (1977). An adaptive optimal controller for discrete-time Markov environments. Information and Control, 34(4), 286–295. Worsley, K. J., Liao, C. H., Aston, J., Petre, V., Duncan, G., Morales, F., & Evans, A. (2002). A general statistical analysis for fMRI data. Neuroimage, 15(1), 1–15.

An Introduction to the Diffusion Model of Decision-Making Philip L. Smith and Roger Ratcliff

Abstract The diffusion model assumes that two-choice decisions are made by accumulating successive samples of noisy evidence to a response criterion. The model has a pair of criteria that represent the amounts of evidence needed to make each response. The time taken to reach criterion determines the decision time and the criterion that is reached first determines the response. The model predicts choice probabilities and the distributions of response times for correct responses and errors as a function of experimental conditions such as stimulus discriminability, speed-accuracy instructions, and manipulations of relative stimulus frequency, which affect response bias. This chapter describes the main features of the model, including mathematical methods for obtaining response time predictions, methods for fitting it to experimental data, including alternative fitting criteria, and ways to represent the fit to multiple experimental conditions graphically in a compact way. The chapter concludes with a discussion of recent work in psychology that generalizes the diffusion model to continuous outcome decisions, in which responses are made on a continuous scale rather than categorically. Keywords Diffusion process · Random walk · Decision-making · Response time · Choice probability

1 Historical Origins The human ability to translate perception into action, which we share with nonhuman animals, relies on our ability to make rapid decisions about the contents of

P. L. Smith () Melbourne School of Psychological Sciences, The University of Melbourne, Melbourne, VIC, Australia e-mail: [email protected] R. Ratcliff Department of Psychology, The Ohio State University, Columbus, OH, USA e-mail: [email protected] © Springer Nature Switzerland AG 2024 B. U. Forstmann, B. M. Turner (eds.), An Introduction to Model-Based Cognitive Neuroscience, https://doi.org/10.1007/978-3-031-45271-0_4

67

68

P. L. Smith and R. Ratcliff

our environment. Any form of coordinated, goal-directed action requires that we be able to recognize things in the environment as belonging to particular cognitive categories or classes and to select the appropriate actions to perform in response. To a very significant extent, coordinated action depends on our ability to provide rapid answers to questions of the form: “What is it?” and “What should I do about it?” When viewed in this way, the ability to make rapid decisions—to distinguish predator from prey, or friend from foe—appears as one of the basic functions of the brain and central nervous system. The purpose of this chapter is to provide an introduction to the mathematical modeling of decisions of this kind. Historically, the study of decision-making in psychology has been closely connected to the study of sensation and perception—an intellectual tradition with its origins in philosophy and extending back to the nineteenth century. Two strands of this tradition are relevant: psychophysics, defined as the study of the relationship between the physical magnitudes of stimuli and the sensations they produce, and the study of reaction time or response time (RT). Psychophysics, which had its origins in the work of Gustav Fechner in the Netherlands in 1860 on “just noticeable differences,” led to the systematic study of decisions about stimuli that are difficult to detect or to discriminate. The study of RT was initiated by Franciscus Donders, also in the Netherlands, in 1868. Donders, inspired by the pioneering work of Hermann von Helmholtz on the speed of nerve conduction, sought to develop methods to measure the speed of mental processes. These two strands of inquiry were motivated by different theoretical concerns but led to a common realization, namely, that decision-making is inherently variable. People do not always make the same response to repeated presentation of the same stimulus and the time they take to respond to it varies from one presentation to the next. Trial-to-trial variation in performance is a feature of an important class of models for speeded, two-choice decision-making developed in psychology, known as sequential-sampling models. These models regard variation in decision outcomes and decision times as the empirical signature of a noisy evidence accumulation process. They assume that, to make a decision, the decision-maker accumulates successive samples of noisy evidence over time, until sufficient evidence for a response is obtained. The samples represent the momentary evidence favoring particular decision alternatives at consecutive time points. The decision time is the time taken to accumulate a sufficient, or criterion, amount of evidence and the decision outcome depends on the alternative for which a criterion amount of evidence is first obtained. The idea that decision processes are noisy was first proposed on theoretical grounds, to explain the trial-to-trial variability in behavioral data, many decades before it was possible to use microelectrodes in awake, behaving animals to record this variability directly. The noise was assumed to reflect the moment-to-moment variability in the cognitive or neural processes that represent the stimulus (Link, 1992; Luce, 1986; Townsend & Ashby, 1983; Vickers, 1979). In this chapter, we describe one such sequential-sampling model, the diffusion model of Ratcliff (1978). Diffusion models, along with random walk models, comprise one of the two main subclasses of sequential-sampling models in psychology; the other subclass comprises accumulator and counter models. For space reasons,

An Introduction to the Diffusion Model of Decision-Making

69

we do not consider models of this latter class in this chapter. The interested reader is referred to Luce (1986), Townsend and Ashby (1983), Vickers (1979), and Ratcliff and Smith (2004) for discussions. To distinguish Ratcliff’s model from other models that also represent evidence accumulation as a diffusion process, we refer to it as the standard diffusion model. Historically, this model was the first model to represent evidence accumulation in two-choice decision-making as a diffusion process and it remains, conceptually and mathematically, the benchmark against which other models can be compared. It is also the model that has been most extensively and successfully applied to empirical data. We restrict our consideration here to twoalternative decision tasks, which historically and theoretically have been the most important class of tasks in psychology.

2 Diffusion Processes and Random Walks Mathematically, diffusion processes are the continuous-time counterparts of random walks, which historically preceded them as models for decision-making. A random walk is defined as the running cumulative sum of a sequence of independent random variables, .Zj , .j = 1, 2, . . .. In models of decision-making, the values of these variables are interpreted as the evidence in a sequence of discrete observations of the stimulus. Typically, evidence is assumed to be sampled at a constant rate, which is determined by the minimum time needed to acquire a single sample of perceptual information, denoted .Δ. The random variables are assumed to take on positive and negative values, with positive values being evidence for one response, say .Ra , and negative values evidence for the other response, .Rb . For example, in a brightness discrimination task, .Ra might correspond to the response “bright” and .Rb correspond to the response “dim.” The mean of the random variables is assumed to be positive or negative, depending on the stimulus presented. The cumulative sum of the random variables, Xi =

i 

.

Zj ,

j =1

is a random walk. If the .Zj are real-valued, the domain of the walk is the positive integers and the range is the real numbers. To make a decision, the decision-maker sets a pair of evidence criteria, a and b, with .b < 0 < a and accumulates evidence until the cumulative evidence total reaches or exceeds one of the criteria, that is, until .Xi ≥ a or .Xi ≤ b. The time taken for this to occur is the first passage time through one of the criteria, defined formally as Ta = min{iΔ : Xi ≥ a|Xk > b; k < i}

.

Tb = min{iΔ : Xi ≤ b|Xk < a; k < i},

70

P. L. Smith and R. Ratcliff

and where .Ta = ∞ and .Tb = ∞ by definition if the associated criterion is never reached. If the first criterion reached is a, the decision-maker makes response .Ra ; if it is b, the decision-maker makes response .Rb . The decision time, .TD , is the time for this to occur TD = min{Ta , Tb }.

.

If response .Ra is identified as the correct response for the stimulus presented, then the mean, or expected value, of .Ta , denoted .E[Ta ], is the mean decision time for correct responses; .E[Tb ] is the mean decision time for errors, and the probability of a correct response, .P (C), is the first passage probability of the random walk through the criterion a, P (C) = Prob{Ta < Tb }.

.

Although either .Ta or .Tb may be infinite on a given realization of the process, .TD will be finite with probability one; that is, the process will terminate with one or other response in finite time (Cox & Miller, 1965). This means that the probability of an error response, .P (E), will equal .1 − P (C). Random walk models of decision-making have been proposed by a variety of authors. The earliest of them were influenced by Wald’s sequential probability ratio test (SPRT) in statistics (Wald, 1947) and assumed that the random variables .Zj were the log-likelihood ratios that the evidence at each step came from one as opposed to the other stimulus. The most highly developed of the SPRT models was proposed by Laming (1968). The later relative judgment theory of Link and Heath (1975) assumed that the decision process accumulates the values of the noisy evidence samples directly rather than their log-likelihood ratios. Evaluation of these models focused primarily on the relationship between mean RT and accuracy and the ordering of mean RTs for correct responses and errors as a function of experimental manipulations (Luce, 1986; Townsend & Ashby, 1983; Vickers, 1979; Laming, 1968; Link & Heath, 1975).

3 The Standard Diffusion Model A diffusion process may be thought of as a random walk in continuous time. Instead of accumulating evidence at discrete time points, evidence is accumulated continuously. Such a process can be obtained mathematically via a limiting process, in which the sampling interval is allowed to go to zero while constraining the average size of the evidence at each step to ensure the variability of the process in a given, fixed time interval remains constant (Cox & Miller, 1965; Gardiner, 2004). The study of diffusion processes was initiated by Albert Einstein, who proposed a diffusion model for the movement of a pollen particle undergoing random Brownian motion (Gardiner, 2004). The rigorous study of such processes was initiated by

An Introduction to the Diffusion Model of Decision-Making

71

Norbert Wiener (1923). For this reason, the simplest diffusion process is known variously as the Wiener process or the Brownian motion process. In psychology, Ratcliff (1978) proposed a diffusion model of evidence accumulation in two-choice decision-making—in part because it seemed more natural to assume that the brain accumulates information continuously rather than at discrete time points. Ratcliff also emphasized the importance of studying RT distributions as a way to evaluate models. Sequential-sampling models not only predict choice probabilities and mean RTs but also predict entire distributions of RTs for correct responses and errors. This provides for very rich contact between theory and experimental data, allowing for strong empirical tests. The main elements of the standard diffusion model are shown in Fig. 1. We shall denote the accumulating evidence state in the model as .Xt , where t denotes time. We should mention that there are two conventions used in psychology to characterize diffusion models. The convention used in the preceding section assumes the process starts at zero and that the criteria are located at a and b, with .b < 0 < a. The other is based on Feller’s (1968) analysis of the so-called gambler’s ruin problem and assumes that the process starts at z and that the criteria are located at 0 and a, with .0 < z < a. As the latter convention was used by Ratcliff in his original presentation of the model (Ratcliff, 1978) and in later work, this is the convention we shall adopt for the remainder of this chapter. The properties of the process are unaltered by translations of the starting point; such processes are called spatially homogeneous. For processes of this kind, a change in convention simply represents a relabeling of the y-axis that represents the accumulating evidence state. Other, more complex,

Variability in starting point s z

RT distribution Ra

Respond R a "bright"

a

Between trial variability in drift rate η z

Time Sample paths Respond R b "dim"

0 RT distribution Rb

Fig. 1 Diffusion model. The process starting at z accumulates evidence between decision criteria at 0 and a. Moment-to-moment variability in the accumulation process means the process can terminate rapidly at the correct response criterion, slowly at the correct response criterion, or at the incorrect response criterion. There is between-trial variability in the drift rate, .ξ , with standard deviation .η, and between-trial variability in the starting point, z, with range .sz

72

P. L. Smith and R. Ratcliff

diffusion processes, like the Ornstein–Uhlenbeck process (Busemeyer & Townsend, 1992, 1993; Smith, 2000), are not spatially homogeneous and their properties are altered by changes in the placement of the starting point. As shown in the figure, the process, starting at z, begins accumulating evidence at time .t = 0. The rate at which evidence accumulates, termed the drift of the process and denoted .ξ , depends on the stimulus that is presented and its discriminability. The identity of the stimulus determines the direction of drift and the discriminability of the stimulus determines the magnitude. Our convention is that when stimulus .sa is presented, the drift is positive and the value of .Xt tends to increase with time, making it more likely to terminate at the upper criterion and result in response .Ra . When stimulus .sb is presented, the drift is negative and the value of .Xt tends to decrease with time, making it is more likely to terminate at the lower boundary with response .Rb . In our example brightness discrimination task, bright stimuli lead to positive values of drift and dim stimuli lead to negative values of drift. Highly discriminable stimuli are associated with larger absolute values of drift, which lead to more rapid information accumulation and faster responding. Because of noise in the process, the accumulating evidence is subject to moment-tomoment perturbations. The time course of evidence accumulation on three different experimental trials, all with the same drift rate, is shown in the figure. These noisy trajectories are termed the sample paths of the process. A unique sample path describes the time course of evidence accumulation on a given experimental trial. The sample paths in the figure show some of the different outcomes that are possible for stimuli with the same drift rate. The sample paths in the figure show: (a) a process terminating with a correct response made rapidly; (b) a process terminating with a correct response made slowly, and (c) a process terminating with an error response. In behavioral experiments, only the response and the RT are observables; the paths themselves are not. They are theoretical constructs used to explain the observed behavior. The noisiness, or variability, in the accumulating evidence is controlled by a second parameter, the infinitesimal standard deviation, denoted s. Its square, .s 2 , is termed the diffusion coefficient. The diffusion coefficient determines the variability in the sample paths of the process. Because the parameters of a diffusion model are only identified to the level of a ratio, all the parameters of the model can be multiplied by a constant without affecting any of the predictions. To make the parameters estimable, it is common practice to fix s arbitrarily. The other parameters of the model are then expressed in units of infinitesimal standard deviation, or infinitesimal standard deviation per unit time.

4 Components of Processing As shown in Fig. 1, the diffusion model predicts RT distributions for correct responses and errors. Moment-to-moment variability in the sample paths of the process, controlled by the diffusion coefficient, means that on some trials the process

An Introduction to the Diffusion Model of Decision-Making

73

will finish rapidly and on others it will finish slowly. The predicted RT distributions have a characteristic unimodal, positively skewed shape: More of the probability mass in the distribution is located below the mean than above it. As the drift of the process changes with changes in stimulus discriminability, the relative proportions of correct responses and errors change, and the means and standard deviations of the RT distributions also change. However, the shapes of the RT distributions change very little; to a good approximation, RT distributions for low discriminability stimuli are scaled copies of those for high discriminability stimuli (Wagenmakers & Brown, 2007). One of the main strengths of the diffusion model is that the shapes of the RT distributions it predicts are precisely those found in empirical data. Many experimental tasks, including low-level perceptual tasks like signal detection and higher-level cognitive tasks like lexical decision and recognition memory, yield families of RT distributions like those predicted by the model (Ratcliff & Smith, 2004). In contrast, other models, particularly those of the accumulator/counter model class, predict distribution shapes that become more symmetrical with reductions in discriminability (Ratcliff & Smith, 2004). Such distributions tend not to be found empirically, except in situations in which people are forced to respond to an external deadline. One of the problems with early random walk models of decision-making— which they shared with the simplest form of the diffusion model—is they predicted that mean RTs for correct responses and errors would be equal (Luce, 1986). Specifically, if .E[Rj |si ] denotes the mean RT for response .Rj to stimulus .si , with .i, j ∈ {a, b}, then, if the drifts for the two stimuli are equal in magnitude and opposite in sign, as is natural to assume for many perceptual tasks, the models predicted that .E[Ra |sa ] = E[Ra |sb ] and .E[Rb |sa ] = E[Rb |sb ]; that is, the mean time for a given response made correctly is the same as the mean time for that response made incorrectly. They also predicted, when the starting point is located equidistantly between the criteria, .z = a/2, that .E[Ra |sa ] = E[Rb |sa ] and .E[Ra |sb ] = E[Rb |sb ]; that is, the mean RT for correct responses to a given stimuli is the same as the mean error RT to that same stimulus. This prediction holds regardless of the relative magnitudes of the drifts. Indeed, a stronger prediction holds; the models predicted equality not only of mean RTs but also of the entire distributions of correct responses and errors. These predictions almost never hold empirically. Rather, the typical finding is that when discriminability is high and speed is stressed, error mean times are shorter than correct mean times. When discriminability is low and accuracy is stressed, error mean times are longer than correct mean times (Luce, 1986). Some studies show a crossover pattern, in which errors are faster than correct responses in some conditions and slower in others (Ratcliff & Smith, 2004). A number of modifications to random walk models were proposed to deal with the problem of the ordering of mean RTs for correct responses and errors, including asymmetry (non-normality) of the distributions of evidence that drive the walk (Link, 1992; Link & Heath, 1975), and biasing of an assumed log-likelihood computation on the stimulus information at each step (Ashby, 1983), but none of

74

P. L. Smith and R. Ratcliff

them provided a completely satisfactory account of the full range of experimental findings. The diffusion model attributes inequality of the RTs for correct responses and errors to between-trial variability in the operating characteristics, or “components of processing,” of the model. The diffusion model predicts equality of correct and error times only when the sole source of variability in the model is the momentto-moment variation in the accumulation process. Given the complex interaction of perceptual and cognitive processes involved in decision-making, such an assumption is probably an oversimplification. A more realistic assumption is that there is trialto-trial variability, both in the quality of information entering the decision process and in the decision-maker’s setting of decision criteria or starting points. Trial-to-trial variability in the information entering the decision process would arise either from variability in the efficiency of the perceptual encoding of stimuli or from variation in the quality of the information provided by nominally equivalent stimuli. Experimental evidence for variability of the latter kind has been obtained using a double-pass procedure in which the same stimuli are presented on pairs of different trials (Ratcliff et al., 2018). The agreement in accuracy on repeated (double-pass) trials is greater than would be predicted if processing on these trials was independent. The level of agreement implies that some of the variability in performance is due to variability among stimuli, which the diffusion model characterizes as variability in drift rates across trials. Trial-to-trial variability in decision criteria or starting points would arise as the result of the decision-maker attempting to optimize the speed and accuracy of responding (Vickers, 1979). Most RT tasks show sequential effects, in which the speed and accuracy of responding depends on the stimuli and/or the responses made on preceding trials, consistent with the idea that there is some kind of adaptive regulation of the settings of the decision process occurring across trials (Luce, 1986; Vickers, 1979). The diffusion model assumes that there is trial-to-trial variation in both drift rates and starting points. Ratcliff (1978) assumed that the drift rate on any trial, .ξ , is drawn from a normal distribution with mean .ν and standard deviation .η. Subsequently (Ratcliff et al., 1999) assumed that there is also trial-to-trial variability in the starting point, z, which they modeled as a uniform distribution with range .sz . They chose a uniform distribution mainly on the grounds of convenience, because the predictions of the model are relatively insensitive to the distribution’s form. The main requirement is that all of the probability mass of the distribution must lie between the decision criteria, which is satisfied by a uniform distribution with .sz suitably constrained. The distributions of drift and starting point are shown in Fig. 1. Trial-to-trial variation in drift rates allows the model to predict slow errors; trialto-trial variation in starting point allows it to predict fast errors. The combination of the two allows it to predict crossover interactions, in which there are fast errors for high discriminability stimuli and slow errors for low discriminability stimuli. Figure 2a shows how trial-to-trial variability in drift results in slow errors. The assumption that drift rates vary across trials means that the predicted RT distributions are probability mixtures, made up of trials with different values of drift. When the drift is small (i.e., near zero), error rates will be high and RTs will be long. When the drift is large, error rates will be low and RTs will be short. Because

An Introduction to the Diffusion Model of Decision-Making

75

Fig. 2 Effects of trial-to-trial variability in drift rates and starting points. The predicted RT distributions are probability mixtures across processes with different drift rates (top) or different starting points (bottom). Variability in drift rates leads to slow errors; variability in starting points leads to fast errors

errors are more likely on trials on which the drift is small, a disproportionate number of the trials in the error distribution will be trials with small drifts and long RTs. Conversely, because errors are less likely on trials on which drift is large, a disproportionate number of the trials in the correct response distribution will be trials with large drifts and short RTs. In either instance, the predicted mean RT will be the weighted mean of the RTs on trials with small drift and large drifts. Figure 2a illustrates how slow errors arise in a simplified case in which there are just two drifts, .ξ1 and .ξ2 , with .ξ1 > ξ2 . When the drift is .ξ1 , the mean RT is 400 ms and the probability of a correct response, .P (C), is 0.95. When the drift is .ξ2 , the mean RT is 600 and .P (C) = 0.80. The predicted mean RTs are the weighted means of large drift and small drift trials. The predicted mean RT for correct responses is .(0.95 × 400 + 0.80 × 600)/1.75 = 491 ms. The predicted mean for error responses .(0.05 × 400 + 0.20 × 600)/0.25 = 560 ms. Rather than just two drifts, the diffusion model assumes that the predicted means for correct responses and errors are weighted means across an entire normal distribution of drift. However, the effect is the same: predicted mean RTs errors are longer than those for correct responses.

76

P. L. Smith and R. Ratcliff

Figure 2b illustrates how fast errors arise as the result of variation in starting point. Again, we have shown a simplified case, in which there are just two starting points, one of which is closer to the lower, error, response criterion and the other of which is closer to the upper, correct, response criterion. In this example, a single value, of drift, .ξ , has been assumed for all trials. The model predicts fast errors because the mean time for the process to reach criterion depends on the distance it has to travel and because it is more likely to terminate at a particular criterion if the criterion is near the starting point rather than far from it. When the starting point is close to the lower criterion, errors are faster and also more probable. When the starting point is close to the upper criterion, errors are slower, because the process has to travel further to reach the error criterion and are less probable. Once again, the predicted distributions of correct responses and errors are probability mixtures across trials with different values of starting point. In the example shown in Fig. 2b, when the process starts near the upper criterion, the mean RT for correct responses is 350 ms and .P (C) = 0.95. When it starts near the lower criterion, the mean RT for correct responses is 450 ms and .P (C) = 0.80. The predicted mean RTs for correct responses and errors are again the weighted means across starting points. In this example, the mean RT for correct responses is .(0.95 × 350 + 0.80 × 450)/1.75 = 396 ms; the mean RT for errors is .(0.20 × 350 + 0.05 × 450)/0.25 = 370 ms. Again, the model assumes that the predicted mean times are weighted means across the entire distribution of starting points, but the effect is the same: predicted mean times for errors are faster than those for correct responses. When equipped with both variability in drift and starting point, the model can predict both the fast errors and the slow errors that are found experimentally (Ratcliff & Smith, 2004). The final component of processing in the model is the nondecision time, denoted .Ter . Like many other models in psychology, diffusion models assume that RT can be additively decomposed into the decision time, .TD , and the time for other processes, .Ter : RT = TD + Ter .

.

The subscript in the notation means “encoding and responding.” In many applications of the model, it suffices to treat .Ter as a constant. In practice, this is equivalent to assuming that it is an independent random variable whose variance is negligible compared to that of .TD . In other applications, particularly those in which discriminability is high and speed is emphasized and RT distributions have small variances, the data are better described by assuming that .Ter is uniformly distributed with range .st . As with the distribution of starting point, the uniformly distribution is used mainly as a convenience, because when the variance of .Ter is small compared to that of .TD , the form of the distribution of RT will be determined almost completely by the form of the distribution of decision times, regardless of the distribution of .Ter (Ratcliff, 2013). The advantage of assuming some variability in .Ter in these settings is that it allows the model to better capture the leading edge of the empirical RT

An Introduction to the Diffusion Model of Decision-Making

77

distributions, which characterizes the fastest 5–10% of responses and which tends to be slightly more variable than the model predicts.

5 Bias and Speed-Accuracy Tradeoff Effects Bias effects and speed-accuracy tradeoff effects are ubiquitous in experimental psychology. Bias effects typically arise when the two stimulus alternatives occur with unequal frequency or have unequal rewards attached to them. Speed-accuracy tradeoff effects arise as the result of explicit instructions emphasizing speed or accuracy or as the result of an implicit set on the part of the decision-maker. Such effects can be troublesome in studies that measure only accuracy or only RT, because of the asymmetrical way in which these variables can be traded off. Small changes in accuracy can be traded off against large changes in RT, which can sometimes make it difficult to interpret a single variable in isolation (Luce, 1986). One of the attractive features of sequential-sampling models like the diffusion model is that they provide a natural account of how speed-accuracy tradeoffs arise. As shown in Fig. 3, the models assume that criteria are under the decision-maker’s control. Moving the criteria further from the starting point (i.e., increasing a while keeping .z = a/2) increases the distance the process must travel to reach a criterion and also reduces the probability that it will terminate at the wrong criterion because of the cumulative effects of noise. The effect of increasing criteria will thus be slower and more accurate responding. This is the speed-accuracy tradeoff. The diffusion model with variation in drift and starting point can account for the interactions with experimental instructions emphasizing speed or accuracy that are found experimentally. When accuracy is emphasized and criteria are set far from the starting point, variations in drift have a greater effect on performance than do variations in starting point, and so slow errors are found. When speed is emphasized

Speed/Accuracy Tradeoff Boundary separation changes

z

Response Bias Bias towards top boundary (blue lines) changes to bias towards bottom boundary (red lines)

z

Fig. 3 Speed-accuracy tradeoff and response bias. Reducing decision criteria leads to faster and less accurate responding. Shifting the starting point biases the process toward the response associated with the nearer criterion

78

P. L. Smith and R. Ratcliff

and criteria are near the starting point, variations in starting point have a greater effect on performance than do variations in drift and fast errors are found. As well as affecting criterion placement, a number of studies have reported that speed versus accuracy instructions affect mean nondecision times and/or mean drift rates (Dutilh et al., 2019). How these latter effects should be interpreted theoretically is a matter of ongoing debate. Like other sequential-sampling models, the diffusion model accounts for bias effects by assuming unequal criteria, represented by a shift in the starting point toward the upper or lower criterion, as shown in Fig. 3. Shifting the starting point toward a particular response criterion increases the probability of that response and reduces the average time taken to make it. The probability of making the other response is reduced and the average time to make it is correspondingly increased. The effect of changing the prior probabilities of the two responses, by manipulating the relative stimulus frequencies, is well described by a change in the starting point (unequal decision criteria). In contrast, unequal reward rates not only lead to a bias in decision criteria, but they also lead to a bias in the way stimulus information is classified (Leite & Ratcliff, 2011). This can be captured in the idea of a drift criterion, which is a criterion on the stimulus information, like the criterion in signal detection theory. The effect of changing the drift criterion is to make the drift rates for the two stimuli unequal. Both kinds of bias effects appear to operate in tasks with unequal reward rates.

6 Mathematical Methods for Diffusion Models Diffusion processes can be defined mathematically either via partial differential equations or by stochastic differential equations. If .f (τ, y; t, x) is the transition density of the process .Xt , that is, .f (τ, y; t, x) dx is the probability that a process starting at time .τ in state y will be found at time t in a small interval .(x, x + dx), then the accumulation process .Xt , with drift .ξ and diffusion coefficient .s 2 , satisfies the partial differential equation .



1 ∂ 2f ∂f ∂f = s2 2 + ξ . ∂τ 2 ∂y ∂y

This equation is known in the probability literature as Kolmogorov’s backward equation, so-called because its variables are the starting time .τ and the initial state y. The process also satisfies a related equation known as Kolmogorov’s forward equation, which is an equation in t and x (Cox & Miller, 1965; Gardiner, 2004). The backward equation is used to derive RT distributions; the forward equation is useful either for studying evidence accumulation by a process unconstrained by criteria (Ratcliff, 1978) or for deriving the distribution of nonterminated processes between criteria at specified times (Ratcliff, 2006). The latter distribution is important in

An Introduction to the Diffusion Model of Decision-Making

79

modeling performance in response-signal tasks in which decisions must be made to unpredictable, external deadlines (Reed, 1976). Alternatively, the process can be defined as satisfying the stochastic differential equation (Gardiner, 2004): dXt = ξ dt + s dWt .

.

The latter equation is useful because it provides a more direct physical intuition about the properties of the accumulation process. Here .dXt is interpreted as the small, random change in the accumulated evidence occurring in a small time interval of duration dt. The equation says that the change in evidence is the sum of a deterministic and a random part. The deterministic part is proportional to the drift rate, .ξ ; the random part is proportional to the infinitesimal standard deviation, s. The term on the right, .dWt , is the differential of a Brownian motion or Wiener process, .Wt . It can be thought of as the random change in the accumulation process during the interval dt when it is subject to the effects of many small, independent random perturbations, described mathematically as a white noise process. White noise is a mathematical abstraction, which cannot be realized physically, but it provides a useful approximation to characterize the properties of physical systems that are perturbed by broad-spectrum, Gaussian noise. Stochastic differential equations are usually written in the differential form given here, rather than in the more familiar form involving derivatives, because of the extreme irregularity of the sample paths of diffusion processes, which means that quantities of the form .dXt /dt are not well defined mathematically. Solution of the backward equation leads to an infinite series expression for the predicted RT distributions and an associated expression for accuracy (Ratcliff, 1978; Cox & Miller, 1965; Buonocore et al., 1990). The stochastic differential equation approach leads to a class of integral equation methods that were developed in mathematical biology to study the properties of integrate-and-fire neurons. The interested reader is referred to Ratcliff and Smith (2004), Smith (2000), and Buonocore et al. (1990) for details. For a two-boundary process with drift .ξ , boundary separation a, starting point z, and infinitesimal standard deviation s, with no variability in any of its parameters, the probability of responding at the upper barrier, .P (ξ, a, z), is     exp −2ξ a/s 2 − exp −2ξ z/s 2   .P (ξ, a, z) = . exp −2ξ a/s 2 − 1 The cumulative distribution of first passage times at the upper boundary, a, is G(t, ξ, a, z) =

.



2k sin π s2 2  P (ξ, a, z) − 2 e−ξ z/s a k=1

 kπ z  a

 2  exp − 21 ξs 2 +   2 ξ k2 π 2 s 2 + 2 2 a s

k2 π 2 s 2 a2

  t .

80

P. L. Smith and R. Ratcliff

The probability of a response and the cumulative distribution of first passage times at the lower boundary are obtained by replacing .ξ with .−ξ and z with .a − z in the preceding expressions. More details can be found in Ratcliff (1978). In addition to the partial differential equation and integral equation methods, predictions for diffusion models can also be obtained using finite-state Markov chain methods or by Monte Carlo simulation (Tuerlinckx et al., 2001). The Markov chain approach, used to model diffusion decision processes by ?Diederich and Busemeyer (2003), approximates a continuous-time, continuous-state, diffusion process by a discrete-time, discrete-state, birth-death process (Bhattacharya & Waymire, 1990). A transition matrix is defined that specifies the probability of an increment or a decrement to the process, conditional on its current state. The entries in the transition matrix express the relationship between the drift and diffusion coefficients of the diffusion process and the transition probabilities of the approximating Markov chain. The transition matrix includes two special entries that represent criterion states, which are set equal to 1.0, expressing the fact that once the process has transitioned into a criterion state, it does not leave it. An initial state vector is defined, which represents the distribution of probability mass at the beginning of the trial, including the effects of any starting point variation. First passage times and probabilities can then be obtained by repeatedly multiplying the state vector by the transition matrix. These alternative methods are useful for more complex models for which an infinite-series solution may not be available. There are now software packages available for fitting the standard diffusion model that avoid the need to implement the model from first principles (Vandekerckhove & Tuerlinckx, 2008; Wiecki et al., 2013; Voss & Voss, 2008).

7 The Representation of Empirical Data The diffusion model predicts accuracy and distributions of RT for correct responses and errors as a function of the experimental variables. In many experimental settings, the discriminability of the stimuli is manipulated as a within-block variable, while instructions, payoffs, or prior probabilities are manipulated as betweenblock variables. The model assumes that manipulations of discriminability affect drift rates, while manipulations of other variables affect criteria or starting points. Although criteria and starting points can vary from trial to trial, they are assumed to be independent of drift rates and to have the same average value for all stimuli in a block. This assumption provides an important constraint in model testing. To show the effects of discriminability variations on accuracy and RT distributions, the data and the predictions of the model are represented in the form of a quantile probability plot, as shown in Fig. 4. To construct such a plot, each of the RT distributions is summarized by an equal-area histogram. Each RT distribution is represented by a set of rectangles, each representing 20% of the probability mass in the distribution, except for the two rectangles at the extremes of the distribution, which together represent the 20% of mass in the upper and lower tails. The time-

An Introduction to the Diffusion Model of Decision-Making

81

Fig. 4 Representing data in a quantile probability plot. Top panel: An empirical RT distribution is summarized using an equal-area histogram with bins bounded by the distribution quantiles. Middle panel: The quantiles of the RT distributions for correct responses and errors are plotted vertically against the probability of a correct response on the right and the probability of an error response on the left. Bottom panel: Example of an empirical quantile probability plot from a brightness discrimination experiment. The digits 1 through 5 are the empirical 0.1, 0.3, 0.5, 0.7, and 0.9 RT quantiles and the continuous curves are the quantiles predicted by the model

axis bounds of the rectangles are distribution quantiles, that is, those values of time that cut off specified proportions of the mass in the distribution. Formally, the pth quantile, .Qp , is defined to be the value of time such that the proportion of RTs in the distribution that are less than or equal to .Qp is equal to p. The distribution in the figure has been summarized using five quantiles: the 0.1, 0.3, 0.5, 0.7, and 0.9 quantiles. The 0.1 and 0.9 quantiles represent the upper and lower tails of the distribution, that is, the fastest and slowest responses, respectively. The 0.5 quantile is the median and represents the distribution’s central tendency. As shown in the figure, the set of five quantiles provides a good summary of the location, variability, and shape of the distribution.

82

P. L. Smith and R. Ratcliff

To construct a quantile probability plot, the quantile RTs for correct responses and errors are plotted on the y-axis against the choice probabilities (i.e., accuracy) on the x-axis for each stimulus condition, as shown in the middle panel of the figure. Specifically, if, .Qi,p (C) and .Qi,p (E) are, respectively, the quantiles of the RT distributions for correct responses and errors in condition i of the experiment, and .Pi (C) and .Pi (E) are the probabilities of a correct response and an error in that condition, then the values of .Qi,p (C) are plotted vertically against .Pi (C) for .p = 0.1, 0.3, 0.5, 0.7, 0.9, and the values of .Qi,p (E) are similarly plotted against .Pi (E). All of the distribution pairs and choice probabilities from each condition are plotted in a similar way. The bottom panel of the figure shows data from a brightness discrimination experiment from Ratcliff and Smith (2010) in which four different levels of stimulus discriminability were used. Because of the way the plot is constructed, the two outermost distributions in the plot represent performance for the most discriminable stimuli and the two innermost distributions represent performance for the least discriminable stimuli. The value of the quantile probability plot is that it shows how performance varies parametrically as stimulus discriminability is altered, and how different parts of the RT distributions for correct responses and errors are affected differently. As shown in the figure, most of the change in the RT distribution with changing discriminability occurs in the upper tail of the distribution (e.g., the 0.7 and 0.9 quantiles); there is very little change in the leading edge (the 0.1 quantile). This pattern is found in many perceptual tasks and also in more cognitive tasks like recognition memory. The quantile probability plot also shows that errors were slower than correct responses in all conditions. This appears as a left-right asymmetry in the plot; if the distributions for correct responses and errors were the same, the plot would be mirror image symmetrical around its vertical midline. The predicted degree of asymmetry is a function of the standard deviation of the distribution of drift rates, .η and, when there are fast errors, of the range of starting points, .sz . The slow-error pattern of data in Fig. 4 is typical of difficult discrimination tasks in which accuracy is emphasized. The pattern of data in Fig. 4 is rich and highly constrained and represents a challenge for any model. The success of the diffusion model is that it has shown repeatedly that it can account for data of this kind. Its ability to do so is not a just a matter of model flexibility. It is not the case that the model is able to account for any pattern of data whatsoever (Ratcliff, 2002). Rather, as noted previously, the model predicts families of RT distributions that have a specific and quite restricted form. Distributions of this particular form are the ones most often found in experimental data.

8 Fitting the Model to Experimental Data Fitting the model to experimental data requires estimation of its parameters by iterative, nonlinear minimization. Many of studies in the literature have used classical

An Introduction to the Diffusion Model of Decision-Making

83

measures of model fit based on chi-square statistics or maximum likelihood, but Bayesian methods have become increasingly influential (Vandekerckhove et al., 2011) and a number of recent articles have compared the performance of the two (Dutilh et al., 2019; Ratcliff & Childers, 2015). Classically, the task of model fitting involves minimizing a fit statistic, or loss function, that characterizes the discrepancy between the model and the data. A variety of minimization algorithms have been used in the literature. The Nelder–Mead SIMPLEX algorithm has been popular because of its robustness (Nelder & Mead, 1965), although recently stochastic optimization methods like the Metropolis-Hastings algorithm (Usher & McClelland, 2001) and differential evolution (Turner et al., 2013) have also been used. A variety of fit statistics have been used in applications, but chi-square-type statistics, either the Pearson chi-square (.χ 2 ) or the likelihood-ratio chi-square (.G2 ), are common. For an experiment with m stimulus conditions, these are defined as χ2 =

m 

.

ni

12  (pij − πij )2 πij j =1

i=1

and G =2

.

2

m  i=1

ni

12  j =1



pij pij ln πij

,

respectively. In these equations, the outer summation over i indexes the m conditions in the experiment and the inner summation over j indexes the 12 bins defined by the quantiles of the RT distributions for correct responses and errors. (The use of five quantiles per distribution gives six bins per distribution, or 12 bins per correct and error distribution pair.) The quantities .pij and .πij are the observed and predicted proportions of probability mass in each bin, respectively, and .ni is the number of stimuli in the ith experimental condition. For bins defined by the quantile bounds, the values of .pij will equal 0.2 or 0.1, depending on whether or not the bin is associated with a tail quantile, and the values of .πij are the differences in the probability mass in the cumulative finishing time distributions, evaluated at adjacent quantiles, .G(Qi,p , ν, a, z) − G(Qi,p−1 , ν, a, z). Here we have written the cumulative distribution as a function of the mean drift, .ν, rather than the trialdependent drift, .ξ , to emphasize that the cumulative distributions are probability mixtures across a normal distribution of drift values. Because the fit statistics keep track of the distribution of probability mass across the distributions of correct responses and errors, minimizing them fits both RT and accuracy simultaneously. Fitting the model typically requires estimation of around 8–10 parameters. For an experiment with a single experimental condition and four different stimulus discriminabilities like the one shown in Fig. 4, a total of 10 parameters must be estimated to fit the full model. There are four values of the mean drift, .νi , .i = 1, . . . , 4, a boundary separation parameter, a, a starting point, z, a nondecision

84

P. L. Smith and R. Ratcliff

time, .Ter , and variability parameters for the drift, starting point, and nondecision time, .η, .sz , and .st , respectively. As noted previously, to make the model estimable, the infinitesimal standard deviation is typically fixed to an arbitrary value (Ratcliff uses .s = 0.1 in his work, but .s = 1.0 has also been used). In experiments in which there is no evidence of response bias, the data can be pooled across the two responses to create one distribution of correct responses and one distribution of errors per stimulus condition. Under these conditions, a symmetrical decision process can be assumed (.z = a/2) and the number of free parameters reduced by one. Although the model has a reasonably large number of free parameters, it affords a high degree of data reduction, defined as the number of degrees of freedom in the data divided by the number of free parameters in the model. There are .11 m degrees of freedom in a data set with m conditions and six bins per distribution. (One degree of freedom is lost for each correct-error distribution pair, because the expected and observed masses are constrained to be equal in each pair, giving .12−1 = 11 degrees of freedom per pair.) For the experiment in Fig. 4, there are 44 degrees of freedom in the data and the model had nine free parameters, which represents a data reduction ratio of almost 5:1. For larger data sets, data reduction ratios of more than 10:1 are common. This represents a high degree of parsimony and explanatory power. It is possible to fit the diffusion model by maximum likelihood instead of by minimum chi-square. Maximum likelihood defines a fit statistic (a likelihood function) on the set of raw RTs rather than on the probability mass in the set of bins, and maximizes this (i.e., minimizes its negative). Despite the theoretical appeal of maximum likelihood, its disadvantage is that it is vulnerable to the effects of contaminants or outliers in a distribution. Almost all data sets have a small proportion of contaminant responses in them, whether from finger errors or from lapses in vigilance or attention, or other causes. RTs from such trials are not representative of the process of theoretical interest. Because maximum likelihood requires that all RTs be assigned a nonzero likelihood, outliers of this kind can disrupt fitting and estimation, whereas minimum chi-square is much less susceptible to such effects. Otherwise, the problem of outliers can be addressed by attempting to model them explicitly. One way to do this is to model the RT distributions as mixtures of valid responses and contaminant responses, where the contaminants are assumed to be uniformly distributed on the range zero to the maximum observed RT with a mixing parameter that is estimated in the fit. This avoids the problem of zero likelihoods and effectively stabilizes maximum likelihood estimation (Ratcliff & Tuerlinckx, 2002). Many applications of the diffusion model have fitted it to group data, obtained by quantile averaging the RT distributions across participants. A group data set is created by averaging the corresponding quantiles, .Qi,p , for each distribution of correct responses and errors in each experimental condition across participants. The choice probabilities in each condition are also averaged across participants. The advantage of group data is that it is less noisy and variable than individual data. A potential concern when working with group data is that quantile averaging may distort the shapes of the individual distributions, but in practice, the model appears to be robust to averaging artifacts. Studies comparing fits of the model to group

An Introduction to the Diffusion Model of Decision-Making

85

and individual data have found that both methods lead to similar conclusions. In particular, the averages of the parameters estimated by fitting the model to individual data agree fairly well with the parameters estimated by fitting the model to quantileaveraged group data (Ratcliff et al., 2003, 2004). Although the effects of averaging have not been formally characterized, the robustness of the model to averaging may be a result of the relative invariance of its families of distribution shapes, discussed previously. Bayesian methods of data analysis and model fitting are becoming increasingly more widespread in cognitive psychology, including in applications of the diffusion model (Dutilh et al., 2019; Vandekerckhove et al., 2011). Classical statistics conceives of model fitting as the task of finding a set of model parameters that makes the model and the data as close to one another as possible, whereas the goal of Bayesian methods is to put a probability distribution on the model parameters, given the data and given the researcher’s prior beliefs or other evidence about how parameters are distributed (e.g., Appendix A of Matzke and Wagenmakers (2009)). Generically, .p(θ |D), the posterior probability of the model parameters, .θ , conditional on the observed data, .D, is given by Bayes’ rule (Vandekerckhove et al., 2011), p(θ |D) =

.

p(D|θ )p(θ) . p(D)

The substance of Bayes’ rule is that it allows one to reverse the order of conditioning in conditional probability statements. Classical fitting methods allow one to make statements about .p(D|θ), the probability, or likelihood, of obtaining the observed data, given a set of model parameters, whereas Bayes’ rule allows one instead to make statements about .p(θ|D), the probability of the parameters given the observed data. Bayesian data analysts argue that this latter quantity is the more meaningful one scientifically. The term in the denominator of Bayes’ rule, .p(D), is the probability of the observed data. It is obtained by integrating the product of the likelihood, .p(D|θ), and the prior probability of the parameters, .p(θ ), in the numerator across the values of the parameters, .θ , which requires evaluation of a multiple integral of dimension equal to the number of parameters in the model. This integral is prohibitively expensive to evaluate by conventional numerical methods for all but the simplest models, but it can be evaluated by Markov chain Monte Carlo methods. Instead of attempting to evaluate the integral exhaustively, Markov chain Monte Carlo tries to identify the region in the parameter space in which most of the probability mass is located and approximates the value of the integral in this region by simulation. For moderate to large samples of data, the performance of classical and Bayesian methods appears to be fairly similar (Dutilh et al., 2019; Ratcliff & Childers, 2015). However, for small samples, Bayesian methods, combined with a hierarchical modeling approach (Vandekerckhove et al., 2011), appear to offer better parameter recovery properties. HDDM, a freely available Bayesian fitting package for the diffusion model, provides a hierarchical modeling capability (Wiecki et al., 2013).

86

P. L. Smith and R. Ratcliff

In hierarchical models, the parameters of the diffusion model for the individual participants are assumed to be drawn from some parent population distribution. Rather than fitting participants individually, the individual participants are fitted simultaneously, with parameter values constrained by the population distribution. Although hierarchical models can also be developed within a classical statistical framework, Bayesian methods make them comparatively easier to work with. When samples are small (e.g., fewer than 40 observations per condition per participant), the population-level constraints imposed by hierarchical models can help stabilize fits and lead to better parameter recovery than is obtained from independent individual fits (Ratcliff & Childers, 2015). The hierarchical diffusion model implemented in HDDM also provides good recovery of the parameters of the diffusion model for participants from different populations. One shortcoming of the current HDDM implementation is it assumes the across-trial variability parameters of the diffusion model are the same for all participants and uses this assumption to constrain the estimates of the other parameters. This constraint can result in incorrect estimates under some circumstances.

9 Diffusion Models of Continuous Outcome Decisions Currently, an area of active research is to extend diffusion models from discretechoice tasks to tasks in which there is a continuous range of decision outcomes. In the real world, many action-oriented decisions made while walking, cycling, or driving require the decision-maker to select one of a continuous range of possible actions or responses. These tasks are most naturally thought of as decision tasks with a continuous outcome or choice set. In the laboratory, continuous outcome decision tasks have been increasingly used in the study of perceptual and memory processes for stimulus attributes like color or orientation (Adam et al., 2017; Bae et al., 2015; Persaud & Hemmer, 2016). They have also been used for human analogues of the saccade-to-target decision task that is used in visual neuroscience (Hanes & Schall, 1996) to study identification of visual targets among distractors (Ratcliff, 2018). Recently, two generalizations of the diffusion model have been proposed that extend it to continuous outcome tasks: one is the circular diffusion model of Smith (2016) and the other is the spatially continuous diffusion model (SCDM) of Ratcliff (2018). These models generalize the original diffusion model in different ways and, while there is overlap in their potential areas of application, the kinds of task that motivated their development were somewhat different. Reflecting this difference, the models make different assumptions about how evidence for a decision is accumulated. The circular diffusion model assumes that the dimensionality of the evidence process is equal to the dimensionality of the stimulus space. For stimuli that are represented perceptually in a two-dimensional space, like isoluminant colors, orientation, or direction of motion, the evidence process is two-dimensional. The SCDM assumes that the dimensionality of the evidence process is equal to the

An Introduction to the Diffusion Model of Decision-Making

87

dimensionality of the response space. For responses on a continuum, on which every point is a possible response, the dimensionality of the evidence process is infinite. Much of the impetus for the development of the circular diffusion model came from the recent visual working memory literature, which has increasingly relied on continuous outcome decision tasks to characterize stimulus representations in memory. The first paper to do so was by Wilken and Ma (2004), who used a method that Prinzmetal et al. (1998) had developed to study attention and perceptual variability. In a typical visual working memory experiment, participants are required to encode a set of stimuli, often consisting of a set of colored patches or a set of oriented bars, into memory (Adam et al., 2017). In a continuous outcome version of the task, at the end of a retention interval the participant is asked to recall the color or orientation of a designated item in memory by clicking a mouse on a point on a circle in the display that corresponds to the remembered item. This yields a distribution of decision outcomes that characterizes the angular error between the true and the remembered value of the item. The empirical distribution of decision outcomes can be described mathematically by a finite or a continuous mixture of von Mises distributions (van den Berg et al., 2014; Zhang & Luck, 2008). The von Mises distribution is a circular analogue of the normal distribution and has probability density function f (θ ; ϕ, κ) =

.

eκ cos(θ−ϕ) . 2π I0 (κ)

This equation describes a bell-shaped curve on a circle, .[0, 2π ] radians. The argument .θ is the angular error of the response, .ϕ is the center of the distribution, and .κ is its precision. Loosely, precision is the reciprocal of variance and characterizes the spread of responses around the circle: High precision corresponds to low variance and vice versa. The quantity in the denominator, .I0 (κ), is a modified Bessel function of the first kind of order zero and serves as a scale factor that normalizes the mass in the distribution to unity. In visual working memory, the precision of the empirical distribution is interpreted as being a reflection of the fidelity of the memory representations that give rise to the set of observed responses. How precision changes as a function of the number of items stored in memory has been a dominant focus of visual working memory research for more than a decade (Ma et al., 2014). The use of the continuous outcome task in these studies, and the associated use of precision as a measure of memory performance, reflects the belief of investigators that these tasks can provide more information about the underlying cognitive processes than can be obtained from traditional two-choice decision tasks. Figure 5 shows the main elements of the circular diffusion model. In the model, evidence accumulation is represented by a two-dimensional Brownian motion or Wiener process on the interior of a disk, whose bounding circle, of radius a, represents the decision criterion for the task. Evidence accumulation is represented by a vector-valued stochastic differential equation, which—apart from notational differences—is a two-dimensional counterpart of the equation for

88

P. L. Smith and R. Ratcliff

Fig. 5 Circular diffusion model. (a) Evidence is accumulated by a two-dimensional Wiener diffusion process, .X t , on the interior of a disk, whose bounding circle, of radius a, represents the decision criterion for the task. The drift rate is vector-valued with norm .‖μ‖ and phase angle .θμ . Evidence accumulation begins at .X 0 = 0 and continues until the process hits a point, .X θ , on the bounding circle at time .Tθ . The hitting point is the decision outcome and the hitting time is the decision time. The irregular path shows the accumulating evidence on a single experimental trial. (b) and (c) Marginal and joint distributions of decision outcomes and decision times for one participant performing a saccade-to-target chromatic decision task, in which participants judged the dominant hue of a color patch that was perturbed by chromatic noise and moved their gaze to the corresponding point on a surrounding color wheel. The symmetrical distribution on the left in (b) is the distribution of angular errors of the decision outcomes, expressed in radians. The skewed distribution on the right is the distribution of RT, measured as the time to break fixation. (c) Joint distribution of decision outcomes and RT. The symbols are the quantiles of the empirical distribution and the continuous lines are the predictions of the model. The plot shows the continuous slow-error pattern predicted with across-trial variability in drift rate norm

the one-dimensional diffusion process of Sect. 6. For the circular model, evidence growth is described by the equation dX t = μ dt + σ dW t .

.

In this equation, .dX t is a two-element column vector that characterizes the random change in the process in the vertical and horizontal directions during a small interval

An Introduction to the Diffusion Model of Decision-Making

89

of duration dt. The components of the vector .μ are the vertical and horizontal components of the drift rate and .σ is a .2×2 dispersion matrix, which is proportional to an identity matrix with nonzero entries .σ , which characterizes the noisiness of the process. The noise term, .dWt , is a two-element vector of independent Brownian motions that describes the random perturbations of the process in the vertical and horizontal directions. As can be seen from Fig. 5, the model assumes that evidence accumulation begins at the center of the disk, .X0 = 0, and continues until it hits a point on the bounding circle, .Xθ . The hitting point is the decision outcome, which is the point where the participant clicks the mouse, and the hitting time, .Tθ , is the decision time, where the capitalization in the notation indicates that the hitting time is a random variable. Because the drift rate, .μ, is vector-valued, it may be expressed in polar coordinates, as shown in Fig. 5. In polar coordinates, the drift rate has a length or norm, .‖μ‖ =

μ21 + μ22 , and a direction or phase angle, .μθ = arctan(μ2 /μ1 ), where .μ1 and .μ2 are the components of drift rate in the horizontal and vertical directions, respectively. Psychologically, the phase angle of the drift rate represents the encoded stimulus identity and the norm represents the quality of the encoded representation. The strong symmetry assumptions of the circular diffusion model means it is possible to derive an analytic expression for the predicted joint decision-time, decision-outcome distribution. Here we give an expression for the joint density function rather than the cumulative distribution function, because this is the function used to fit the model to data by maximum likelihood, which is how it has been evaluated empirically to date. Using similar notation to Sect. 6, we denote by .g(θ, t; a, μ, σ ) the probability density of response .θ at time t for a process with drift rate .μ, decision criterion a, and noise term .σ . This density has the form (Smith, 2016; Smith et al., 2020),

1 1 σ2 2 exp 2 (aμ1 cos θ + aμ2 sin θ ) − ‖μ‖ t .g(θ, t; a, μ, σ ) = 2π a 2 σ 2σ 2  2 2  ∞  j0,k σ j0,k exp − × t . J1 (j0,k ) 2a 2 k=1

The noise term, .σ , is the term on the main diagonal of the dispersion matrix, .σ . The function .J1 (x) in the denominator on the right is a first-order Bessel function of the first kind and the .j0,k terms in the numerator are the zeros of a zero-order Bessel function of the first kind, .J0 (x), that is, the points at which the function crosses the x-axis (Smith, 2016). A striking property of the circular diffusion model is that the predicted distribution of decision outcomes, .g(θ ; a, μ, σ ), obtained by marginalizing .g(θ, t; a, μ, σ ) with respect to time, has the form of a von Mises distribution, with center .ϕ = μθ and precision κ=

.

a‖μ‖ . σ2

90

P. L. Smith and R. Ratcliff

That is, precision is equal to the quality of the evidence in the stimulus, times the amount of evidence needed to make a response, divided by the noisiness of the evidence accumulation process. In models of visual working memory, precision is usually treated as an irreducible theoretical quantity that varies with experimental conditions, but in the circular diffusion model it is decomposable into simpler components of processing that are the same as those used to characterize the speed and accuracy of decisions in two-choice tasks. When there is no across-trial variability in the components of processing, the circular diffusion model, like the two-choice model, predicts that decision times will be the same for all decision outcomes. However, when there is across-trial variability in drift rates or decision criteria, the model predicts continuous forms of the slow-error and fast-error pattern predicted by the two-choice model. When there is across-trial variability in the drift rate norm, the model predicts that the most accurate responses will be the fastest responses; when there is across-trial variability in criterion, the model predicts that the least accurate responses will be the fastest responses (Smith, 2016). The bottom panels of Fig. 5 show example data and model fits for one participant performing a color decision task, in which participants made decisions about the dominant hues of color patches that were perturbed by chromatic noise (Ratcliff, 2018; Smith et al., 2020) and expressed their decisions by moving their gaze from a central fixation point to a point on a surrounding color wheel. The RT was the time to break fixation and the decision outcome was the landing point of the first corrective eye movement after the initial saccade. The panels in the lower left of Fig. 5 are the marginal distributions of decision outcomes and RT, respectively. The distribution of decision outcomes is a symmetrical, high-tailed distribution, with a central bell-shaped region flanked by flat tails, similar to the distributions found in visual working memory studies with high memory loads (Zhang & Luck, 2008). The data in Fig. 5 were from a condition with a high level of added noise and the model used to fit them assumed that performance was a mixture of trials on which the stimulus was correctly encoded and trials on which encoding failed and the stimulus representations were randomly distributed around the circle. There was evidence of encoding failures only under high noise conditions; at low and intermediate levels of noise performance was well described by a model in which the drift rate norm varied across trials, but the drift rate phase angle, which describes the encoded stimulus identity, did not. The RT distributions predicted by the circular diffusion model are unimodal and positively skewed, like those predicted by the two-choice model, and change as a function of drift rate and decision criterion in similar ways (Smith, 2016). The panel at the lower right shows the joint distribution of decision outcomes and RT in the form of a two-dimensional (.Q × Q) quantile plot. The horizontal axis plots quantiles of the distribution of decision outcomes and the vertical axis plots quantiles of the distributions of RT. The dish-shaped form of the .Q × Q plot is an expression of the model’s slow-error property: The fastest responses are at the center of the plot where the response error is smallest and the slowest responses are

An Introduction to the Diffusion Model of Decision-Making

91

Fig. 6 Spatially continuous diffusion model. (a) Saccade-to-target decision task. Participants move their eyes from the central fixation square to the point of maximum brightness on the surrounding annulus. (b) Normal distribution of drift rates. (c) Six examples of random Gaussian process noise. (d) Five samples of accumulated evidence with the last reaching the decision criterion, a. The panels at the bottom show data and SCDM model predictions for a color decision task, in which color patches were perturbed with high levels of chromatic noise, similar to the task in Fig. 1. The panels on the left show distributions of decision outcomes for 16 subjects, and the panels on the right show mean RT as a function of angular in error in degrees, where the correct response is centered on .180◦ , averaged across subjects

at the edges where the response error is largest. The model predicts this pattern of RTs with across-trial variability in drift rate norm, .‖μ‖. Ratcliff’s SCDM (Ratcliff, 2018), shown in Fig. 6, is another way of generalizing the two-choice diffusion model to decisions on a continuous scale. One of the applications of the model, as noted above and as shown in Fig. 6a, is to the saccadeto-target decision task that is widely used in monkey neuroscience. In the form of the task shown in the figure, the targets are Gaussian luminance blobs presented in

92

P. L. Smith and R. Ratcliff

random pixel noise and the display can contain additional distractor blobs of lower luminance as well as the target. The decision-maker’s task is to make a saccade (or a hand movement or a mouse click) to the point of maximum luminance in the surrounding annulus. Because the targets and distractors can appear anywhere on the circle, the task is most naturally conceived of as a continuous outcome task rather than one with a discrete-choice set. Rather than assuming a single bivariate evidence accumulator, as in the circular diffusion model, the SCDM instead assumes that there is a separate evidence accumulator associated with each point on the continuum. The model can therefore be viewed as a generalization of discrete-choice models like the dual diffusion model, in which there is a separate evidence accumulator for each response (Ratcliff & Smith, 2004). In the SCDM, like the dual diffusion model, the first accumulator to reach criterion determines the response. The dual diffusion model assumes that the evidence accumulators are independent but this is not an appropriate assumption for continuous outcome decisions because it would lead to degenerate RT predictions. Independent race models assume that the RT survivor function, .Prob{T > t}, is the product of the finishing time survivor functions of the individual accumulators. As the number of independent racing accumulators grows unboundedly the predicted RT survivor function in such a model would approach a Dirac delta function, which represents a single spike of probability mass concentrated at .t = 0. For this reason, the SCDM, instead of assuming independent accumulators, assumes that the accumulators are correlated. This ensures that the model predicts well-behaved RT distributions in the continuum limit. In the circular diffusion model, the evidence accumulation process is twodimensional, and the drift rate, which characterizes the encoded information in the stimulus, is represented by a two-dimensional vector. In the SCDM, the evidence accumulation process is infinite dimensional, and the drift rate is accordingly represented by a continuum of values, as are the noise perturbations at each time step. As shown in Fig. 6b, the model assumes that the drift rate function is a scaled copy of a normal distribution, whose height represents the quality of the encoded stimulus representation and whose variance represents its dispersion. Evidence growth in the model is represented by two equations, one that describes the growth of evidence at each time step and another that describes the correlation in the noise increments at adjacent spatial positions. In discrete time and spatial position, the evidence growth equation is √ ΔXi = vi Δt + σ ηi Δt,

.

where .Δt is the time step, .vi is the height of the drift rate function at spatial position i, and .ηi is the random increment to the ith accumulator, which is a zero-mean √ Gaussian random variable with standard deviation 1.0. The quantity .ηi Δt may be viewed as a notational variant of the term .dW t in the circular diffusion model (cf. Usher and McClelland 2001). It denotes a zero-mean, unit-variance, Gaussian random variable that scales with the square root of the time step. This scaling

An Introduction to the Diffusion Model of Decision-Making

93

ensures that the variance of the process remains finite as the time step goes to zero (Smith, 2000). The noise increments to the accumulators are described by a Gaussian process, .ηi , examples of which are shown in Fig. 6c. Instead of being independent, the noise increments at adjacent locations are correlated, which results in smooth functions like those in the figure. The noise functions are generated by a Gaussian kernel function, .K(x, x ' ), of the form K(x, x ' ) = exp

.



−(x − x ' )2 , 2r 2

where x and .x ' are two points in space. The kernel function is an .m × m square matrix, obtained by allowing the separation between the points x and .x ' to vary across m different, equally spaced positions on the continuum. The approximation to a continuous process becomes better as m becomes larger and the spacing between adjacent points gets smaller. To simulate the Gaussian process, the square root, .R, of the kernel matrix is multiplied by a vector of independent zero-mean, unit-variance, Gaussian random variables. In this representation, .K = R ' R, where .R is an upper triangular matrix. At each time step, the increment to the evidence process .Xi is the sum of the drift rate vector and an independent sample of the Gaussian noise process, .ηi . The predicted joint distribution of decision times and decision outcomes is the first-passage time distribution of the evidence process .Xt through the decision criterion, a. Like the two-choice diffusion model and the circular diffusion model, the SCDM assumes that drift rate amplitude (the height of the normal curve in Fig. 6b) and decision criterion vary across trials. There appears to be no known analytic expressions for the first-passage time distribution for processes of this kind, so the model must be evaluated by simulation, which increases the challenge of fitting it to data. Because its predictions must be generated by simulation, fitting methods that use binned statistics, like .G2 , are more tractable and stable than maximum likelihood. The panels at the bottom of Fig. 6 show data and model predictions for a task similar to the one in Fig. 5, in which participants judged the dominant hues of color patches perturbed by high levels of chromatic noise and moved their eyes to the corresponding point on a surrounding color wheel. The strong symmetry assumptions required to derive analytic predictions for the circular diffusion model means that it most naturally applies to tasks in which responses are made on closed circular domains, such as decisions about color, orientation, and direction of motion. In contrast, the SCDM can also be applied to decisions in which the responses are mapped to open domains like lines or semicircles. Tasks like number-line judgments lead naturally to models of this kind (Ratcliff & McKoon, 2020). Ratcliff (2018) reported data from several experiments in which the response region was a semicircle, to which the circular diffusion model cannot be directly applied. Smith (2016) suggested a form of the model for this task in which a reflecting lower boundary is used to close the response space and constrain the process to the upper half-plane, but there appears

94

P. L. Smith and R. Ratcliff

to be no simple analytic characterization for this form of the model, although predictions for it could be derived by simulation or via a finite-state Markov chain approximation (Diederich & Busemeyer, 2003). In Markov chain models, the transition probabilities of the diffusion process are represented as a matrix and its evolution over time is given by matrix multiplication (Bhattacharya & Waymire, 1990). Despite the circular diffusion model’s more restricted domain of application, several interesting generalizations of it are possible. Smith and Corbett (2019) showed that the analytic representation of the joint outcome-time distribution in the model carries over to higher dimensions and can be used to model decisions in which the response regions are spheres or hyperspheres. They used the resulting representation to develop a four-dimensional hyperspherical diffusion model for visual search in four-item arrays. Smith (2019) showed that the circular diffusion model could also be connected in an analytically tractable way to Ashby and Townsend’s general recognition theory (Ashby & Townsend, 1986). General recognition theory is a theory of how people make decisions about stimuli composed of pairs of binaryvalued features. It may be viewed as a generalization of the bivariate signal detection theory model in which the variability of the representations on the two stimulus dimensions is correlated across trials rather than independent. Like signal detection theory, general recognition theory is a theory of response accuracy only, although several attempts have been made to extend it to RT. Smith (2019) showed that a circular diffusion form of general recognition theory could be developed by allowing the drift rates in the horizontal and vertical directions to be correlated across trials and by partitioning the response circle into discrete regions with decision bounds. Despite the added complexity of correlated across-trial variability in drift rates, he was able to obtain an analytic expression for the joint time-outcome density function of the resulting process. When there is correlated drift rate variability in the stimulus features, the model predicts asymmetries in the distributions of decision outcomes and decision times whose form depends on the sign of the correlation. To date, these effects have not been investigated in empirical data. Although continuous outcome decision tasks have been increasingly used in cognitive psychology for more than two decades, the study of RT in these tasks is comparatively new and we are still learning how best to implement them experimentally and how best to model them. The circular diffusion model and the SCDM both predict joint decision time-outcome distributions and so have the potential for very rich contact between theory and data. For this theoretical potential to be realized, however, large samples of data are needed in order to estimate the joint distributions. A further complication is that continuous stimulus spaces are typically not isotropic or homogeneous, but show effects of nameable stimulus categories or preferred directions on decision outcomes and RT (Bae et al., 2015; Persaud & Hemmer, 2016; Ratcliff, 2018; Smith et al., 2020; Hardman et al., 2017). These effects are obscured if data are aggregated across stimulus types to create a single distribution of decision outcomes and decision times, but become apparent if data are plotted as a function of the locations of the stimuli on the continuum. The evidence suggests that these kinds of categorical effects are an important source

An Introduction to the Diffusion Model of Decision-Making

95

of variability in performance and need to be represented in theoretical models. Stimulus bias (Smith et al., 2020) and response bias (Ratcliff, 2018) models of categorical effects are both plausible, and the question of which of them can provide the best account of performance remains an open empirical one. Investigation of these kinds of effects requires a characterization of performance as a function of the positions of stimuli and responses on the continuum, so again large samples are needed. One potential concern associated with the collection of RTs in continuous outcome tasks is that a substantial part of the variability in RT may be due to motor processes, because of the complexity of the associated movements. Although RT in continuous outcome tasks is measured from the time at which movement is initiated, and excludes any overt movement time, it might be argued that movement planning nevertheless contributes substantial variability to the distributions of RT. These concerns were at least partially allayed by Ratcliff’s study (Ratcliff, 2018), in which eye movement, mouse movement, and touch screen responses were compared. He found that performance was similar on all three tasks and that the parameters of the SCDM estimated from the three tasks were also similar. Kvam (2019) likewise obtained good fits of a version of the circular diffusion model to a color judgment task using mouse movements, in agreement with the results obtained by Smith et al. (2020) using eye movements, shown in Fig. 5. These results suggest that performance is not much changed by the kind of movement that is required and imply that most of the variability in RT is likely to be in the decision process itself. This in turn suggests that empirical data may be successfully modeled using a relatively simple model of nondecision time, as is the case with two-choice decisions.

10 Conclusion Recently, there has been a burgeoning of interest in the diffusion model and related models in psychology and in neuroscience. In psychology, this has come from the realization that the model can provide an account of the effects of stimulus information, response bias, and response caution (speed-accuracy tradeoff) on performance in simple decision tasks, and a way to characterize these components of processing quantitatively in populations and in individuals. In neuroscience, it has come from studies recording from single cells in structures of the oculomotor systems of awake behaving monkeys performing saccade-to-target decision tasks. Neural firing rates in these structures are well-characterized by assuming that they provide an online readout of the process of accumulating evidence to a response criterion (Hanes & Schall, 1996; Smith & Ratcliff, 2004). This interpretation has been supported by the finding that the parameters of a diffusion model estimated from monkeys’ RT distributions and choice probabilities can predict firing rates in the interval prior to the overt response (Ratcliff et al., 2003, 2007). These results linking behavioral and neural levels of analysis have been accompanied by

96

P. L. Smith and R. Ratcliff

theoretical analyses showing how diffusive evidence accumulation at the behavioral level can arise by aggregating the information carried in individual neurons across the cells in a population (Smith, 2010; Smith & McKenzie, 2011). There has also been recent interest in investigating alternative models that exhibit diffusive, or diffusion-like, model properties. Some of these investigations have been motivated by a quest for increased neural realism, and the resulting models have included features like racing evidence totals, decay, and mutual inhibition (Usher & McClelland, 2001). Although arguments have been made for the importance of such features in a model, and although these models have had some successes, none has yet been applied as systematically and as successfully to as wide a range of experimental tasks as has the standard diffusion model.

11 Suggestions for Further Reading Anyone wishing to properly understand the RT literature should begin with Luce’s classic monograph, Response Times (Luce, 1986). Although the field has developed rapidly in the years since it was published, it remains unsurpassed in the depth and breadth of its analysis. Ratcliff’s Psychological Review article (Ratcliff, 1978) is the fundamental reference for the diffusion model, while Ratcliff and Smith’s Psychological Review article (Ratcliff & Smith, 2004) provides a detailed empirical comparison of the diffusion model and other sequential-sampling models. Smith and Ratcliff’s Trends in Neuroscience article (Smith & Ratcliff, 2004) discusses the emerging link between psychological models of decision-making and neuroscience, while Ratcliff et al.’s Trends in Cognitive Science article (Ratcliff et al., 2016) reviews current issues.

12 Exercises Simulate a random walk with normally distributed increments in Matlab, R, Python, or some other software package. Use your simulation to obtain predicted RT distributions and choice probabilities for a range of different accumulation rates (means of the random variables, .Zi ). Use a small time step of, say, 0.001 s to ensure you obtain a good approximation to a diffusion process and simulate 5000 trials or more for each condition. In most experiments to which the diffusion model is applied, decisions are usually made in around a second or less, so try to pick parameters for your simulation that generate RT distributions on the range 0–1.5 s. 1. The drift rate, .ξ , and the infinitesimal standard deviation, s, of a diffusion process describe the change occurring in a unit time interval (e.g., during one second). If .ξrw and .srw denote, respectively, the mean and standard deviation of the distribution of increments, .Zi , to the random walk, what values must they be set

An Introduction to the Diffusion Model of Decision-Making

2.

3.

4.

5.

97

to in order to obtain a drift rate of .ξ = 0.2 and an infinitesimal standard deviation of .s = 0.1 in the diffusion process? (Hint: The increments to a random walk are independent and the means and variances of sums of independent random variables are both additive.) Verify that your simulation yields unimodal, positively skewed RT distributions like those in Fig. 1. What is the relationship between the distribution of correct responses and the distribution of errors? What does this imply about the relationship between the mean RTs for correct responses and errors? Obtain RT distributions for a range of different drift rates. Drift rates of .ξ = {0.4, 0.3, 0.2, 0.1} with a boundary separation .a = 0.1 are likely to be good choices with .s = 0.1. Calculate the 0.1, 0.3, 0.5, 0.7, and 0.9 quantiles of the distributions of RT for each drift rate. Construct a Q-Q (quantile-quantile) plot by plotting the quantiles of the RT distributions for each of the four drift conditions on the y-axis against the quantiles of the largest drift rate (e.g., .ξ = 0.4) condition on the x-axis. What does a plot of this kind tell you about the families of RT distributions predicted by a model? Compare the Q-Q plot from your simulation to the empirical Q-Q plots reported by Ratcliff and Smith (2010) in their Figure 20. What do you conclude about the relationship? Read Wagenmakers and Brown (2007). How does the relationship they identify between the mean and variance of empirical RT distributions follow from the properties of the model revealed in the Q-Q plot?

References Adam, K. C. S., Vogel, E. K., & Awh, E. (2017). Clear evidence for item limits in visual working memory. Cognitive Psychology, 97, 79–97. Ashby, F. G. (1983). A biased random walk model for two choice reaction time. Journal of Mathematical Psychology, 27, 277–297. Ashby, F. G., & Townsend, J. T. (1986). Varieties of perceptual independence. Psychological Review, 93, 154–179. Bae, G.-Y., Olkkonen, M., Allred, S. R., & Flombaum, J. I. (2015). Why some colors appear more memorable than others: A model combining categories and particulars in color working memory. Journal of Experimental Psychology: General, 144, 744–763. Bhattacharya, R. B., & Waymire, E. C. (1990). Stochastic processes with applications. New York: Wiley. Buonocore, A., Giorno, V., Nobile, A. G., & Ricciardi, L. (1990). On the two-boundary firstcrossing-time problem for diffusion processes. Journal of Applied Probability, 27, 102–114. Busemeyer, J., & Townsend, J. T. (1992). Fundamental derivations from decision field theory. Mathematical Social Sciences, 23, 255–282. Busemeyer, J., & Townsend, J. T. (1993). Decision field theory: A dynamic-cognitive approach to decision making in an uncertain environment. Psychological Review, 100, 432–459. Cox, D. R., & Miller, H. D. (1965). The theory of stochastic processes. London, UK: Chapman & Hall.

98

P. L. Smith and R. Ratcliff

Diederich, A., & Busemeyer, J. R. (2003). Simple matrix methods for analyzing diffusion models of choice probability, choice response time, and simple response time. Journal of Mathematical Psychology, 47, 304–322. Dutilh, G., Annis, J., Brown, S. D., Cassey, P., Evans, N. J., Grasman, R. P. P. P., et al. (2019). The quality of response time data inference: A blinded, collaborative assessment of the validity of cognitive models. Psychonomic Bulletin & Review, 26, 1051–1069. Feller, W. (1968). An introduction to probability theory and its applications (3rd ed.). New York: Wiley. Gardiner, C. W. (2004). Handbook of stochastic methods. (3rd ed.). Berlin: Springer. Hanes, D. P., & Schall, J. D. (1996). Neural control of voluntary movement initiation. Science, 274, 427–430. Hardman, K. O., Vergauwe, E., & Ricker, T. J. (2017). Categorical working memory representations are used in delayed estimation of continuous colors. Journal of Experimental Psychology: Human Perception and Performance, 43, 30–54. Kvam, P. D. (2019). Modeling accuracy, response time, and bias in continuous outcome orientation judgments. Journal of Experimental Psychology: Human Perception and Performance, 45, 301–318. Laming, D. R. J. (1968). Information theory of choice reaction time. New York: Wiley. Leite, F. P. & Ratcliff, R. (2011). What cognitive processes drive response biases? A diffusion model analysis. Judgment and Decision Making, 6, 651–687. Link, S. W. (1992). The wave theory of difference and similarity. Englewood Cliffs, NJ.: Erlbaum. Link, S. W., & Heath, R. A. (1975). A sequential theory of psychological discrimination. Psychometrika, 40, 77–105. Luce, R. D. (1986). Response times. New York: Oxford University Press. Ma, W. J., Husain, M., & Bays, P. M. (2014). Changing concepts of working memory. Nature Neuroscience, 17, 347–356. Matzke, D., & Wagenmakers, E.-J. (2009). Psychological interpretation of the ex-Gaussian and shifted Wald parameters: A diffusion model analysis. Psychonomic Bulletin & Review, 16, 798– 817. Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization. Computer Journal, 7, 308–313. Persaud, K., & Hemmer, P. (2016). The dynamics of fidelity over the time course of long-term memory. Cognitive Psychology, 88, 1–21. Prinzmetal, W., Amiri, H., Allen, K., & Edwards, T. (1998). Phenomenology of attention: I. Color, location, orientation, and spatial frequency. Journal of Experimental Psychology: Human Perception and Performance, 24, 261–282. Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59–108. Ratcliff, R. (2002). A diffusion model account of response time and accuracy in a brightness discrimination task: Fitting real data and failing to fit fake but plausible data. Psychonomic Bulletin & Review, 9, 278–291. Ratcliff, R. (2006). Modeling response signal and response time data. Cognitive Psychology, 53, 195–237. Ratcliff, R. (2013). Parameter variability and distributional assumptions in the diffusion model. Psychological Review, 120, 281–292. Ratcliff, R. (2018). Decision making on spatially continuous scales. Psychological Review, 125, 888–935. Ratcliff, R., & Childers, R. (2015). Individual differences and fitting methods for the two-choice diffusion model of decision making. Decision, 2, 237–279. Ratcliff, R., & McKoon, G. (2020). Decision making in numeracy tasks with spatially continuous scales. Cognitive Psychology, 116(Art. 101259), 1–21. Ratcliff, R., & Smith, P. L. (2004). A comparison of sequential-sampling models for two choice reaction time. Psychological Review, 111, 333–367.

An Introduction to the Diffusion Model of Decision-Making

99

Ratcliff, R., & Smith, P. L. (2010). Perceptual discrimination in static and dynamic noise: The temporal relationship between perceptual encoding and decision making. Journal of Experimental Psychology: General, 139, 70–94. Ratcliff, R., & Tuerlinckx, F. (2002). Estimating parameters of the diffusion model: Approaches to dealing with contaminant reaction times and parameter variability. Psychonomic Bulletin & Review, 9, 438–481. Ratcliff, R., Van Zandt, T., & McKoon, G. (1999). Connectionist and diffusion models of reaction time. Psychological Review, 106, 261–300. Ratcliff, R., Cherian, A., & Segraves, M. (2003). A comparison of macaque behavior and superior colliculus neuronal activity to predictions from models of simple two-choice decisions. Journal of Neurophysiology, 90, 1392–1407. Ratcliff, R., Thapar, A., & McKoon, G. (2003). A diffusion model analysis of the effects of aging on brightness discrimination. Perception & Psychophysics, 65, 523–535. Ratcliff, R., Thapar, A., & McKoon, G. (2004). A diffusion model analysis of the effects of aging on recognition memory. Journal of Memory and Language, 50, 408–424. Ratcliff, R., Hasegawa, Y., Hasegawa, R., Smith, P. L., & Segraves, M. (2007). A dual diffusion model for single cell recording data from the superior colliculus in a brightness discrimination task. Journal of Neurophysiology, 97, 1756–1797. Ratcliff, R., Smith, P. L., Brown, S. D., & McKoon, G. (2016). Diffusion decision model: Current issues and history. Trends in Cognitive Science, 20, 260–281. Ratcliff, R., Voskuilen, C., & McKoon, G. (2018). Internal and external sources of variability in perceptual decision processes. Psychological Review, 125, 33–46. Reed, A. V. (1976). List length and the time course of recognition in human memory. Memory and Cognition, 4, 16–30. Smith, P. L. (2000). Stochastic dynamic models of response time and accuracy: A foundational primer. Journal of Mathematical Psychology, 44, 408–463. Smith, P. L. (2010). From Poisson shot noise to the integrated Ornstein-Uhlenbeck process: Neurally-principled models of diffusive evidence accumulation in decision-making and response time. Journal of Mathematical Psychology, 54, 266–283. Smith, P. L. (2016). Diffusion theory of decision making in continuous report. Psychological Review, 123, 425–451. Smith, P. L. (2019). Linking the diffusion model and general recognition theory: Circular diffusion with bivariate-normally distributed drift rates. Journal of Mathematical Psychology, 91, 145– 168. Smith, P. L., & Corbett, E. A. (2019). Speeded multielement decision making as diffusion in a hypersphere: Theory and application to double-target detection. Psychonomic Bulletin & Review, 26, 127–162. Smith, P. L., & McKenzie, C. R. L. (2011). Diffusive information accumulation by minimal recurrent neural models of decision-making. Neural Computation, 23, 2000–2031. Smith, P. L. & Ratcliff, R. (2004). Psychology and neurobiology of simple decisions. Trends in Neurosciences, 27, 161–168. Smith, P. L., Saber, S., Corbett, E. A., & Lilburn, S. D. (2020). Modeling continuous outcome color decisions with the circular diffusion model: Metric and categorical properties. Psychological Review, 127, 562–590. Townsend, J. T., & Ashby, F. G. (1983). Stochastic modeling of elementary psychological processes. Cambridge, UK: Cambridge University Press. Tuerlinckx, F., Maris, E., Ratcliff, R., & De Boeck, P. (2001).A comparison of four methods for simulating the diffusion process. Behavior Research Methods, Instruments, & Computers, 33, 443–456. Turner, B. M., Sederberg, P. B., Brown, S. D., & Steyvers, M. (2013). A method for efficiently sampling from distributions with correlated dimensions. Psychological Methods, 18, 368–384. Usher, M., & McClelland, J. L. (2001). The time course of perceptual choice: The leaky, competing accumulator model. Psychological Review, 108, 550–592.

100

P. L. Smith and R. Ratcliff

van den Berg, R., Awh, E., & Ma, W. J. (2014). Factorial comparison of working memory models. Psychological Review, 121, 124–149. Vandekerckhove, J., & Tuerlinckx, F. (2008). Diffusion model analysis with MATLAB: A DMAT primer. Behavior Research Methods, 40, 61–72. Vandekerckhove, J., Tuerlinckx, F., & Lee, M. D. (2011). Hierarchical diffusion models for twochoice response times. Psychological Research, 16, 44–62. Vickers, D. (1979). Decision processes in visual perception. London, UK: Academic Press. Voss, A., & Voss, J. (2008). A fast numerical algorithm for the estimation of diffusion model parameters. Journal of Mathematical Psychology, 52, 1–9. Wagenmakers, E.-J., & Brown, S. (2007). On the linear relationship between the mean and standard deviation of a response time distribution. Psychological Review, 114, 830–841. Wald, A. (1947). Sequential analysis. New York: Wiley. Wiecki, T. V., Sofer, I., & Frank, M. J. (2013). HDDM: Hierarchical Bayesian estimation of the drift-diffusion model in Python. Frontiers in Neuroinformatics, 7(Art. 14), 1–10. Wiener, N. (1923). Differential space. Journal of Mathematical Physics, 2, 131–174. Wilken, P., & Ma, W. J. (2004). A detection theory account of change detection. Journal of Vision, 4, 1120–1135. Zhang, W., & Luck, S. J. (2008). Discrete fixed-resolution representations in visual working memory. Nature, 453, 233–235.

Discovering Cognitive Stages in M/EEG Data to Inform Cognitive Models Jelmer P. Borst and John R. Anderson

Abstract Computational cognitive models aim to simulate the cognitive processes humans go through when performing a particular task. In this chapter, we discuss a machine learning approach that can discover such cognitive processes in M/EEG data. The method uses a combination of multivariate pattern analysis (MVPA) and hidden semi-Markov models (HsMMs), to take both the spatial extent and the temporal duration of cognitive processes into account. In the first part of this chapter, we will introduce the HsMM-MVPA method and demonstrate its application to an associative recognition dataset. Next, we will use the results of the analysis to inform a high-level cognitive model developed in the ACT-R (adaptive control of thought – rational) architecture. Finally, we will discuss how the HsMM-MVPA method can be extended and how it can inform other modeling paradigms. Keywords ACT-R · M/EEG · Stage discovery · MVPA · HsMM

1 Introduction The goal of developing computational models is to better understand the functioning of the human mind. One type of cognitive models, symbolic process models, focuses on the sequence of cognitive processes the mind goes through when performing a particular task and how these processes are coordinated. Prime examples are models developed in the cognitive architecture ACT-R (Anderson, 2007; Anderson et al., 2004; Anderson & Lebiere, 1998; Borst & Anderson, 2015a). For example, an ACT-R model of driving describes all the perceptual actions, central cognitive decisions, and motor actions that a driver has to make to maintain a car on the road

J. P. Borst () University of Groningen, Groningen, The Netherlands e-mail: [email protected] J. R. Anderson Carnegie Mellon University, Pittsburgh, PA, USA © Springer Nature Switzerland AG 2024 B. U. Forstmann, B. M. Turner (eds.), An Introduction to Model-Based Cognitive Neuroscience, https://doi.org/10.1007/978-3-031-45271-0_5

101

102

J. P. Borst and J. R. Anderson

and navigate traffic (Salvucci, 2006). Or take solving equations like x + 4 = 12. An ACT-R model describes the steps necessary to solve this problem: attending the stimulus on the screen, encoding it, committing it to working memory, retrieving the necessary factual information from declarative memory (12 − 4 = 8), and finally issuing the motor response (e.g., Lebiere (1999), Stocco and Anderson (2008)). More complex models can then simulate what happens to these processes when one becomes proficient in such a task (e.g., Tenison and Anderson (2015)) or how people develop new solution strategies involving different sequences of cognitive processes when required (e.g., Anderson and Fincham (2014)). To develop such models, we need to know what cognitive processes people go through when performing a certain task. For a long time, this knowledge was mostly based on behavioral data (e.g., Donders (1868), Sternberg (1969)), sometimes aided by eye-tracking data (e.g., Salvucci and Anderson (2001)). However, behavioral data typically yield only a single datapoint per trial – a response time – which makes it difficult to distinguish between different sequences of cognitive processes. Since the start of the current century, researchers have therefore turned to neuroimaging data, in particular functional magnetic resonance imaging (fMRI), for additional guidance in developing cognitive models (e.g., Anderson (2007), Borst and Anderson (2015a, 2017), Just and Varma (2007)). Although the high spatial resolution and wholebrain coverage of fMRI ensure that all different components of a task are taken into account, its temporal resolution is severely limited. In the couple of seconds that it takes to solve a simple mathematical problem, we can typically only collect one or two brain scans, limiting the information on the temporal sequence of cognitive processes considerably. Recently, we have started to use EEG and MEG data for stage discovery, to profit from their high temporal resolution (Borst et al., 2013, 2016). According to two major theories on EEG, significant cognitive events result in EEG peaks that are added to the ongoing EEG oscillations (Makeig et al., 2002; Shah et al., 2004; Yeung et al., 2007). Thus, if one could identify these peaks, one would know when cognitive events happen. Unfortunately, it is difficult to identify these peaks in the ongoing oscillations. This problem is illustrated with synthetic data in Fig. 1, in which we added five such peaks (simulating cognitive events) to the ongoing oscillations of each simulated trial. The added peaks vary slightly in their temporal location, reflecting the variability in the length of cognitive processes between trials. The left column of Fig. 1 shows five random trials with the added peaks in orange and the resulting EEG data in blue. While the peaks affect the overall oscillations, it is impossible to identify them in the EEG data of single trials. As illustrated in the middle column of Fig. 1, one therefore normally averages across trials to generate event-related potentials (ERPs): the uncorrelated EEG signal will go to zero, and the activity due to the peaks remains. After about 100 trials, the ERP of the EEG data (blue line) looks close to the activity due to just the peaks (orange line) – which is the best possible solution using this method. However, this average activity really only succeeds in identifying peaks close to fixed time points, such as stimulus onset. In this example, we can see in the average signal the small dip corresponding to the first peak because it is close to stimulus onset.

Discovering Cognitive Stages in M/EEG Data to Inform Cognitive Models

103

Fig. 1 Illustration of ERPs and PRPs using synthetic data. Left column, random selection of trials: raw simulated EEG data of one channel (blue), cognitive events the raw data is based on (orange), and estimated peak locations using all channels (dashed green). Middle column: ERPs using the first 5, 20, 50, 100, and 200 trials (blue), ERPs of the underlying cognitive events (orange). Right column: PRPs using the first 5, 20, 50, 100, and 200 trials (blue), PRPs of the underlying cognitive events (orange)

However, because the variation in the timing of the peaks increases, the further one moves away from a fixed point; the ERP becomes less clear further away from stimulus onset. For example, while the dip due to the fourth peak is the largest peak in the raw data, it is not visible in the ERPs, as it occurs at a different moment on each trial. This is unfortunate because if one could measure these peaks on single trials, one would be able to identify the onset of cognitive processing stages – which is exactly the information we need to develop cognitive models. To solve this problem, we

104

J. P. Borst and J. R. Anderson

developed a powerful machine learning method that combines hidden semi-Markov models with multivariate pattern analysis (HsMM-MVPA analysis; Anderson et al., 2016). This method integrates all information present across all trials of all participants, to ultimately identify peaks on single trials. The vertical green lines in the left column of Fig. 1 show the results of such an analysis: while not always exactly correct, the locations of the peaks are generally well estimated (note that the analysis was done on 32 synthetic channels, only one channel is shown). Using these locations, one can line up the different trials by resampling the intervals between peaks and perform across-trial averaging. This results in “peak-related potentials” (PRPs) that precisely identify all peaks, as shown in the right column of Fig. 1. These PRPs provide a blueprint of a cognitive model: they indicate the start of each processing stage. In this chapter, we will explain this method and show how it can be used to develop a cognitive model. In Sect. 2, we will explain the HsMM-MVPA analysis and apply it to an associative recognition experiment. In Sect. 3, we will use the results of the analysis to develop an ACT-R model of this task. Finally, we will discuss how the method can be extended and how it can be used to inform other modeling paradigms.

2 Part 1: The Discovery of Processing Stages in M/EEG Data 2.1 The HsMM-MVPA Method The goal of the HsMM-MVPA analysis is to identify sequences of multivariate peaks (i.e., occurring across the scalp) in the ongoing EEG data. The method assumes that participants go through the same cognitive processes on each trial and therefore looks for peaks with similar topologies on each trial. At the same time, the duration of the processing stages is expected to differ between trials and participants. To model such a process, the method applies hidden semi-Markov models. A hidden Markov model describes a process that is always in one of a set of discrete states which cannot be observed directly. In the current analysis, the EEG data constitute the observations that are used to find these hidden states, which are assumed to reflect a cognitive processing stage. Because the duration of cognitive processes varies per stage and trial, we applied a hidden semi-Markov model, which allows states to have a variable duration. The onset of each HsMM state is defined by a multivariate peak in the EEG data across all channels – the MVPA part of the analysis – modeled with a 50-ms half-sine window function, here referred to as bump. These bumps are separated by flats, where the EEG signal due to cognitive processes is assumed to have zero mean amplitude. Figure 2 illustrates the application of a five-bump HsMM to EEG data. In the middle of the figure, an idealized EEG signal is shown, with five peaks on each trial. This results in six cognitive stages: each bump (top) indicates the onset of a

Discovering Cognitive Stages in M/EEG Data to Inform Cognitive Models

105

Fig. 2 HsMM-MVPA method. The method discovers similar bumps in all trials in a bottom-up manner (top). These bumps delineate cognitive processes, of which the duration across trials is modeled using gamma distributions (bottom). (Reprinted with permission from Berberyan et al. (2021))

new cognitive processing stage, while the first stage is initiated by trial onset. The last stage terminates with the response. The HsMM-MVPA analysis integrates the information across all trials of all participants to identify the five bumps that best account for all data. For example, the topology of the second bump is based on the second bump of all trials, indicated by the dotted red line (top). To describe the variable duration of cognitive stages across trials, gamma distributions with the shape parameter fixed to 2 were used. For instance, the duration of the fourth stage varies considerably in the figure (cf. Fig. 1), which is reflected by the wide gamma distribution at the bottom (light blue line). To identify the bumps and gamma distributions that maximize the likelihood of the data from all trials simultaneously, the method uses a standard expectation maximization algorithm for HsMMs (Yu, 2010). For further technical details, we refer to Anderson et al. (2016). To conduct an HsMM-MVPA analysis, several steps have to be carried out. The first step aims to identify the number of cognitive stages that participants go through when performing a task. To this end, HsMMs with increasing number of states are fit to the data and subsequently compared. However, because HsMMs with more states have more free parameters, they typically fit the data better. To avoid overfitting, we first apply leave-one-out cross-validation to each HsMM. That is, we estimate the HsMM with a given number of states on n − 1 subjects and then calculate the log-likelihood of the nth subject using the resulting bump and gamma parameters. We then compare the resulting log-likelihoods between HsMMs with increasing numbers of states and only select a model when it is better for a significant number of subjects compared to a model with fewer states to ensure that the identified stages generalize across subjects (Anderson & Fincham, 2014).

106

J. P. Borst and J. R. Anderson

In the first step of the analysis, we assumed that participants go through the same cognitive stages in the different conditions of an experiment and therefore fit a single model to determine the number of HsMM states and thereby the number of cognitive stages. That means that we are left with a single set of bump topologies and gamma parameters for all conditions. As the second step of the analysis, we inspect the resulting model and often see that one or more stages seem to vary in duration or bump topology between conditions (the analysis is sufficiently flexible to allow for that). To formally investigate whether there is indeed a difference between conditions, we then fit a model with – for example – different gamma parameters per condition for a particular stage. We compare the resulting model to the original model using the LOOCV procedure discussed above and select the winning model. Finally, we use the logic of the experiment in combination with the broader literature, the bump topologies, and the stage durations to interpret the meaning of the identified stages.

2.2 Discovering Cognitive Processing Stages in Associative Recognition To illustrate the HsMM-MVPA method, we will use it to analyze an associative recognition dataset collected by Borst et al. (2013), which was previously analyzed with several variants of the HsMM-MVPA method (Anderson et al., 2016; Borst & Anderson, 2015a; Portoles et al., 2018).1 In this experiment, 20 participants learned 32 word pairs in a study phase. In a subsequent test phase during which EEG data were collected, the participants were again presented with word pairs. Now, they were asked to judge whether these pairs were target pairs that they had studied or repaired foils – pairs consisting of the same words as the studied pairs but in new combinations. This means that participants not only had to remember the component words (item information) but also how they were combined (associative information).2 To affect a hypothesized associate retrieval stage, we manipulated the associative fan of the word pairs. Fan refers to the number of items in memory that a certain word is associated with and was operationalized by having words occur in only a single word pair (fan 1) or in two different word pairs (fan 2). Both words in a pair either had a fan of 1 or 2. From the literature, it is well known that fan 2 pairs result in more errors and longer RTs than fan 1 pairs (e.g., Anderson and Reder (1999), Schneider and Anderson (2012)). Using the HsMM-MVPA method, we will investigate in what cognitive stage(s) the RT effect originates.

1 The

dataset and analysis scripts can be found on www.jelmerborst.nl/models. the original experiment, there were also new foils, foils consisting of entirely new words. Because such stimuli result in a different sequence of cognitive stages, we disregard them for this chapter. 2 In

Discovering Cognitive Stages in M/EEG Data to Inform Cognitive Models

107

First, the EEG data need to be preprocessed (see Portoles et al. (2018) for details). After standard artifact removal and filtering (e.g., Luck (2005)), we down-sampled the data to 100 Hz to reduce computational load. Next, the data were segmented into trials, retaining the data between stimulus onset and response for each trial. The data of each trial were detrended. Incomplete trials due to artifact removal, incorrect trials, trials with RTs longer than three SDs from the mean per subject and condition, and trials with extreme amplitudes (exceeding ±80 μV) were removed. To further reduce the amount of data for the analysis, we performed a principal component analysis and retained the first ten components (explaining more than 95% of the variance in the data). Finally, these components were normalized. As the first step in the HsMM-MVPA analysis, we performed the LOOCV procedure to identify the optimal number of bumps and consequently the number of cognitive stages. To this end, we first fit HsMMs to the data of all subjects to identify good starting parameters (i.e., describing the bumps and the gammas) for the LOOCV procedure itself.3 To avoid local maxima – for example, finding a pretty good bump in the 1-bump HsMM, but not the best possible bump – we started by fitting an HsMM with as many bumps as fit in the data. Given that each bump constitutes 50 ms and the shortest trial in our dataset is 430 ms, we can fit at most 8 bumps. After fitting an 8-bump model, we proceeded to the 7-bump model. Here, we started with the parameters identified for the 8-bump model while leaving out one bump. We iteratively fit all models with 7 bumps and finally chose the model with the highest log-likelihood. We then proceeded to the 6-bump model, 5-bump model, etc. Afterwards, we used the identified parameters as starting parameters for the LOOCV method described above. Figure 3 shows the results. Panel (a) illustrates how additional bumps are placed when fitting HsMMs with increasing numbers of bumps. In the 1-bump HsMM, the most prominent bump is identified, and one by one, the weaker bumps are added. Here, especially in solutions with more than 6 bumps very weak bumps are introduced. To identify the HsMM that best accounts for the data and also generalizes across subjects, we use a sign-test, where we test the number of subjects that show an improved fit when we add more bumps (Anderson & Fincham, 2014). Figure 3b shows that while the average loglikelihood increases up to 6 bumps, only 10 out of 20 subjects fit better than with a 5-bump model – clearly not a significant number. Instead, we will choose the 5-bump model, as it is better for 18 out of 20 subjects than the 4-bump model and seems to generalize better than the 6-bump solution. Inspecting the resulting stage durations revealed that stage 4 varied in duration between conditions, even though a single gamma distribution was used across conditions (the flexibility of the gamma distribution allows for that). To see if we can account for the data better, we fit a new model with a different gamma distribution

3 The starting parameters influence the EM algorithm used to find the optimal model. An alternative method that we have regularly applied is to use 100 sets of random starting parameters and select the best result, but the method described here is more robust.

108

J. P. Borst and J. R. Anderson

Fig. 3 Identifying the number of bumps. Panel (a) shows the bump placement for models with increasing numbers of bumps. Panel (b) shows the loglikelihood of those models, with the number of subjects out of 20 that improve compared to a model with one bump less. The orange dot shows the loglikelihood of a model where stage 4 used separate gamma distributions per condition, for which 19/20 subjects improved from the basic model

Fig. 4 Final HsMM-MVPA model. Panel (a) shows how the cognitive stages add up to the reaction times for the four conditions, with the identified bump topologies at the border of the stages. Panel (b) shows the durations of the stages

for each condition in stage 4. This model fits better for 19 out of 20 subjects than the simpler model, making it preferred (Fig. 3b, orange dot). Moreover, there also seemed to be a small difference between conditions in the last stage of the model, but when allowing different gamma distributions per condition for this stage, it only improved the fit for 10 out of 20 subjects and was therefore disregarded. Figure 4 shows the final model. Panel (a) illustrates how the discovered stages add up to the total RT per condition, while panel (b) shows the duration of each stage more clearly. To interpret which cognitive processes occur in each stage, we previously looked at existing theories of associative recognition and known ERP components (Anderson et al., 2016; Borst & Anderson, 2015b), as well as at connectivity profiles in the stages (Portoles et al., 2018). The first three stages take about 300 ms in total and are highly similar between conditions. These stages were

Discovering Cognitive Stages in M/EEG Data to Inform Cognitive Models

109

interpreted as a pre-attention stage (before the stimuli are processed), an encoding stage where the words are read, and a familiarity stage in which the familiarity of the words is judged. Note that given that we are looking at target and repaired foil pairs, all words are familiar. Therefore, in the next stage, the word pair in memory that is closest to the stimuli on the screen is retrieved, which takes longer for fan 2 pairs. This is the associative retrieval stage that we were aiming to find. In the case of target pairs, the retrieved pair (e.g., METAL-SPARK) will be the same as the stimuli on the screen (METAL-SPARK). In the case of repaired foils, it will be a pair with one word matching the stimuli and one word being different (METALTREE). Figure 4b clearly shows how this retrieval stage caused the difference in RT between the four conditions: fan 2 pairs are retrieved slower than fan 1 pairs and foils slower than targets. In stage 5, the decision stage, the retrieved word pair is compared to the pair on the screen, and in the final stage, the corresponding response is issued. In the next section, we will use this interpretation to develop a cognitive model of this task.

3 Part 2: A Symbolic Process Model 3.1 The Cognitive Architecture ACT-R To explain how people carry out associative recognition in more detail, we developed a model in the ACT-R cognitive architecture (Anderson, 2007).4 A cognitive architecture is first and foremost a psychological theory: it explains, for instance, how our memory system works. Instead of being limited to a single psychological construct, however, architectures typically account for complete tasks, from perception to response execution. In addition, a cognitive architecture is implemented as a computer simulation, which can be used to create cognitive models of specific tasks (e.g., the Stroop task, associative recognition, driving a car). To evaluate ACT-R models, one can use them to predict reaction times, errors, and fMRI data, which can then be compared to human data (Borst et al., 2015; Borst & Anderson, 2015a, 2017). The ACT-R cognitive architecture consists of a set of independent cognitive modules that are coordinated by a central procedural module. There are modules for perception (visual and aural) and action (manual and vocal) and several central cognitive modules, of which declarative memory is the most important for the current task. The modules interact with the procedural module through buffers of limited size. The procedural module consists of rules that specify what cognitive action to take given the contents of the buffers. For instance, a rule might state if METAL is encoded in the visual buffer, then declarative memory should try to 4 Available

for download at www.jelmerborst.nl/models.

110

J. P. Borst and J. R. Anderson

retrieve the meaning of METAL. An ACT-R model consists of such rules and of knowledge in declarative memory (e.g., the meaning of the word “metal”). Thus, ACT-R itself can be seen as the fixed hardware – the architecture – of the mind, while the models function as software that runs on this hardware. In ACT-R, a new cognitive processing stage starts when the procedural module executes a rule. That is, if a known pattern is detected in the buffers, it will trigger a rule, which will alter the current cognitive processing in the various modules. After a while, this will result in a new pattern in the buffers, which might trigger the execution of another production rule. Even though this might sound like a highly mechanical way of describing cognitive processing, it has been linked to the biology of the brain. Based on fMRI data, the procedural module has been mapped onto the basal ganglia (e.g., Anderson (2007)). According to neural models of the basal gangliathalamus circuit, this circuit implements goal-directed cognitive processing (e.g., Hazy et al. (2007), Kriete et al. (2013), O’Reilly and Frank (2006), Redgrave et al. (1999), Stewart et al. (2012), Stocco (2017), Stocco et al. (2010)). Incoming connections from the cortex allow the basal ganglia to monitor cortical states of other areas, comparable to the buffers in ACT-R. The striatum tracks the similarity between these monitored states and a number of internal states that trigger particular actions. At any one time, such actions – ACT-R’s production rules – can be executed through the thalamus, which projects back to the cortex and can evoke a change in neural processing. Our assumption is that this causes a multivariate peak in the EEG signal, which will be detected as a bump by the HsMM-MVPA method. This assumption is supported by intracranial recordings, which show that some cognitive events produce short changes of amplitude in the basal ganglia, temporally followed by local modulations of EEG amplitude (Rektor et al., 2003, 2004). These studies used traditional ERP methods which, as discussed above, might blur cortical peaks and affect the measurement of the timing of peaks. Nevertheless, we believe that each discovered bump is linked to a production rule in the ACT-R model.

3.2 A Model of Associative Recognition Figure 5 shows the ACT-R model of associative recognition that we developed based on the stages that were discovered with the HsMM-MVPA method (see for earlier versions of this model, for example, Anderson et al. (2016), Anderson and Reder (1999), and Schneider and Anderson (2012)). Panel (a), top, shows the activity of five ACT-R modules for a target fan-1 trial. The bottom of panel (a) illustrates how this results in six stages for each condition, which closely match the stages discovered by the HsMM-MVPA analysis (cf. Fig. 5b to Fig. 4b).

Discovering Cognitive Stages in M/EEG Data to Inform Cognitive Models

111

Fig. 5 ACT-R model. Panel (a) shows the activity of five modules and how that results in cognitive stages adding up to the RT (bottom). Panel (b) shows the durations of these stages

First, it takes an estimated 60 ms for the stimuli to reach the cortex.5 This is followed by the execution of the first production rule, which starts the encoding process of the first word on the screen by the visual module. Production rules take a fixed 50 ms in ACT-R, giving a total duration of the first stage of 110 ms. Encoding the word takes 45 ms in this context (see Anderson et al. (2016)), which triggers the second rule, starting two processes: the second word is encoded by the visual module, and the meaning of the first word is retrieved from memory.6 The third rule starts the associative retrieval stage, in which the model retrieves the word pair from memory that is closest to the word pair encoded from the screen. This is the stage that causes a difference in reaction times between the four conditions and will be discussed in some detail below. The fourth rule stores the retrieved pair in ACT-R’s imaginal module which maintains the current problem state (Borst et al., 2010; Nijboer et al., 2016), similar to the focus of attention concept in current working memory theories (e.g., McElree (2001), Oberauer (2009)). Finally, the fifth rule compares this stored representation to the encoded word pair and issues a corresponding manual motor action. Such a recall-to-reject model has been shown to account for many different associative recognition datasets (Anderson & Reder, 1999; Malmberg, 2008; Rotello et al., 2000; Rotello & Heit, 2000; Schneider & Anderson, 2012). In the model, the only processing stage that varies by condition is stage 4, the associative retrieval stage, which matches the results of the HsMM-MVPA analysis (Fig. 4b). ACT-R assumes that information in the buffers can “spread activation” to information in declarative memory. In the case of a target pair, both encoded words will spread activation to the to-be-retrieved pair, resulting in the fastest retrievals. In the case of a repaired foil pair, only one of the encoded words can spread activation – 5 Note

that our timing estimates are slightly different than in the original model due to our reanalysis of the data. 6 This retrieval constitutes a familiarity process; if this were a slow retrieval due to unfamiliar words in the experimental context, this would trigger a production that directly proceeds to a response stage indicating a completely new foil, as described in Borst and Anderson (2015b) and Anderson et al. (2016).

112

J. P. Borst and J. R. Anderson

the repaired foil pair does not exist in memory – explaining the difference in retrieval time between targets and foils. However, the larger difference occurs between fan 1 and fan 2 pairs. ACT-R assumes that the strength of association between an item and a memory trace depends on the number of memory traces the item appears in; the less often, the more predictive it is, and the more activation it will spread. For example, if one has lived in Pittsburgh for a long time, the cue “Pittsburgh” will not activate any specific memories. On the other hand, when one has only visited the city once, “Pittsburgh” will probably activate specific memories of this one trip. Thus, fan 1 items are more predictive than fan 2 items for retrieving a word pair, therefore spread more activation, and thus cause the difference in retrieval time between the fan conditions. For mathematical details, please see Schneider and Anderson (2012) or Anderson et al. (2016).

4 General Discussion In Sect. 2 of this chapter, we explained how the HsMM-MVPA method can be used to discover cognitive stages in EEG data. The method was demonstrated on an associative recognition dataset, which allowed us to identify the crucial processing stage – associative retrieval – that caused the difference in reaction time between conditions. In Sect. 3, we briefly introduced the ACT-R cognitive architecture and linked its production rules to the bumps in the EEG signal that form the basis of the HsMM-MVPA method. Based on the discovered stages and this connection, we developed an ACT-R model of associative recognition. This model has the same cognitive stages that were discovered by the HsMM-MVPA method and shows highly similar condition effects. Altogether, this illustrates how one can design a cognitive model based on the results of the HsMM-MVPA method. The associative recognition task used in this chapter is a relatively simple example, in the sense that only one of the processing stages varied in duration between conditions. However, the HsMM-MVPA method can discover more complex patterns. For example, Zhang et al. (2017) analyzed a more involved associative recognition task and discovered that both a retrieval stage and a decision stage varied in length – but in opposite directions. This meant that although there was only a small overall effect on RT between some conditions, the underlying stages did vary considerably, which is a crucial information if one wants to develop a model. In a different task, the same authors investigated the insertion of additional cognitive stages in a task, thus comparing conditions with different numbers of stages, while some stages overlapped between conditions (Zhang et al., 2018b). One could also imagine more complex topologies, for example, extending the method to deal with recurring stages. In addition to different stage configurations, the method can also be applied to slightly different kinds of neural data. For instance, we have applied the method successfully to intracranial EEG data and MEG data, both in order to increase the spatial definitions of the stages (Anderson et al., 2018; Zhang et al., 2018a).

Discovering Cognitive Stages in M/EEG Data to Inform Cognitive Models

113

Finally, while we used the HsMM-MVPA method here to develop an ACT-R model, the results can naturally also be applied to other modeling paradigms. On the one hand, one could think of other models that make a strong link between cognitive processes and the basal ganglia, such as Nengo or Leabra (e.g., Eliasmith (2013), Kriete et al. (2013)). In those cases, one could again directly map HsMM stages on model processes. On the other hand, it can also be used to inform models that do not make any claims about neural processes. The resulting stages (e.g., Fig. 4a) basically provide a blueprint for any cognitive or mathematical model. For instance, when focusing on decision processes by using evidence accumulation models (e.g., Forstmann et al. (2008), van Maanen et al. (2011)), one could imagine constraining nondecision processes based on HsMM-MVPA results, to be able to better estimate the decision process itself (cf. Berberyan et al., 2021). In conclusion, we view the HsMM-MVPA method as a powerful way to divide overall reaction times into meaningful cognitive processing stages, which can be used to inform a wide variety of modeling approaches.

Exercises 1. While the high temporal resolution of EEG is typically given as an advantage as compared to, for example, fMRI, it does have its drawbacks. (a) Explain why the fourth peak in Fig. 1 disappears when doing a standard ERP analysis. (b) Would this also be a problem when measuring the same process with fMRI? 2. In the HsMM-MVPA analysis, bumps are supposed to be identical across different trials, while the stage lengths can vary. Explain the rationale behind this. 3. To determine the number of stages in the data, we use leave-one-out cross validation, combined with a sign test. This test evaluates whether the loglikelihood of a significant number of subjects improve when we add a stage. Alternatively, one could also simply take the highest log-likelihood, given that we already use an LOOCV procedure. What is the advantage of using the sign test, and what would be the advantage of using log-likelihood to determine the number of stages? 4. Imagine that participants solve a factorial like 4! = 4 × 3 × 2 × 1. What would the expected stage topology look like? 5. One limitation of using the HsMM-MVPA method with M/EEG is that we only measure the top levels of the cortex. What does this mean for the cognitive stages that we can discover? 6. The ACT-R model that we developed in Sect. 3 of this chapter follows the discovered stages very closely. One could almost say that this is “just a fit” of the discovered stages and does not explain the underlying process.

114

J. P. Borst and J. R. Anderson

(a) What speaks against such an interpretation? (b) How could one further evaluate this model? 7. Take your favorite modeling approach, and discuss how the results of an HsMMMVPA analysis could inform or help to evaluate such models.

Answers 1. (a) Because the fourth peak appears at a different moment in each trial, it will disappear when averaging across trials or result in a very broad peak with a very low amplitude. (b) No. This brief peak would add to the slow hemodynamic response on each trial, which is more or less the summation of all processes during the last several seconds. However, the exact timing of the peak would be lost. 2. The bumps signify the cognitive processes that participants go through when performing a task, which are assumed to be the same on each trial (at least within a condition). However, the length of these processes can vary on a trial-by-trial basis, which is why the analysis allows the stage duration to differ between trials. 3. Sign test: this focuses on generalization across subjects. When applying the sign test, we make sure that this is the best solution for the majority of the subjects. Log-likelihood: this focuses on the best account of all data points. If we would take the highest log-likelihood, we would choose the model that explains the EEG across all subjects best, which is not necessarily the best model for the majority of the subjects. 4. For solving n!: Visual encoding – n times a retrieval process for the result followed by an update of working memory – entering the response. 5. We can only discover cognitive processes that cause bumps in the top levels of the cortex. It is certainly possible that this does not hold for all cognitive processes, which would be missed by the current method. 6. (a) The mechanism that explains the length of the associative retrieval stage was not developed to account for the results of the HsMM-MVPA analysis. In addition, it is a single mechanism that accounts for the duration in four different conditions, suggesting it might be general. (b) The next step would be to use this model to predict data in a different experiment, for example, with higher-fan conditions, and test this in an experiment. 7. –

Discovering Cognitive Stages in M/EEG Data to Inform Cognitive Models

115

Further Reading • The current HsMM-MVPA method for EEG data was first introduced by Anderson and colleagues in 2016. This paper has the most detailed description of the method, including extensive appendices in which assumptions underneath the method are tested using synthetic data. • In 2018, we applied the same method to MEG data (Anderson et al., 2018) and intracranial EEG data (Zhang et al., 2018a), allowing for greater spatial precision and better interpretation of the underlying cognitive processes. • Chapter 1 of Anderson (2007) gives a very clear introduction to cognitive architectures and ACT-R. In case you do not have the book available, Anderson (2005) provides an introduction to ACT-R and its mapping on brain regions.

References Anderson, J. R. (2005). Human symbol manipulation within an integrated cognitive architecture. Cognitive Science, 29, 313–341. Anderson, J. R. (2007). How can the human mind occur in the physical universe? Oxford University Press. Anderson, J. R., & Fincham, J. M. (2014). Extending problem-solving procedures through reflection. Cognitive Psychology, 74, 1–34. https://doi.org/10.1016/j.cogpsych.2014.06.002 Anderson, J. R., & Lebiere, C. (1998). The atomic components of thought. Lawrence Erlbaum. Anderson, J. R., & Reder, L. M. (1999). The fan effect: New results and new theories. Journal of Experimental Psychology: General, 128(2), 186–197. https://doi.org/10.1037/00963445.128.2.186 Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Lebiere, C., & Qin, Y. (2004). An integrated theory of the mind. Psychological Review, 111(4), 1036–1060. https://doi.org/ 10.1037/0033-295X.111.4.1036 Anderson, J. R., Zhang, Q., Borst, J. P., & Walsh, M. M. (2016). The discovery of processing stages: Extension of Sternberg’s method. Psychological Review, 123(5), 481–509. Anderson, J. R., Borst, J. P., Fincham, J. M., Ghuman, A. S., Tenison, C., & Zhang, Q. (2018). The common time course of memory processes revealed. Psychological Science, 32, 1463–1474. https://doi.org/10.1177/0956797618774526 Berberyan, H., Van Maanen, L., Van Rijn, H., & Borst, J. P. (2021). EEG-based identification of evidence accumulation stages in decision making. Journal of Cognitive Neuroscience, 33(3), 510–527. Borst, J. P., & Anderson, J. R. (2015a). Using the cognitive architecture ACT-R in combination with fMRI data. In B. U. Forstmann & E.-J. Wagenmakers (Eds.), Model-based cognitive neuroscience. Springer. Borst, J. P., & Anderson, J. R. (2015b). The discovery of processing stages: Analyzing EEG data with hidden semi-Markov models. NeuroImage, 108, 60–73. Borst, J. P., & Anderson, J. R. (2017). A step-by-step tutorial on using the cognitive architecture ACT-R in combination with fMRI data. Journal of Mathematical Psychology, 76, 94–103. Borst, J. P., Taatgen, N. A., & Van Rijn, H. (2010). The problem state: A cognitive bottleneck in multitasking. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36(2), 363–382. https://doi.org/10.1037/a0018106

116

J. P. Borst and J. R. Anderson

Borst, J. P., Schneider, D. W., Walsh, M. M., & Anderson, J. R. (2013). Stages of processing in associative recognition: Evidence from behavior, electroencephalography, and classification. Journal of Cognitive Neuroscience, 25(12), 2151–2166. Borst, J. P., Nijboer, M., Taatgen, N. A., Van Rijn, H., & Anderson, J. R. (2015). Using data-driven model-brain mappings to constrain formal models of cognition. PLoS One, 10(3), e0119673. https://doi.org/10.1371/journal.pone.0119673 Borst, J. P., Ghuman, A. S., & Anderson, J. R. (2016). Tracking cognitive processing stages with MEG: A spatio-temporal model of associative recognition in the brain. NeuroImage, 141, 416– 430. https://doi.org/10.1016/j.neuroimage.2016.08.002 Donders, F. C. (1868). De snelheid van psychische processen (On the speed of mental processes). Eliasmith, C. (2013). How to build a brain: A neural architecture for biological cognition. Oxford University Press. Forstmann, B. U., Dutilh, G., Brown, S., Neumann, J., von Cramon, D. Y., Ridderinkhof, K. R., & Wagenmakers, E.-J. (2008). Striatum and pre-SMA facilitate decision-making under time pressure. Proceedings of the National Academy of Sciences of the United States of America, 105(45), 17538–17542. https://doi.org/10.1073/pnas.0805903105 Hazy, T. E., Frank, M. J., & O’Reilly, R. C. (2007). Towards an executive without a homunculus: Computational models of the prefrontal cortex/basal ganglia system. Philosophical Transactions of the Royal Society B: Biological Sciences, 362(1485), 1601–1613. Just, M. A., & Varma, S. (2007). The organization of thinking: What functional brain imaging reveals about the neuroarchitecture of complex cognition. Cognitive, Affective, & Behavioral Neuroscience, 7(3), 153–191. Kriete, T., Noelle, D. C., Cohen, J. D., & O’Reilly, R. C. (2013). Indirection and symbol-like processing in the prefrontal cortex and basal ganglia. Proceedings of the National Academy of Sciences of the United States of America, 110(41), 16390–16395. https://doi.org/10.1073/ pnas.1303547110 Lebiere, C. (1999). The dynamics of cognition: An ACT-R model of cognitive arithmetic. Kognitionswissenschaft, 8(1), 5–19. Luck, S. J. (2005). An introduction to the event-related potential technique. MIT Press. Makeig, S., Westerfield, M., Jung, T.-P., Enghoff, S., Townsend, J., Courchesne, E., & Sejnowski, T. J. (2002). Dynamic brain sources of visual evoked responses. Science, 295(5555), 690–694. Malmberg, K. J. (2008). Recognition memory: A review of the critical findings and an integrated theory for relating them. Cognitive Psychology, 57(4), 335–384. https://doi.org/10.1016/ j.cogpsych.2008.02.004 McElree, B. (2001). Working memory and focal attention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27(3), 817–835. Nijboer, M., Borst, J. P., Van Rijn, H., & Taatgen, N. A. (2016). Contrasting single and multicomponent working-memory systems in dual tasking. Cognitive Psychology, 86, 1–26. https:// doi.org/10.1016/j.cogpsych.2016.01.003 O’Reilly, R. C., & Frank, M. J. (2006). Making working memory work: A computational model of learning in the prefrontal cortex and basal ganglia. Neural Computation, 18(2), 283–328. Oberauer, K. (2009). Design for a working memory. In B. H. Ross (Ed.), Psychology of learning and motivation (Vol. 51, pp. 45–100). Academic Press. Portoles, O., Borst, J. P., & van Vugt, M. K. (2018). Characterizing synchrony patterns across cognitive task stages of associative recognition memory. European Journal of Neuroscience, 48(8), 2759–2769. Redgrave, P., Prescott, T. J., & Gurney, K. (1999). The basal ganglia: A vertebrate solution to the selection problem? Neuroscience, 89(4), 1009–1023. Rektor, I., Kaˇnovský, P., Bareš, M., Brázdil, M., Streitová, H., Klajblová, H., et al. (2003). A SEEG study of ERP in motor and premotor cortices and in the basal ganglia. Clinical Neurophysiology, 114(3), 463–471. https://doi.org/10.1016/S1388-2457(02)00388-7 Rektor, I., Bareš, M., Kaˇnovský, P., Brázdil, M., Klajblová, I., Streitová, H., et al. (2004). Cognitive potentials in the basal ganglia—Frontocortical circuits. An intracerebral recording study. Experimental Brain Research, 158(3), 289–301. https://doi.org/10.1007/s00221-004-1901-6

Discovering Cognitive Stages in M/EEG Data to Inform Cognitive Models

117

Rotello, C. M., & Heit, E. (2000). Associative recognition: A case of recall-to-reject processing. Memory and Cognition, 28(6), 907–922. https://doi.org/10.3758/BF03209339 Rotello, C. M., MacMillan, N. A., & Van Tassel, G. (2000). Recall-to-reject in recognition: Evidence from ROC curves. Journal of Memory and Language, 43, 67–88. https://doi.org/ 10.1006/jmla.1999.2701 Salvucci, D. D. (2006). Modeling driver behavior in a cognitive architecture. Human Factors, 48(2), 362–380. Salvucci, D. D., & Anderson, J. R. (2001). Automated eye-movement protocol analysis. HumanComputer Interaction, 16(1), 39–86. Schneider, D. W., & Anderson, J. R. (2012). Modeling fan effects on the time course of associative recognition. Cognitive Psychology, 64(3), 127–160. https://doi.org/10.1016/ j.cogpsych.2011.11.001 Shah, A. S., Bressler, S. L., Knuth, K. H., Ding, M., Mehta, A. D., Ulbert, I., & Schroeder, C. E. (2004). Neural dynamics and the fundamental mechanisms of event-related brain potentials. Cerebral Cortex, 14(5), 476–483. https://doi.org/10.1093/cercor/bhh009 Sternberg, S. (1969). The discovery of processing stages: Extensions of Donders’ method. Acta Psychologica, 30, 276–315. https://doi.org/10.1016/0001-6918(69)90055-9 Stewart, T. C., Bekolay, T., & Eliasmith, C. (2012). Learning to select actions with spiking neurons in the basal ganglia. Frontiers in Neuroscience, 6, 2. https://doi.org/10.3389/fnins.2012.00002 Stocco, A. (2017). A biologically plausible action selection system for cognitive architectures: Implications of basal ganglia anatomy for learning and decision-making models. Cognitive Science, 12(10), 366. Stocco, A., & Anderson, J. R. (2008). Endogenous control and task representation: An fMRI study in algebraic problem-solving. Journal of Cognitive Neuroscience, 20(7), 1300–1314. https:// doi.org/10.1162/jocn.2008.20089 Stocco, A., Lebiere, C., & Anderson, J. R. (2010). Conditional routing of information to the cortex: A model of the basal ganglia’s role in cognitive coordination. Psychological Review, 117(2), 541–574. https://doi.org/10.1037/a0019077 Tenison, C., & Anderson, J. R. (2015). Modeling the distinct phases of skill acquisition. Journal of Experimental Psychology-Learning Memory and Cognition, 42(5), 749–767. van Maanen, L., Brown, S. D., Eichele, T., Wagenmakers, E.-J., Ho, T., Serences, J., & Forstmann, B. U. (2011). Neural correlates of trial-to-trial fluctuations in response caution. The Journal of Neuroscience, 31(48), 17488–17495. https://doi.org/10.1523/jneurosci.2924-11.2011 Yeung, N., Bogacz, R., Holroyd, C. B., Nieuwenhuis, S., & Cohen, J. D. (2007). Theta phase resetting and the error-related negativity. Psychophysiology, 44(1), 39–49. https://doi.org/ 10.1111/j.1469-8986.2006.00482.x Yu, S. Z. (2010). Hidden semi-Markov models. Artificial Intelligence, 174, 215–243. https:// doi.org/10.1016/j.artint.2009.11.011 Zhang, Q., Walsh, M. M., & Anderson, J. R. (2017). The effects of probe similarity on retrieval and comparison processes in associative recognition. Journal of Cognitive Neuroscience, 29(2), 352–367. Zhang, Q., van Vugt, M., Borst, J. P., & Anderson, J. R. (2018a). Mapping working memory retrieval in space and in time: A combined electroencephalography and electrocorticography approach. NeuroImage, 174, 472–484. https://doi.org/10.1016/j.neuroimage.2018.03.039 Zhang, Q., Walsh, M. M., & Anderson, J. R. (2018b). The impact of inserting an additional mental process. Computational Brain & Behavior, 38(4), 1–14. https://doi.org/10.1007/s42113-0180002-8

Spiking, Salience, and Saccades: Using Cognitive Models to Bridge the Gap Between “How” and “Why” Gregory E. Cox, Thomas J. Palmeri, Gordon D. Logan, Philip L. Smith, and Jeffrey D. Schall

Abstract Cognitive models, which describe cognition in terms of processes and representations, are ideally suited to help build bridges between “how” cognition works at the level of individual neurons and “why” cognition occurs at the level of goal-directed whole-organism behavior. This chapter presents an illustrative example of such a model, Salience by Competitive and Recurrent Interaction (SCRI; Cox et al. Psychol Rev, 2022), a theory of how neurons in the Frontal Eye Fields (FEF) integrate localization and identification information over time to represent the relative salience of objects in visual search. SCRI is framed in cognitive terms but is able to explain the millisecond-by-millisecond spiking activity of individual FEF neurons. This enables SCRI to help identify differences between neurons in terms of the computational mechanisms they instantiate by means of accounting for their dynamics. Such neural data also provide valuable constraints on SCRI that illuminate the relative importance of different types of competitive and recurrent interactions. Simulated activity from SCRI, coupled with a Gated Accumulator Model (GAM) of FEF movement neurons, reproduces the details of response time distributions in visual search behavior. The chapter includes extensive discussion of the difficult choices and exciting prospects for developing joint neuro-cognitive models like SCRI, developments which are enabled by recent advances in dynamic cognitive models and neural recording technologies.

G. E. Cox () Department of Psychology, University at Albany, State University of New York, Albany, NY, USA T. J. Palmeri · G. D. Logan Department of Psychology, Vanderbilt University, Nashville, TN, USA e-mail: [email protected]; [email protected] P. L. Smith School of Psychological Sciences, University of Melbourne, Parkville, VIC, Australia e-mail: [email protected] J. D. Schall Department of Biology, York University, Toronto, ON, Canada e-mail: [email protected] © Springer Nature Switzerland AG 2024 B. U. Forstmann, B. M. Turner (eds.), An Introduction to Model-Based Cognitive Neuroscience, https://doi.org/10.1007/978-3-031-45271-0_6

119

120

G. E. Cox et al.

Keywords Visual search · Salience · Saccades · Spiking activity · Cognitive models

1 Introduction Explaining cognition is challenging. At least some of this challenge arises because it is not sufficient to explain how cognition occurs, but why. Consider an everyday example of visual search: You are standing in the produce section of an unfamiliar grocery store, trying to find the zucchini. Your visual environment contains objects that have some exogenous salience: the red of the radishes stands out from the background, as do some particularly vibrant carrots. Other objects may have some endogenous salience by virtue of their resemblance to the target of your search: the yellow squash has a similar shape as zucchini while the cucumbers have both a similar shape and similar color. Eventually—within perhaps a second or two— you make a saccadic eye movement toward one of these areas of the produce aisle, bringing the vegetables there into foveal vision. Success! Upon closer inspection, you have located the zucchini, despite the fact that they are embedded in an array of salient distractors. It is possible to explain how your behavior was produced in biophysical terms—the activity of the various neurons involved in responding to visual stimuli and translating those responses into muscle contractions. It is also possible to explain why you produced that behavior in functional terms— your goals (to find the zucchini) and constraints (to locate the zucchini using only visual information). Between how and why lies an explanation in terms of information—the relative salience of objects across the visual field, the degree to which those objects conform to an internal representation of a search target, and the accumulation of evidence in favor of making a saccade to those different objects. Such an explanation—in terms of representations and processes applied to those representations—is provided by computational cognitive models. In this chapter, we focus on the role of computational cognitive modeling in building a bridge between how and why. The distinction between “how” and “why,” and the bridge between the two served by cognitive models, aligns with the classic three “levels of description” posed by Marr (1982) (although it is also closely related to the “intentional stance” as described by Dennett, 1971). Marr suggested that any information processing system can be described at functional, algorithmic, and implementational levels. We identify the functional level with “why”; you looked at the zucchini because, given the information available in your visual environment, your goal was to identify and locate a search target. We identify the implementational level with “how”; your eyes moved due to a cascade of neural firing events culminating in the coordinated actions of a set of muscles. This places computational cognitive models comfortably between the other two levels, at the algorithmic level; you looked at the zucchini as the result of a set of processes that gave rise to a representation of the relative salience of objects across your visual

Spiking, Salience, and Saccades: Using Cognitive Models to Bridge the Gap. . .

121

field as well as processes that use that representation to select where to initiate a saccade. When these three levels of description are invoked, it is often to emphasize their separateness, in that different phenomena may be more naturally described at one level versus another. For example, “ideal observer” or “rational” models, posed at the functional level, characterize the performance of a system that could optimally accomplish a goal under a given set of constraints, thereby providing a framework for understanding how close to optimal a natural or artificial system might be (Anderson, 1990; Lieder & Griffiths, 2020). More generally, it is easy to appreciate that the same algorithm may be implemented on different hardware and different algorithms may ultimately serve the same goal, suggesting a degree of independence between levels of description. In this chapter, we take a different perspective and focus on the role that computational cognitive models can play in unifying levels of description. Computational cognitive models provide a common language for helping to understand a set of phenomena across all three levels simultaneously (Love, 2015). We use the example of a model called SCRI (pronounced “scry”), which stands for Salience by Competitive and Recurrent Interactions (Cox et al., 2022). SCRI bridges the gap between implementational (“how”) and functional (“why”) levels of description by jointly providing an account of the millisecond-bymillisecond firing rates of neurons involved in visual search as well as how activity from those neurons is used to drive goal-directed saccades. In this chapter, we focus on the challenges and choices involved in developing such a model. We also place SCRI within the broader context of how modeling helps us understand cognition and point out areas that are likely to benefit from similar approaches. The example of SCRI illustrates how accounting for neural data places important constraints on theories of cognition as well as how neural activity can be better understood by making an isomorphism between neural dynamics and the dynamics of information processing.

1.1 Dimensions of Constraint Cognitive models aim to explain observable behavioral states in terms of cognitive states which are not directly observed. Consider a simple version of a visual search task: A stimulus array is presented with just two objects, one target and one distractor. As with the zucchini example, the participant knows ahead of time what the target is. The task of the participant is to direct their gaze toward the target object in the array by making a saccade. At a coarse grain, the resulting behavior has just two states: The participant directs gaze toward the target (state 1) or toward the distractor (state 2). Across trials, the participant might sometimes fall into either of these states. A cognitive model could explain the likelihood with which a participant falls into state 1 or state 2 on the basis of differences in the representations of how “target-like” the two objects appear. Sometimes, the target will correctly have a higher “target strength” than the distractor, but sometimes the opposite might occur.

122

G. E. Cox et al.

The cognitive model could not go much deeper than this, because of the coarseness with which behavioral states are defined; there is simply not much for the model to explain. The more dimensions defining the behavioral state, the more strongly the data can constrain and inform a cognitive model, leading to more sophisticated models. For example, if the outcome is not measured in a binary way (gaze on target vs. distractor) but in terms of the pixel coordinates on the display corresponding to the center of the area falling on the participant’s fovea, the space of potential behavioral outcomes becomes quite large (even a monitor with low resolution by modern standards would yield a space with .1024×768 = 786432 states). A cognitive model applied to these data would then be forced to explain how far gaze fell from each option, potentially yielding greater insight into the nature of target and distractor representations and the degree to which they may interact with one another. Even so, such an expansion of the behavioral state space does not yield much insight into the nature of the processes by which target and distractor representations are constructed and interact. Characterizing behavior only in terms of gaze position reflects the final outcome of these processes, not how they arrived at that outcome. For that reason, response times (RT’s) have proven to be an especially informative way of expanding the observed behavioral state space to be explained by a model. Even if the final gaze position is defined in a binary manner (target vs. distractor), with both choice and response time, behavioral states consist of all possible combinations of gaze position (containing the target vs. distractor) and times needed to initiate the shift of gaze from fixation to that position. Although response time is a continuous quantity, in practice it is measured to limited resolution and only within a certain window of time after stimulus presentation. If response times are measured to the nearest millisecond up to 2 seconds after stimulus onset, this results in a state space with .2 × 2000 = 4000 states. Although this is fewer than would be obtained by using pixel coordinates, the expansion of the behavioral state space by including response time makes it possible to distinguish between models of the final gaze position which would otherwise make identical predictions (Townsend, 1972; van Zandt & Ratcliff, 1995). This, in turn, enables more fine-grained inferences regarding the nature of the cognitive processes that lead to those observed behavioral states. Response time is a dynamic quantity—it measures how long it took a set of processes to get from stimulus to response. The accumulated variability in process durations manifests in the shape of the response time distribution. The shape of this distribution can then be explained in terms of processing stages, rates, and interactions between processes which may operate in parallel or in series. Because the distribution across choice/RT reflects an amalgam of the variability in these various component processes and their interactions, there is still a limit to how well RT data can constrain model development. Just as choice reflects only the final outcome of a set of processes, RT reflects the cumulative variability in the dynamics of those processes which, in turn, may be consistent with many possible trajectories of internal states. Functional neurophysiology provides a glimpse of those trajectories. In so doing, neural data add new dimensions beyond those of the

Spiking, Salience, and Saccades: Using Cognitive Models to Bridge the Gap. . .

123

behavioral state space, informing and constraining model development to the extent that neural states can be related to cognitive states posited by models. A relationship between neural and cognitive states is often referred to as a “linking proposition” (Teller, 1984; Schall, 2004). The linking proposition makes it possible to identify the dynamic evolution of neural states prior to an overt behavior with the action of the cognitive processes and their interactions that produce that behavior. Trying to account for dynamics across the neural state space provides valuable constraints that make it possible to formulate and test more sophisticated models than could be developed from behavior alone, while simultaneously illuminating how cognitive processes are realized physiologically.

2 A Case Study: SCRI As an example of a computational cognitive model that bridges the gap between levels of description, we discuss our recent SCRI model (Cox et al., 2022) along with, for purposes of this book, the considerations that went into its development and its evaluation. SCRI was developed to understand how visually responsive neurons in the Frontal Eye Fields (FEF) give rise to a representation of the relative salience of objects across the visual field, which then serves as evidence regarding where a saccade should be directed. The formal computational components of SCRI were derived chiefly from a cognitive model of visual selection, the Competitive Interaction model (Smith & Sewell, 2013; Smith et al., 2015).

2.1 Phenomena to Be Explained SCRI aims to explain target selection during a simple form of visual search, similar to the examples above. The subjects are rhesus macaques with implanted electrodes that permit recording of single-unit neuron spiking activity in Frontal Eye Field (FEF). On each trial, a search array is displayed containing a single target object along with one or more distractor objects arrayed equidistant from each other and equidistant from a central fixation point. The targets and distractors can be defined based on different features; in the experiments to which we have applied SCRI, features have included form, color, and motion. In a form search, the targets and distractors are different shapes; for example, the target may be a rotated T and the distractors rotated L’s (Cohen et al., 2009). In a color search, the targets and distractors are different colors. In a motion search, the targets and distractors are random-dot kinematograms and differ in terms of whether the dots in the kinematogram move predominantly to the left or to the right (Sato et al., 2001). The feature values that designate a target are consistent within a session but vary randomly between blocks. At the beginning of each trial, the subject fixates a point

124

G. E. Cox et al.

in the center of the display and receives a juice reward for making a saccade to the location of the target object. While there are many factors that affect the difficulty of visual search, two that have been extensively studied are set size—the number of distractors in the display (Atkinson et al., 1969; Schneider & Shiffrin, 1977; Shiffrin & Schneider, 1977; Treisman & Gelade, 1980)—and the similarity between the search target and distractors (Duncan & Humphreys, 1989). In terms of behavior, increasing set size or increasing target-distractor similarity is associated with longer times to initiate a saccade to the target as well as a higher rate of errors (i.e., making a saccade to a distractor instead of the target). These manipulations also have distinct effects on individual neurons that are part of the systems that give rise to that behavior. We focus specifically on neurons in FEF. FEF is particularly important for visual search. FEF receives afferent connections from both dorsal and ventral stream visual areas, meaning that “what” and “where” information converge on FEF (Schall et al., 1995). FEF also contains visual neurons that have been found to be sensitive to the salience of objects in different parts of the visual field as well as movement neurons that appear to be causally related to the initiation of saccades to specific locations in the visual field (Hanes & Schall, 1996; Hanes et al., 1998; Hauser et al., 2018; Woodman et al., 2008). Movement neurons in FEF are associated with “movement fields” (analogous to receptive fields for sensory neurons) associated with the location of a saccade. During visual search, movement neurons initially have low activity at the onset of a search array; for some movement neurons, their activity increases until a threshold is reached at which point a saccade is initiated into the movement field associated with the neurons that reached threshold (Hanes & Schall, 1996). This pattern of activity is consistent with FEF movement neurons acting like evidence accumulators, an important theoretical construct in cognitive models of decision making (Brown & Heathcote, 2008; Busemeyer & Townsend, 1993; Link & Heath, 1975; Pike, 1973; Smith & Vickers, 1988; Smith & Van Zandt, 2000; Stone, 1960; Townsend & Ashby, 1983; Usher & McClelland, 2001; Ratcliff, 1978; Vickers, 1970). But what is the evidence that these FEF movement neurons accumulate? A likely possibility is that this evidence is derived from the activity of FEF visual neurons. Each FEF visual neuron has a receptive field (RF) that, in any given trial of visual search could contain a target, distractor, or nothing at all. FEF visual neurons have a characteristic three-phase response profile within each trial of visual search (Fig. 1). In the first phase, activity is relatively flat at a baseline level. In the second phase, activity increases if an object is present in its RF, regardless of whether that object is a target and distractor. In the third phase, if there is an object in the neuron’s RF, activity becomes higher if there is a target in its RF than if there is a distractor. The time at which a neuron transitions from the second to third phase is called “Target Selection Time” (TST, sometimes “Target Discrimination Time”; Thompson et al., 1996). Various aspects of FEF visual neuron activity are sensitive to set size and similarity: Increasing set size delays TST, reduces overall activity in phase 3, and

Spiking, Salience, and Saccades: Using Cognitive Models to Bridge the Gap. . .

125

B

A

60

L

Stimulus in RF

40

Target Distractor 20

L

L

L

Spike rate (sp/s)

L

L

T

L

Phase 1

0 0

Phase 2 100

Phase 3 200

300

Time from array (ms)

Fig. 1 (a) Example of a visual search array along with dashed circles depicting receptive fields (RF’s) of two Frontal Eye Fields (FEF) visual neurons. (b) Examples of the canonical three-phase response profile for FEF visual neurons, depending on whether the object in their RF is a target (“T” shape) or distractor (rotated “L” shape)

reduces the difference between target and distractor activity in phase 3 (Cohen et al., 2009). Increasing target-distractor similarity also delays TST, reduces target activity in phase 3, and reduces the difference between target and distractor activity in phase 3 (Sato et al., 2001; Sato & Schall, 2003). Although these differences in TST between conditions were correlated with differences in saccade RT, a causal link between FEF visual neuron activity and behavior was put to the test using a Gated Accumulator Model (GAM) of FEF movement neurons. Purcell et al. (2010, 2012) took spiking activity from FEF visual neurons recorded under manipulations of set size and target-distractor similarity and used it to create input signals to a network of accumulator units; each accumulator received input from FEF neurons with the same receptive fields. When an accumulator reached a threshold, the model made a saccade into the accumulator’s movement field. Over many simulated trials, the GAM model closely fit the observed distributions of saccade RT. In addition, the dynamics of the GAM accumulators mirrored the dynamics of recorded FEF movement neurons. Taken together, this provides evidence that FEF movement neurons perform a cognitive process of evidence accumulation and that this evidence is derived from the activity of FEF visual neurons. But what determines FEF visual neuron activity?

2.2 The Model The aim of SCRI was to explain the spiking activity of FEF visual neurons and how that activity is modulated by manipulations of set size and target-distractor similarity, and to do so in such a way that the activity predicted by SCRI could serve the same evidentiary role driving evidence accumulation (as in GAM) as the activity of recorded FEF visual neurons.

126

2.2.1

G. E. Cox et al.

Motivating Principles

Several core principles guided the development of SCRI, driven ultimately by our goals to understand neural dynamics in terms of cognitive dynamics, as well as to use neural dynamics as a source of constraints to differentiate between possible cognitive-level processes. Linking Proposition We hypothesized that the firing rate of an FEF visual neuron at a given time after the onset of the search array represented the relative salience of the object in that neuron’s RF at that time. This hypothesis constitutes a linking proposition (Schall, 2004; Teller, 1984) connecting cognitive states (representations of salience across different regions of the visual field) with neural states (momentary firing rates of individual FEF visual neurons). The proposition was driven by the observation that FEF visual neuron activity itself was sufficient to reproduce FEF movement neuron dynamics and saccade behavior (Purcell et al., 2010, 2012). This proposition is what makes it possible for SCRI to jointly describe the cognitive processes involved in generating a representation of salience in visual search as well as the moment-by-moment spiking activity of FEF visual neurons—neural and cognitive dynamics are hypothesized to be isomorphic to one another. In other words, the linking proposition makes it possible for SCRI to bridge levels of description by explaining the dynamics of FEF visual neurons in terms of the cognitive processes and representations implemented by those neurons, rather than solely in terms of neurophysiology. It is important to lay out this linking proposition explicitly because, logically speaking, a failure of SCRI to adequately fit neural data could be due to either a problem in SCRI itself or in the linking function connecting its cognitive states to neural states—or both. Recurrent Integration of “what” and “where” In order to guide saccades in visual search, it must be possible to know what kinds of objects are where. So from a functional perspective, the representation of salience maintained by FEF visual neurons must integrate information pertaining to localization and identification. From an anatomical perspective, FEF is ideally situated because, as noted above, it receives afferent connections that convey exactly these kinds of information. Moreover, FEF projects recurrently to those same dorsal and ventral visual areas. Therefore, SCRI should explicitly model these two kinds of information and how they jointly contribute to a representation of salience. In particular, SCRI must include a mechanism that enables localization and identification information to interact recurrently in a manner suggested by the physiology. Competition The effects of set size and target-distractor similarity on the third phase of activity of FEF visual neurons suggest that there is competition of some sort between FEF visual neurons with non-overlapping RF’s. The presence of additional distractors suppresses the activity of FEF visual neurons with the target in their RF. And even if the number of distractors is unchanged, making the distractors more similar to the target suppresses activity of FEF visual neurons with the target in their RF. While it is clear that SCRI will have to embody competition in some way,

Spiking, Salience, and Saccades: Using Cognitive Models to Bridge the Gap. . .

127

it is unclear a priori what forms this competition may take. For example, competition may arise from lateral inhibition between visual neurons themselves or between the neurons that excite FEF visual neurons. Competition may also arise as a function of feedforward inhibition among the inputs to FEF visual neurons. A Family of Models The potential ambiguity regarding the nature of competition and the possibility for recurrent interactions highlights another motivating principle: SCRI should be designed to be able to incorporate multiple potential mechanisms so they can be evaluated and compared in terms of their explanatory power. In this way, SCRI can be viewed as encompassing an entire “family” of potential models that lack some or all of the mechanisms present in the complete version of SCRI. This approach is valuable when developing a new model because qualitative features of the data are often insufficient to clearly distinguish between predictions from different plausible mechanisms. Comparison between members of the “SCRI family” also enables SCRI to go beyond providing a mere “proof of concept” that a set of mechanisms can reproduce a set of phenomena, to dig into which of those mechanisms is most important for explaining those phenomena. Dynamics Although this final principle may seem obvious, given how the phenomena have been described thus far, it is clear that whatever account SCRI offers must be dynamic. The multiple apparent phases of FEF visual neuron responses suggest that these dynamics will not necessarily be simple.

2.2.2

Conceptual Outline

The principles above led us to consider as our starting point the Competitive Interaction (CI) model of visual selection (Smith & Sewell, 2013; Smith et al., 2015). The CI model contains dynamic competitive mechanisms that are sufficiently rich to allow us to explore their consequences for explaining FEF visual neuron activity. Moreover, CI explicitly distinguishes localization (“where”) and identification (“what”) information and enables these types of information to interact recurrently. Although the final version of SCRI is, in several ways, simpler than the full CI model, the CI model provided an expressive framework within which to build the SCRI model. According to SCRI (Fig. 2), the appearance of an object in the RF of an FEF visual neuron results in transient excitation of that neuron. This transient excitation represents localization (“where”) information, essentially saying that there is “something” in the neuron’s receptive field. The transient localization excitation—which likely arises from areas like MT—causes the neuron to shift from phase 1 (baseline) to phase 2 (undifferentiated increase in firing). At the same time, greater activity of the FEF visual neuron “opens the gate” for identification information (from areas like V4 or temporal areas) to accrue about the object in the neuron’s RF. In this way, the recurrent gating between FEF and identification acts like attention; the more active an FEF visual neuron is, the faster it is possible to tell whether the object in its RF is a target or distractor. To the extent that the features of

128

G. E. Cox et al. Localization

Identification

z1

R = [0,1]

βz

x1

L

T

Salience

λz

λv

αz

m1

v1

αx

λm

βm

βv

x2

v2

m2 g

L

L

Response

z2

θ zi

xi

vi

g

mi

b

Time from array (ms)

Fig. 2 Diagram of the Salience by Competitive and Recurrent Interaction (SCRI) model of Frontal Eye Field (FEF) visual neurons, as well as how it interfaces with the Gated Accumulator Model (GAM) of FEF movement neurons. The example trial shows a form-based visual search for a target “T” shape among rotated “L”-shaped distractors. Panels at the bottom show examples of the dynamical quantities in the model. See main text for details

the object match those of the search target, the FEF visual neuron receives additional sustained excitation; this leads to the transition from phase 2 to phase 3, in which the neuron fires at a higher rate if a target is in its RF than if a distractor is in its RF. Within SCRI, there are many possible ways for competition to occur. Localization signals may compete with one another, such that what counts as excitation for one FEF visual neuron is inhibition for FEF visual neurons with non-overlapping RF’s; this is a form of feedforward inhibition. Identification signals may also send feedforward inhibition to FEF visual neurons, such that a strong target match in one location sends inhibitory signals to other locations. There can also be lateral inhibition among FEF visual neurons as well as lateral inhibition between the units responsible for identifying whether objects are targets or distractors. One of the purposes of SCRI was to help understand which of these many possible forms of competition is necessary to explain FEF visual neuron responses in visual search.

Spiking, Salience, and Saccades: Using Cognitive Models to Bridge the Gap. . .

2.2.3

129

Formal Description

The firing rate of an FEF visual neuron (.vi (t)) with RF centered on region i of the visual field at time t is described in terms of its time derivative: .

dvi = [1 − vi (t)] [b + xi (t) + zi (t)] dt ⎡ ⎤    xj (t) + αz zj (t) + βv σv,ij vj (t)⎦ − vi (t) ⎣λv + αx i/=j

i/=j

(1)

i/=j

Equation (1) takes the form of a so-called “shunting” equation (Grossberg, 1980). Shunting equations describe systems where the degree of excitation is limited by how far the system is from a saturation point, and where the degree of inhibition is limited by how far the system is from zero activity. As a result, shunting equations describe systems with activation bounded between zero and a saturation point. In Eq. (1), the saturation point is chosen to be 1, since this merely sets the scale for the rest of the parameters of the model. The first line of Eq. (1) represents the three sources of excitation for FEF visual neurons: a baseline level of tonic excitation (b), the transient excitation from the localization signal (.xi (t)), and the sustained excitation from the identification signal (.zi (t)). The second line of Eq. (1) represents four sources of inhibition: leakage (.λv ), feedforward inhibition from localization signals at other RF’s (.αx ), feedforward inhibition from identification signals at other RF’s (.αz ), and lateral inhibition between FEF visual neurons centered on different RF’s (.βv ). Note that SCRI allows for lateral inhibition to be spatially graded, represented by .σv,ij , such that there is stronger inhibition when RF’s i and j are closer than when they are farther apart. This spatial distribution is described below (see Eq. (4)). The transient localization signal (.xi (t)) for an FEF visual neuron centered on RF i at time t is xi (t) = χi × γ (t; s, r)

.

(2)

where .γ (t; s, r) is a Gamma probability density function with shape s and rate r evaluated at time t and .χi governs the total amount of transient excitation at RF i. We assume that .χi is the same for any RF that contains an object, regardless of whether it is a target or distractor, and that .χi = 0 for any RF’s that do not contain an object. Our choice of a Gamma density function follows Smith (1995) but is largely a choice of convenience since many other unimodal functions would work as well. The sustained identification signal (.zi (t)) for an FEF visual neuron centered on RF i at time t is described in terms of its time derivative

130

G. E. Cox et al.

⎡ ⎤  dzi . σz,ij zj (t)⎦ = [ηi − zi (t)] viR (t) × 𝚪 [t; (1 + κ)s, r] − zi (t) ⎣λz + βz dt j /=i

(3) Eq. (3), like Eq. (1), is a “shunting” equation. The saturation point is .ηi which represents the degree to which an object in RF i matches the target of the search; .ηi is largest for RF’s that contain a target, smaller for RF’s that contain a distractor, and zero for RF’s that do not contain any objects. The rate at which .zi (t) approaches .ηi is governed by two factors: the firing rate of the FEF visual neuron with the same receptive field (.vi (t)) and the availability of identification-relevant visual features (.𝚪 [t; (1 + κ)s, r]). .𝚪 is the cumulative distribution function of the .γ density that appeared in Eq. (2), with the same shape s and rate r; the parameter .κ allows for a delay in the availability of identification-relevant information relative to localization (if .κ = 0, there is no delay). Similarly, we treat the exponent .R as a way to turn on or off the recurrent multiplicative gating between FEF visual neurons and identification units. If .R = 1, then identification units grow toward saturation faster the more active the corresponding FEF visual neuron is because excitation is multiplied by .vi (t). If .R = 0, FEF visual neuron activity has no effect on the rate at which identification units approach saturation (because excitation is multiplied by .vi0 (t) = 1). Identification information can also decay due to leakage (.λz ) and is subject to lateral inhibition between different RF’s (.βz ) weighted by distance (.σz,ij ; see Eq. (5)). We note that while Eq. (1) describes the evolution over time of the firing rate of an individual neuron, we are agnostic as to whether the localization or identification signals arise from single neurons or from populations of neurons. As such, we refer to the entities described by Eqs. (2) and (3) as “units” rather than neurons. As noted above, lateral inhibition is allowed to have a spatial distribution. For simplicity, SCRI assumes that lateral inhibition falls off as a Gaussian function of the distance .dij between RF’s centered on locations i and j :  σv,ij = exp −

.

 σz,ij = exp −

dij2 2ρv2 dij2 2ρz2

 .

(4)

 (5)

The rate at which lateral inhibition diminishes as a function of distance is described by two parameters, .ρv for FEF visual neurons and .ρz for units representing identification information. Distances .dij are scaled such that a distance of 1 is equivalent to the radius of the search array.

Spiking, Salience, and Saccades: Using Cognitive Models to Bridge the Gap. . .

131

Table 1 An example of how neural spiking data may be represented in a form that can be modeled in a transparent fashion Monkey Q Q Q Q Q

Neuron ID 52 52 52 52 52

Trial ID 150 150 150 151 151

Condition Set size 8 Set size 8 Set size 8 Set size 4 Set size 4

Object in RF Target Target Target Distractor Distractor

Time from array onset (ms) 191 192 193 0 1

Spike 0 1 0 0 1

2.3 Applying the Model SCRI was fit to spiking activity recorded from 94 FEF visual neurons of macaques while they were engaged in a visual search task as described above. Fitting involved using gradient descent to find SCRI parameters that maximized the likelihood of the observed spiking activity. In addition to fitting the full version of SCRI, a large number of restricted versions were fit to the spiking activity from each neuron. In this chapter, we focus on the comparisons involving different combinations of inhibitory mechanisms as well as versions that did or did not allow for recurrent multiplicative gating between FEF visual neurons and their corresponding identification units (i.e., .R = 1 or 0).

2.3.1

Structure of the Data

Table 1 provides an example of how SCRI comes into contact with the spiking activity from FEF visual neurons. Each neuron is identified by the monkey and neuron ID and the neuron was recorded in many trials (identified by the trial ID). Neurons were recorded during manipulations either of set size or similarity, reflected in the “condition.” If set size was manipulated, each array contained either 2, 4, or 8 objects, one of which was the target while the rest were distractors. If similarity was manipulated, all arrays contained 8 objects with one target and 7 distractors which were either similar or dissimilar to the target. On some trials, the object in the neuron’s RF was the target but in others it was a distractor. (SCRI was not fit to data in which no stimulus was present in the neuron’s RF.) SCRI was fit to spiking activity between the onset of the search array (at time .t = 0) and the time at which the monkey made a saccade to the target. For example, in the table above, Monkey Q made a saccade to the target around 193 ms after the onset of the array in trial 150. SCRI was not fit to trials in which the monkey made an incorrect saccade. For each millisecond of time, the neuron either generated a spike (a “1” in the “spike” column) or not (a “0” in the “spike” column). It is worth noting—especially for those with a background primarily in cognitive/behavioral work rather than neurophysiology—that pre-processing is typically required to bring data into the format above, which is most suitable for modeling.

132

G. E. Cox et al.

Recordings from a single neuron typically yield a list of timecodes at which each recorded neuron produced a “spike.” These timecodes are typically reported to the nearest millisecond and then need to be aligned with timecodes representing the time at which different stimuli were presented. These can then be re-coded in terms of trial numbers, condition labels, and the type of object in the neuron’s RF.

2.3.2

Parameter Estimation

When fitting SCRI to FEF visual neuron i, we interpret the quantity .vi (t) as representing the probability that neuron i generates a spike during the next millisecond of time following time t. This quantity is essentially the rate/intensity of a time inhomogeneous Poisson process, integrated over a one millisecond window following time t (they are not strictly identical because a Poisson process could, in principle, generate multiple spikes during that window, whereas the data only indicate, and therefore the model can only predict, whether at least one spike occurred during that window; Smith and Van Zandt, 2000). The basic idea is to treat each millisecond time bin in each trial as a Bernoulli random event, with the model providing the probability that the event turns out 1 (spike) or 0 (no spike). For this purpose, it is useful to introduce the subscript j which indexes a particular trial recorded from neuron i. That way, .vi,j (t) represents SCRI’s predicted probability that neuron i generates a spike in the next millisecond after time t in trial j . By introducing the trial-level index j , we allow .vi,j (t) to differ depending on whether a target or distractor is in neuron i’s RF on trial j , as well as depending on the set size and target-distractor similarity condition in trial j . If we use .Xi,j (t) to denote whether or not neuron i generated a spike during the millisecond of time following time t on trial j , the likelihood of each observed millisecond time bin is Bernoulli:



1−Xi,j (t) X (t) P Xi,j (t) = vi,ji,j (t) 1 − vi,j (t)

.

(6)

Note that while Eq. (6) might suggest that each time bin is independent, in fact they are only conditionally independent, given the model. The dynamics of SCRI mean that .vi,j (t) already represents the temporal dependencies involved in time-varying spike rates. SCRI makes several simplifying assumptions when fitting a particular FEF visual neuron, each of which is revisited later. Model as many FEF visual neurons as there are possible stimulus locations: Although there are many neurons (visual and otherwise) in all of FEF, for simplicity, we only model eight neurons per session, with RF’s centered on each of the eight possible stimulus locations in the search arrays we used. On trials where no stimulus appears in the neuron’s RF, we assume that the neuron does not receive any localization- or identification-based excitation but

Spiking, Salience, and Saccades: Using Cognitive Models to Bridge the Gap. . .

133

otherwise participates in the same competitive and recurrent interactions with other neurons. Parameters are shared with other unrecorded neurons from the same session: Only a single FEF visual neuron is recorded in each session. However, the activity of that neuron is affected by competitive and recurrent interactions with other unrecorded neurons and neural populations, not just in FEF but throughout the visual system (e.g., localization and identification signals). To fit SCRI, we assume that the same parameters governing the recorded neuron in a session are shared with other unrecorded neurons from that same session. For example, the strength and dynamics of the localization signal are shared for all neurons with objects in their RF’s; the parameters reflecting the strength of leakage and inhibition are the same across all neurons in that session; etc. No spike history effects: We assume that spiking activity of FEF visual neurons is driven only by the latent dynamics described by SCRI. A neuron’s own recent spiking history is assumed not to affect its momentary spiking. This simplification is justified in this case because FEF visual neurons have low firing rates; and, as we shall see, this simplification does not impair the model’s ability to fit spiking activity. No stochastic variability in parameters: We assume that any variability between trials is due to inherent Poisson spiking noise, and that there is no variability in model parameters between trials except that which results from differences in the search array (e.g., whether an object in a RF is a target or distractor). This assumption means that we can consider all trials from a particular neuron in a particular condition as being realizations of the same underlying spike rate dynamics, which is modeled by SCRI. Parameters for SCRI were estimated using gradient descent to find the parameters that minimized the negative log-likelihood of the recorded spikes for each neuron. Because of the conditional independence of the likelihoods for each time bin (Eq. (6)), the negative log-likelihood for activity from a neuron (.NLLi ) is just the negative sum of the log-likelihoods for each recorded time bin: NLLi = −

.





Xi,j (t) log vi,j (t) + 1 − Xi,j (t) log 1 − vi,j (t) j

2.3.3

(7)

t

Model Comparison

As noted above, one of the important motivating principles of SCRI was that it should not just fit data but dig deeper to provide information about which competitive and recurrent mechanisms were most important for explaining FEF visual neuron dynamics. To achieve this function, we fit different versions of SCRI to each neuron, where each version was defined by fixing a subset of competitive/recurrent mechanisms to have parameters of zero, effectively taking those mechanisms out of the model, thereby forming a “family” of models embedded within the SCRI

134

G. E. Cox et al.

framework. For example, if .R = 0 in Eq. (3), the recurrent gating between FEF visual neurons and the identification signal is removed. Similarly, if .αx = 0 in Eq. (1), there is no localization-based feedforward inhibition and if .κ = 0 in Eq. (3) there is no added delay in the availability of identification-related features. In total, the presence/absence of four types of competitive interaction (localizationbased feedforward inhibition [.αx ], identification-based feedforward inhibition [.αz ], FEF visual neuron lateral inhibition [.βv ], identification lateral inhibition [.βz ]), recurrence (.R), and delay (.κ) would result in .26 = 64 possible SCRI variants. As described by Cox et al. (2022), more variants are possible that either assume a spatial gradient for each form of lateral inhibition (.ρx < ∞ and/or .ρz < ∞) or allow for different values of .μ depending on target-distractor similarity. Moreover, it is important to note that not all model parameters are uniquely identifiable depending on which conditions a neuron was recorded in. For example, without conditions that vary set size, it is not possible to uniquely identify the localization-based feedforward inhibition parameter .αx . This is because the strength of this inhibition would trade-off with the overall strength of the localization signal .χ ; only conditions that vary set size would allow one to disentangle the independent contributions of those two factors (because then some SCRI neurons would not receive any localization signal). Given that each variant of SCRI is fit to each neuron by minimizing negative loglikelihood (equivalently, by maximizing likelihood), several methods are available to compare variants. Perhaps the most well-known are the Akaike Information Criterion (AIC; Akaike, 1974) and the Bayesian Information Criterion (BIC; Schwarz, 1978). AIC and BIC represent different ways of trading off model fit— measured by negative log-likelihood—and model complexity—measured by the number of free parameters (i.e., how many were not fixed at zero). Specifically, for neuron i fit by SCRI variant m, the AIC and BIC can be written .

AI Cim = 2 × NLLim + 2 × pm.

(8)

BI Cim = 2 × NLLim + (log Ni ) × pm

(9)

where .NLLim is the negative log-likelihood of the observed spikes from neuron i given model m, .pm is the number of free parameters for model m, and .Ni is the number of time bins recorded from neuron i (across all conditions/trials). AIC focuses more on the predictive accuracy of the model and represents an approximation to leave-one-out cross-validation (Stone, 1977). Whereas AIC rewards a better fit as sample size .Ni increases, BIC counteracts this by imposing a stronger penalty for free parameters with more observations. For any given neuron i, it is straightforward to find the SCRI variant m that achieves the best AIC or BIC (where “best” is “lowest”). However, there are issues with aggregating AIC and BIC across neurons. One could simply add them up for each model variant m, but this would tend to give more weight to neurons with more observations. To adjust for this imbalance, AIC and BIC can be first converted

Spiking, Salience, and Saccades: Using Cognitive Models to Bridge the Gap. . .

135

into “weights” which are on a commensurate scale for all neurons (Wagenmakers & Farrell, 2004). These “weights” transform a set of AIC or BIC scores into a set of proportions. For example, AIC weights (wAI C) can be calculated like so:

 1 ΔAI Cim = exp − AI Cim − min AI Cim . m 2

(10)

ΔAI Cim wAI Cim =  m ΔAI Cim

(11)

.

where .minm AI Cim is the minimum AIC among all models m fit to neuron i. To find BIC weights, simply replace AI C with BI C in Eqs. (10) and (11).

2.4 Insights Armed with the full SCRI model as well as the ability to fit an entire family of SCRI variations that effectively excise different competitive and recurrent mechanisms, we are in a position to do two things: First, to verify whether a model like SCRI that is framed in terms only of information can nonetheless accurately fit the spiking activity of individual neurons. Second, to the extent that SCRI is capable of fitting spiking activity, we can use model comparison to draw insights about which competitive/recurrent mechanisms are most important for explaining how FEF visual neurons come to represent the relative salience of objects in a visual search display. The first task illustrates the value of a cognitive model for explaining why neural activity takes a particular form, in terms of the processes and representations a neuron instantiates. The second task illustrates how the constraints provided by fitting neural data enable inferences about representations and processes that would not be possible from behavioral data alone.

2.4.1

Quality of Fit

As illustrated in Fig. 3, SCRI is eminently capable of fitting spiking activity from FEF visual neurons during visual search. In the aggregate, SCRI captures the qualitative form of the canonical FEF visual neuron response as well as the qualitative effects of set size and target-distractor similarity. SCRI also fits at the level of individual FEF visual neurons, accounting for an average of 87% of the variance in spike density functions across 94 neurons. This illustrates that, at least in principle, a model built on cognitive principles can nonetheless explain at fine detail the spiking activity of individual neurons.

136

G. E. Cox et al.

Set size 2

Set size 4

Set size 8

T

T

T

L L

L

L

Set size 4

Hard

l

l

L

l

l

L L

Set size 2

Easy

D

Set size 8

l

l

l

ll

l

L

L C

B

L

L

A

l

l

l

l

l

Easy

l

Hard

Spike rate (sp/s)

80 75

60

50

40

25

20

0

0 0

100

200

300

0

100

200

300

0

100

200

300

0

100

Time from array (ms)

E

Set size 2

200

0

100

200

Time from array (ms)

Set size 4

F

Set size 8

Easy

Hard

80 200

60 40

100 20 0

0 0

G

100

200

300 0

Set size 2

100

200

300 0

Set size 4

100

200

0

300

100

H

Set size 8

200

300 0

100

Easy

200

300

Hard

Spike rate (sp/s)

80 100

60 40

50 20 0

0 0

I

100

200

300 0

Set size 2

100

200

300 0

Set size 4

100

200

0

300

J

Set size 8

100

200

300

0

Easy

100

200

300

Hard

40 75 30 50

20

25

10 0

0 0

100

200

300

0

100

200

300

0

100

200

300

0

Time from array (ms) Observed

Simulated

100

200

0

100

200

Time from array (ms) Stimulus in RF

Target

Distractor

Fig. 3 Fits of SCRI to FEF visual neuron spiking activity. Panels (a) and (b) give examples of the two types of manipulation recorded for different neurons: varying set size (a) and targetdistractor similarity (b). Panels (c) and (d) show the average predicted and observed spike density functions (Thompson et al., 1996) across all neurons recorded under each manipulation. Panels (e) through (j) show examples of predicted and observed spike density functions for individual neurons. According to the clustering in Fig. 4, the neurons shown in panels (e) and (f) are in Group 1, those in panels (g) and (h) are in Group 2, and those in panels (i) and (j) are in Group 3

2.4.2

Most Important SCRI Mechanisms Across Neurons

Summed across neurons, AIC and BIC largely agree on which SCRI mechanisms are most important to include (Table 2): Recurrence (.R), a delay in the onset of

Spiking, Salience, and Saccades: Using Cognitive Models to Bridge the Gap. . .

137

Table 2 Mechanisms included in the SCRI variant preferred by different model comparison metrics aggregated across neurons SCRI Mechanism Recurrence (.R) Delayed identification (.κ) Localization feedforward inhibition (.αx ) Identification feedforward inhibition (.αz ) FEF lateral inhibition (.βv ) FEF lateral inhibition spatial distribution (.ρv ) Identification lateral inhibition (.βz ) Identification lateral inhibition spatial distribution (.ρz )

Metric AIC . . . . . . – –

BIC

wAIC

wBIC

.

.

.

.

.

.

.

.

.

.

– – – –

– – – – – – –

– – –

Check mark indicates the mechanism was included, dash indicates it was not

identification-related features (.κ), and both forms of feedforward inhibition (.αv and αz ). Both AIC and BIC agree that lateral inhibition between FEF visual neurons (.βv ) should be included as well, but BIC’s stronger penalty for free parameters leads BIC to prefer a version of SCRI where there is no spatial gradient to FEF visual neuron lateral inhibition. As noted above, summed AIC/BIC tends to give more weight to neurons with more observations. Average AIC and BIC weights across neurons are somewhat more conservative. In terms of average AIC weight, the version of SCRI that is preferred no longer includes FEF visual neuron lateral inhibition, while average BIC weight leaves recurrence as the most important SCRI mechanism. From these model comparisons, we can draw conclusions about the relative importance of different SCRI mechanisms based on which are included in the model preferred according to each metric. In essence, the more metrics prefer including a mechanism, the more important it probably is for explaining FEF visual neuron dynamics. In order, recurrence is most important (4/4 metrics preferred), the two forms of feedforward inhibition and identification delay are next (3/4 metrics preferred), FEF visual neuron lateral inhibition follows (2/4 metrics preferred), and finally a spatial distribution of FEF visual neuron lateral inhibition (1/4 metrics preferred). To be clear, we make no claim that any of these metrics enable us to select a “true” or even “most probable” model (though this issue is taken up later). Rather, we view these metrics as indicating the relative explanatory value of each SCRI mechanism. Interpreted this way, SCRI suggests that the ability of FEF visual neurons to act as recurrent gates is essential for explaining their spiking activity. FEF visual neurons (or rather, enough of them; see below) perform an attention-like function by using increased firing rate to enhance “downstream” processing of objects within their receptive field. Nearly as important is feedforward inhibition from both the localization signal as well as identification signals. The importance of feedforward inhibition is interesting because this is the same form of inhibition that naturally leads to normalization (Smith et al., 2015), a “canonical neural computation” (Carandini & Heeger, 2012; Heeger, 1992; Reynolds & Heeger, 2009) that has

.

138

G. E. Cox et al.

1

2

3

Recurrent Identification delay

Mechanism

Detection feedforward inhibition Identification feedforward inhibition

Set size

FEF lateral inhibition

Similarity

FEF lateral inhibition spatial distribution Identification lateral inhibition Identification lateral inhibition spatial distribution

Neuron

Fig. 4 Clustering of FEF visual neurons based on which combination of SCRI mechanisms yielded the best AIC for each neuron. Each column corresponds to a single neuron, with a filled bar indicating that the mechanism was present in the AIC-preferred SCRI variant for that neuron and an empty space indicating that it was not. Open squares are mechanisms that were either not applicable or could not be uniquely identified on the basis of the conditions in which the neuron was recorded. Clustering was based on hierarchical agglomerative clustering using the proportion of overlapping SCRI mechanisms as the similarity measure between neurons

been identified in other sensory areas. In particular, normalization means that the representation of salience maintained by FEF visual neurons is sensitive to contrast and therefore represents a relative rather than absolute quantity.

2.4.3

Most Important SCRI Mechanisms for Individual Neurons

Since SCRI fits spiking activity from individual neurons, not just aggregates, it is possible to use AIC or BIC to select a preferred model variant for each of the 94 neurons in our dataset. As illustrated in Fig. 4, this can be useful for clustering individual neurons. While neurons can be clustered based on properties of their observed spiking activity (Lowe & Schall, 2018), a model like SCRI enables neurons to be clustered in terms of the inferred mechanisms that produce that activity. The division shown in Fig. 4, based on AIC, suggests three clusters of neurons. The first cluster contains very few neurons that act as recurrent gates (i.e., for most .R = 0), whereas the second cluster consists entirely of neurons that act as recurrent gates (i.e., for all of them, .R = 1). Finally, the third cluster consists of cells that exhibit a form of competition that was not preferred at the group level, namely, lateral inhibition between units that generate identification signals (.βz ). One thing to take away from this is that asking, “which mechanisms best explain FEF visual neurons as a whole?” is a different question than, “which mechanisms best explain each FEF visual neuron?” Predictions from a model like SCRI are not simple linear functions of the mechanisms it includes, so these two questions can, in principle, have very different answers. A second takeaway is that it may be possible

Spiking, Salience, and Saccades: Using Cognitive Models to Bridge the Gap. . .

139

to infer from model-based clustering that apparently similar spiking dynamics can be produced by different mechanisms, hinting at the potential diversity of cell types in FEF even among those ostensibly performing the same function (Lowe & Schall, 2018). The distinction between recurrent (group 2) and non-recurrent (group 1) cells may reflect an anatomical difference between cells that do or do not have recurrent connections to identification-related neural populations in, e.g., V4 or IT cortex. Meanwhile, the third cluster of cells may represent neurons that would be engaged in visual short term memory tasks and therefore make use of competition between higher-order representations (e.g., of the degree of target match) to adjudicate between objects that need to be maintained in working memory for an extended period (Dominey & Arbib, 1992; Mitchell & Zipser, 2003). Using SCRI to cluster neurons by mechanism suggests new avenues of research at the physiological level.

2.5 Closing the Loop So far, the chapter has focused on using SCRI to explain FEF visual neuron activity in cognitive terms. This achievement is the foundation for a bridge between functional (why) and implementational (how) levels of description. But to complete the bridge, it is necessary to embed SCRI in the complete causal chain from stimulus (visual search array) to salience (as represented by FEF visual neurons) to behavior (making a saccade). Doing so establishes not just that SCRI can explain dynamics of individual neurons, but that it can explain the role those neurons play in producing goal-directed visual search behavior. To enable SCRI to predict saccade behavior in addition to neural spiking, we made use of the Gated Accumulator Model (GAM) of FEF movement neurons by Purcell et al. (2010, 2012). According to GAM (shown along with SCRI in Fig. 2), each possible saccade target location—called a “movement field”—is associated with an accumulator. Accumulators compete with one another via lateral inhibition (parameter .βm ) and are “leaky” (parameter .λm ), just like FEF visual neurons and identification units in SCRI. Accumulators receive input from multiple FEF visual neurons in the form of spike trains convolved with a function representing a postsynaptic potential (Thompson et al., 1996). However, accumulators in GAM are only excited when this input is sufficiently large to exceed a “gate” (parameter g). This gate prevents normal baseline activity from eliciting a saccade. GAM accumulators continue to accumulate FEF visual neuron activity until one accumulator reaches a threshold (parameter .θ ) at which point a saccade is initiated into the movement field associated with that accumulator. When GAM was originally developed, the inputs to its accumulators came from the very FEF visual neurons to which SCRI was fit. These inputs enabled GAM to accurately reproduce distributions of saccade response times and errors. At the same time, the dynamics of GAM’s accumulators closely resembled the firing rate dynamics of FEF movement neurons. If SCRI successfully explains the role that FEF visual neurons play in generating the salience evidence that gets accumulated

140

G. E. Cox et al.

to make saccade decisions, then replacing the observed spike trains with spike trains simulated from SCRI should also enable GAM to fit the details of saccade behavior as well as FEF movement neuron dynamics. The structure of SCRI makes it simple to simulate spike trains: Because .vi (t) represents a probability of generating a spike in the next millisecond, we can treat spikes across time as a series of “coin-flips” with the probability of the coin coming up heads (i.e., a spike for neuron i in the millisecond following time t) equal to .vi (t). In the original version of GAM, accumulator inputs were averages of spike trains randomly sampled (with replacement) from the set of observed FEF visual neuron spike trains. To mimic this structure, we simulated a large number of spike trains from the SCRI fit to each neuron in each condition in which it was recorded. This way, the inputs to each GAM accumulator in each simulated trial were not a random sample from the set of recorded spike trains but a random sample from the set of simulated spike trains. For additional details, the reader is referred to Cox et al. (2022). As illustrated in Fig. 5, when SCRI is used to generate evidence that gets accumulated by GAM, the joint model is able to accurately reproduced saccade response time distributions. In addition, as reported by Cox et al. (2022), GAM accumulators using simulated SCRI input demonstrate the same correlations between their dynamics and response time as are observed in FEF movement neurons (and which GAM was able to reproduce using observed FEF visual neuron spike trains as input). This completes the bridge between neurons and behavior: SCRI explains not just neural activity, but how that activity is embedded in a system that produces goal-directed behavior.

3 Discussion This chapter has described SCRI as an example of how cognitive models can act as a bridge between different levels of description. SCRI explains the dynamics of FEF visual neurons as they act to represent the relative salience of different parts of a visual search array for the purpose of directing a saccade to a target object. These dynamics involve a transient localization signal which initially excites FEF visual neurons whose RF’s contain objects. In turn, recurrent gating enables FEF visual neurons to govern the rate at which information accrues about the identity of an object within their RF (i.e., how target-like the object is). Primarily via feedforward inhibition, but secondarily through lateral inhibition, FEF visual neurons compete with one another such that their asymptotic activity represents the relative salience of objects in their RF’s. These recurrent and competitive interactions reproduce the effects of set size and target-distractor similarity on FEF visual neuron activity. In addition, when treated as evidence to be accumulated by the GAM model of FEF movement neurons, SCRI’s salience representation successfully reproduces saccade response time distributions and movement neuron dynamics.

Spiking, Salience, and Saccades: Using Cognitive Models to Bridge the Gap. . .

Q

141

S

1.00 0.75 0.50 0.25

R2 = 0.99

Cumulative probability

0.00

R2 = 0.97

F

MC

1.00 0.75 0.50 0.25

R2 = 0.87

0.00

R2 = 0.99

L

MM

1.00 0.75 0.50 0.25 0.00

R2 = 0.98 200

300

400

500

R2 = 0.99 200

300

400

500

RT (ms) Easy

Hard

SS2

SS4

SS8

Fig. 5 Observed (points) and predicted (lines) cumulative distribution functions for correct saccade response times (RT) from the six monkeys from which FEF visual neuron recordings were made. Points depict the 10%, 30%, 50%, 70%, and 90% quantiles of the observed correct RT distributions for each monkey in each condition (“SS” = “Set size”). Smooth lines show RT’s simulated by GAM using simulated FEF visual neuron activity from SCRI as evidence, as depicted in Fig. 2

The success of SCRI is not merely in fitting data, although certainly this is no small feat. Indeed, previous models applied to FEF have focused on qualitative or average neural activity rather than explaining the diversity of activity across cells (Dominey & Arbib, 1992; Hamker, 2004, 2005; Heinzle et al., 2007; Mitchell & Zipser, 2003). Rather, by building a bridge between levels of description, SCRI contributes to both cognitive and neural theory. Without the constraints provided by neural data, it would not have been possible to ascertain the relative importance of different competitive and recurrent mechanisms. For example, feedforward inhibition enables a system described by “shunting” dynamics, like SCRI, to evolve to an asymptotic state that normalizes the total activity in the system. Since SCRI needed feedforward inhibition to account for neural spiking, this is

142

G. E. Cox et al.

evidence that FEF visual neurons form a normalized representation of salience across the visual field, a core component of many theories of attention (Carandini & Heeger, 2012; Heeger, 1992; Reynolds & Heeger, 2009; Smith et al., 2015). Conversely, by identifying individual neurons that are better explained by different combinations of mechanisms, SCRI provides a new lens through which to view the diversity of cell types in FEF and other cortical areas. The importance of recurrence and feedforward inhibition in SCRI each make distinct testable predictions about patterns of anatomical connectivity between regions (Cox et al., 2022). Because SCRI is framed in cognitive terms, it is possible to use neural data to make direct inferences about the functions a neuron performs. At the same time, it is possible to go the other direction and make direct inferences about neurophysiology on the basis of the information processing mechanisms identified with a particular neuron.

3.1 Turning Points As promised above, this part of the discussion focuses on alternative paths that could have been taken during the development of SCRI. The purpose of this discussion is both to de-mystify certain aspects of model development as well as to point out areas where future work might be directed by taking alternative paths.

3.1.1

Designing the Model

Perhaps the most daunting task facing a scientist when using a formal model to understand a set of phenomena is how to design the model in the first place. The design process is inherently iterative and idiosyncratic, but SCRI illustrates several ways a modeler can tackle the design phase without being overwhelmed with possibilities. A modeler must keep in mind the scientific objective of the model being designed. This objective defines how the model is meant to be interpreted and can help delineate the basic features the model must have. With SCRI, the aim was to explain the firing rates over time of FEF visual neurons during visual search in terms of constructs that had cognitive meaning. This implied some basic features the model would need: that it be dynamic and that it involve a time-varying quantity that can be directly related to firing rates. In addition, SCRI’s goal implied that the processes and representations embodied in the model would be interpreted in cognitive terms. Other objectives would lead to very different models. For example, a model with the objective of explaining FEF visual neurons in biophysical terms might embody processes and representations like membrane conductance and ion concentrations. Similarly, a model with the objective of describing FEF visual neuron spiking in statistical terms might not embody processes or representations of any kind; instead, a statistical model would describe higher-level properties of the data like means and covariances (e.g., a Gaussian process regression model might

Spiking, Salience, and Saccades: Using Cognitive Models to Bridge the Gap. . .

143

be used to describe and compare functions describing average firing rates over time in different conditions). A modeler must also keep in mind the important qualitative phenomena a model is meant to explain. This helps identify the kinds of mechanisms that may need to be included in the model. The scientific objective of SCRI meant that SCRI had to be dynamic, have cognitive-level interpretations, and produce an output that could be related to FEF visual neuron activity. However, the qualitative phenomena of set size and similarity effects in visual search suggested two other features that SCRI would likely need: competition and recurrence. Once these features were identified, it was possible to find a starting point in the form of the CI model (Smith & Sewell, 2013; Smith et al., 2015) because it met many of our design criteria. Once a starting point was found, it was a matter of iterating to find which model mechanisms seemed important for explaining the qualitative phenomena of interest. These early iterations did not involve doing any quantitative fits to data, but rather many simulations of different possible model variants to see whether they would even succeed in reproducing the qualitative effects of set size and similarity on FEF visual neurons. Finally, a modeler must acknowledge that there are many design decisions that are under-constrained. This point is already apparent in how scientific objectives and qualitative phenomena guide model development: There is unlikely to be “one true model” for any system, particularly those as complex as neural and cognitive systems. Sometimes, a decision can be made for the sake of practicality, such as the choice in SCRI of a gamma function for describing the initial transient localization signal. Other times, however, a decision can be made on the basis of model comparisons. In designing SCRI, it was clear that there were many possible loci of competition but there was no basis for deciding between them in purely qualitative terms. Instead, we reasoned that quantitative model comparisons could help decide the issue. Rather than a shortcoming, SCRI granted additional insight by virtue of encompassing—and allowing us to decide between—different possible combinations of competitive mechanisms. As we saw, this leads to insights about the computational functions served by these neurons during visual search.

3.1.2

Fitting and Comparing Models

We were able to infer something about the relative importance of SCRI’s competitive and recurrent mechanisms by fitting different versions of SCRI that constrained the parameters associated with different mechanisms to effectively turn those mechanisms “on” or “off.” As described above, because we were able to find parameters for SCRI that maximize likelihood (i.e., minimize negative loglikelihood), model comparison criteria like AIC and BIC were available for us to compare these different variants of SCRI. Another approach would penalize model complexity by using cross-validation (Geisser & Eddy, 1979), selecting a subset of the data for fitting and another for evaluating goodness-of-fit. As noted above, AIC is a large-sample approximation

144

G. E. Cox et al.

to leave-one-out cross-validation (Stone, 1977). Cross-validation is well-suited to cases where models are “overdetermined,” that is, where multiple sets of model parameters would fit the same data about equally well. Cross-validation is also an explicitly predictive model comparison metric, in that it compares models on their ability to “predict” new data (the left-out test data) from the same conditions to which the model was originally fit. As a consequence, it is important in crossvalidation to define the test data in a manner appropriate to the type of generalization desired of the model. For example, one could leave out a subset of trials from each condition for the test data; this would assess a model’s ability to generalize to new trials from the same neuron in the same condition. Alternatively, one could leave out the last 100 ms (or some other duration) of spiking activity on each trial for the test data; this would assess a model’s ability to predict future spiking from prior spiking. Finally, broader generalization can be assessed by, e.g., using a model fit to one neuron to predict spiking for another neuron or for the same neuron but recorded during a different task or set of conditions (Busemeyer & Wang, 2000). A third approach that is available for models with closed-form likelihood expressions like SCRI is Bayesian inference (Gelman et al., 2014; Kruschke, 2015). With Bayesian inference, instead of getting a single point estimate of the parameters that maximize the likelihood of the data conditioned on a model, one obtains a joint posterior distribution across all model parameters. Bayesian inference is particularly well-suited to hierarchical situations, such as when multiple neurons are recorded from the same subject; in that case, it would make sense to treat the parameters for individual neurons as samples from a “hyper-distribution” that differs between subjects (Cao, et al., 2021). Inference could be performed for each model variant separately, allowing them to be compared using Bayes factors or information criteria like the Deviance Information Criterion (DIC; Spiegelhalter et al., 2002) or Widely Applicable Information Criterion (WAIC; Watanabe, 2013). However, Bayesian inference also offers a different perspective on model comparison, at least for nested models like the variants of SCRI: Because a model variant is identified by which parameters are fixed (typically to zero), one could observe how much the posterior distribution for a parameter was within a small distance from zero. This would provide a continuous signal of how much support the mechanism receives, though it would also require a choice of threshold (how close to zero?). A downside to Bayesian inference is that approximating the joint posterior distribution requires drawing many samples, and each sample entails evaluating the model likelihood. In situations where evaluating that likelihood is difficult, Bayesian inference may not be feasible. In the end, we elected to fit multiple variants of SCRI using maximum likelihood and compare them using criteria like AIC. This choice was motivated by both practical and scientific concerns. From a practical standpoint, we found in initial work that Bayesian inference with SCRI was troublesome due to the need to solve many systems of differential equations to get the likelihood of the data across all neurons for each posterior sample. Moreover, because the scale of the parameters in SCRI is not always intuitive, it was hard to know how close to zero a parameter should be before it was considered evidence against the corresponding mechanism.

Spiking, Salience, and Saccades: Using Cognitive Models to Bridge the Gap. . .

145

Similarly, cross-validation would have required committing to a particular way of forming the test set, which in turn would entail a commitment to a particular type of generalization. While this might be appropriate for future work, it did not address the most pertinent questions when introducing SCRI: can it work at all, and if so, how? These questions are transparently addressed by comparisons of maximum likelihood fits, since these fits represent the best each variant could possibly do and comparisons focus on which mechanisms are needed to achieve that level of performance. Finally, we note that the availability of closed-form likelihood expressions is unusual for neural models. Typically, such models are simulation-based, at which point they are a challenge to fit. One challenge is to come up with a way to evaluate quality of fit besides likelihood. Another challenge is to efficiently explore the parameter space of the model in the absence of any gradient information. One approach to the second challenge is to simulate enough from a model to form a stable estimate of its goodness-of-fit and then apply standard parameter search routines (this was the approach taken by Purcell et al., 2010, 2012). Recently, population Monte Carlo methods have been shown to be useful for doing both optimization and Bayesian posterior inference with simulation-based models (Turner & Van Zandt, 2018). In fact, population Monte Carlo methods were used by Cox et al. (2022) to fit the GAM model to response time data, because GAM has no closed-form likelihood expression. The first challenge—how to evaluate quality of fit without likelihood— requires bespoke solutions depending on the model’s goals (Palestro et al., 2018). As such, we hesitate to offer any specific recommendations, but caution that evaluating goodness-of-fit is particularly difficult for sparse data and for data that consist of functions or trajectories; neural spike trains fall into both of these categories.

3.1.3

History Effects in Neural Spiking

In developing SCRI, we made many simplifying assumptions. One was that the probability of a neuron generating a spike was dependent only on the neuron’s momentary firing rate and not on the neuron’s previous history of firing. Although this assumption can be sustained in the case of FEF visual neurons because of their relatively low firing rates, it is not always justifiable. There are, however, well-developed techniques for incorporating history into models of neural spiking (Truccolo et al., 2005). The basic idea is that whether a neuron generates a spike during a particular window of time is a function of three things: the latent firing rate (e.g., as predicted by SCRI), a refractory factor that inhibits spiking when a spike was made recently, and a self-excitatory factor that encourages more spiking if the neuron has been spiking a lot recently. The relative contributions of these factors need to be estimated from data and can differ between neurons (Weber & Pillow, 2017). Spike history effects are especially important to account for when modeling neurons with high firing rates and/or neurons where spontaneous activity—rather than input-driven activity, like with SCRI—is critical to the model.

146

3.1.4

G. E. Cox et al.

Parameters for Unrecorded Neurons

Although SCRI was fit to activity from a single FEF visual neuron at a time, that neuron’s activity is assumed to be driven by interactions with other similar FEF neurons, as well as units that produce localization and identification signals. These other neurons or neural populations were not recorded simultaneously with the FEF visual neuron, so how should one account for this in a model. With SCRI, we assumed that the same parameters that governed the dynamics of the recorded neuron also governed the dynamics of the other unrecorded neurons operating at the same time. As a result, it is best to think of SCRI parameters as describing not just a single neuron, but rather a population of FEF visual neurons operating concurrently on a trial, of which we only recorded a single neuron. In principle, it would be possible to elaborate on this population approach to model unrecorded—but still causally related—neurons. For example, in a hierarchical model (see above), one estimates parameters simultaneously for each neuron as well as the “hyper-distribution” describing the entire population of neurons. One could then model unseen neurons by sampling from this hyper-distribution. As attractive as this idea is, it still has a problem rooted in the underlying data: The neurons that are recorded are not random samples but are selected by hand. As a result, it is not clear that this more sophisticated—if computationally intensive— approach would be preferable to the simplified route taken by SCRI. Nonetheless, as technology evolves, it will become easier and easier to record the joint activity of large representative samples of neurons, hopefully obviating the need for these assumptions.

3.1.5

Joint Modeling

Conceptually, SCRI and GAM are a closed loop, jointly explaining saccade performance in visual search: SCRI explains how FEF visual neurons generate salience evidence and GAM explains how FEF movement neurons accumulate that evidence to decide where to make a saccade. In practice, however, the two models are treated separately, with SCRI being fit to FEF visual neuron activity recorded from a monkey during visual search and GAM being fit to the saccade response times produced by that monkey during those same visual search sessions. To more tightly link the two models, what would be ideal is to have simultaneous recordings from many FEF visual and movement neurons in the same trial, such that it would be possible to couple a specific search array stimulus, FEF visual neuron activity, FEF movement neuron activity, and behavior all within the same trial. As noted above, technological advances make it likely that we will be in a position to get this kind of data in the near future. This is an exciting possibility because it will yield insights into, among other things, the sources of errors in visual search (Heitz et al., 2010; Thompson et al., 2005; Trageser et al., 2008) as well as the possibility of interactions between visual and movement neurons.

Spiking, Salience, and Saccades: Using Cognitive Models to Bridge the Gap. . .

147

3.2 Prospects Theories of cognition have historically developed by seeking out stronger and stronger constraints. Over the last several decades, response time has provided such a constraint, leading to the development of theories framed in terms of the dynamics of ongoing and interacting cognitive processes, not just in vision (Smith & Sewell, 2013), but in categorization (Lamberts, 2000; Nosofsky & Palmeri, 1997) and memory (Brockdorff & Lamberts, 2000; Cox & Criss, 2020; Cox & Shiffrin, 2017) as well. Just like SCRI was able to take a dynamic cognitive model (Smith & Sewell, 2013; Smith et al., 2015) as a starting point for a model that could explain neural activity in cognitive terms, these models may spark similar developments in other domains. Neural spiking activity lets us look “under the hood” to see how the representations and processes posited by cognitive models play out at the neural level. Just like response times can tell apart theories that would be indistinguishable from choice behavior alone, neural data enabled us to distinguish between different competitive and recurrent interactions in SCRI. While it is clear that neural data can be a boon to cognitive theory, it is hoped that the example of SCRI also illustrates the value of cognitive models for understanding neurons. This mutual synergy makes model-based cognitive neuroscience so exciting, powerful, and intimidating. By laying bare the details and thought processes involved in developing a neurocognitive model like SCRI, the reader may be better equipped to start building bridges of their own.

A Exercises 1. In SCRI, FEF visual neurons act as recurrent multiplicative gates on the rate at which identification units accrue information about the target/distractor status of an object within their receptive field. Consider a location-based priming paradigm in which a FEF visual neuron with a particular receptive field has higher baseline activity at the start of a trial. What are the consequences of this higher initial activity for the eventual state across FEF visual neurons if the object that appears in its RF is a target? What about if it is a distractor? Compare and contrast these predictions with those in which FEF visual neurons directly excite identification units. 2. There is a close formal relationship between normalization and the asymptotic state of a system, like SCRI, described by shunting equations with feedforward inhibition (Smith et al., 2015). Normalization is a key component of Bayesian inference, since the products of likelihoods and priors must be normalized to sum to one to form a posterior. Rao (2004) proposed a neural implementation of Bayesian inference that involved, in principle, both feedforward and lateral inhibition. Compare and contrast these two approaches. We suggest focusing on differences in the dynamics of the two systems, as well as differences in how the weights on inhibitory connections must be specified and/or learned.

148

G. E. Cox et al.

3. Consider how to apply SCRI to forms of visual search in which participants make a sequence of eye movements rather than just one. Your considerations should touch on all three of Marr’s levels. From the functional level, what principles might guide how multiple eye movements should be directed? From the implementational level, what additional neural mechanisms would need to be invoked? From the algorithmic level, what sorts of processes and representations might be necessary to maintain and accrue information across eye movements? 4. The form of visual search described in this chapter involves knowing what the target is prior to the onset of the visual search array on each trial. In popout search, the target is defined only by being different from the distractors and is not known a priori. FEF visual neuron activity is similar between these two forms of search. Would it be possible to explain the difference in these forms of search in terms of identification unit activity? If so, how would identification work in popout search? What kinds of neural data might help constrain a model of popout? 5. In the chapter, we argued that, like behavioral response times, neural spiking data places valuable constraints on theories of cognition. Consider how other methods of functional neurophysiology (e.g., EEG, MEG, fMRI) provide similar constraints on cognitive theories. How do these constraints depend on the temporal and spatial resolution (and signal-to-noise ratio) of the resulting data? How do these constraints depend on the form of the cognitive theory, specifically the extent to which it specifies the dynamics of the processes involved in transforming internal representations into actions?

B Recommended Reading • Cox et al. (2022) offer a more extensive exposition and exploration of SCRI, particularly with regard to its implications for neurophysiology. • Purcell et al. (2010) illustrate how the GAM model of FEF movement neurons was used to predict saccade response time distributions in many of the same visual search conditions to which SCRI was later applied. As with SCRI, GAM was used as a diagnostic tool to discover which mechanisms FEF movement neurons would need to implement in order to account for these behavioral data. • Schall (2004) provides a broad overview of the philosophy of connecting cognitive and neural dynamics with a focus on saccade decisions in visual search. This paper argues for the importance of “linking propositions” for building bridges between levels of description. • Smith et al. (2015) provide an overview of the Competitive Interaction (CI) model which served as the basis for much of SCRI. In addition, this paper lays out the formal relationship between normalization and the asymptotic activity of a system of shunting equations with feedforward inhibition. • Le Guin (1974) is strongly recommended for students young and old, as well as all others living in a society.

Spiking, Salience, and Saccades: Using Cognitive Models to Bridge the Gap. . .

149

References Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723. https://arxiv.org/abs/arXiv:1011.1669v3. https://doi.org/ 10.1109/TAC.1974.1100705. Anderson, J. R. (1990). The adaptive character of Thought. The adaptive character of thought. Hillsdale, NJ: Lawrence Erlbaum Associates. Atkinson, R. C., Holmgren, J. E., & Juola, J. F. (1969). Processing time as influenced by the number of elements in a visual display. Perception & Psychophysics, 6(6), 321–326. https://doi.org/10. 3758/BF03212784. Brockdorff, N., & Lamberts, K. (2000). A feature-sampling account of the time course of oldnew recognition judgments. Journal of Experimental Psychology: Learning, Memory, and Cognition 26(1), 77–102. Brown, S., & Heathcote, A. (2008). The simplest complete model of choice response time: Linear ballistic accumulation. Cognitive Psychology, 57, 153–178. Busemeyer, J.R., & Townsend, J.T. (1993). Decision field theory: A dynamic-cognitive approach to decision making in an uncertain environment. Psychological Review, 100(3), 432–459. https:// arxiv.org/abs/arXiv:1011.1669v3. https://doi.org/10.1037/0033-295X.100.3.432. Busemeyer, J. R., & Wang, Y. (2000). Model comparisons and model selections based on generalization criterion methodology. Journal of Mathematical Psychology, 44, 171–189. Cao, R., Bladon, J. H., Charczynski, S. J., Hasselmo, M. E., & Howard, M. W. (2021). Internally generated time in the rodent hippocampus is logarithmically compressed. bioRxiv. https://www.biorxiv.org/content/early/2021/10/26/2021.10.25.465750. https://arxiv. org/abs/https://www.biorxiv.org/content/early/2021/10/26/2021.10.25.465750.full.pdf. https:// doi.org/10.1101/2021.10.25.465750. Carandini, M., & Heeger, D. J. (2012). Normalization as a canonical neural computation. Nature Reviews Neuroscience, 13(1), 51–62. https://arxiv.org/abs/NIHMS150003. https://doi.org/10. 1038/nrn3136. Cohen, J. Y., Heitz, R. P., Woodman, G. F. & Schall, J. D. (2009). Neural basis of the set-size effect in frontal eye field: timing of attention during visual search. Journal of neurophysiology, 101(4), 1699–704. https://doi.org/10.1152/jn.00035.2009. Cox, G. E., & Criss, A. H. (2020). Similarity leads to correlated processing: A dynamic model of encoding and recognition of episodic associations. Psychological Review, 127(5), 792–828. Cox, G. E., Palmeri, T. J., Logan, G. D., Smith, P. L., & Schall, J. D. (2022). Salience by competitive and recurrent interactions: Bridging neural spiking and computation in visual attention. Psychological Review. 129(5), 1144–1182. Cox, G. E., & Shiffrin, R. M. (2017). A dynamic approach to recognition memory. Psychological Review, 124(6), 795–860. https://doi.org/10.1037/rev0000076. Dennett, D. C. (1971). Intentional systems. The Journal of Philosophy, 68(4), 87–106. Dominey, P. F., & Arbib, M. A. (1992). A cortico-subcortical model for generation of spatially accurate sequential saccades. Cerebral Cortex, 2(2), 153–175. https://doi.org/10.1093/cercor/ 2.2.153. Duncan, J. & Humphreys, G. W. (1989). Visual search and stimulus similarity. Psychological Review, 96(3), 433–458. https://arxiv.org/abs/arXiv:1011.1669v3. https://doi.org/10.1037/ 0033-295X.96.3.433. Geisser, S., & Eddy, W. F. (1979). A predictive approach to model selection. Journal of the American Statistical Association, 74(365), 153–160. Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis (3rd ed.). Boca Raton, FL: CRC Press. Grossberg, S. (1980). How does a brain build a cognitive code? Psychological Review, 87(1), 1–51. https://doi.org/10.1037/0033-295X.87.1.1. Hamker, F. H. (2004). A dynamic model of how feature cues guide spatial attention. Vision Research, 44(5), 501–521. https://doi.org/10.1016/j.visres.2003.09.033.

150

G. E. Cox et al.

Hamker, F. H. (2005). The reentry hypothesis: The putative interaction of the frontal eye field, ventrolateral prefrontal cortex, and areas v4, it for attention and eye movement. Cerebral Cortex, 15(4), 431–447. https://doi.org/10.1093/cercor/bhh146. Hanes, D. P., Patterson, W. F., & Schall, J. D. (1998). Role of frontal eye fields in countermanding saccades: Visual, movement, and fixation activity. Journal of Neurophysiology, 79(2), 817–834. https://doi.org/10.1152/jn.1998.79.2.817. Hanes, D. P., & Schall, J. D. (1996). Neural control of voluntary movement initiation. Science, 274(5286), 427–430. https://doi.org/10.1126/science.274.5286.427. Hauser, C. K., Zhu, D., Stanford, T. R., & Salinas, E. (2018). Motor selection dynamics in FEF explain the reaction time variance of saccades to single targets. eLife, 7(e33456), 1–32. Heeger, D. J. (1992). Normalization of cell responses in cat striate cortex. Visual Neuroscience, 9, 181–197. Heinzle, J., Hepp, K., & Martin, K. A. C. (2007). A microcircuit model of the frontal eye fields. Journal of Neuroscience, 27(35), 9341–9353. https://doi.org/10.1126/10.1523/JNEUROSCI. 0974-07.2007. Heitz, R. P., Cohen, J. Y., Woodman, G. F., Schall, J. D. (2010). Neural correlates of correct and errant attentional selection revealed through n2pc and frontal eye field activity. Journal of Neurophysiology, 104(5), 2433–2441. https://arxiv.org/abs/NIHMS150003. https://doi.org/ 10.1152/jn.00604.2010. Kruschke, J. K. (2015). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan (2nd ed.). London: Academic Press. Lamberts, K. (2000). Information-accumulation theory of speeded categorization. Psychological Review, 107(2), 227–260. https://doi.org/10.1037/0033-295X.107.2.227. Le Guin, U. K. (1974). The dispossessed. New York: Harper & Row. Lieder, F., & Griffiths, T. L. (2020). Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources. Behavioral and Brain Sciences, 43(e1), 1–60. Link, S. W., & Heath, R. A. (1975). A sequential theory of psychological discrimination. Psychometrika, 40, 77–105. Love, B. C. (2015). The algorithmic level is the bridge between computation and brain. Topics in Cognitive Science, 7(2), 230–242. https://doi.org/10.1111/tops.12131. Lowe, K. A., & Schall, J. D. (2018). Functional categories of visuomotor neurons in macaque frontal eye field. eNeuro, 5(5), 1–21. https://doi.org/10.1523/ENEURO.0131-18.2018. Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. San Francisco: W.H. Freeman. Mitchell, J. F., & Zipser, D. (2003). Sequential memory-guided saccades and target selection: a neural model of the frontal eye fields. Vision Research, 43(25), 2669–2695. https://doi.org/10. 1016/S0042-6989(03)00468-1. Nosofsky, R. M., & Palmeri, T. J. (1997). An exemplar-based random walk model of speeded classification. Psychological Review, 104(2), 266–300. https://doi.org/10.1037/0033-295X. 104.2.266. Palestro, J. J., Sederberg, P. B., Osth, A. F., Van Zandt, T., & Turner, B. M. (2018). Likelihood-free methods for cognitive science. Cham, Switzerland: Springer. Pike, R. (1973). Response latency models for signal detection. Psychological Review, 80(1), 53–68. Purcell, B. A., Heitz, R. P., Cohen, J. Y., Schall, J. D., Logan, G. D., & Palmeri, T. J. (2010). Neurally constrained modeling of perceptual decision making. Psychological Review, 117(4), 1113–1143. https://doi.org/10.1037/a0020311. Purcell, B. A., Schall, J. D., Logan, G. D., Palmeri, T. J. (2012). From salience to saccades: Multiple-alternative gated stochastic accumulator model of visual search. Journal of Neuroscience, 32(10), 3433–3446. https://doi.org/10.1523/JNEUROSCI.4622-11.2012. Rao, R. P. N. (2004). Bayesian computation in recurrent neural circuits. Neural Computation, 16 , 1–38. Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85(2), 59–108. https:// doi.org/10.1037/0033-295X.85.2.59.

Spiking, Salience, and Saccades: Using Cognitive Models to Bridge the Gap. . .

151

Reynolds, J. H., & Heeger, D. J. (2009). The normalization model of attention. Neuron, 61(2), 168–185. https://doi.org/10.1016/j.neuron.2009.01.002. Sato, T. R., Murthy, A., Thompson, K. G., Schall, J. D. (2001). Search efficiency but not response interference affects visual selection in frontal eye field. Neuron, 30(2), 583–591. https://doi. org/10.1016/S0896-6273(01)00304-X. Sato, T. R., & Schall, J. D. (2003). Effects of stimulus-response compatibility on neural selection in frontal eye field. Neuron, 38(4), 637–648. https://doi.org/10.1016/S0896-6273(03)00237-X. Schall, J. D. (2004). On building a bridge between brain and behavior. Annual Review of Psychology, 55(1), 23–50. https://doi.org/10.1146/annurev.psych.55.090902.141907. Schall, J. D., Morel, A., King, D. J., & Bullier, J. (1995). Topography of visual cortex connections with frontal eye field in macaque: Convergence and segregation of processing streams. The Journal of Neuroscience, 15(6), 4464–4487. Schneider, W., & Shiffrin, R. M. (1977). Controlled and automatic human information processing: I. detection, search, and attention. Psychological Review, 84(1), 1–66. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. https://arxiv.org/abs/arXiv:1011.1669v3. https://doi.org/10.1214/aos/1176344136. Shiffrin, R. M., & Schneider, W. (1977). Controlled and automatic human information processing: II. perceptual learning, automatic attending, and a general theory. Psychological Review, 84(2), 127–190. Smith, P. L. (1995). Psychophysically principled models of visual simple reaction time. Psychological Review, 102(3), 567–593. Smith, P. L., & Sewell, D. K. (2013). A competitive interaction theory of attentional selection and decision making in brief, multielement displays. Psychological Review, 120(3), 589–627. https://doi.org/10.1037/a0033140. Smith, P. L., Sewell, D. K., & Lilburn, S. D. (2015). From shunting inhibition to dynamic normalization: Attentional selection and decision-making in brief visual displays. Vision Research, 116, 219–240. https://doi.org/10.1016/j.visres.2014.11.001. Smith, P. L., & Van Zandt, T. (2000). Time-dependent Poisson counter models of response latency in simple judgment. British Journal of Mathematical and Statistical Psychology, 53, 293–315. Smith, P. L., & Vickers, D. (1988). The accumulator model of two-choice decision. Journal of Mathematical Psychology, 32, 135–168. Spiegelhalter, D. J., Best, N. G., Carlin, B. P., van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society B, 64(4), 583–639. Stone, M. (1960). Models for choice-reaction time. Psychometrika, 25, 251–260. Stone, M. (1977). An asymptotic equivalence of choice of model by cross validation and Akaike’s criterion. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 44–47. Teller, D. Y. (1984). Linking propositions. Vision Research, 24(10), 1233–1246. https://doi.org/10. 1016/0042-6989(84)90178-0. Thompson, K. G., Bichot, N. P., Sato, T. R. (2005). Frontal eye field activity before visual search errors reveals the integration of bottom-up and top-down salience. Journal of Neurophysiology, 93(1), 337–351. https://doi.org/10.1152/jn.00330.2004. Thompson, K. G., Hanes, D. P., Bichot, N. P., & Schall, J. D. (1996). Perceptual and motor processing stages identified in the activity of macaque frontal eye field neurons during visual search. Journal of Neurophysiology, 76(6), 4040–4055. https://doi.org/10.1152/jn.1996.76.6. 4040. Townsend, J. T. (1972). Some results concerning the identifiability of parallel and serial processes. British Journal of Mathematical and Statistical Psychology, 25(2), 168–199. https://doi.org/10. 1111/j.2044-8317.1972.tb00490.x. Townsend, J. T., & Ashby, F. G. (1983). Stochastic modeling of elementary psychological processes. Cambridge: Cambridge University Press. Trageser, J. C., Monosov, I. E., Zhou, Y., Thompson, K. G. (2008). A perceptual representation in the frontal eye field during covert visual search that is more reliable than the behavioral report. European Journal of Neuroscience, 28, 2542–2549.

152

G. E. Cox et al.

Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12(1), 97–136. https://arxiv.org/abs/9605103. [cs] https://doi.org/10.1016/00100285(80)90005-5. Truccolo, W., Eden, U. T., Fellows, M. R., Donoghue, J. P., & Brown, E. N. (2005). A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. Journal of Neurophysiology, 93, 1074–1089. Turner, B. M., & Van Zandt, T. (2018). Approximating Bayesian inference through model simulation. Trends in Cognitive Sciences, 22(9), 826–840. Usher, M., & McClelland, J. L. (2001). The time course of perceptual choice: The leaky, competing accumulator model. Psychological Review, 108(3), 550–592. https://doi.org/10.1037/0033295X.108.3.550. van Zandt, T., & Ratcliff, R. (1995). Statistical mimicking of reaction time data: Single-process models, parameter variability, and mixtures. Psychonomic Bulletin & Review, 2(1), 20–54. Vickers, D. (1970). Evidence for an accumulator model of psychophysical discrimination. Ergonomics, 13(1), 37–58. https://doi.org/10.1080/00140137008931117. Wagenmakers, E.-J., & Farrell, S. (2004). AIC model selection using Akaike weights. Psychonomic Bulletin & Review, 11(1), 192–196. Watanabe, S. (2013). A widely applicable Bayesian information criterion. Journal of Machine Learning Research, 14, 867–897. Weber, A. I., & Pillow, J. W. (2017). Capturing the dynamical repertoire of single neurons with generalized linear models. Neural Computation, 29, 3260–3289. Woodman, G. F., Kang, M.-S., Thompson, K., & Schall, J. D. (2008). The effect of visual search efficiency on response preparation. Psychological Science, 19(2), 128–136. https://doi.org/10. 1111/j.1467-9280.2008.02058.x.

Ultrahigh Field Magnetic Resonance Imaging for Model-Based Neuroscience Nikos Priovoulos, Ícaro Agenor Ferreira de Oliveira, Wietske van der Zwaag, and Pierre-Louis Bazin

Abstract Cognitive neuroscience has been very successful at employing functional magnetic resonance imaging (fMRI) for probing cognition, right from the start of fMRI with the discovery of the blood-oxygen-level-dependent (BOLD) effect in the early 1990s (Ogawa et al., Proc Natl Acad Sci U S A 87(24):9868– 9872, 1990; Kwong et al., Proc Natl Acad Sci U S A 89(12):5675–5679, 1992). Numerous studies have mapped the functional architecture of the cerebral cortex at the macroscopic scale. This chapter will focus on the importance of MRI for building more detailed models of the entire human brain at the mesoscopic level. This specifically means the visualization, both structurally and functionally, of the small nuclei involved in decision-making such as the subthalamic nucleus and locus coeruleus, data that might be used to indicate directionality of information flow in the brain, such as seen in laminar imaging and the investigation of feedback loops as can be found in the cerebellum. These investigations are only achievable with the modern ultrahigh field (UHF) scanners; hence this chapter will focus on the use of 7 T MRI in cognitive neuroscience. After a brief general introduction on the concepts of the BOLD response and the MRI systems used to visualize those responses, the characteristics of UHF MRI will be discussed in Sect. 1, followed by Sect. 2. The benefits of quantitative imaging are discussed in Sect. 3, and finally, we will discuss connecting the structural information with the functional data in Sect. 4.

N. Priovoulos · W. van der Zwaag () Spinoza Centre for Neuroimaging, Amsterdam, The Netherlands e-mail: [email protected] Í. A. F. de Oliveira Spinoza Centre for Neuroimaging, Amsterdam, The Netherlands Experimental and Applied Psychology, VU University, Amsterdam, The Netherlands P.-L. Bazin Integrative Model-Based Cognitive Neuroscience Research Unit, University of Amsterdam, Amsterdam, The Netherlands Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany © Springer Nature Switzerland AG 2024 B. U. Forstmann, B. M. Turner (eds.), An Introduction to Model-Based Cognitive Neuroscience, https://doi.org/10.1007/978-3-031-45271-0_7

153

154

N. Priovoulos et al.

Keywords Ultrahigh field · Magnetic resonance imaging · Mesoscopic level

1 Ultrahigh Field MRI The blood-oxygen-level-dependent (BOLD) signal observed in fMRI is an endogenous contrast, which makes fMRI an entirely noninvasive method with unrivalled spatial specificity. The veins and venules in the brain have naturally a slightly different interaction with the main magnetic field of an MRI scanner than the surrounding tissue and arteries, due to the magnetic properties of hemoglobin. This magnetic susceptibility difference leads to local signal loss in so-called T2*weighted images. As an extreme case, the larger veins are actually visible as black dots (no signal) in high-resolution functional images (see Fig. 1). Upon activation, the blood flow to an active region of the brain increases locally. This increased blood flow reduces the susceptibility difference, again locally, and results in a signal increase in T2*-weighted images. This signal increase is what is known as the BOLD response and is an undisputed marker of neural activity (Logothetis, 2008). Magnetic resonance imaging is a very versatile technique, with many applications beyond BOLD imaging. Its acquisitions can be tailored to reflect different contrasts, such as the longitudinal relaxation time T1 (correlated with the local concentration of myelin), the transverse relaxation time T2* (correlated with the distribution of iron), and local diffusion properties of the tissue of interest or even distributions of metabolites. The contrasts are generated by carefully manipulating the magnetization of the proton spins of the object or person in the scanner, first aligning them with the main magnetic field (B0 ) orientation and then, using RF coils placed at close proximity to the imaged regions, applying radio frequency Fig. 1 Time-averaged EPI at 0.8 mm resolution, with activation map from an auditory task overlaid. Vessels are clearly visible as small black holes, especially around the insula. (Figure provided by authors)

Ultrahigh Field Magnetic Resonance Imaging for Model-Based Neuroscience

155

(RF) pulses that rotate the spins away from this equilibrium. As the proton spins relax back into their aligned state, they in turn generate magnetic field changes that can be read out by the same RF coils. Images are encoded with the help of linear gradients, which will vary the magnetic field slightly in a linear fashion on the three cardinal axes: left-right, anterior-posterior, or inferior-superior. The MRI signal varies in frequency as a function of the magnetic field strength and hence changing that field strength, even if only slightly, as a function of location will allow the decoding of that location from the received signal. Imaging sequences (often referred to by acronyms such as MPRAGE, GRE, TSE, FLAIR, etc.) combine carefully timed successions of pulses, gradients, and readouts to generate images with different contrast weightings. A more in-depth explanation of MRI can be found in several standard works dedicated to Magnetic Resonance Imaging, such as MRI from Picture to Proton (McRobbie et al., 2017), MRI in Practice (Westbrook & Talbot, 2018), and Magnetic Resonance Imaging: Physical Principles and Sequence Design (Brown et al., 2014). A standard hospital MRI scanner has a main magnetic field strength of 1.5 or 3 Tesla. The magnetic field strength, or B0, determines the possible achievable signalto-noise ratio (SNR) of the images. SNR increases a little more than linearly with field strength: .SNR  B0[1−1.5] (Pohmann et al., 2016). But there are many more parameters that change the obtained images and many of those are also field strength dependent: For example, as the field strength goes up, the transverse relaxation times T2 and T2* become shorter (Peters et al., 2007) while the longitudinal relaxation time T1 becomes longer (Wright et al., 2008). For neuroscientists, however, it is more important to know that the susceptibility-related contrast increases, because this is what drives the BOLD response that most functional MRI measurements are based on. The BOLD signal and its field strength dependence are discussed in more detail in the following paragraphs. As the SNR and other magnetic resonance parameters depend on B0, there is a drive toward higher and higher field strengths. In early 2020, there were approximately 90 scanners for human use installed with a field strength of 7 T (Fig. 2) and a handful of even higher field strengths (9.4 T, 10.7 T, and 11.7 T). Most of these are positioned at or near large hospital facilities, often in close proximity to an active neuroscience community.

1.1 How Functional Imaging Changes at UHF At ultrahigh field, functional MRI benefits significantly from the increased SNR and BOLD contrast. The combination of the two means that a larger (BOLD) signal is being detected with higher sensitivity (Fig. 3), which can dramatically decrease the amount of scan time required to detect a given response as far fewer repetitions of a task are necessary to detect BOLD signal changes with confidence (van der Zwaag et al., 2009). Nevertheless, there are also disadvantages associated with higher field

156

N. Priovoulos et al.

Fig. 2 7 T scanners, 2023. Map courtesy of Dr. Huber, Maastricht University. For an up-to-date version, see https://layerfmri.page.link/UHFmap

Fig. 3 Time courses obtained from a single ROI in a single subject performing a motor task. The differences in BOLD amplitude are striking. (Figure adapted with permission from van der Zwaag et al. (2009))

strengths. The large majority of functional MRI scans is done with gradient echo EPI (echo planar imaging), an efficient acquisition method in which all of a twodimensional image slice is acquired after a single RF pulse. The readout time for

Ultrahigh Field Magnetic Resonance Imaging for Model-Based Neuroscience

157

Fig. 4 EPI images before (middle) and after (right) distortion correction. The insets show a part of the musculature around the skull, which is significantly displaced. The leftmost panel shows the undistorted anatomical image. (Figure provided by authors)

this slice is necessarily long, up to 50 ms, compared to the 2–4 ms typically used for individual voxel readouts in structural imaging. In the time of an EPI readout, inhomogeneities in the magnetic field can slightly modify the received signal. In the resulting images, these changes develop into distortions of the object being imaged. These spatial distortions can be especially destructive in areas close to an air-water boundary, such as the inferior frontal cortex above the sinuses, the amygdala, and the inferior temporal lobe near the auditory canals. These distortions become much more pronounced at UHF (ultrahigh field) because the differences in magnetic susceptibility of the tissues become larger, resulting in larger inhomogeneities of the local magnetic field. It is difficult to combat this issue, because the BOLD signal we are interested in is also driven by a susceptibility change, namely, the between de-oxygenated blood and its surroundings. Hence, a scan that is made insensitive to susceptibility differences will also score very poorly on BOLD sensitivity. On the positive side, if the magnetic field distortions are known or measured, which can be done in a few minutes, the distortions can largely be corrected (Andersson et al., 2003; Graham et al., 2017). Fig. 4 shows an EPI (echo-planar imaging) image corrected using FSL’s (FMRIB Software Library) top-up method (Andersson et al., 2003). Differences are especially noticeable for small details and at the surface of the brain. A second important factor at UHF is the presence of physiological noise. Although the term suggests otherwise, it is not really measurement noise but indicates signal fluctuations induced by processes in which the researcher is not directly interested, such as breathing and the heartbeat. During the cardiac cycle, the blood flow in the large cerebral vessels will change, changing also the signal in and around those vessels. The changes in the position of the thorax during the breathing cycle will cause changes in the magnetic field, and these changes can reach into the lower brain areas such as the cerebellum, brainstem, and even temporal lobes

158

N. Priovoulos et al.

(van der Zwaag et al., 2015). These signal changes become measurable when the thermal noise component of the images is relatively low. In an extreme case where the image SNR would be infinitely high, then all that is measured over time are system instabilities (Krüger & Glover, 2001). Because the instabilities in the scanner hardware are much smaller than those of the human being scanned, the measured instabilities typically originate from the physiology. In practice, that means that an increase in image SNR (obtained, for example, by better RF coils or larger voxels) does not always translate into higher temporal SNR (a measure for BOLD detection power). Physiological noise can be removed a posteriori from the images (Glover et al., 2000), but acquiring images with higher spatial resolution and, hence, a higher thermal noise component may be preferential (Triantafyllou et al., 2005). A third feature of UHF worth mentioning is the signal variation caused by B1 inhomogeneities. B1 , or more precisely the term B1 + , is used to indicate the magnetic field transmitted by the RF coil into the subject being imaged. This radio frequency field is typically assumed to be homogeneous throughout, so that an RF pulse has the same flip angle (amount of deviation of the spinning protons from the B0 orientation) throughout the brain. Because the wavelength at UHF is shorter than at 3 T or 1.5 T, there is more interference in these RF fields, which can lead to significant loss of signal in affected brain areas. Two solutions to this problem exist: the use of dielectric pads (Teeuwisse et al., 2012) and that of multichannel transmit coils. In these multichannel transmit systems, 8 or 16 coils are used to transmit simultaneously, and these RF pulses can be adjusted with respect to each other to mitigate dropout due to interference. There are different ways in which to approach these adjustments, such as RF shimming or advanced pulse design, which the interested reader can find clearly summarized in Padormo et al. (2016). The distortions, physiological noise, and B1 inhomogeneities may lead to different choices of acquisition methods for UHF. For example, readouts in EPI tend to be as short as possible to reduce distortions. Still, for functional acquisitions, the gradient-echo EPI remains the most widely used sequence, although other contrasts are also gaining popularity (see further Sect. 1.3). For anatomical images, the B1 inhomogeneities can be particularly detrimental, and often the standard anatomical MPRAGE (magnetization-prepared rapid gradient echo) or T1-FFE (T1 fast field echo) sequence from 1.5 and 3 T imaging will be replaced with an MP2RAGE (Marques et al., 2010a) or PSIR (phase-sensitive inversion recovery) (Mougin et al., 2016). These sequences are inherently less sensitive to B1 inhomogeneities. Examples of the standard T1-weighted MPRAGE and a comparable MP2RAGE are shown in Fig. 5. For other sequences, diffusion mapping, magnetization transfer contrast (Priovoulos et al., 2018), and turbo spin-echo (Eggenschwiler et al., 2014) adjustments are also required at 7 T, including solutions to deal with the increased specific absorption rate (SAR) in some of these sequences at 7 T, which reflects tissue heating and must be limited for safety. Many of these are discussed in Marques and Norris (2018).

Ultrahigh Field Magnetic Resonance Imaging for Model-Based Neuroscience

159

Fig. 5 MPRAGE (left) and MP2RAGE (right) datasets for the same individual. While the MPRAGE has central brightening and marked signal loss in the cerebellum (white arrow), contrast in the MP2RAGE is much more homogeneous throughout, including in the cerebellum. (Figure provided by authors)

1.2 Analysis of UHF Data Functional MRI experiments generally involve a task performed in the scanner, while the BOLD activity is being recorded. By generating a forward model of the expected task-specific BOLD response, the involvement of different brain regions into the related cognitive process can be inferred (Fig. 6). In classical fMRI, these analyses are often done voxel-wise and combined using statistical models to correct for multiple comparisons across the whole brain or the cerebral cortical surface (Friston et al., 1994; Poldrack et al., 2011). In UHF-MRI, it is often beneficial to define instead a set of regions of interest (ROIs) that are hypothesized to be involved in the given task, because the increase in resolution is otherwise lost due to the smoothing required by whole brain analyses (Turner & Geyer, 2014). In this, UHFMRI also agrees better with cognitive modeling, as it assumes a given model of the neural correlates for the task (or compares a small set of models) rather than explore every brain region blindly. In order to perform fMRI analyses, either whole-brain or ROI (region-of-interest)-based, several preprocessing steps are required. Recent standardization efforts have reached a consensus preprocessing pipeline, fMRI-prep, which performs the most common steps in an optimized pipeline crafted by experts (Esteban et al., 2019). The needed steps include masking of the skull and extracranial tissues, correction of the motion between acquired EPI images of the fMRI time series, slice-time correction (adjusting the measurements to reflect the time it takes to acquire each slice), distortion correction (either using a separately acquired MRI (magnetic resonance imaging) field map image or an EPI image with reverse encoding

160

N. Priovoulos et al.

Fig. 6 Schematic fMRI dataset. The time series of brain images is taken while two sets of checkerboards are shown to the subject. When extracting the time courses for each voxel, some will show activation (left) while most will not (right). (Figure provided by authors)

direction), and co-registration to the subject’s anatomical MRI (after preprocessing to mask extra-cranial tissues and parcellation of the brain to label cortex and other structures of interest). See Poldrack et al. (2011) for a thorough presentation of the main concepts and approaches. Additional preprocessing steps may include smoothing or nuisance component estimation. For UHF fMRI, a few adjustments have to be made. On the anatomical side, the images acquired are generally sequences built on the ratio between images such as MP2RAGE, which enhances the noise in the background. A first step of background masking (Bazin et al., 2014; O’Brien et al., 2014) is thus required before processing, to enable the pipeline to perform the anatomical co-registration. Many fMRI studies at 7 T focus on a small region of the brain and thus acquire small slabs of data that do not cover the entire brain. In such cases, it can become necessary to collect additional images, either imaging the functional slab with an anatomical contrast or collecting some additional whole-brain EPI. Other strategies include collecting anatomical data with an EPI-based sequence, which match the distortions of the fMRI; see Polimeni et al. (2018) for a review of the different options. Beyond the preprocessing stages, there are many different options for fMRI analysis, all highly dependent on the neuroscience question being investigated. Most of the approaches developed for 3 T imaging can be applied to UHF data without specific adaptation. The traditional GLM methods are generally less desirable due to their reliance on smoothing, but new variants can retain high spatial specificity (Lohmann et al., 2018). Region-of-interest analyses combine the advantages of

Ultrahigh Field Magnetic Resonance Imaging for Model-Based Neuroscience

161

averaging across multiple voxels while maintaining sharp boundaries but require prior identification of the regions. In some cases, anatomical details are sufficient, such as the increased myelination of primary cortical areas (Dinse et al., 2015). Prior identification can also be done by performing an additional task for locating functionally relevant regions or applying parcellations from relevant atlases (see Arslan et al. (2018) for a review). Another modeling approach which allows higher degrees of freedom than the traditional GLM fitting is the population receptive field method. Here, multiple parameters are fitted, rather than just the BOLD signal amplitude. In visual field mapping, for example, responses are fitted varying the x and y coordinates of the visual angle map and also the width of the distribution of neuronal responses in the fMRI voxel (Dumoulin & Wandell, 2008). Using this modeling approach, responses to visual stimuli have been characterized in brain regions even in areas not typically expected to be involved in vision (van Es et al., 2019). For auditory (Thomas et al., 2015) and more cognitive stimuli (Harvey et al., 2013), comparable approaches have been developed. Other recent advances in analysis techniques of potential interest for UHF fMRI include connectopic mapping (Haak et al., 2018) or inter-subject correlation analysis (Chen et al., 2016, 2017).

1.3 Alternative Functional Contrast Sources Although BOLD has proven itself highly useful, the draining veins cause spatial spreading of the signal which is, in many cases, unwanted. Recent efforts have resulted in new developments for alternative contrasts linked to neural activity in order to obtain a better picture. Here, we will highlight two of those, arterial spin labeling (ASL) and vascular space occupancy (VASO). ASL is a noninvasive method that uses the water present in the arterial blood as a diffusible endogenous contrast. Unlike the majority of perfusion techniques that are qualitative, showing relative changes in cerebral blood flow (CBF), cerebral blood volume (CBV), and mean transit time (MTT), ASL can provide a quantitative measure of CBF (Pollock et al., 2009). Moreover, CBF maps present higher temporal and spatial resolutions compared to other methods and from ASL have been validated extensively against methods that use exogenous contrast agents, such as 15O-PET (Alsop et al., 2015; Wintermark et al., 2005; Xu et al., 2009). The fundamental concept of ASL is the manipulation of the longitudinal magnetization of the water present in the arterial blood so that it differs from the magnetization of the tissue (Fig. 7). It is based on the labeling of the protons in the supplying vessels outside the imaging plane that reach the parenchyma after a period of time known as post-labeling delay (PLD). A labeled image is then acquired. A control (unlabeled) image is also acquired (Pollock et al., 2009). The subtraction of the labeled and control images then eliminates the static tissue signal (Fig. 7). The remaining signal is a relative measure of the perfusion, resulting in a perfusionweighted image, which is proportional to the local CBF.

162

N. Priovoulos et al.

Fig. 7 General principle of arterial spin labeling. A pair of control label is acquired, in the label image; spins in the arterial blood are labeled followed by a control condition, where ideally there is minimal alteration in the water spins. The difference between both images is proportional to the amount of flow supplying in that voxel. (Figure provided by authors)

Unfortunately, the signal difference is just a fraction (1–2%) of the tissue signal and depends on many parameters such flow rate, T1 of the blood and tissue, and transit time for blood to travel from the labeling region to the imaging plane. Therefore, multiple control and label images are acquired to ensure sufficient signalto-noise ratio, and absolute quantitative CBF maps can be obtained by varying the delay times and using the general kinetic model (Buxton et al., 1998). Besides, the perfusion contrast also has a dependency on the label efficiency, which is related to the amount of spins labeled in the labeling plane (Detre et al., 2012). For functional ASL experiments, a time series of tag-control images is acquired during the execution of a task, similar to BOLD fMRI acquisitions. Unfortunately, the pairwise acquisition results in lower SNR and lower temporal resolution compared to BOLD fMRI (Borogovac & Asllani, 2012), although ASL is very attractive for cognitive neuroscience due to the closer coupling between CBF and neuronal activity in contrast with the vascular biases of BOLD fMRI. At UHF, in addition to the higher SNR, the longer blood T1 associated with higher field strength results in an amplified perfusion signal. To date, perfusion fMRI experiments with ASL at UHF have not been widely applied in humans. However, recent studies showed that it is possible and a sufficiently high spatial resolution for cortical depth-dependent studies has been achieved (Huber et al., 2017; Ivanov et al., 2017; Kashyap et al., 2019). On the other hand, ASL has lower temporal resolution, less brain small coverage, and low SNR compared to BOLD acquisitions. At UHF, the longer blood T1 achieved associated with higher field strength results in an amplified perfusion signal. Yet to date, perfusion fMRI experiments with ASL at UHF have not been

Ultrahigh Field Magnetic Resonance Imaging for Model-Based Neuroscience

163

widely applied in humans. However, recent studies showed that it is possible to reach a sufficiently high spatial resolution, for cortical depth-dependent studies have been achieved (Huber et al., 2017; Ivanov et al., 2017; Kashyap et al., 2019). Like ASL, VASO is a technique aiming to quantify a specific component of the neurovascular activity, in this case changes in local cerebral blood volume (CBV). VASO was proposed initially as a method for obtaining CBV-weighted contrast noninvasively in vivo (Lu et al., 2013). The VASO sequence takes advantage of the T1 difference between blood and the surrounding tissues. It uses an inversion recovery pulse to null the blood while part of the surrounding tissue signal remains. The relative volume of the tissue compartment in a voxel is considered proportional to 1 – CBV (if no BOLD contamination is present). The main advantage of VASO is its higher microvascular specificity compared with the GE-BOLD acquisitions typically used for fMRI (Huber et al., 2015; Jin & Kim, 2008); it has been shown that most of the CBV changes comes from the small arterioles that are located close to the activated neurons (Huber et al., 2017). In BOLD-based fMRI, the signal pooling in the draining veins results in BOLD responses building up toward the cortical surface (Markuerkiaga et al., 2016). Because VASO was developed as an alternative for BOLD fMRI experiments; it is typically run as a succession of different images, generating a time series similar to functional MRI experiments. As the CBV increases, a reduction in overall signal within the voxel is detected. Hence, CBV/VASO time courses show a signal reduction where BOLD signal increases during a stimulus (Fig. 8). VASO does have a few drawbacks: in order to implement the VASO sequence and obtain the desired CBV signal, prior knowledge of the arterial transit time and longitudinal relaxation rates (gray matter, white matter, CSF, blood) are needed. The required inversion time limits the possible length of the EPI readout. Hence, similar to the limits encountered in ASL, smaller coverage is achieved in VASO than in BOLD fMRI experiments. The necessity for the signal to recover after the inversion pulse, and the acquisition of an extra image to remove residual BOLD weighting, also limits the temporal resolution. Third, the SNR of CBV-weighted VASO is inherently lower than that of BOLD fMRI, because the inversion pulse used to null the blood also removes a significant part of the tissue signal. Nevertheless, recent developments in VASO acquisition allowed several laminar fMRI studies beyond simple validation of known systems, for example, Finn et al. (2019).

2 Pushing the Limits for Cerebellum, Subcortex, and Within-Cortex Imaging While ultrahigh field can benefit many fMRI experiments because of the higher BOLD responses, it really is making a difference for small areas of the brain or specific questions about functional subregions that cannot be visualized or functionally separated otherwise. Especially noteworthy in this case is the subcortex,

164

N. Priovoulos et al.

Fig. 8 Plots averaged over all significant voxels for BOLD (red) and VASO CBV (blue). Unilateral motor task (right hand), block design 12/24 seconds on/off. (Figure provided by authors)

where 455 distinct anatomical structures are tightly packed together (Forstmann et al., 2017). An example brain region, already mentioned in other chapters in this book, is the subthalamic nucleus or STN. This iron-rich area is about the size of a kidney bean, less than a centimeter wide in its short axis. With run-ofthe-mill 3-mm resolution fMRI, only a very small number of voxels are actually inside the STN, significantly hampering detectability of functional responses in this area. A second important region is the cerebellum, because of its thin layer of gray matter and tightly folded structure. The high spatial resolution can benefit functional MRI acquisitions very significantly. Finally, high spatial resolution is essential when studying BOLD responses within the cortex. Two subdivisions of the cortical sheet are relevant: first, a division in layers, as a function of cortical depth, because the neocortex consists of six different layers characterized by their cell type distribution which differentially reflect input and output, and second, a division in columns perpendicular to the cortical surface. These columns, sometimes grouped

Ultrahigh Field Magnetic Resonance Imaging for Model-Based Neuroscience

165

into hypercolumns, are thought to be computational units of the cortex. The bestknown examples are found in the primary visual areas, where input from the left and right eyes is organized in a columnar fashion. Aspects of functional MRI of the subcortical nuclei STN and LC, cerebellum, and intracortical structures at ultrahigh field will be discussed in some more detail in the following paragraphs.

2.1 In the Subcortex: The Example of the Subthalamic Nucleus and the Locus Coeruleus The subthalamic nucleus (STN) is a small but vitally important structure in the basal ganglia. It is widely recognized that the cortico-basal ganglia network controls motor performance (Bogacz et al., 2010; Mink, 1996; Nambu et al., 2002). Within this network, the subthalamic nucleus (STN) is hypothesized to receive excitatory cortical input which in turn leads to a slowing or termination of movements. Because of its small volume, central location in the basal ganglia, and high iron content, the STN is a challenging structure to image with MRI. With the commonly used 3 mm isotropic resolution of 3 T functional MRI studies, the entire STN is covered with 4–5 voxels (de Hollander et al., 2017). The common practice of smoothing in fMRI exacerbates this problem, mixing signals from the neighboring substantia nigra, red nucleus, or thalamus (de Hollander et al., 2015). Subcortical structures such as the STN also contain high concentrations of iron (Deistung et al., 2013; Keuken et al., 2017), which leads to substantially shorter T2* relaxation times compared to cortical areas (Peters et al., 2007), reducing BOLD sensitivity in these regions (de Hollander et al., 2017). The use of UHF MRI is crucial to increase anatomical specificity in locating the STN for functional studies but also for deep brain stimulation surgery (Cho et al., 2010). Although “standard” anatomical images have T1 contrast, the STN has very limited T1 contrast with neighboring white matter and other subcortical structures, and a susceptibility contrast based on T2* weighting is required for accurate delineation (Keuken et al., 2018). For BOLD functional MRI, two factors have to be combined: high spatial resolution (below 2 mm isotropic) and a reduced echo time (TE), as close as possible to the T2* of the STN (de Hollander et al., 2017). Given the size and location of the STN, functional analyses require a region-of-interest analysis rather than the voxel-wise GLM favored in classical cortical studies; however this more explicit approach is more readily compatible with modeling, for example, Keuken et al. (2015). The locus coeruleus (LC) is a small pontine nucleus (approximately 1–2 mm in diameter and 12 mm in length in humans) that is the main source of noradrenaline in the cortex. Noradrenaline uptake is crucial in adjusting arousal and attention in mammals (Seo & Bruchas, 2017). As such, the LC activity has been hypothesized to relate to the prediction error within a predictive coding model of the brain (Sales et al., 2019), while noradrenaline uptake is thought to modulate the decision-making behavior (Eckhoff et al., 2009).

166

N. Priovoulos et al.

Fig. 9 (a) Sagittal slice group z-map at brainstem level from a vagus nerve stimulation paradigm performed at 7 T (voxel size = 1.25 mm; (Figure adapted with permission from Priovoulos et al. (2019b)). A cluster at locus coeruleus level can be observed. White lines indicate axial slices presented in b. (b) Axial successive brainstem slices of group z-map. Bilateral locus coeruleus involvement can be observed. (c) Successive slices of a high-resolution LC template at approximately the same level. Hyperintensities (white arrows) indicate the LC position

Even though the human LC is important from a model-based neuroscience perspective, the human LC is relatively understudied in vivo, since typical fMRI approaches at 3 T or lower fields cannot reach the necessary spatial resolution (approximately 1 mm or less). This is particularly crucial for the LC due to its proximity to the fourth ventricle, which results in an increased contamination from cerebrospinal fluid signal. The narrower BOLD point spread function (Shmuel et al., 2007) and increased SNR afforded by UHF in combination with parallel imaging may allow us to achieve the necessary resolution to perform BOLD-weighted fMRI in the LC, as was recently demonstrated (Fig. 9; Sclocco et al., 2019; Jacobs et al., 2018; Priovoulos et al., 2019a, b). LC fMRI experiments in UHF are however associated with several challenges: the brainstem’s proximity with the chest cavity results in increased sensitivity to the B0 field fluctuations related to the respiratory cycle. The B0 perturbations result in phase and frequency instabilities (that can result, for example, in variable ghosting and distortion in gradient echo planar imaging) that increase at UHF (van Gelderen et al., 2007). The physiological noise tends to dominate over the thermal noise in the brainstem and lower the SNR gains. Furthermore, the increased B1 transmit

Ultrahigh Field Magnetic Resonance Imaging for Model-Based Neuroscience

167

inhomogeneity in UHF can further reduce SNR in the brainstem, while the effect of susceptibility artifacts including blurring and distortions increases (e.g., due to the often-necessary inclusion of the skull base and frontal sinuses in the field of view). The above issues can be partially mitigated by real-time B0 shimming (van Gelderen et al., 2007), retrospective denoising algorithms (Salimi-Khorshidi et al., 2014), parallel transmit approaches (Katscher & Bornert, 2006), the careful placement of the field of view, and the narrower imaging slices afforded by parallel imaging. For LC UHF imaging in particular, an additional challenge is that it is hard to apply the structural imaging method typically used to localize it at lower fields (viz., a SARintense two-dimensional turbo-spin echo) to UHF. To tackle this, a simple imaging method focused on magnetization transfer has been demonstrated recently at 7 T (Priovoulos et al., 2018). In sum, in vivo neuroscientific advances in the brainstem in general and the LC in particular are tightly coupled with our progress in mitigating the challenges associated with UHF acquisitions. These challenges also result in different noise and spatial autocorrelation profiles between the brainstem and the cortex; thus combining brainstem and cortex signal in a single analysis may require more sophisticated approaches (e.g., typical GLM modeling approaches assume a constant spatial autocorrelation; Eklund et al. 2016).

2.2 In the Cerebellum: Highly Folded Lobules and Small Subcortical Nuclei The cerebellum, or little brain, is an area of the brain positioned behind the brainstem, inferior to the occipital lobe. A wide-cited fact is that although the cerebellum takes up only 10% of the total brain volume, it contains four times as many neurons as the rest of the brain (Herculano-Houzel, 2012). The cerebellum, like the forebrain, consists of a white matter scaffold, covered with a gray matter cortex. Within this white matter structure, which is also referred to as the “arbor vitae,” lie nuclei of iron-rich gray matter. The dentate nucleus is the largest and most well known of these. Anatomically, the cerebellum is much more finely organized than the rest of the brain (Fig. 10), with relatively thin white matter zones in the gyri and a 1-mm thick cortex, considerably thinner than that of the occipital lobe pointed out by the arrows in Fig. 10. The cerebellum is subdivided into ten lobules, visible as little tree-shaped structures, which further branch into folia. One of the lobules is outlined in black in Fig. 10. Functionally, the cerebellum has an important role in the motor system (Manto et al., 2012). For example, a task as simple as riding a bike would become impossible without proper cerebellar function. In addition to the involvement in motor control, the cerebellum is implicated in motor learning (Ernst et al., 2017), in working memory (Küper et al., 2016), and even in language and executive functions (Stoodley, 2012). The cerebellum is affected in a wide range of diseases, ranging from those affecting the cerebellum specifically, such as cerebellar ataxia (Deistung

168

N. Priovoulos et al.

Fig. 10 A high-resolution image from the cerebellum, outlined in white. A lobule is outlined in black. (The image is edited with permission from Marques et al. (2010b); Priovoulos et al. (2023))

et al., 2016), to brain-wide diseases such as multiple sclerosis (Fartaria et al., 2017). A recently published gross functional map of the cerebellum suggests that function does not follow the lobular structure (King et al., 2019). Cerebellar imaging generally suffers from a lack of spatial resolution because of the fine spatial structure. The increased SNR at 7 T can be traded for spatial resolution, hugely benefitting the depiction of the thin cerebellar cortex and small nuclei (Marques et al., 2010b). Functionally, the added benefit of an increased BOLD response (Gizewski et al., 2007) enables functional cerebellar imaging (Ernst et al., 2017; Küper et al., 2016). Two drawbacks of UHF potentially affect the cerebellum: the inhomogeneity of the radio frequency pulses and the presence of physiological noise. A loss of transmit RF power often occurs in the right cerebellar hemisphere. There are two technological solutions for this: the use of a dielectric pad (Teeuwisse et al., 2012; Vaidya et al., 2018) and the use of parallel transmit techniques (Padormo et al., 2016), an example of which is shown in Fig. 11.

2.3 In the Cerebral Cortex: Layers and Columns Human cortex is organized at different spatial scales, ranging from the few micrometers of an individual neuron to cortical columns and cortical layers at the millimeter scale and the several centimeters covered by cortical areas and white matter tracts. For both laminar and columnar fMRI, spatial resolutions below 1 mm are essential and, hence, almost all laminar and columnar fMRI experiments are performed at 7 T or higher (Dumoulin et al., 2018). Besides hardware considerations, the specific physiological and biophysical sources of the BOLD signal strongly determine the

Ultrahigh Field Magnetic Resonance Imaging for Model-Based Neuroscience

169

Fig. 11 T1-weighted MPRAGE images acquired at 7 T with a standard system (left) and eightchannel transmit system with advanced rf-pulse design (right) yield marked differences in the cerebellum (outlined in blue). (Figure provided by authors)

response sensitivity and specificity, not only in amplitude but also in temporal behavior. The larger venous vessels (intracortical, pial, and large draining veins) have a larger blood volume and are located further downstream resulting in higher amplitudes and more delayed and broader BOLD responses (blood pooling) that are less specific to neuronal activity (de Zwart et al., 2005; Menon, 2002; Siero et al., 2011; Turner, 2002). BOLD signals from the microvasculature (arterioles, capillaries, and small venules) have the highest spatial specificity to neuronal tissue and tend to show smaller amplitudes and faster and narrower responses with a notable heterogeneity across cortical depth (Jin & Kim, 2008; Ogawa et al., 1993; Siero et al., 2013; Uluda˘g et al., 2009; Yacoub et al., 2008). The BOLD signal heterogeneity across cortical depth becomes crucially important when pushing the spatial resolution to the domain of cortical laminae and columns – it contains depth-dependent variations in neuronal activity but also nonneuronal variations in vascular architecture, blood draining properties, and other physiological noise components (Koopmans et al., 2011). These effects should be taken into account when analyzing cortical-depth or laminar-resolved fMRI (Goense et al., 2016; Markuerkiaga et al., 2016). Spin-echo- (T2-weighted) based acquisitions have an increased microvascular weighting at ultrahigh field, however at the price of reduced sensitivity, increased SAR, sensitivity to B1 inhomogeneity, and residual T2* weighting in the case of long EPI readouts. Several types of T2-weighted-based fMRI acquisitions have been explored (Barth et al., 2010; De Martino et al., 2013; Goa et al., 2014), but T2*-weighted gradient echo remains the most widely used method for BOLD

170

N. Priovoulos et al.

Fig. 12 Slice selection and functional domains in human visual cortex. (a) depicts the optimal region of flat gray matter in primary visual cortex (parallel to the calcarine sulcus) in one subject from which columnar-level fMRI maps of ODC (b) and orientation preference (c) are generated and characterized. (Reproduced with permission from Yacoub et al. (2008))

acquisitions with laminar or columnar resolution (Marques & Norris, 2018). VASO, as described in Sect. 1.3, is also gaining in popularity due to recent advances and successful applications to neuroscientific questions (Finn et al., 2019). Columnar imaging is even more challenging than laminar imaging. There is no possibility to average over an extended region as is done for the signal at different cortical depths, and the curvature of the gray matter sheet places very high demands on segmentation and analysis to keep the line of interest aligned with the underlying column. The first and most well-known example of a human columnar experiment demonstrated the presence of ocular dominance columns in the primary visual cortex (Fig. 12; Yacoub et al., 2008). This particular study benefited from carefully placed slices so that an anisotropic voxel could be used. Later columnar experiments did have isotropic voxels and could, at least in theory, allow analysis in both the columnar and laminar directions (Dumoulin et al., 2017; Zimmermann et al., 2011; Nasr et al., 2016). These studies remain highly challenging and are

Ultrahigh Field Magnetic Resonance Imaging for Model-Based Neuroscience

171

still rather uncommon for neuroscience questions, though they may become more feasible as hardware and acquisition methods improve.

3 UHF Neuroanatomy with Quantitative MRI Functional imaging is not the only field of neuroscience for which UHF MRI is useful, as anatomical imaging enables the visualization of the organizational units of the brain related to the myelo- and cytoarchitecture. Standard MRI output values, however, typically represent the combination of tissue properties modulated by scanner parameters, such as the echo time, the repetition time, the flip angle, or the transmit and receive profiles in an arbitrary fashion. This hinders longitudinal or between scanner comparisons. Quantitative MRI (qMRI) aims to quantify (in SI units) the physical properties of the tissue, such as the longitudinal and transverse magnetization, the water proton density, the magnetization transfer, and the direction and strength of water diffusion (Weiskopf et al., 2013). Beyond increased reproducibility, quantifying these properties allows us to develop models that link the MRI-relevant spatial scale with several features of the underlying microstructure such as the myelin, the water and iron concentration, or the axon density and orientation in vivo (Fig. 13; Dinse et al., 2015; Edwards et al., 2018). Such features until recently were only accessible through ex vivo histology (Deistung et al., 2013; Tabelow et al., 2019). This has obvious value for clinical neuroscience in humans (i.e., it can potentially allow individual prognosis or diagnosis based on normative values) but may further allow us to better link changes in fMRI activity with differences in brain microstructure (see Sect. 4 for examples).

3.1 Common qMRI Models Several of the qMRI measures stem from measuring the relaxation times of the tissue. One of the most common is the longitudinal relaxation (T1), which expresses the time over which the spin returns to their equilibrium state. This is facilitated by thermal molecular motion as well as the interactions of the free water molecules with water molecules bound to macromolecules, such as the myelin lipids. As such, in the human brain, T1 mapping predominantly relates to water relaxation, to myelin concentration, and more mildly to paramagnetic ions, such as those of iron. All of these processes also affect the transverse (T2) relaxation, along with spin-spin interactions. The observed transverse relaxation (T2*) is typically faster due to proton dephasing caused by field distortions either from magnet defects or from the susceptibility of the tissue. Therefore, T2 and T2* in particular are sensitive to diamagnetic and paramagnetic tissue. For all of these measures, microstructureinspired models have been developed that describe the T1, T2, and T2* decays as a multi-exponential combination of water and myelin components (with a possible

172

N. Priovoulos et al.

Fig. 13 Schematic of the relationship between common qMRI approaches and brain microstructure. (The image is copied with permission from Edwards et al. (2018))

correction for susceptibility-induced dephasing; Sati et al., 2013; van Gelderen et al., 2016). See also Cercignani et al. (2018) for an in-depth overview of quantitative MRI. As mentioned, field distortions result in phase changes; therefore, the MR signal phase is sensitive to the susceptibility induced by the presence of paramagnetic and diamagnetic objects, after accounting for their shape and orientation. A set of techniques dubbed quantitative susceptibility mapping (QSM) have been developed to resolve the susceptibility, typically by assuming that the local field disturbances are well described as a magnetic dipole and solving the inverse problem (a challenging endeavor). QSM maps are therefore sensitive to the presence of myelin and paramagnetic materials, such as iron (Wang & Liu, 2015). Proton density (PD) mapping correlates to the water content of the voxel that is visible, that is, it has a transverse relaxation that is long enough to allow imaging. Water molecules that are bound to macromolecules have a very short transverse relaxation and are effectively invisible. Therefore, PD relates to the unbound water concentration, which varies with the underlying brain tissue composition and provides another measure of microstructure (Mezer et al., 2013). While water protons that are bound to macromolecules show a very short transverse relaxation, their longitudinal relaxation is longer. Given radiofrequency saturation, the macromolecule-bound water protons can exchange magnetization (cross-relax) with adjacent water protons over dipole-dipole interactions. By specifically saturating the macromolecule-bound water protons and measuring the sub-

Ultrahigh Field Magnetic Resonance Imaging for Model-Based Neuroscience

173

sequent signal reduction of the unbound water pool, one can extract quantitative or semi-quantitative measures of the magnetization exchange. Since myelin lipids, such as cholesterol, bind water and are common in the brain, magnetization transfer measures relate strongly to myelin accumulation (Henkelman et al., 2001). Finally, the diffusion of water molecules in the brain is restricted by the presence of neuronal cells and fibers. This can reduce diffusion and/or render it anisotropic: intuitively, if neuronal fibers are approximated as tubes, it may be easier for water molecules to move along them than across them. This can be sampled with MRI by applying symmetric gradients on each side of the refocusing pulse of a spinecho, thus rephasing stationary spins but dephasing diffusing spins and resulting therefore in signal loss in relation to diffusion. By repeating this measurement multiple times along several orientations, an orientation-specific measure of diffusion can be extracted. Multiple models have been suggested to extract measures of microstructure based on this basic idea, including measures of fiber orientation and dispersion as well as axonal density (Hagmann et al., 2006).

3.2 qMRI at UHF From the above, it is evident that modeling approaches based on qMRI can allow us to approximate multiple features of the brain’s microstructure. Importantly, the recent popularization of 7 T can greatly facilitate this through SNR gains and possible gains in effective SNR through increased parallel transmission and reception capability due to the increased RF inhomogeneity (Pohmann et al., 2016). qMRI scans at UHF can reach resolutions as high as 200–400 μm in vivo, which puts us within reach of the resolution needed to image several of the building blocks of the brain (Trampel et al., 2019). For example, most of the cortex is 2–4 mm thick and consists of six layers. High-resolution T1 maps and phase images have been shown to differentiate layers based on the underlying myelination and iron deposition (Fig. 14; Duyn et al., 2007). Similar results have been reported for the cortex of the cerebellum, as well as for several subcortical nuclei. Beyond the SNR and parallel imaging gains, qMRI in UHF greatly benefits from the increased susceptibility sensitivity (Budde et al., 2011). This has increased interest in techniques like QSM. Furthermore, the increased chemical shift at UHF allows for more accurate off-resonance saturation thus facilitating MT imaging. MT imaging is further facilitated by the longer T1 at UHF that allows over a longer period and thus can increase the MT contrast (Jiang et al., 2017). However, the shorter T2 at UHF results in increased signal loss due to dephasing. This reduces the time over which we can detect diffusion and thus restricts DWI approaches. Furthermore, the roughly quadratically increasing SAR with field strength severely hinders RF-intense approaches, including SE-based approaches for DWI (diffusionweighted imaging) or the application of off-resonance or inversion pulses for MT and T1 mapping, respectively. The SAR increase can be partially ameliorated by tailored acquisition schemes, including parallel imaging approaches, three-

174

N. Priovoulos et al.

Fig. 14 Between-layer contrast in the phase image of a gradient echo in the neocortex (a) and the cerebellum (b). A. A dark band, resembling the stria of Gennari can be observed (yellow box, black arrows). (The image is copied with permission from Duyn et al. (2007), Copyright (2007) National Academy of Sciences, U.S.A.)). (b) A black band, resembling the granular layer of the cerebellar cortex can be observed (white arrow, 1) compared to the molecular layer (white arrow, 2) and white matter (white arrow, 3). (The image is copied with permission from Marques et al. (2010b); Priovoulos et al. (2023))

dimensional imaging, or the efficient usage of multi-echo readouts. One such example from our work allows the acquisition of T1, T2*, and QSM information in a single scan, quickly covering a big part of the qMRI regime, while minimizing inter-scan motion effects (Caan et al., 2019). UHF acquisitions severely suffer from RF inhomogeneities, which can be challenging to correct in post-processing. qMRI measures are typically extracted by two or more images that have the same inhomogeneity profile, which allows for the bias to be easily modeled out. This has resulted in the popularization of qMRI approaches in UHF, even if there is no interest in quantifying the underlying microstructure (Haast et al., 2016). Such an example is the MP2RAGE, which is regularly used to produce a synthetic T1-weighted image that shows greatly reduced bias and optimal white/gray matter contrast from the combination of two images at different inversion times (Marques et al., 2010a). Additionally, the fact that in several qMRI techniques multiple images are acquired (with a partially-redundant signal between them) can be leveraged to increase SNR either by averaging or by estimating and removing noise (Bazin et al., 2019; Manjon et al., 2013). To successfully link qMRI measures with features like the layer structure, high spatial resolution acquisitions are required. In this case, the voxel size becomes comparable to even relatively small displacements, such as from physiological procedures, reducing the effective resolution that can be achieved. This is exacerbated by the longer time needed to acquire multiple and high-resolution images, which increases the probability of motion due to noncompliance. Motion correction

Ultrahigh Field Magnetic Resonance Imaging for Model-Based Neuroscience

175

techniques are therefore needed to reap the resolution benefits of UHF. This can require dedicated hardware (such as optical cameras) or acquisition modifications (e.g., the concomitant acquisition of a fat image to correct online for rigid head motion; Skare et al., 2015; Gallichan et al., 2016). High-resolution data can be challenging to process as well: most processing pipelines, for example, for skullstripping, segmentations, or registrations, were developed for lower-resolution data and do not efficiently handle the increased data size (Bazin et al., 2014). Furthermore, thin structures that are less visible at low resolutions, such as veins, arteries, or the dura, can bias any of the above operations at higher resolution. Finally, the need to preserve the high spatial resolution may discourage explicit or implicit smoothing, including interpolations, with the analysis focus switching to individual (instead of group) level.

4 Structure-Function Relationships for Modeling Whether probing function or microstructure, UHF MRI brings the anatomy back onto a center stage. Other lines of research have recently outlined the strong relationships in play: cortical gradients of resting-state functional connectivity indicating functional abstraction follow closely myelin patterns (Margulies et al., 2016). Further correspondences with gene expression, temporal hierarchies, and the timing of brain area development strengthen this observation (for a review, see Huntenburg et al. (2018)). Myelination in particular is a process that is constantly changing as a result of neural experience and may provide an essential blueprint for cortical microcircuits (Turner, 2019). Examples of myelin-function relationships can be found all over the brain, from the deep nuclei (Alkemade et al., 2019) to the cerebellar (Boillat et al., 2018) and cerebral cortices (Tardif et al., 2015), where most success in demarcating regions has been achieved in the well-studied and heavily myelinated regions of the visual (Fracasso et al., 2016), auditory (de Martino et al., 2015), and somatomotor cortex (Sánchez-Panchuelo et al., 2014). Connectivity, whether defined structurally with diffusion MRI or functionally with resting-state fMRI, provides direct information about the interactions between distant areas, and UHF resolutions may provide the required level of detail to image and test experimentally functional systems (see, for example, Steele et al. (2016) and Sitek et al. (2019)). Combined with laminar fMRI techniques, these may even provide empirical measures of directionality to build more informed computational models in the cortex (Stephan et al., 2019). In summary, UHF MRI and fMRI have many advantages for neuroscience. With increased SNR, more diverse imaging contrasts, and a resolution in the mesoscale, it brings many new tools for scientists to map and interrogate functional systems in humans. On the other hand, these new capabilities come with increased constraints and artifacts, including a relatively limited set of specially adapted computational analysis tools, which limits translation from demonstrating imaging feats to answering neuroscientific questions. The increasing availability of 7 T scanners

176

N. Priovoulos et al.

worldwide will likely close the gap in the coming years, and this current limitation may well be an opportunity to take a new look at neuroimaging practices and integrate more cognitive models into the analysis framework, whether by correlating interindividual differences between MRI and behavioral model parameters, building hierarchical joint models of imaged signals and cognitive models, or embedding cognitive architecture models into their individually derived anatomical substrate.

References Alkemade, A., de Hollander, G., Miletic, S., Keuken, M. C., Balesar, R., de Boer, O., Swaab, D. F., & Forstmann, B. U. (2019). The functional microscopic neuroanatomy of the human subthalamic nucleus. Brain Structure & Function, 224(9), 3213–3227. https://doi.org/10.1007/ s00429-019-01960-3 Alsop, D. C., Detre, J. A., Golay, X., Günther, M., Hendrikse, J., Hernandez-Garcia, L., Lu, H., Macintosh, B. J., Parkes, L. M., Smits, M., Van Osch, M. J. P., Wang, D. J. J., Wong, E. C., & Zaharchuk, G. (2015). Recommended implementation of arterial spin-labeled perfusion MRI for clinical applications: A consensus of the ISMRM perfusion study group and the European consortium for ASL in dementia. Magnetic Resonance in Medicine, 73, 102–116. https:// doi.org/10.1002/mrm.25197 Andersson, J. L. R., Skare, S., & Ashburner, J. (2003). How to correct susceptibility distortions in spin-echo echo-planar images: Application to diffusion tensor imaging. NeuroImage, 20(2), 870–888. Arslan, S., Ktena, S. I., Makropoulos, A., Robinson, E. C., Rueckert, D., & Parisot, S. (2018). Human brain mapping: A systematic comparison of parcellation methods for the human cerebral cortex. NeuroImage, 170, 5–30. https://doi.org/10.1016/j.neuroimage.2017.04.014 Barth, M., Meyer, H., Kannengiesser, S. A. R., Polimeni, J. R., Wald, L. L., & Norris, D. G. (2010). T2-weighted 3D fMRI using S2-SSFP at 7 T. Magnetic Resonance in Medicine, 63, 1015–1020. Bazin, P. L., Weiss, M., Dinse, J., Schafer, A., Trampel, R., & Turner, R. (2014). A computational framework for ultra-high resolution cortical segmentation at 7Tesla. NeuroImage, 93(Pt 2), 201–209. Bazin, P. L., Alkemade, A., van der Zwaag, W., Caan, M., Mulder, M., & Forstmann, B. U. (2019). Denoising high-field multi-dimensional MRI with local complex PCA. Frontiers in Neuroscience, 13, 1066. Bogacz, R., Wagenmakers, E. J., Forstmann, B. U., & Nieuwenhuis, S. (2010, Jan). The neural basis of the speed-accuracy tradeoff. Trends in Neurosciences, 33(1), 10–16. https://doi.org/ 10.1016/j.tins.2009.09.002 Boillat, Y., Bazin, P. L., O’Brien, K., Fartaria, M. J., Bonnier, G., Krueger, G., van der Zwaag, W., & Granziera, C. (2018). Surface-based characteristics of the cerebellar cortex visualized with ultra-high field MRI. NeuroImage, 172, 1–8. https://doi.org/10.1016/j.neuroimage.2018.01.016 Borogovac, A., & Asllani, I. (2012). Arterial spin labeling (ASL) fMRI: Advantages, theoretical constrains and experimental challenges in neurosciences. International Journal of Biomedical Imaging, 2012. https://doi.org/10.1155/2012/818456 Brown, R. W., Cheng, Y.-C. N., Haacke, E. M., Thompson, M. R., & Venkatesan, R. (2014). Magnetic resonance imaging: Physical principles and sequence design (2nd ed.). John Wiley & Sons, Inc. Budde, J., Shajan, G., Hoffmann, J., Ugurbil, K., & Pohmann, R. (2011). Human imaging at 9.4 T using T(2) *-, phase-, and susceptibility-weighted contrast. Magnetic Resonance in Medicine, 65, 544–550.

Ultrahigh Field Magnetic Resonance Imaging for Model-Based Neuroscience

177

Buxton, R. B., Frank, L. R., Wong, E. C., Siewert, B., Warach, S., & Edelman, R. R. (1998). A general kinetic model for quantitative perfusion imaging with arterial spin labeling. Magnetic Resonance in Medicine, 40, 383–396. https://doi.org/10.1002/mrm.1910400308 Caan, M. W. A., Bazin, P. L., Marques, J. P., de Hollander, G., Dumoulin, S. O., & van der Zwaag, W. (2019). MP2RAGEME: T1, T2 (*), and QSM mapping in one sequence at 7 tesla. Human Brain Mapping, 40, 1786–1798. Cercignani, M., Dowell, N. G., & Tofts, P. (Eds.). (2018). Quantitative MRI of the brain: Principles of physical measurement (Series in medical physics and biomedical engineering) (2nd ed.). CRC Press, Taylor & Francis Group. Chen, G., Shin, Y.-W., Taylor, P. A., Glen, D. R., Reynolds, R. C., Israel, R. B., & Cox, R. W. (2016). Untangling the relatedness among correlations, part I: Nonparametric approaches to inter-subject correlation analysis at the group level. NeuroImage, 142, 248–259. https://doi.org/ 10.1016/j.neuroimage.2016.05.023 Chen, G., Taylor, P. A., Shin, Y.-W., Reynolds, R. C., & Cox, R. W. (2017). Untangling the relatedness among correlations, part II: Inter-subject correlation group analysis through linear mixed-effects modeling. NeuroImage, 147, 825–840. https://doi.org/10.1016/ j.neuroimage.2016.08.029 Cho, Z. H., Min, H. K., Oh, S. H., Han, J. Y., Park, C. W., Chi, J. G., Kim, Y. B., Paek, S. H., Lozano, A. M., & Lee, K. H. (2010, Sep). Direct visualization of deep brain stimulation targets in Parkinson disease with the use of 7-tesla magnetic resonance imaging. Journal of Neurosurgery, 113(3), 639–647. https://doi.org/10.3171/2010.3.JNS091385 De Martino, F., Zimmermann, J., Muckli, L., Ugurbil, K., Yacoub, E., & Goebel, R. (2013). Cortical depth dependent functional responses in humans at 7T: Improved specificity with 3D GRASE. PLoS One, 8, e60514. De Martino, F., Moerel, M., Ugurbil, K., Formisano, E., & Yacoub, E. (2015, Aug). Less noise, more activation: Multiband acquisition schemes for auditory functional MRI. Magnetic Resonance in Medicine, 74(2), 462–467. https://doi.org/10.1002/mrm.25408 de Hollander, G., Keuken, M. C., & Forstmann, B. U. (2015, Mar 20). The subcortical cocktail problem; mixed signals from the subthalamic nucleus and substantia nigra. PLoS One, 10(3), e0120572. https://doi.org/10.1371/journal.pone.0120572 de Hollander, G., Keuken, M. C., van der Zwaag, W., Forstmann, B. U., & Trampel, R. (2017, Jun). Comparing functional MRI protocols for small, iron-rich basal ganglia nuclei such as the subthalamic nucleus at 7 T and 3 T. Human Brain Mapping, 38(6), 3226–3248. https://doi.org/ 10.1002/hbm.23586 de Zwart, J. A., Silva, A. C., van Gelderen, P., Kellman, P., Fukunaga, M., Chu, R., Koretsky, A. P., Frank, J. A., & Duyn, J. H. (2005). Temporal dynamics of the BOLD fMRI impulse response. NeuroImage, 24, 667–677. Deistung, A., Schafer, A., Schweser, F., Biedermann, U., Turner, R., & Reichenbach, J. R. (2013). Toward in vivo histology: A comparison of quantitative susceptibility mapping (QSM) with magnitude-, phase-, and R2*-imaging at ultra-high magnetic field strength. NeuroImage, 65, 299–314. Deistung, A., Stefanescu, M. R., Ernst, T. M., Schlamann, M., Ladd, M. E., Reichenbach, J. R., & Timmann, D. (2016). Structural and functional magnetic resonance imaging of the cerebellum: Considerations for assessing cerebellar ataxias. Cerebellum (London, England), 15, 21–25. https://doi.org/10.1007/s12311-015-0738-9 Detre, J. A., Rao, H., Wang, D. J. J., Chen, Y. F., & Wang, Z. (2012). Applications of arterial spin labeled MRI in the brain. Journal of Magnetic Resonance Imaging, 35, 1026–1037. https:// doi.org/10.1002/jmri.23581 Dinse, J., Härtwich, N., Waehnert, M. D., Tardif, C. L., Schäfer, A., Geyer, S., Preim, B., Turner, R., & Bazin, P.-L. (2015). A cytoarchitecture-driven myelin model reveals area-specific signatures in human primary and secondary areas using ultra-high resolution in-vivo brain MRI. NeuroImage, 114, 71–87. https://doi.org/10.1016/j.neuroimage.2015.04.023 Dumoulin, S. O., & Wandell, B. A. (2008). Population receptive field estimates in human visual cortex. NeuroImage, 39(2), 647–660.

178

N. Priovoulos et al.

Dumoulin, S. O., Harvey, B. M., Fracasso, A., Zuiderbaan, W., Luijten, P. R., Wandell, B. A., & Petridou, N. (2017). In vivo evidence of functional and anatomical stripe-based subdivisions in human V2 and V3. Scientific Reports, 7(1), 733. https://doi.org/10.1038/s41598-017-00634-6 Dumoulin, S. O., Fracasso, A., van der Zwaag, W., Siero, J. C. W., & Petridou, N. (2018). Ultrahigh field MRI: Advancing systems neuroscience towards mesoscopic human brain function. NeuroImage, 168, 345–357. https://doi.org/10.1016/j.neuroimage.2017.01.028 Duyn, J. H., van Gelderen, P., Li, T. Q., de Zwart, J. A., Koretsky, A. P., & Fukunaga, M. (2007). High-field MRI of brain cortical substructure based on signal phase. Proceedings of the National Academy of Sciences of the United States of America, 104, 11796–11801. Eckhoff, P., Wong-Lin, K. F., & Holmes, P. (2009). Optimality and robustness of a biophysical decision-making model under norepinephrine modulation. The Journal of Neuroscience, 29, 4301–4311. Edwards, L. J., Kirilina, E., Mohammadi, S., & Weiskopf, N. (2018). Microstructural imaging of human neocortex in vivo. NeuroImage, 182, 184–206. Eggenschwiler, F., O’Brien, K. R., Gruetter, R., & Marques, J. P. (2014). Improving T2 -weighted imaging at high field through the use of kT -points. Magnetic Resonance in Medicine, 71(4), 1478–1488. https://doi.org/10.1002/mrm.24805 Eklund, A., Nichols, T. E., & Knutsson, H. (2016). Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. Proceedings of the National Academy of Sciences of the United States of America, 113, 7900–7905. Ernst, T. M., Thürling, M., Müller, S., Kahl, F., Maderwald, S., Schlamann, M., Boele, H. J., Koekkoek, S. K. E., Diedrichsen, J., De Zeeuw, C. I., Ladd, M. E., & Timmann, D. (2017). Modulation of 7 T fMRI signal in the cerebellar cortex and nuclei during acquisition, extinction, and reacquisition of conditioned eyeblink responses. Human Brain Mapping, 38, 3957–3974. https://doi.org/10.1002/hbm.23641 Esteban, O., Markiewicz, C. J., Blair, R. W., Moodie, C. A., Isik, A. I., Erramuzpe, A., Kent, J. D., Goncalves, M., DuPre, E., Snyder, M., Oya, H., Ghosh, S. S., Wright, J., Durnez, J., Poldrack, R. A., & Gorgolewski, K. J. (2019). fMRIPrep: A robust preprocessing pipeline for functional MRI. Nature Methods, 16, 111–116. https://doi.org/10.1038/s41592-018-0235-4 Fartaria, M. J., O’Brien, K., Sorega, ¸ A., Bonnier, G., Roche, A., Falkovskiy, P., Krueger, G., Kober, T., Bach Cuadra, M., & Granziera, C. (2017). An ultra-high field study of cerebellar pathology in early relapsing-remitting multiple sclerosis using MP2RAGE. Investigative Radiology, 52, 265–273. https://doi.org/10.1097/RLI.0000000000000338 Finn, E. S., Huber, L., Jangraw, D. C., Molfese, P. J., & Bandettini, P. A. (2019). Layer-dependent activity in human prefrontal cortex during working memory. Nature Neuroscience, 22, 1687– 1695. https://doi.org/10.1038/s41593-019-0487-z Fracasso, A., van Veluw, S. J., Visser, F., Luijten, P. R., Spliet, W., Zwanenburg, J. J. M., Dumoulin, S. O., & Petridou, N. (2016). Lines of Baillarger in vivo and ex vivo: Myelin contrast across lamina at 7T MRI and histology. NeuroImage, 133, 163–175. https://doi.org/ 10.1016/j.neuroimage.2016.02.072 Friston, K. J., Worsley, K. J., Frackowiak, R. S., Mazziotta, J. C., & Evans, A. C. (1994). Assessing the significance of focal activations using their spatial extent. Human Brain Mapping, 1(3), 210–220. https://doi.org/10.1002/hbm.460010306 Forstmann, B. U., de Hollander, G., Van Maanen, L., Alkemade, A., & Keuken, M. C. (2017). Towards a mechanistic understanding of the human subcortex. Nature Reviews Neuroscience, 18, 67–65. https://doi.org/10.1038/nrn.2016.163 Gallichan, D., Marques, J. P., & Gruetter, R. (2016). Retrospective correction of involuntary microscopic head movement using highly accelerated fat image navigators (3D FatNavs) at 7T: 3D FatNavs for high-resolution retrospective motion correction. Magnetic Resonance in Medicine, 75, 1030–1039. https://doi.org/10.1002/mrm.25670 Gizewski, E. R., de Greiff, A., Maderwald, S., Timmann, D., & Forsting, M. (2007). Ladd ME.fMRI at 7 T: Whole-brain coverage and signal advantages even infratentorially? NeuroImage, 37(3), 761–768. Epub 2007 Jun 14.

Ultrahigh Field Magnetic Resonance Imaging for Model-Based Neuroscience

179

Glover, G. H., Li, T. Q., & Ress, D. (2000). Image-based method for retrospective correction of physiological motion effects in fMRI: RETROICOR. Magnetic Resonance in Medicine, 44(1), 162–167. Goa, P. L. E., Koopmans, P. J., Poser, B. A., Barth, M., & Norris, D. G. (2014). BOLD fMRI signal characteristics of S1- and S2-SSFP at 7 T. Frontiers in Neuroscience, 8, 49. Goense, J., Bohraus, Y., & Logothetis, N. K. (2016). fMRI at high spatial resolution: Implications for BOLD-models. Frontiers in Computational Neuroscience, 10, 66. Graham, M. S., Drobnjak, I., & Zhang, H. (2017). Quantitative assessment of the susceptibility artefact and its interaction with motion in diffusion MRI. PLoS One, 12(10), e0185647. Haak, K. V., Marquand, A. F., & Beckmann, C. F. (2018). Connectopic mapping with resting-state fMRI. NeuroImage, 170, 83–94. https://doi.org/10.1016/j.neuroimage.2017.06.075 Haast, R. A., Ivanov, D., Formisano, E., & Uludag, K. (2016). Reproducibility and reliability of quantitative and weighted T1 and T2(*) mapping for myelin-based cortical parcellation at 7 tesla. Frontiers in Neuroanatomy, 10, 112. Hagmann, P., Jonasson, L., Maeder, P., Thiran, J. P., Wedeen, V. J., & Meuli, R. (2006). Understanding diffusion MR imaging techniques: From scalar diffusion-weighted imaging to diffusion tensor imaging and beyond. Radiographics, 26(Suppl 1), S205–S223. Harvey, B. M., Klein, B. P., Petridou, N., & Dumoulin, S. O. (2013). Topographic representation of numerosity in the human parietal cortex. Science, 341(6150), 1123–1126. https://doi.org/ 10.1126/science.1239052 Henkelman, R. M., Stanisz, G. J., & Graham, S. J. (2001). Magnetization transfer in MRI: A review. NMR in Biomedicine, 14, 57–64. Herculano-Houzel, S. (2012). The remarkable, yet not extraordinary, human brain as a scaled-up primate brain and its associated cost. Proceedings of the National Academy of Sciences of the United States of America, 109, 10661–10668. https://doi.org/10.1073/pnas.1201895109 Huber, L., Uluda˘g, K., & Möller, H. E. (2017). Non-BOLD contrast for laminar fMRI in humans: CBF, CBV, and CMRO2. NeuroImage, 1–19. https://doi.org/10.1016/ j.neuroimage.2017.07.041 Huber, L., Goense, J., Kennerley, A. J., Trampel, R., Guidi, M., Reimer, E., Ivanov, D., Neef, N., Gauthier, C. J., Turner, R., & Möller, H. E. (2015, Feb 15). Cortical lamina-dependent blood volume changes in human brain at 7 T. Neuroimage, 107, 23–33. https://doi.org/10.1016/ j.neuroimage.2014.11.046 Huntenburg, J. M., Bazin, P. L., & Margulies, D. S. (2018, Jan). Large-scale gradients in human cortical organization. Trends in Cognitive Sciences, 22(1), 21–31. https://doi.org/10.1016/ j.tics.2017.11.002 Ivanov, D., Gardumi, A., Haast, R. A. M., Pfeuffer, J., Poser, B. A., & Uluda˘g, K. (2017). Comparison of 3 T and 7 T ASL techniques for concurrent functional perfusion and BOLD studies. NeuroImage, 156, 363–376. https://doi.org/10.1016/j.neuroimage.2017.05.038 Jacobs, H. I. L., Müller-Ehrenberg, L., Priovoulos, N., & Roebroek, A. (2018). Curvilinear locus coeruleus functional connectivity trajectories over the adult lifespan: A 7T MRI study. Neurobiology of Aging, 69, 167–176. Jiang, X., van Gelderen, P., & Duyn, J. H. (2017). Spectral characteristics of semisolid protons in human brain white matter at 7 T. Magnetic Resonance in Medicine, 78, 1950–1958. Jin, T., & Kim, S.-G. (2008). Cortical layer-dependent dynamic blood oxygenation, cerebral blood flow and cerebral blood volume responses during visual stimulation. NeuroImage, 43, 1–9. Kashyap, S., Ivanov, D., Havlicek, M., Poser, B., & Uludag, K. (2019). Laminar CBF and BOLD fMRI in the human visual cortex using arterial spin labelling at 7T. In Proc. 27th Sci. Meet. ISMRM 609. Katscher, U., & Bornert, P. (2006). Parallel RF transmission in MRI. NMR in Biomedicine, 19, 393–400. Keuken, M. C., Van Maanen, L., Bogacz, R., Schäfer, A., Neumann, J., Turner, R., & Forstmann, B. U. (2015, Oct). The subthalamic nucleus during decision-making with multiple alternatives. Human Brain Mapping, 36(10), 4041–4052. https://doi.org/10.1002/hbm.22896

180

N. Priovoulos et al.

Keuken, M. C., Bazin, P. L., Backhouse, K., Beekhuizen, S., Himmer, L., Kandola, A., Lafeber, J. J., Prochazkova, L., Trutti, A., Schäfer, A., Turner, R., & Forstmann, B. U. (2017, Aug). Effects of aging on T1 , T2 *, and QSM MRI values in the subcortex. Brain Structure and Function, 222(6), 2487–2505. https://doi.org/10.1007/s00429-016-1352-4 Keuken, M. C., Isaacs, B. R., Trampel, R., van der Zwaag, W., & Forstmann, B. U. (2018, Jul) Visualizing the human subcortex using ultra-high field magnetic resonance imaging. Brain Topogr, 31(4), 513–545. https://doi.org/10.1007/s10548-018-0638-7 Koopmans, P. J., Barth, M., Orzada, S., & Norris, D. G. (2011, Jun 1). Multi-echo fMRI of the cortical laminae in humans at 7 T. Neuroimage, 56(3), 1276–1285. https://doi.org/10.1016/ j.neuroimage.2011.02.042 King, M., Hernandez-Castillo, C. R., Poldrack, R. A., Ivry, R. B., & Diedrichsen, J. (2019). Functional boundaries in the human cerebellum revealed by a multi-domain task battery. Nature Neuroscience, 22(8), 1371–1378. https://doi.org/10.1038/s41593-019-0436-x Krüger, G., & Glover, G. H. (2001). Physiological noise in oxygenation-sensitive magnetic resonance imaging. Magnetic Resonance in Medicine, 46(4), 631–637. Küper, M., Kaschani, P., Thürling, M., Stefanescu, M. R., Burciu, R. G., Göricke, S., Maderwald, S., Ladd, M. E., Hautzel, H., & Timmann, D. (2016). Cerebellar fMRI activation increases with increasing working memory demands. Cerebellum (London, England), 15, 322–335. https:// doi.org/10.1007/s12311-015-0703-7 Kwong, K. K., Belliveau, J. W., Chesler, D. A., Goldberg, I. E., Weisskoff, R. M., Poncelet, B. P., Kennedy, D. N., Hoppel, B. E., Cohen, M. S., Turner, R., et al. (1992). Dynamic magnetic resonance imaging of human brain activity during primary sensory stimulation. Proceedings of the National Academy of Sciences of the United States of America, 89(12), 5675–5679. Logothetis, N. K. (2008). What we can do and what we cannot do with fMRI. Nature, 453, 869– 878. Lohmann, G., Stelzer, J., Lacosse, E., Kumar, V. J., Mueller, K., Kuehn, E., Grodd, W., & Scheffler, K. (2018). LISA improves statistical analysis for fMRI. Nature Communications, 9. https:// doi.org/10.1038/s41467-018-06304-z Lu, H., Hua, J., & van Zijl, P. C. (2013, Aug). Noninvasive functional imaging of cerebral blood volume with vascular-space-occupancy (VASO) MRI. NMR in Biomedicine, 26(8), 932–948. https://doi.org/10.1002/nbm.2905 Manjon, J. V., Coupe, P., Concha, L., Buades, A., Collins, D. L., & Robles, M. (2013). Diffusion weighted image denoising using overcomplete local PCA. PLoS One, 8, e73021. Manto, M., Bower, J. M., Conforto, A. B., Delgado-García, J. M., da Guarda, S. N. F., Gerwig, M., Tesche, C. D., Tilikete, C., & Timmann, D. (2012). Consensus paper: Roles of the cerebellum in motor control—The diversity of ideas on cerebellar involvement in movement. Cerebellum (London, England), 11, 457–487. https://doi.org/10.1007/s12311-011-0331-9 Markuerkiaga, I., Barth, M., & Norris, D. G. (2016). A cortical vascular model for examining the specificity of the laminar BOLD signal. NeuroImage, 132, 491–498. Margulies, D. S., Ghosh, S. S., Goulas, A., Falkiewicz, M., Huntenburg, J. M., Langs, G., Bezgin, G., Eickhoff, S. B., Castellanos, F. X., Petrides, M., Jefferies, E., & Smallwood, J. (2016, Nov 1). Situating the default-mode network along a principal gradient of macroscale cortical organization. Proceedings of the National Academy of Sciences of the United States of America, 113(44), 12574–12579. https://doi.org/10.1073/pnas.1608282113 Marques, J. P., & Norris, D. G. (2018). How to choose the right MR sequence for your research question at 7T and above? NeuroImage, 168, 119–140. https://doi.org/10.1016/ j.neuroimage.2017.04.044 Marques, J. P., Kober, T., Krueger, G., van der Zwaag, W., Van de Moortele, P. F., & Gruetter, R. (2010a). MP2RAGE, a self bias-field corrected sequence for improved segmentation and T1-mapping at high field. NeuroImage, 49, 1271–1281. Marques, J. P., van der Zwaag, W., Granziera, C., Krueger, G., & Gruetter, R. (2010b). Cerebellar cortical layers: In vivo visualization with structural high-field-strength MR imaging. Radiology, 254, 942–948.

Ultrahigh Field Magnetic Resonance Imaging for Model-Based Neuroscience

181

McRobbie, D. W., Moore, E. A., & Graves, M. J. (2017). MRI from picture to proton (3rd ed.). University Printing House, Cambridge University Press. Menon, R. S. (2002). Postacquisition suppression of large-vessel BOLD signals in high-resolution fMRI. Magnetic Resonance in Medicine, 47, 1–9. Mezer, A., Yeatman, J. D., Stikov, N., Kay, K. N., Cho, N.-J., Dougherty, R. F., Perry, M. L., Parvizi, J., Hua, L. H., Butts-Pauly, K., & Wandell, B. A. (2013). Quantifying the local tissue volume and composition in individual brains with magnetic resonance imaging. Nature Medicine, 19, 1667–1672. https://doi.org/10.1038/nm.3390 Mink, J. W. (1996, Nov). The basal ganglia: focused selection and inhibition of competing motor programs. Progress in Neurobiology, 50(4), 381–425. https://doi.org/10.1016/s03010082(96)00042-1 Mougin, O., Abdel-Fahim, R., Dineen, R., Pitiot, A., Evangelou, N., & Gowland, P. (2016). Imaging gray matter with concomitant null point imaging from the phase sensitive inversion recovery sequence. Magnetic Resonance in Medicine, 76(5), 1512–1516. https://doi.org/ 10.1002/mrm.26061 Nambu, A., Tokuno, H., & Takada, M. (2002, Jun). Functional significance of the corticosubthalamo-pallidal ‘hyperdirect’ pathway. Neuroscience Research, 43(2), 111–117. https:// doi.org/10.1016/s0168-0102(02)00027-5 Nasr, S., Polimeni, J. R., & Tootell, R. B. (2016). Interdigitated color- and disparity-selective columns within human visual cortical areas V2 and V3. The Journal of Neuroscience, 36, 1841– 1857. O’Brien, K. R., Kober, T., Hagmann, P., Maeder, P., Marques, J., Lazeyras, F., Krueger, G., & Roche, A. (2014). Robust T1-weighted structural brain imaging and morphometry at 7T using MP2RAGE. PLoS One, 9, e99676. https://doi.org/10.1371/journal.pone.0099676 Ogawa, S., Lee, T. M., Kay, A. R., & Tank, D. W. (1990). Brain magnetic resonance imaging with contrast dependent on blood oxygenation. Proceedings of the National Academy of Sciences of the United States of America, 87(24), 9868–9872. Ogawa, S., Menon, R. S., Tank, D. W., Kim, S. G., Merkle, H., Ellermann, J. M., & Ugurbil, K. (1993). Functional brain mapping by blood oxygenation level-dependent contrast magnetic resonance imaging. A comparison of signal characteristics with a biophysical model. Biophysical Journal, 64(3), 803–812. Padormo, F., Beqiri, A., Hajnal, J. V., & Malik, S. J. (2016). Parallel transmission for ultrahigh-field imaging. NMR in Biomedicine, 29(9), 1145–1161. https://doi.org/10.1002/nbm.3313 Peters, A. M., Brookes, M. J., Hoogenraad, F. G., Gowland, P. A., Francis, S. T., Morris, P. G., & Bowtell, R. (2007). T2* measurements in human brain at 1.5, 3 and 7 T. Magnetic Resonance Imaging, 25(6), 748–753. Pohmann, R., Speck, O., & Scheffler, K. (2016). Signal-to-noise ratio and MR tissue parameters in human brain imaging at 3, 7, and 9.4 tesla using current receive coil arrays. Magnetic Resonance in Medicine, 75, 801–809. Poldrack, R. A., Nichols, T., & Mumford, J. (2011). Handbook of functional MRI data analysis. Cambridge University Press. https://doi.org/10.1017/CBO9780511895029 Polimeni, J. R., Renvall, V., Zaretskaya, N., & Fischl, B. (2018). Analysis strategies for high-resolution UHF-fMRI data. NeuroImage, 168, 296–320. https://doi.org/10.1016/ j.neuroimage.2017.04.053 Pollock, J. M., Tan, H., Kraft, R. A., Whitlow, C. T., Burdette, J. H., & Maldjian, J. A. (2009). Arterial spin-labeled MR perfusion imaging: Clinical applications. Magnetic Resonance Imaging Clinics of North America, 17, 315–338. https://doi.org/10.1016/j.mric.2009.01.008 Priovoulos, N., Jacobs, H. I. L., Ivanov, D., Uludag, K., Verhey, F. R. J., & Poser, B. A. (2018). High-resolution in vivo imaging of human locus coeruleus by magnetization transfer MRI at 3T and 7T. NeuroImage, 168, 427–436. Priovoulos, N., Verhey, F., Poser, B., Napadow, V., Sclocco, R., Ivanov, D., & Jacobs, H. I. L. (2019a). Respiratory-gated auricular vagal afferent nerve stimulation modulates the locus coeruleus in aged adults. In Proceedings of the organization for human brain mapping, Rome, p. Th069.

182

N. Priovoulos et al.

Priovoulos, N., Jacobs, P. B., Ivanov, D., Pagen, L., & Uludag, K. (2019b). Locus coeruleus and parasympathetic network interactions revealed with fMRI at 7T during memory. In Proceedings of the organization for human brain mapping, Rome, p. W673. Priovoulos, N., Andersen M., Dumoulin, S. O., Boer, V. O., & van der Zwaag, W. (2023). High-resolution motion-corrected 7.0-T MRI to derive morphologic measures from the human cerebellum in Vivo. Radiology 307, 200–205. Sales, A. C., Friston, K. J., Jones, M. W., Pickering, A. E., & Moran, R. J. (2019). Locus Coeruleus tracking of prediction errors optimises cognitive flexibility: An active inference model. PLoS Computational Biology, 15, e1006267. Salimi-Khorshidi, G., Douaud, G., Beckmann, C. F., Glasser, M. F., Griffanti, L., & Smith, S. M. (2014). Automatic denoising of functional MRI data: Combining independent component analysis and hierarchical fusion of classifiers. NeuroImage, 90, 449–468. Sánchez-Panchuelo, R. M., Besle, J., Mougin, O., Gowland, P., Bowtell, R., Schluppeck, D., & Francis, S. (2014). Regional structural differences across functionally parcellated Brodmann areas of human primary somatosensory cortex. NeuroImage, 93(Pt 2), 221–230. https://doi.org/ 10.1016/j.neuroimage.2013.03.044 Sati, P., van Gelderen, P., Silva, A. C., Reich, D. S., Merkle, H., de Zwart, J. A., & Duyn, J. H. (2013). Micro-compartment specific T2* relaxation in the brain. NeuroImage, 77, 268–278. Sclocco, R., Garcia, R. G., Kettner, N. W., Isenburg, K., Fisher, H. P., Hubbard, C. S., Ay, I., Polimeni, J. R., Goldstein, J., Makris, N., Toschi, N., Barbieri, R., & Napadow, V. (2019). The influence of respiration on brainstem and cardiovagal response to auricular vagus nerve stimulation: A multimodal ultrahigh-field (7T) fMRI study. Brain Stimulation, 12(4), 911–921. Seo, D. O., & Bruchas, M. R. (2017). Polymorphic computation in locus coeruleus networks. Nature Neuroscience, 20, 1517–1519. Shmuel, A., Yacoub, E., Chaimow, D., Logothetis, N. K., & Ugurbil, K. (2007). Spatio-temporal point-spread function of fMRI signal in human gray matter at 7 Tesla. NeuroImage, 35, 539– 552. Siero, J. C., Petridou, N., Hoogduin, H., Luijten, P. R., & Ramsey, N. F. (2011). Cortical depthdependent temporal dynamics of the BOLD response in the human brain. Journal of Cerebral Blood Flow and Metabolism, 31, 1999–2008. Siero, J. C., Ramsey, N. F., Hoogduin, H., Klomp, D. W., Luijten, P. R., & Petridou, N. (2013). BOLD specificity and dynamics evaluated in humans at 7 T: Comparing gradient-echo and spin-echo hemodynamic responses. PLoS One, 8, e54560. Sitek, K. R., Gulban, O. F., Calabrese, E., Johnson, G. A., Lage-Castellanos, A., Moerel, M., Ghosh, S. S., & De Martino, F. (2019). Mapping the human subcortical auditory system using histology, post mortem MRI and in vivo MRI at 7T. bioRxiv. https://doi.org/10.1101/568139 Skare, S., Hartwig, A., Martensson, M., Avventi, E., & Engstrom, M. (2015). Properties of a 2D fat navigator for prospective image domain correction of nodding motion in brain MRI. Magnetic Resonance in Medicine, 73, 1110–1119. Steele, C. J., Anwander, A., Bazin, P.-L., Trampel, R., Schaefer, A., Turner, R., Ramnani, N., & Villringer, A. (2016). Human cerebellar sub-millimeter diffusion imaging reveals the motor and non-motor topography of the dentate nucleus. Cerebral Cortex. https://doi.org/10.1093/cercor/ bhw258 Stephan, K. E., Petzschner, F. H., Kasper, L., Bayer, J., Wellstein, K. V., Stefanics, G., Pruessmann, K. P., & Heinzle, J. (2019). Laminar fMRI and computational theories of brain function. NeuroImage, 197, 699–706. https://doi.org/10.1016/j.neuroimage.2017.11.001 Stoodley, C. J. (2012). The cerebellum and cognition: Evidence from functional imaging studies. Cerebellum (London, England), 11, 352–365. https://doi.org/10.1007/s12311-011-0260-7 Tabelow, K., Balteau, E., Ashburner, J., Callaghan, M. F., Draganski, B., Helms, G., Kherif, F., Leutritz, T., Lutti, A., Phillips, C., Reimer, E., Ruthotto, L., Seif, M., Weiskopf, N., Ziegler, G., & Mohammadi, S. (2019). hMRI – A toolbox for quantitative MRI in neuroscience and clinical research. NeuroImage, 194, 191–210.

Ultrahigh Field Magnetic Resonance Imaging for Model-Based Neuroscience

183

Tardif, C. L., Schäfer, A., Waehnert, M., Dinse, J., Turner, R., & Bazin, P.-L. (2015). Multi-contrast multi-scale surface registration for improved alignment of cortical areas. NeuroImage, 111, 107–122. https://doi.org/10.1016/j.neuroimage.2015.02.005 Teeuwisse, W. M., Brink, W. M., & Webb, A. G. (2012). Quantitative assessment of the effects of high-permittivity pads in 7 Tesla MRI of the brain. Magnetic Resonance in Medicine, 67(5), 1285–1293. https://doi.org/10.1002/mrm.23108 Thomas, J. M., Huber, E., Stecker, G. C., Boynton, G. M., Saenz, M., & Fine, I. (2015). Population receptive field estimates of human auditory cortex. NeuroImage, 105, 428–439. https://doi.org/ 10.1016/j.neuroimage.2014.10.060 Trampel, R., Bazin, P. L., Pine, K., & Weiskopf, N. (2019). In-vivo magnetic resonance imaging (MRI) of laminae in the human cortex. NeuroImage, 197, 707–715. Triantafyllou, C., Hoge, R. D., Krueger, G., Wiggins, C. J., Potthast, A., Wiggins, G. C., & Wald, L. L. (2005, May 15). Comparison of physiological noise at 1.5 T, 3 T and 7 T and optimization of fMRI acquisition parameters. Neuroimage, 26(1), 243–250. https://doi.org/ 10.1016/j.neuroimage.2005.01.007 Turner, R. (2002). How much cortex can a vein drain? Downstream dilution of activation- related cerebral blood oxygenation changes. NeuroImage, 16, 1062–1067. Turner, R. (2019, May 8). Myelin and modeling: Bootstrapping cortical microcircuits. Front Neural Circuits, 13, 34. https://doi.org/10.3389/fncir.2019.00034. PMID: 31133821; PMCID: PMC6517540. Turner, R., & Geyer, S. (2014). Comparing like with like: The power of knowing where you are. Brain Connectivity, 4(7), 547–557. https://doi.org/10.1089/brain.2014.0261. Epub 2014 Aug. Uluda˘g, K., Müller-Bierl, B., & U˘gurbil, K. (2009). An integrative model for neuronal activityinduced signal changes for gradient and spin echo functional imaging. NeuroImage, 48, 150– 165. Vaidya, M. V., Lazar, M., Deniz, C. M., Haemer, G. G., Chen, G., Bruno, M., Sodickson, D. K., Lattanzi, R., & Collins, C. M. (2018). Improved detection of fMRI activation in the cerebellum at 7T with dielectric pads extending the imaging region of a commercial head coil. Journal of Magnetic Resonance Imaging, 48(2), 431–440. https://doi.org/10.1002/jmri.25936 van der Zwaag, W., Francis, S., Head, K., Peters, A., Gowland, P., Morris, P., & Bowtell, R. (2009). fMRI at 1.5, 3 and 7 T: Characterising BOLD signal changes. NeuroImage, 47(4), 1425–1434. https://doi.org/10.1016/j.neuroimage.2009.05.015 van der Zwaag, W., Jorge, J., Butticaz, D., & Gruetter, R. (2015). Physiological noise in human cerebellar fMRI. Magma, 28(5), 485–492. https://doi.org/10.1007/s10334-015-0483-6 van Gelderen, P., de Zwart, J. A., Starewicz, P., Hinks, R. S., & Duyn, J. H. (2007). Realtime shimming to compensate for respiration-induced B0 fluctuations. Magnetic Resonance in Medicine, 57, 362–368. van Gelderen, P., Jiang, X., & Duyn, J. H. (2016). Effects of magnetization transfer on T1 contrast in human brain white matter. NeuroImage, 128, 85–95. van Es, D. M., van der Zwaag, W., & Knapen, T. (2019, May 20). Topographic maps of visual space in the human cerebellum. Current Biology, 29(10), 1689–1694.e3. https://doi.org/10.1016/ j.cub.2019.04.012 Wang, Y., & Liu, T. (2015). Quantitative susceptibility mapping (QSM): Decoding MRI data for a tissue magnetic biomarker. Magnetic Resonance in Medicine, 73, 82–101. Weiskopf, N., Suckling, J., Williams, G., Correia, M. M., Inkster, B., Tait, R., Ooi, C., Bullmore, E. T., & Lutti, A. (2013). Quantitative multi-parameter mapping of R1, PD(*), MT, and R2(*) at 3T: A multi-center validation. Frontiers in Neuroscience, 7, 95. Westbrook, C., & Talbot, J. (2018). MRI in practice (5th ed.). Wiley. Wintermark, M., Sesay, M., Barbier, E., Borbély, K., Dillon, W. P. P., Eastwood, J. D. D., Glenn, T. C. C., Grandin, C. B. B., Pedraza, S., Soustiel, J. F. J.-F., Nariai, T., Zaharchuk, G., Caillé, J. M., Dousset, V., Yonas, H., Borbely, K., Dillon, W. P. P., Eastwood, J. D. D., Glenn, T. C. C., Grandin, C. B. B., Pedraza, S., Soustiel, J. F. J.-F., Nariai, T., Zaharchuk, G., Caille, J.-M., Dousset, V., & Yonas, H. (2005). Comparative overview of brain perfusion imaging techniques. Stroke, 36, e83–e99. https://doi.org/10.1161/01.STR.0000177884.72657.8b

184

N. Priovoulos et al.

Wright, P. J., Mougin, O. E., Totman, J. J., Peters, A. M., Brookes, M. J., Coxon, R., Morris, P. E., Clemence, M., Francis, S. T., Bowtell, R. W., & Gowland, P. A. (2008). Water proton T1 measurements in brain tissue at 7, 3, and 1.5 T using IR-EPI, IR-TSE, and MPRAGE: Results and optimization. Magma, 21(1–2), 121–130. https://doi.org/10.1007/s10334-008-0104-8 Xu, G., Rowley, H. A., Wu, G., Alsop, D. C., Shankaranarayanan, A., Dowling, M., Christian, B. T., Oakes, T. R., & Johnson, S. C. (2009). Reliability and precision of pseudo-continuous arterial spin labeling perfusion MRI on 3.0 T and comparison with 15 O-water PET in elderly subjects at risk for Alzheimer’s disease. NMR in Biomedicine, 23, n/a–n/a. https://doi.org/ 10.1002/nbm.1462 Yacoub, E., Harel, N., & Ugurbil, K. (2008). High-field fMRI unveils orientation columns in humans. Proceedings of the National Academy of Sciences of the United States of America, 105, 10607–10612. Zimmermann, J., Goebel, R., De Martino, F., van de Moortele, P.-F., Feinberg, D., Adriany, G., Chaimow, D., Shmuel, A., U˘gurbil, K., & Yacoub, E. (2011). Mapping the organization of axis of motion selective features in human area MT using high-field fMRI. PLoS One, 6, e28716.

An Introduction to EEG/MEG for Model-Based Cognitive Neuroscience Bernadette C. M. van Wijk

Abstract Electroencephalography (EEG) and magnetoencephalography (MEG) are closely related non-invasive recording techniques that are of great value in cognitive neuroscience studies with human participants. They are sensitive to, respectively, changes in electric and magnetic fields that are generated by postsynaptic currents on spatially aligned pyramidal cells in cortex. The superior time resolution of EEG/MEG compared to functional magnetic resonance imaging (fMRI) allows for tracking changes in neural activity during the subject’s performance of a behavioural task with high temporal precision. This chapter introduces the biophysics behind EEG/MEG recordings, discusses practical issues when conducting an experiment, and highlights the most important signal features for the field of cognitive neuroscience. Furthermore, it illustrates how the combination of cognitive modelling and EEG/MEG may aid a meaningful interpretation of experimental findings. Keywords Electroencephalography · Magnetoencephalography · Signal processing

1 Introduction In 1780, Luigi Galvani (1737–1798) discovered that electrical stimulation could induce leg contractions in a dead frog. His observations were to be the start of an interest in bio-electricity that continues to the present day. The introduction of non-polarizable electrodes made it possible for pioneers like Richard Caton (1842–1926) and Vasili Danilevsky (1852–1939) to record electrical activity from the surface of exposed animal brains (Schomer & Lopes da Silva, 2017). Like EEG/MEG studies today (Fig. 1), these early studies tried to link observations

B. C. M. van Wijk () Department of Human Movement Sciences, Vrije University of Amsterdam, Amsterdam, The Netherlands e-mail: [email protected] © Springer Nature Switzerland AG 2024 B. U. Forstmann, B. M. Turner (eds.), An Introduction to Model-Based Cognitive Neuroscience, https://doi.org/10.1007/978-3-031-45271-0_8

185

186

B. C. M. van Wijk

Fig. 1 Examples of modern experimental set-ups for EEG and MEG studies in cognitive neuroscience. Relatively simple paradigms are frequently used to link measured brain activity with parts of the experimental task. Many studies make use of visual stimuli that are presented on a monitor in front of the subject and/or audio stimuli via earphones or nearby loudspeakers. Left: EEG in combination with a remote eye tracking system. (Source: https://www.tobiipro. com/product-listing/StimTracker/). Right: a subject responds with a button press during an MEG experiment. (Source: National Institute of Mental Health Image library, http://infocenter.nimh.nih. gov/il/public_il/image_details.cfm?id=80)

of spontaneous and evoked electrical brain activity to mental states and external events. It was Adolf Beck (1863–1939) who first described the disappearance of rhythmic oscillations upon stimulating the eyes of dogs and rabbits with light. Due to technical improvements of the galvanometer, Hans Berger (1893–1941) succeeded in replicating this finding non-invasively in humans for the first time (Berger, 1929). Berger is therefore widely recognized as the inventor of human electroencephalography (EEG). Ever since his work, both hardware and signal processing techniques have advanced considerably, and although alternative functional neuroimaging techniques have emerged, EEG remains widely used by scientific researchers and clinicians today. The first recordings of magnetic fields outside the scalp induced by the alpha rhythm were performed by David Cohen in the 1960s (Cohen, 1968). These were obtained with a copper induction coil with one million turns around a ferrite rod core as measurement device. Like present MEG experiments, the recordings were performed in a magnetically shielded room, but it was still necessary to perform elaborate additional measurements to remove background noise and to co-record EEG as a reference signal in order to detect traces of the alpha rhythm. Soon after, the development of superconducting quantum interference devices (SQUIDs) allowed for recordings with much higher sensitivity (Cohen, 1972). Even so, early

An Introduction to EEG/MEG for Model-Based Cognitive Neuroscience

187

systems still comprised of only one sensor. Whole-head recordings were extremely cumbersome as they had to be obtained through repeated measurements with the sensor placed at different scalp locations. Systems with multiple sensors covering the entire scalp were introduced in the 1990s, thereby accelerating the discovery of various neural markers of cognition (Hari & Salmelin, 2012). There are currently about 200 whole-head SQUID-based MEG systems worldwide that are being used for scientific and/or clinical purposes (Hari et al., 2018). Several factors contribute to the popularity of EEG/MEG as recording techniques in cognitive neuroscience studies. Firstly, their sub-millisecond time resolution allows for studying cognitive processes as they unfold. Secondly, recorded signals are sensitive to ionic currents induced by post-synaptic potentials and are therefore a direct reflection of neural activity. Furthermore, EEG equipment is relatively cheap to purchase, use, and maintain. Last but not least, EEG and MEG are both noninvasive techniques that are safe to use in all human subjects. These arguments are often compared to the use of fMRI. In contrast to EEG/MEG, fMRI has better spatial resolution but poorer temporal resolution and only indirectly reflects neural activity through changes in blood oxygen levels, and the use of strong magnetic fields may be unsafe for subjects with metal implants. Another comparison is often made with invasive techniques such as electrocorticography (ECoG) and local field potential (LFP) recordings. On the one hand, the larger distance to the brain tissue for EEG/MEG means that its signal-to-noise ratio is considerably lower and recorded signals contain a stronger mixture of source activities. On the other hand, EEG/MEG allows for wider coverage of the brain as invasive recordings are often placed only in specific locations. The advantages and disadvantages of EEG/MEG are summarized in Table 1 and will be further discussed in subsequent sections.

2 What Are We Measuring with EEG/MEG? Researchers in cognitive neuroscience often use EEG/MEG with the aim of gaining insight into the neural mechanisms underlying cognitive processes. In order to translate experimental findings into theories of brain function, it is crucial to understand what we are actually measuring with these techniques and, equally important, what we are not measuring. For this we need to dive into the biophysics of electrical signalling in neurons. During rest, neurons maintain a resting membrane potential of about −70 mV, meaning that there is net accumulation of negatively charged ions inside the cell compared to outside. Communication between neurons occurs via action potentials, which are brief electrical impulses in the form of a rapid depolarization and repolarization of the membrane potential. Action potentials travel down the neuron’s axon to its synaptic terminals where they trigger a release of neurotransmitters. These neurotransmitters bind to the receptors of specific ion channels on the membrane of a post-synaptic neuron. This binding opens the channel for ions to flow through. Depending on the type of neurotransmitter that is released, either

188

B. C. M. van Wijk

Table 1 Advantages and disadvantages of EEG/MEG Advantages of EEG/MEG High temporal resolution: It allows for studying neural activity as cognitive processes unfold.

Disadvantages of EEG/MEG Poor spatial resolution: Signals contain a mixture of activity from different neural sources.

Direct marker of neural activity as signals are mainly generated by post-synaptic currents.

It cannot detect firing rates, distinguish between excitation and inhibition, and has low sensitivity to activity from most subcortical structures.

Electrode/sensor arrays cover the entire cortex. Non-invasive and therefore attractive for studying human cognition. EEG equipment is relatively cheap and easy to operate. Subject preparation time for MEG is short. Subjects who are not compatible to undergo magnetic resonance imaging can still safely be recorded with EEG (or MEG). Data are high-dimensional and rich in features that may be important markers of cognition in health and disease.

Low signal-to-noise ratio: many trials are needed per experimental condition. Sensitive to artifacts including eye movements and blinks, swallowing and coughing, muscle activity, and head motion. It can be difficult to obtain artifact-free MEG recordings from subjects with ferromagnetic implants. Preparation time for EEG may be long. No off-the-shelf analysis pipeline exists but choices in processing steps need to be made for each study anew. Complex signal processing techniques contain many potential pitfalls that may lead to erroneous findings.

positively charged ions such as sodium or potassium or negatively charged ions like chloride pass through the membrane. Depending on the ion’s equilibrium potential, the post-synaptic membrane potential either temporarily becomes more positive (depolarization) or more negative (hyperpolarization) until the neurotransmitter is broken down and the ion channel closes again. When multiple depolarizing synaptic inputs overlap in time, the post-synaptic membrane potential might increase sufficiently to cross the threshold for voltage-gated sodium channels to open and trigger an action potential. These inputs therefore have an excitatory effect, and the depolarization is also referred to as an excitatory post-synaptic potential (EPSP). By contrast, hyperpolarizing synaptic inputs bring the neuron further away from triggering an action potential and are therefore referred to as causing inhibitory postsynaptic potentials (IPSPs). Post-synaptic potentials lead to a spread of current both inside and outside the neuron (Fig. 2a). An EPSP creates a local increase in positively charged ions inside the cell (current source) that results in a current flow towards the more negatively charged neuron’s soma (current sink). The sink/source pair forms a current dipole of a certain strength and spatial orientation. In the extracellular space, the local decrease in positively charged ions at the synapse creates a compensatory current in the opposite direction from the primary current in the intracellular space. The

An Introduction to EEG/MEG for Model-Based Cognitive Neuroscience

189

Fig. 2 Origin of EEG and MEG signals. (a) Excitatory (EPSP) and inhibitory (IPSP) post-synaptic potentials induce intracellular and compensatory extracellular currents (red and yellow arrows, respectively). Although EPSPs and IPSPs lead to current flow in opposite direction, the location of the synapse on the post-synaptic dendrite also influences the direction of current flow along the dendrite and therefore the polarity of the signal observed at the scalp. Consequently, recordings cannot be directly interpreted in terms of underlying excitatory and inhibitory synaptic activity. (b) The summation of EPSPs and IPSPs that occur synchronously in time on spatially aligned dendrites leads to measurable signal outside the head. Extracellular currents propagate through intermediate tissues (yellow lines) to EEG electrodes on the scalp that measure differences in electric potential with respect to a reference location. Intracellular currents generate a magnetic field (green lines) that can be picked up with MEG sensors without being disturbed by intermediate tissues. (Adapted from Hari & Parkkonen, 2015)

reverse situation occurs during IPSPs, where an increase in negatively charged ions inside the cell (current sink) results in a current flow away from the more positively charged neuron’s soma (current source). Still, we are unable to distinguish EPSPs and IPSPs in EEG/MEG signals as the location of the synapse on the post-synaptic neuron determines whether a positive or negative deflection will be observed in the recorded signal. An EPSP close to the neuron’s soma will result

190

B. C. M. van Wijk

in an intracellular current flow towards the tip of the dendrites that is similar to the current flow induced by an IPSP at a more distant location. In other words, the polarity of EEG/MEG signals cannot be directly interpreted in terms of excitatory versus inhibitory synaptic activity. The extracellular change in electric potential created by post-synaptic potentials is detectable by EEG as the electric field spreads through the different tissues between the neurons and the electrodes placed on the scalp (Fig. 2b). This is called volume conduction. Due to the resistive properties of these tissues to current and the fact that the electric field of a current dipole falls off with distance squared from its source (1/r2 ), the measured signals at the level of the scalp are relatively weak. Electrical activity from the brain can only be detected if many neurons are active at the same time, so the extracellular electric fields of individual neurons add up. In addition, these fields should be of the same polarity or they will cancel out when summed across a population of neurons. This implies that the contribution of action potentials to the measured signal is likely to be minimal as they have a very brief duration of 1–2 ms and consist of a biphasic curve with both a positive and negative deflection. The slightest time-jitter at which they are triggered in individual neurons of a population makes their generated fields negligible. By contrast, post-synaptic potentials are ideal for summation as they have a relatively long duration between 10 and 100 ms and only consist of a positive or negative deflection. They are therefore seen as the main generators of EEG and MEG signals. Furthermore, only neurons with dendrites that are spatially aligned contribute to the measured signal as their induced extracellular fields can have the same polarity across neurons. Fortunately, pyramidal cells in cortex have relatively long dendrites with a perpendicular orientation to the cortical surface and are therefore well suited for generating measurable signal. It is due to this parallel organization near the surface of the brain that we are able to detect a clear signal non-invasively. Extracellular fields generated by synaptic activity on dendrites in a star-shaped configuration easily cancel out when summing over large numbers of neurons. Activity of inhibitory interneurons and spiny stellate cells therefore contributes little to the measured signal on the scalp. In addition, activity from subcortical structures is considered to be very difficult to detect as these structures are located at a large distance from the EEG electrodes or MEG sensors and often contain neurons with dendrites that are not as regularly aligned as cortical pyramidal cells. Using computer simulations, Attal et al. (2007) estimated that several thousands of trials are necessary to be able to detect activity from structures like the external pallidum or thalamus within ongoing EEG/MEG time series. Every electric current generates a magnetic field with a perpendicular orientation. This also applies to the intracellular currents induced by EPSPs and IPSPs. These magnetic fields pass through biological tissues unaffected and can therefore be measured outside the head (Fig. 2b). Although the magnetic fields produced by brain activity are tiny (10−15 T) compared to the earth’s magnetic field (0.5 × 10−4 T), they can be recorded with very sensitive SQUIDs. Changes in magnetic flux are picked up with a superconducting coil (magnetometer) located close to the subject’s head. This generates a current that flows through an input coil coupled

An Introduction to EEG/MEG for Model-Based Cognitive Neuroscience

191

with the SQUID. In order to eliminate magnetic interference from distant sources, MEG sensors may consist of two (or more) magnetometers with coils in opposite direction in either axial or planar configuration (gradiometer). SQUIDs operate near absolute zero temperature through cooling by liquid helium that is contained within a large dewar located above the helmet (see Fig. 1). This is why MEG systems look so sizable. Theoretically, radially oriented current dipoles inside a spherically symmetric conductor do not produce magnetic fields outside the sphere (Hämäläinen et al., 1993), suggesting that MEG would only be sensitive to tangential sources. In practice however, the head is not spherical and perfectly radially oriented sources are rare in the convoluted cortex, and their detection probability depends more on source depth than orientation (Hillebrand & Barnes, 2002). Like EEG, a large number of post-synaptic currents need to be induced simultaneously in order to be detectable. Modelling estimates by Murakami and Okada (2006) suggest that recruitment of around 50,000 pyramidal neurons is sufficient to yield measurable signal. To summarize, EEG and MEG are mostly sensitive to EPSPs and IPSPs that occur synchronously in time on a large number of spatially aligned dendrites of pyramidal cells in cortex. Post-synaptic currents on inhibitory interneurons or spiny stellate cells contribute little to the measured signal due to the radial configuration of their dendrites. The impact of action potentials is also considered to be small. The location of subcortical structures deep inside the brain and their mixed orientation of dendritic trees are suboptimal for EEG and MEG. Although it is theoretically possible to detect activity from these structures, the large number of trials that need to be collected for achieving an adequate signal-to-noise ratio is for many studies practically infeasible.

3 Practical Considerations and Pre-processing Steps Although EEG/MEG equipment is relatively simple to operate for users, there are a number of practical matters that strongly affect the quality of the collected data. Many of these hold for both EEG and MEG, but I will also point out a few modalityspecific issues. First time users will quickly realize that recordings are inherently noisy and easily influenced by external sources, movement of the subject, or the type of stimuli used in the experiment. Experimental effects of interest in cognitive neuroscience studies are typically expected to be small and may vary within and across subjects. They might emerge only after averaging the data over a large number of repetitions and/or by contrasting experimental conditions. It is therefore important to take great care in the experimental design (e.g. Luck, 2014) and to give appropriate subject instructions in order to avoid being unable to study the effect of interest with the acquired data set or having to apply laborious artifact correction methods. Previous literature may guide the way but extensive pilot testing is nevertheless recommended.

192

B. C. M. van Wijk

Various commercial EEG systems are available that differ in the number and type of electrodes used. Most conventional electrodes are made of silver and silver chloride (Ag/AgCl) that need the application of electrolyte gel to bridge the space between the skin and the surface of the electrode. The quality of the recording will depend on the ability to reduce skin-electrode impedance, which is ideally kept below 5 k. To achieve this, a syringe with a blunt needle is used to move any hair aside, to softly abrade the skin in order to remove dead cells, and to apply the gel. This will take considerable preparation time for systems with a large number of electrodes. Alternatively, ‘dry’ systems do not require the application of gel but rely on mechanical contact between the electrodes and the skin. Preparation time may therefore be considerably lower, although some systems still require the application of a saline solution to decrease skin-electrode impedance. Electrodes are usually kept in place through prefabricated electrode nets or by inserting them into the holes of electrode caps that are produced in different sizes to fit a range of head sizes. Most caps follow a standardized layout based on the internationally recognized 10–20 system originally introduced for 21 electrodes (Jasper, 1958) and later extended to the 10–10 (Chatrian et al., 1985) and 10–5 (Oostenveld & Praamstra, 2001) systems to accommodate more electrodes. The numbers refer to the distance between electrodes in percent of the line between the anatomical landmarks nasion and inion and between left and right pre-auricular points. Although electrode arrays may contain as many as 256 electrodes, most experimental studies in the field of cognitive neuroscience are performed with 64 or 128 electrodes. Only a few types of commercial MEG systems are produced. They contain a fixed number of axial and/or planar gradiometers between 150 and 306 that are embedded in the interior of the system’s helmet. Subjects are measured in either supine or upright position, and the bed or chair is adjusted until the subject’s head is positioned straight inside the helmet and just touching the top. Hence, unlike EEG, the sensors are not directly attached to the scalp, and the position of the head relative to the sensors needs to be measured with separate coils that are placed on the nasion and left/right pre-auricular points or at a known distance to these landmarks. Experimenters could try to minimize possible head movement during the recording by placing small cushions between the subject’s head and the helmet. Some labs even make use of three-dimensional printed head casts that are custommade for individual subjects in order to fill all residual spaces (Meyer et al., 2017). Keeping the subject’s head still at a location that is accurately measured increases the recording’s signal-to-noise ratio and yields better source localization estimates (Troebinger et al., 2014). Currently, rapid progress is made in the development of new MEG sensor technology, optically pumped magnetometers (OPMs) (Tierney et al., 2019), that could overcome these limitations. OPMs operate at room temperature and can be mounted in a wearable cap placed directly on the head (Boto et al., 2018). The shorter distance to the brain results in higher sensitivity to cortical activity compared to SQUID-based systems (Sander et al., 2012), but extensive magnetic shielding is still required. Other advantages are that subject movements are less constrained and sensor positions can be adjusted to individual head size.

An Introduction to EEG/MEG for Model-Based Cognitive Neuroscience

193

The main task of the experimenter during data acquisition is to make sure that the subject is feeling comfortable and performing the experimental task correctly and to minimize the occurrence of signal artifacts. The acquisition software allows for online inspection of the collected time series and for checking the presence of any synchronization pulses that might have been programmed. A major source of artifacts arises from the net difference in electrical charge between the cornea and retina of the subject’s eyes. Essentially, the eyes form two large dipoles that lead to considerable changes in signal amplitude during eye movements and blinks across many frontal, temporal, central, and parietal recording sites. To a large extent, these can be avoided by adequate subject instruction and design of the experimental task. A fixation cross or dot can be included in the centre of the screen that is used to display the task, and short rest periods between trials can be included to allow for blinking. Other artifacts that can be minimized with subject instruction include those that arise from excessive activation of facial or head muscles, coughing and swallowing, and unnecessary movement. Slow drifts, for example, due to sweating or cable movements, can normally be easily filtered out, as well as any 50 or 60 Hz line noise induced by the main electricity network. Furthermore, MEG recordings are readily disturbed by movement of ferromagnetic metallic objects. Participants are therefore asked not to wear clothing with magnetic buttons or zippers and to take off any removable jewellery before the start of the recording. Implants, pins from surgery, or dental braces can cause distortions to the recorded time series that may be successfully removed offline through spatial filtering algorithms. Interference from external sources such as the earth’s magnetic field, power lines, and nearby traffic is attenuated through a carefully designed shielded room in which the MEG system is housed. Once the data set has been collected, the first analysis steps are to read in the data files into a software programme of choice and to pre-process the time series in order to prepare them for further analyses that address the effects of interest more directly. This includes the application of filters, epoching of trials around the onset of events in the experimental task, baseline correction, and the rejection or correction of time segments or channels with artifacts. If needed, independent component analysis (ICA) could be used to clean the time series from eye movements and blinks and muscle and cardiac activity. Visual inspection of the time series is often unavoidable to make sure no artifacts are left in the time series. There is no general consensus on the minimum number of clean trials needed as it depends on the signal-to-noise ratio of the data feature of interest. For most cognitive neuroscience experiments, around 100 trials per condition suffice. An additional pre-processing step for EEG is the choice of reference electrode(s), also referred to as montage. Because voltage measurements are always with respect to a reference point, at least two electrodes are required for recording EEG. Ideally, the reference itself is not influenced by the effect of interest but does record the same background noise. One or more reference electrodes are used during recording, but the collected time series can still be re-referenced offline to other electrodes. With modern high-density electrode arrays, researchers are flexible in their choice of montage. Conventional reference electrodes for cognitive neuroscience experiments

194

B. C. M. van Wijk

include electrode Cz, the average across the two electrodes placed on left and right mastoids, the average across all electrodes, and a Laplacian reference where the time series from adjacent electrodes are subtracted. Every montage has its advantages and disadvantages, and which one is chosen often depends on what has been used in previous literature. Re-referencing does not play a role in MEG recordings as sensors detect absolute magnetic fields for which no separate reference channel is needed. It is feasible for researchers with an affinity for programming to implement all EEG/MEG data analyses via custom-written code. Alternatively, there are a number of excellent open-source software packages available that aid the data analysis in Matlab and Python. The most commonly used Matlab toolboxes include SPM (https://www.fil.ion.ucl.ac.uk/spm/), Fieldtrip (http://www.fieldtriptoolbox.org), EEGLAB (https://sccn.ucsd.edu/eeglab/), and Brainstorm (https://neuroimage.usc. edu/brainstorm/). For Python, MNE (https://mne.tools/stable/index.html) has been developed. Most software packages contain extensive user documentation and links to freely available tutorial data sets for learning.

4 Event-Related Potentials Typical EEG/MEG studies in cognitive neuroscience make use of experimental paradigms in which participants perform the same type of trials many times. Simply averaging the recorded time series across trials can reveal systematic fluctuations in signal amplitude that relate to performance of the task. These fluctuations might be difficult to observe in single trials, due to the inherently low signal-to-noise ratio of EEG/MEG. The noise that is recorded, however, is likely to be unrelated to the task itself. Averaging across trials ensures that the noise is cancelled out and, any stimulus- or response-locked brain activity becomes visible. In order to correct for slow drifts or other fluctuations during the course of the experiment, the time series of each epoch are high-pass filtered or baseline-corrected before averaging by subtracting the average signal in a time period of a few 100 ms before the onset of the stimulus. Various stereotypical task-related signal components have been identified. These are typically referred to according to standard terminology based on whether the amplitude shows a negative (‘N’) or positive (‘P’) deflection, the latency of the component’s peak in ms (roughly), and the location on the scalp where it is observed. On the one hand, evoked potentials can be observed at very short latencies and in primary (cortical) areas. Examples are the N25, P60, and N80 in somatosensory-evoked potentials and similar components in visual-evoked potentials, auditory-evoked potentials, and motor-evoked potentials (as measured with electromyography). Evoked potentials reflect basic stimulus processing and show little alterations by cognitive processes such as attention or memory. They often serve an important role in making clinical diagnoses for disorders where nerve propagation is affected. On the other hand, event-related potentials (ERPs, event-

An Introduction to EEG/MEG for Model-Based Cognitive Neuroscience

195

Fig. 3 Event-related potentials can be modulated by cognitive processes. In the study by O’Connell et al. (2012), subjects were instructed to respond with a button press as soon as they detected a contrast change in the presented stimulus. Although the stimulus in fact gradually faded, the change was detected at variable moments in time. Event-related potentials for trials grouped by reaction time (RT) revealed a centro-parietal component resembling the time course of evidence accumulation with a peak at the moment of the button press. (Adapted from O’Connell et al., 2012)

related fields for MEG) occur at longer latencies and are modulated by cognitive factors. Examples are the parietal P300 related to cognitive information processing, the temporal N400 related to semantic processing, the frontocentral N100 elicited by unpredicted stimuli and error-related negativity (ERN) in response to an erroneous event or response, and the Bereitschaftspotential over motor areas before the initiation of voluntary movement. ERPs can be visualized as a topography at the time of their peak amplitude or as a time series for a selected (group of) electrode(s) (Fig. 3). When the experimental design allows for systematic comparisons between conditions, any effects of interests could be further highlighted by plotting the ERP as a difference between conditions. Many research questions can be addressed by statistically comparing onset latencies and/or peak amplitudes between conditions or patient groups. This can be approached with conventional parametric statistics such as the general linear model (Kilner & Friston, 2010; Litvak et al., 2011) or via permutation tests (Maris & Oostenveld, 2007). In model-based EEG/MEG, peak amplitudes can, for example, be used to correlate with estimated parameter values of a cognitive model. Several EEG studies on perceptual decision-making have focused on ERPs. O’Connell et al. (2012) identified a centro-parietal positivity (CPP) component, with characteristics similar to the P300, that reflects the build-up of sensory evidence in a change detection task (Fig. 3). Subjects were presented with a flickering annulus stimulus that gradually changed in contrast. Although the sensory information was exactly the same across trials, subjects detected the change at variable moments in

196

B. C. M. van Wijk

time. When analysing trials with short, medium, and fast reaction times separately, the CPP traced the build-up of sensory evidence, culminating at a fixed peak amplitude just before the response. Crucially, the component was unaffected by the stimulus modality (visual or auditory) and the presence or absence of an overt motor response and occurred only when subjects were instructed to pay attention to the contrast change. This indicates that the CPP acts as an abstract, supramodal representation of sensory evidence. The work by Philiastides et al. (2014) explicitly linked the CPP to parameters of a drift-diffusion-like computational model. In their experimental task, subjects performed a visual discrimination task. Per trial, a picture of a face or car was presented at one of four different phase coherence levels, hence manipulating the amount of sensory evidence. Participants indicated with a button press whether they perceived a face or a car. The build-up rate of the CPP positively correlated with the drift rate parameter. In addition, the onset latency of the component significantly correlated with the estimated non-decision time across individuals. Interestingly however, the CPP did not reach a fixed peak amplitude at the time of the response and continued to increase for the trials with lower sensory evidence. Choice confidence might therefore have contributed to the ERP time courses. It is also possible that differences in task design compared to the study by O’Connell et al. (2012) have led to different observations. Linking ERPs with parameters from cognitive models on a single trial level is difficult but feasible if the data feature is strong enough. Boehm et al. (2014) demonstrated this for the contingent negative variation (CNV), a slow negative build-up over central electrodes in anticipation of an upcoming motor response. Subjects performed a random dot motion task where each trial started with a task instruction to emphasize speed or accuracy and a 2s waiting period before the dots were displayed. Single-trial estimates of the response caution parameter of the linear ballistic accumulator model were predictive of the CNV amplitude at the end of the waiting period. This effect was only found for speed-emphasis trials and not for accuracy-emphasis trials, suggesting that the CNV reflects trial-bytrial adjustments of response caution to mediate quick decision-making. Response caution was even found to be a better predictor of CNV amplitude than raw response times, underscoring that the component does not merely represent the preparation of specific motor responses but reflects a more general cognitive preparedness to act. Other model-based ERP studies have focused on, for example, the distinction between early and late ERP components in the decision-making process (Philiastides et al., 2006; Ratcliff et al., 2009), the relation between visual potentials and diffusion decision parameters (Nunez et al., 2017), and evidence accumulation during sequential decision-making (de Lange et al., 2010).

5 Time-Frequency Modulations In his report from 1929, Berger presents the EEG of his 15-year-old son Klaus with clear rhythmic oscillations at a frequency of ~10 Hz. He referred to these oscillations as the alpha rhythm as it was the first rhythm he observed. The oscillations were

An Introduction to EEG/MEG for Model-Based Cognitive Neuroscience

197

Fig. 4 Oscillations and time-frequency modulations in EEG/MEG recordings. (a) Excerpt from an MEG recording over the left parietal cortex. Time periods with strong alpha oscillations can often clearly be observed in raw recordings especially when subjects have their eyes closed or become a little drowsy. (b) The spectral profile of EEG/MEG time series is traditionally divided into separate frequency bands based on the presence of spectral peaks and observed amplitude modulations with task performance or behavioural state. Data points around 50 Hz are omitted because of line noise. (c) Time-frequency spectra describe modulations in spectral power with respect to the onset of a particular stimulus or event. A pre-stimulus time window usually serves as a baseline. Shown examples are particularly relevant for the field of cognitive neuroscience. Left: Viewing a moving visual grating stimulus induces an increase in gamma power and decrease in beta power in the visual cortex. (Adapted from Orekhova et al., 2018). Middle: Execution of unilateral movements are accompanied by a stereotypical decrease in beta power over bilateral sensorimotor cortices, followed by a unilateral rebound after movement termination. An increase in gamma power around movement onset is sometimes observed in MEG. (Data for time-frequency spectrum taken from Litvak et al. (2012), topographies below are adapted from Espenhahn et al. (2017)). Right: A brief increase in midfrontal theta power is associated with processing of conflict, novelty, errors, and punishment. (Adapted from Cavanagh & Frank, 2014). Note the different ways of visualizing the source of activity changes in the bottom row

strongest, that is, having the largest amplitude, when subjects were in a relaxed state with eyes closed (see also Fig. 4a). Later, another alpha rhythm was discovered close to the central sulcus that, together with a beta rhythm (~25 Hz), is suppressed upon tactile stimulation (Jasper & Andrews, 1938). Ever since, oscillatory activity in EEG/MEG time series at various frequencies and scalp locations has been found to modulate with behavioural state or to show deviations in neurological or psychiatric

198

B. C. M. van Wijk

disorders. Accordingly, we typically divide the spectral profile of EEG/MEG time series into distinct frequency bands (Fig. 4b): delta (1–4 Hz) is associated with sleep, theta (4–8 Hz) with memory, alpha (8–12 Hz) with attention, beta (13– 30 Hz) with movement, and gamma (~30–100 Hz) with sensory and cognitive processing. Clearly, this is a very general and coarse description of the frequency components of EEG/MEG signals and their associated functions. Any experimental study shows that task-related brain responses are more complex with multiple modulations occurring simultaneously in separate frequency bands and at different scalp locations. Several of these modulations can be observed systematically across individuals during the performance of behavioural and cognitive tasks (Fig. 4c). Oscillations can be studied in the frequency domain through spectral decomposition. A popular method is Fourier analysis, which exploits the fact that any complex time series can be broken down into a sum of sinusoids with different frequencies, amplitudes, and initial phase. The contribution of each frequency to the overall signal can be presented as a power spectrum or power spectral density. To gain insight into spectral power (squared amplitude) changes over time, one could compute a sequence of spectra using a sliding time window. Here, the chosen length of the window determines the trade-off between the temporal and frequency resolutions. The window should be of sufficient duration in order to detect a full cycle (but preferable more) of the lowest frequency of interest, while it should be short enough to detect fast fluctuations in oscillation amplitude. The obtained spectra can be averaged across the same time window for different trials in order to obtain a time-frequency spectrum that reveals task-related power modulations. It is crucial to compute this average after applying the Fourier transformation as oscillations may easily cancel out in a time-domain average if they do not occur at a fixed phase in relation to the stimulus (also referred to as induced as opposed to evoked activity). Other commonly used techniques such as Hilbert or wavelet analysis have been shown to be mathematically equivalent to Fourier analysis theoretically (Bruns, 2004) and often yield comparable results in practice. Early stimulus-induced time-frequency modulations in cognitive paradigms like perceptual decision-making can be observed in primary sensory cortices. Attended/detected stimuli lead to an increase in gamma band power in visual (Hoogenboom et al., 2010; Wyart & Tallon-Baudry, 2008), auditory (Kaiser et al., 2006), or somatosensory (Bauer et al., 2006) cortex that is accompanied by a decrease in alpha/beta power. In their EEG study, Polanía et al. (2014) demonstrated that gamma band power in the parietal cortex is associated with evidence accumulation estimates from a sequential sampling model on a trial-bytrial basis. This relation was observed both when subjects made perceptual decisions on the relative size of presented food item images, as well as during value-based decisions on which food item they liked most. The latter also induced an increase in gamma oscillations in frontal regions that were synchronized with those in the parietal cortex, hence revealing fronto-parietal network activation. Motor responses such as button presses are accompanied by a stereotypical decrease in beta power before and during the movement, followed by a rebound after movement termination (Pfurtscheller & Lopes Da Silva, 1999; van Wijk et al.,

An Introduction to EEG/MEG for Model-Based Cognitive Neuroscience

199

2012). Although initially occurring over bilateral sensorimotor cortices, the power suppression becomes stronger in the hemisphere contralateral to the moving hand just before movement onset. Donner et al. (2009) demonstrated that the degree of beta power lateralization during stimulus viewing can already predict which left/right choice subjects are going to make a few seconds later, as if actively involved in the decision-making process. Similarly, overall gamma power during this interval was also predictive of the upcoming response, reminiscent of the increase around movement onset that can often be observed with MEG. Of particular interest to the field of cognitive neuroscience is the midfrontal increase in theta power that is associated with processing of novel stimuli, conflict, and errors (Cavanagh & Frank, 2014). Using a model-based approach, Cavanagh et al. (2011) found that a stronger midfrontal theta power increase during high-conflict trials was indicative of higher decision threshold values. This hints at a mechanism where the detection of conflict might serve to buy more time for evaluating the response options through an increase in decision threshold. The study was also performed in individuals with Parkinson’s disease who were undergoing deep brain stimulation treatment. Switching on the stimulation broke the positive correlation between midfrontal theta power and decision threshold resulting in more impulsive choice behaviour. Oscillatory patterns in EEG/MEG signals arise when neuronal activity is rhythmically synchronized. Such activity could, for example, emerge from reciprocal coupling between excitatory and inhibitory cells within a cortical column or within wider networks involving multiple cortical or subcortical regions such as the thalamus. It is likely to be assisted by the intrinsic tendency of many neurons to fire action potentials at certain frequencies (Buzsáki, 2006). An advantage of largescale synchronized activity is that pre-synaptic neurons can have a larger impact on target neurons compared to when their input arrives asynchronously. Also, rhythmic fluctuations of membrane potential provide predictable time windows of high excitability. This has been proposed to mediate the gating of information between neural populations (Fries, 2005).

6 Estimating Source Locations Berger’s recordings were performed with only two electrodes, one placed on the forehead and the other at the back of the head. Later studies exploring different electrode positions would reveal that the alpha rhythm that is suppressed when closing the eyes likely has its origin in the occipital lobes (Adrian & Matthews, 1934). It is very challenging though to pinpoint the exact origin of signals measured at the scalp as electric fields spread considerably through volume conduction. The different intermediate tissues between the neural sources and electrodes (brain tissue, cerebrospinal fluid, skull, skin) each have conductive properties that will smear and dampen electrical currents. Activity from a single source will therefore be picked up by multiple electrodes at the scalp. Even though MEG signals are

200

B. C. M. van Wijk

assumed to be unaffected by volume conduction currents (Hämäläinen et al., 1993), the relatively large distance between the brain and the sensors inside the helmet means that magnetic fields from different sources overlap. Substantial activity from a single cortical source may be detected in sensors up to 15 cm apart (Srinivasan et al., 2007). The difficulty of identifying the source origins of scalp recordings is called the inverse problem. Per definition, it does not have a unique solution as many configurations of source activations give rise to the same activation pattern at the scalp level. However, it is possible to make use of plausible constraints to find an approximate solution. Firstly, one could utilize Maxwell’s equations of electromagnetism to describe what the measured signal at the scalp level would look like if a single brain region of interest (usually referred to as voxel) would be active as a dipole with known orientation and unit strength. This is repeated for all voxels in the defined brain volume (often 10,000s (= ten thousands) of voxels) and combined into a single lead field matrix or forward model. Constructing such a model is referred to as solving the forward problem. Over the years, and with increasing computer power, researchers have developed more detailed head models. Originally, the brain’s volume was approximated with a single sphere for which the lead field can be computed analytically (Sarvas, 1987). Later models became more realistic with multiple spheres or shells to approximate the shape of the brain and its surrounding tissues, and with advanced boundary and finite element models for numerical computation of current spread (for a comparison see Stenroos et al., 2014). Secondly, additional constraints are needed to solve the inverse problem. Again, the developed methods differ in assumptions and complexity, but they all use the forward model as a biophysical basis (for an overview see, for example, Michel et al., 2004). For example, dipole fit methods aim to find a combination of dipole locations and dipole moments that explain as much of the measured scalp activity as possible. This could be performed with an a priori assumed limited number of dipoles or through brain-wide minimum norm solutions that minimize the overall activation. By contrast, beamforming methods do not aim to ‘explain away’ the measured data at the scalp level but seek to construct spatial filters that maximize the signal from the voxel of interest and minimize the interference of all other sources (Hillebrand & Barnes, 2005). To be able to do so, it is often assumed that activity from individual voxels is not (strongly) correlated. The result is a (smooth) volumetric or surface image of source activation strengths that may be visualized in a similar way as BOLD-fMRI data. Furthermore, one could extract the time series for the source location(s) with peak activations to obtain a higher signal-to-noise ratio for the data features of interest. The concepts underlying source analysis are summarized in Fig. 5.

An Introduction to EEG/MEG for Model-Based Cognitive Neuroscience

201

Fig. 5 Source analysis of EEG/MEG data. (a) The inverse problem of locating the neural sources responsible for generating the observed EEG/MEG measurements per definition does not have a unique solution. (b) Both electric and magnetic fields from a single neural source are picked up by multiple sensors through volume conduction. Vice versa, recordings from single EEG or MEG sensors contain activity from a mixture of sources. (c) The forward problem refers to finding an accurate description of activity patterns that would be recorded at the sensor level for each potential active source location and can be solved through detailed modelling of electromagnetic field spread in combination with a (realistic) model of the head and its tissues. (d) Differences between source localization algorithms arise from the necessity to formulate assumptions in order to find a plausible solution to the inverse problem. (e) Estimated source activations can be visualized as a volumetric or surface image that can be subjected to statistical tests. (f) Time series at the source level can be extracted through a linear combination of sensor level time series, where the weights are determined by the spatial filter for the source location of interest

7 Advanced Signal Processing EEG/MEG time series are rich spatiotemporal data that invite many more signal processing techniques than simple averages in the time or frequency domain. Closer inspection of temporal dynamics in different frequency bands reveals scale-free properties that are influenced by genetic factors and neuropsychiatric disorders (Hardstone et al., 2012). Cross-frequency coupling refers to the various ways in which signals at different frequencies can be interdependent, involving combina-

202

B. C. M. van Wijk

tions of amplitude, phase, or instantaneous frequency as a function of time (Jirsa & Müller, 2013). In particular phase-amplitude coupling between hippocampal theta and gamma oscillations plays a recognized role in memory processes and spatial navigation (Lisman & Jensen, 2013). The complexity of EEG/MEG data sets can be reduced through dimensionality reduction techniques like principal or independent component analysis to facilitate the visualization and interpretation of findings (Makeig et al., 1997). In turn, multivariate classification algorithms can help to encode and decode task-specific or disorder-specific spatiotemporal or spatiospectral patterns (Grootswagers et al., 2017; Lehmann et al., 2007), for example, to infer the (re-)activation of visual object and sequence representations (Liu et al., 2019). A popular approach is to test for statistical dependencies between time series measured at different scalp (or source) locations as a sign of functional coupling between brain regions. Measures for this include the Pearson correlation coefficient, coherence (correlation in the frequency domain), the phase locking value, mutual information, and several others (for overviews, see Bastos & Schoffelen, 2015; Pereda et al., 2005). One of the greatest challenges is to separate true coupling from volume conduction effects, for which several methods have been developed (Nolte et al., 2004; Stam et al., 2007). Furthermore, measures like Granger causality and transfer entropy allow for inferring the directionality of coupling, that is, whether region A influences region B or vice versa. The technique of dynamic causal modelling uses biologically inspired computational models to infer synaptic coupling strengths within and between brain regions (Kiebel et al., 2008; van Wijk et al., 2018). The complexity of EEG/MEG data analysis is fascinating but also easily prone to pitfalls that may not be directly obvious. This applies to advanced methods (van Wijk et al., 2010) but also to more straightforward approaches like the computation of ERPs (Woodman, 2010) and functional connectivity measures (Bastos & Schoffelen, 2015). Guidelines have been formulated to improve the quality of experimental studies and to foster their reproducibility (Gross et al., 2013).

8 Concluding Remarks EEG and MEG are principal research techniques for studying the neural underpinnings of human cognition. Decades of dedicated research have revealed several time- and frequency-domain components that reliably arise in relation to certain cognitive task aspects, sometimes with altered amplitudes in patient groups. Rapid developments in more advanced signal processing techniques are likely to uncover additional signal features. We are still trying to grasp what these observations truly mean for understanding the neural mechanisms that lead to cognition and behaviour. Here, computational or statistical models hold a powerful role in connecting observed brain activity with theories from psychology. Parameters of these models

An Introduction to EEG/MEG for Model-Based Cognitive Neuroscience

203

can significantly aid the interpretation of otherwise abstract EEG/MEG findings. A few examples have been highlighted in this chapter that may inspire researchers to further advance this promising field of research.

Exercises 1. What are the main differences between EEG and MEG? 2. The build-up of the centro-parietal positivity ERP component during evidence accumulation is reminiscent of the increase in firing rate of single neurons that can be observed in the posterior parietal cortex of rhesus monkeys (Shadlen & Newsome, 2001). Discuss to what extent these two phenomena might be related. 3. Explain why it is not possible to look at time-frequency modulations at the same time resolution as the original recording. 4. A researcher detects a correlation between the beta oscillation time series recorded over left and right motor cortex. Discuss whether this finding could be interpreted as an indication of functional coupling. 5. What are the merits of combining cognitive models of behaviour with EEG/MEG recordings?

Answers 1. The most important differences include the following: • They measure distinct signals: electric (EEG) versus magnetic (MEG) fields. • EEG recordings need a reference to measure electric potential differences, whereas MEG is reference-free. • EEG signals are generated by extracellular currents and MEG signals primarily by intracellular currents. • MEG signals are unaffected by intermediate tissues of the head and are therefore less prone to inaccuracies of applied forward models for source localization. • EEG typically requires more subject preparation time than MEG. • MEG requires measurement of the subject’s head position as the sensors are physically detached from the head. • Subjects with ferromagnetic body implants may be difficult to measure with MEG but not with EEG. • MEG systems are much more expensive to purchase and maintain. 2. Given that action potentials contribute little to EEG signals, it is unlikely that the same neural activity pattern underlies both recordings. However, there are several indications to believe that the two findings are highly related:

204

B. C. M. van Wijk

(a) The centro-parietal positivity has a topography that could match the source location of the single cell recordings. (b) Action potentials cause and are generated by ionic currents that could underlie the EEG observations. (c) Both signal features correlate with evidence accumulation and have been found to reach a fixed level before the response and hence appear to be similarly involved in the decision-making process. 3. The computation of spectral power requires a time series with a minimum duration of a full oscillation period of the lowest frequency of interest. For shorter time windows, the oscillation’s amplitude cannot be determined. A sliding time window centred around the time points of interest is used for the computation of time-frequency spectra. Although one could slide this time window with only one sample at the time, neighbouring power estimates will have been obtained based on almost identical data and are therefore not independent. 4. Correlations between time series recorded at different points on the scalp are highly prone to volume conduction and might therefore merely reflect the fact that both sensors are picking up signal from the same neural source. Spurious correlations between oscillations due to volume conduction will always occur with either zero phase lag or in anti-phase as signals propagate instantaneously from the neural sources to the measurement sensors. In this case, it is impossible to distinguish true connectivity from volume conduction effects. Only time series that are synchronized at a different phase lag can be directly interpreted in terms of functional coupling. 5. There is mutual benefit in the combination of cognitive modelling with EEG/MEG. On the one hand, EEG/MEG can be used to reveal whether and how theoretical ideas from cognitive psychology might be implemented by the brain. On the other hand, cognitive models help to meaningfully interpret EEG findings that otherwise remain descriptive. Together, they strive to identify the neural mechanisms of cognition.

Further Reading • Niedermeyer’s Electroencephalography: Basic Principles, Clinical Applications, and Related Fields (7 ed). Editors: Donald L Schomer and Fernando H Lopes da Silva. 2017, Oxford University Press. All-encompassing and detailed resource for both fundamental research and clinical EEG studies. https://doi.org/10.1093/ med/9780190228484.001.0001 • Analyzing Neural Time Series Data: Theory and Practice. Author: Mike X Cohen. 2014, The MIT Press. This book explains in comprehensive language everything a novice EEG/MEG researcher needs to know about data analysis. Highly recommended for students without a strong mathematical background. https://doi.org/10.7551/mitpress/9609.001.0001

An Introduction to EEG/MEG for Model-Based Cognitive Neuroscience

205

• MEG-EEG Primer. Authors: Riita Hari & Aina Puce. 2017, Oxford University Press. Equally accessible resource providing an excellent overview of basic physics and physiology, practical information, and experimental findings. https:/ /doi.org/10.1093/med/9780190497774.001.0001 • Rhythms of the Brain. Author: György Buzsáki. 2006, Oxford University Press. Fascinating findings from decades of experimental work on neural oscillations and synchronization are presented in this book. https://doi.org/10.1093/ acprof:oso/9780195301069.001.0001 • http://www.fil.ion.ucl.ac.uk/spm/course/video/. Videos and slides from the lectures of the annual SPM course for MEG/EEG at University College London are freely available. These explain the various data analysis steps and demonstrate how to perform them in the SPM Matlab software. • http://www.fieldtriptoolbox.org/faq/open_data/. Up-to-date list of open access EEG and MEG data sets. • http://www.biomagcentral.org. Main website for the MEG research community. Here you can find more information on MEG basics, instrumentation, research, and clinical applications. There is also an overview of additional educational resources and analysis tools. Make sure to subscribe to the mailing list for job advertisements and announcements of news and events including the bi-annual Biomag conference.

References Adrian, E. D., & Matthews, B. H. C. (1934). The Berger rhythm: Potential changes from the occipital lobes in man. Brain, 57, 355–385. https://doi.org/10.1093/brain/57.4.355 Attal, Y., Bhattacharjee, M., Yelnik, J., Cottereau, B., Lefèvre, J., Okada, Y., Bardinet, E., Chupin, M., & Baillet, S. (2007). Modeling and detecting deep brain activity with MEG & EEG. Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 30, 4937– 4940. https://doi.org/10.1109/IEMBS.2007.4353448 Bastos, A. M., & Schoffelen, J.-M. (2015). A tutorial review of functional connectivity analysis methods and their interpretational pitfalls. Frontiers in Systems Neuroscience, 9, 175. https:// doi.org/10.3389/fnsys.2015.00175 Bauer, M., Oostenveld, R., Peeters, M., & Fries, P. (2006). Tactile spatial attention enhances gamma-band activity in somatosensory cortex and reduces low-frequency activity in Parieto-occipital areas. The Journal of Neuroscience, 26, 490–501. https://doi.org/10.1523/ JNEUROSCI.5228-04.2006 Berger, H. (1929). Über das Elektroenkephalogramm des Menschen. Archiv für Psychiatrie und Nervenkrankheiten, 87, 527–570. Boehm, U., van Maanen, L., Forstmann, B., & van Rijn, H. (2014). Trial-by-trial fluctuations in CNV amplitude reflect anticipatory adjustment of response caution. NeuroImage, 96, 95–105. https://doi.org/10.1016/j.neuroimage.2014.03.063 Boto, E., Holmes, N., Leggett, J., Roberts, G., Shah, V., Meyer, S. S., Muñoz, L. D., Mullinger, K. J., Tierney, T. M., Bestmann, S., Barnes, G. R., Bowtell, R., & Brookes, M. J. (2018). Moving magnetoencephalography towards real-world applications with a wearable system. Nature, 555, 657–661. https://doi.org/10.1038/nature26147

206

B. C. M. van Wijk

Bruns, A. (2004). Fourier-, Hilbert- and wavelet-based signal analysis: Are they really different approaches? Journal of Neuroscience Methods, 137, 321–332. https://doi.org/10.1016/ j.jneumeth.2004.03.002 Buzsáki, G. (2006). Rhythms of the brain. Oxford University Press. Cavanagh, J. F., & Frank, M. J. (2014). Frontal theta as a mechanism for cognitive control. Trends in Cognitive Sciences. https://doi.org/10.1016/j.tics.2014.04.012 Cavanagh, J. F., Wiecki, T. V., Cohen, M. X., Figueroa, C. M., Samanta, J., Sherman, S. J., & Frank, M. J. (2011). Subthalamic nucleus stimulation reverses mediofrontal influence over decision threshold. Nature Neuroscience, 14, 1462–1467. https://doi.org/10.1038/nn.2925 Chatrian, G. E., Lettich, E., & Nelson, P. L. (1985). Ten percent electrode system for topographic studies of spontaneous and evoked EEG activity. The American Journal of EEG Technology, 25, 83–92. Cohen, D. (1968). Magnetoencephalography: Evidence of magnetic fields produced by alpharhythm currents. Science, 161, 784–786. https://doi.org/10.1126/science.161.3843.784 Cohen, D. (1972). Magnetoencephalography: Detection of the brain’s electrical activity with a superconducting magnetometer. Science (80-), 175, 664–666. https://doi.org/10.1126/ science.175.4022.664 de Lange, F. P., Jensen, O., & Dehaene, S. (2010). Accumulation of evidence during sequential decision making: The importance of top-down factors. The Journal of Neuroscience, 30, 731– 738. https://doi.org/10.1523/JNEUROSCI.4080-09.2010 Donner, T. H., Siegel, M., Fries, P., & Engel, A. K. (2009). Buildup of choice-predictive activity in human motor cortex during perceptual decision making. Current Biology, 19, 1581–1585. https://doi.org/10.1016/j.cub.2009.07.066 Espenhahn, S., de Berker, A. O., van Wijk, B. C. M., Rossiter, H. E., & Ward, N. S. (2017). Movement-related beta oscillations show high intra-individual reliability. NeuroImage, 147, 175–185. https://doi.org/10.1016/j.neuroimage.2016.12.025 Fries, P. (2005). A mechanism for cognitive dynamics: Neuronal communication through neuronal coherence. Trends in Cognitive Sciences, 9, 474–480. https://doi.org/10.1016/ j.tics.2005.08.011 Grootswagers, T., Wardle, S. G., & Carlson, T. A. (2017). Decoding dynamic brain patterns from evoked responses: A tutorial on multivariate pattern analysis applied to time series neuroimaging data. Journal of Cognitive Neuroscience, 29, 677–697. https://doi.org/10.1162/ jocn_a_01068 Gross, J., Baillet, S., Barnes, G. R., Henson, R. N., Hillebrand, A., Jensen, O., Jerbi, K., Litvak, V., Maess, B., Oostenveld, R., Parkkonen, L., Taylor, J. R., van Wassenhove, V., Wibral, M., & Schoffelen, J.-M. (2013). Good practice for conducting and reporting MEG research. NeuroImage, 65, 349–363. https://doi.org/10.1016/j.neuroimage.2012.10.001 Hämäläinen, M., Hari, R., Ilmoniemi, R. J., Knuutila, J., & Lounasmaa, O. V. (1993). Magnetoencephalography – Theory, instrumentation, and applications to noninvasive studies of the working human brain. Reviews of Modern Physics, 65, 413–497. https://doi.org/10.1103/ RevModPhys.65.413 Hardstone, R., Poil, S.-S., Schiavone, G., Jansen, R., Nikulin, V. V., Mansvelder, H. D., & Linkenkaer-Hansen, K. (2012). Detrended fluctuation analysis: A scale-free view on neuronal oscillations. Frontiers in Physiology, 3, 450. https://doi.org/10.3389/fphys.2012.00450 Hari, R., & Parkkonen, L. (2015). The brain timewise: How timing shapes and supports brain function. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 370, 20140170. https://doi.org/10.1098/rstb.2014.0170 Hari, R., & Salmelin, R. (2012). Magnetoencephalography: From SQUIDs to neuroscience. Neuroimage 20th anniversary special edition. NeuroImage, 61, 386–396. https://doi.org/10.1016/ j.neuroimage.2011.11.074 Hari, R., Baillet, S., Barnes, G., Burgess, R., Forss, N., Gross, J., Hämäläinen, M., Jensen, O., Kakigi, R., Mauguière, F., Nakasato, N., Puce, A., Romani, G.-L., Schnitzler, A., & Taulu, S. (2018). IFCN-endorsed practical guidelines for clinical magnetoencephalography (MEG). Clinical Neurophysiology, 129, 1720–1747. https://doi.org/10.1016/j.clinph.2018.03.042

An Introduction to EEG/MEG for Model-Based Cognitive Neuroscience

207

Hillebrand, A., & Barnes, G. R. (2002). A quantitative assessment of the sensitivity of whole-head MEG to activity in the adult human cortex. NeuroImage, 16, 638–650. https://doi.org/10.1006/ nimg.2002.1102 Hillebrand, A., & Barnes, G. R. (2005). Beamformer analysis of MEG data. International Review of Neurobiology, 68, 149–171. https://doi.org/10.1016/S0074-7742(05)68006-3 Hoogenboom, N., Schoffelen, J.-M., Oostenveld, R., & Fries, P. (2010). Visually induced gammaband activity predicts speed of change detection in humans. NeuroImage, 51, 1162–1167. https://doi.org/10.1016/j.neuroimage.2010.03.041 Jasper, H. H. (1958). The ten-twenty electrode system of the International Federation. Clinical Neurophysiology, 10, 371–375. Jasper, H. H., & Andrews, H. L. (1938). Electro-encephalography. III. Normal differentiation of occipital and precentral regions in man. Archives of Neurology and Psychiatry, 39, 96–115. Jirsa, V., & Müller, V. (2013). Cross-frequency coupling in real and virtual brain networks. Frontiers in Computational Neuroscience, 7, 78. https://doi.org/10.3389/fncom.2013.00078 Kaiser, J., Hertrich, I., Ackermann, H., & Lutzenberger, W. (2006). Gamma-band activity over early sensory areas predicts detection of changes in audiovisual speech stimuli. NeuroImage, 30, 1376–1382. https://doi.org/10.1016/j.neuroimage.2005.10.042 Kiebel, S. J., Garrido, M. I., Moran, R. J., & Friston, K. J. (2008). Dynamic causal modelling for EEG and MEG. Cognitive Neurodynamics, 2, 121–136. https://doi.org/10.1007/s11571-0089038-0 Kilner, J. M., & Friston, K. J. (2010). Topological inference for EEG and MEG. The Annals of Applied Statistics, 4, 1272–1290. https://doi.org/10.1214/10-AOAS337 Lehmann, C., Koenig, T., Jelic, V., Prichep, L., John, R. E., Wahlund, L.-O., Dodge, Y., & Dierks, T. (2007). Application and comparison of classification algorithms for recognition of Alzheimer’s disease in electrical brain activity (EEG). Journal of Neuroscience Methods, 161, 342–350. https://doi.org/10.1016/j.jneumeth.2006.10.023 Lisman, J. E., & Jensen, O. (2013). The θ-γ neural code. Neuron. https://doi.org/10.1016/ j.neuron.2013.03.007 Litvak, V., Mattout, J., Kiebel, S., Phillips, C., Henson, R., Kilner, J., Barnes, G., Oostenveld, R., Daunizeau, J., Flandin, G., Penny, W., & Friston, K. (2011). EEG and MEG data analysis in SPM8. Computational Intelligence and Neuroscience, 2011, 1–32. https://doi.org/10.1155/ 2011/852961 Litvak, V., Eusebio, A., Jha, A., Oostenveld, R., Barnes, G., Foltynie, T., Limousin, P., Zrinzo, L., Hariz, M. I., Friston, K., & Brown, P. (2012). Movement-related changes in local and long-range synchronization in Parkinson’s disease revealed by simultaneous magnetoencephalography and intracranial recordings. The Journal of Neuroscience, 32, 10541–10553. https://doi.org/10.1523/JNEUROSCI.0767-12.2012 Liu, Y., Dolan, R. J., Kurth-Nelson, Z., & Behrens, T. E. J. (2019). Human replay spontaneously reorganizes experience. Cell, 178, 640–652.e14. https://doi.org/10.1016/j.cell.2019.06.012 Luck, S. J. (2014). Chapter 4: The design of ERP experiments. In An introduction to the eventrelated potential technique. The MIT Press. Makeig, S., Jung, T. P., Bell, A. J., Ghahremani, D., & Sejnowski, T. J. (1997). Blind separation of auditory event-related brain responses into independent components. Proceedings of the National Academy of Sciences of the United States of America, 94, 10979–10984. https:// doi.org/10.1073/pnas.94.20.10979 Maris, E., & Oostenveld, R. (2007). Nonparametric statistical testing of EEG- and MEG-data. Journal of Neuroscience Methods, 164, 177–190. https://doi.org/10.1016/j.jneumeth.2007.03.024 Meyer, S. S., Bonaiuto, J., Lim, M., Rossiter, H., Waters, S., Bradbury, D., Bestmann, S., Brookes, M., Callaghan, M. F., Weiskopf, N., & Barnes, G. R. (2017). Flexible head-casts for high spatial precision MEG. Journal of Neuroscience Methods, 276, 38–45. https://doi.org/10.1016/ j.jneumeth.2016.11.009 Michel, C. M., Murray, M. M., Lantz, G., Gonzalez, S., Spinelli, L., & Grave de Peralta, R. (2004). EEG source imaging. Clinical Neurophysiology, 115, 2195–2222. https://doi.org/ 10.1016/j.clinph.2004.06.001

208

B. C. M. van Wijk

Murakami, S., & Okada, Y. (2006). Contributions of principal neocortical neurons to magnetoencephalography and electroencephalography signals. The Journal of Physiology, 575, 925–936. https://doi.org/10.1113/jphysiol.2006.105379 Nolte, G., Bai, O., Wheaton, L., Mari, Z., Vorbach, S., & Hallett, M. (2004). Identifying true brain interaction from EEG data using the imaginary part of coherency. Clinical Neurophysiology, 115, 2292–2307. https://doi.org/10.1016/j.clinph.2004.04.029 Nunez, M. D., Vandekerckhove, J., & Srinivasan, R. (2017). How attention influences perceptual decision making: Single-trial EEG correlates of drift-diffusion model parameters. Journal of Mathematical Psychology, 76, 117–130. https://doi.org/10.1016/j.jmp.2016.03.003 O’Connell, R. G., Dockree, P. M., & Kelly, S. P. (2012). A supramodal accumulation-to-bound signal that determines perceptual decisions in humans. Nature Neuroscience, 15, 1729–1735. https://doi.org/10.1038/nn.3248 Oostenveld, R., & Praamstra, P. (2001). The five percent electrode system for high-resolution EEG and ERP measurements. Clinical Neurophysiology, 112, 713–719. https://doi.org/10.1016/ s1388-2457(00)00527-7 Orekhova, E. V., Sysoeva, O. V., Schneiderman, J. F., Lundström, S., Galuta, I. A., Goiaeva, D. E., Prokofyev, A. O., Riaz, B., Keeler, C., Hadjikhani, N., Gillberg, C., & Stroganova, T. A. (2018). Input-dependent modulation of MEG gamma oscillations reflects gain control in the visual cortex. Scientific Reports, 8, 8451. https://doi.org/10.1038/s41598-018-26779-6 Pereda, E., Quiroga, R. Q., & Bhattacharya, J. (2005). Nonlinear multivariate analysis of neurophysiological signals. Progress in Neurobiology, 77, 1–37. https://doi.org/10.1016/ j.pneurobio.2005.10.003 Pfurtscheller, G., & Lopes Da Silva, F. H. (1999). Event-related EEG/MEG synchronization and desynchronization: Basic principles. Clinical Neurophysiology, 110, 1842–1857. https:// doi.org/10.1016/S1388-2457(99)00141-8 Philiastides, M. G., Ratcliff, R., & Sajda, P. (2006). Neural representation of task difficulty and decision making during perceptual categorization: A timing diagram. The Journal of Neuroscience, 26, 8965–8975. https://doi.org/10.1523/JNEUROSCI.1655-06.2006 Philiastides, M. G., Heekeren, H. R., & Sajda, P. (2014). Human scalp potentials reflect a mixture of decision- related signals during perceptual choices. The Journal of Neuroscience, 34, 16877– 16889. https://doi.org/10.1523/JNEUROSCI.3012-14.2014 Polanía, R., Krajbich, I., Grueschow, M., & Ruff, C. C. (2014). Neural oscillations and synchronization differentially support evidence accumulation in perceptual and value-based decision making. Neuron, 82, 709–720. https://doi.org/10.1016/j.neuron.2014.03.014 Ratcliff, R., Philiastides, M. G., & Sajda, P. (2009). Quality of evidence for perceptual decision making is indexed by trial-to-trial variability of the EEG. Proceedings of the National Academy of Sciences of the United States of America, 106, 6539–6544. https://doi.org/10.1073/ pnas.0812589106 Sander, T. H., Preusser, J., Mhaskar, R., Kitching, J., Trahms, L., & Knappe, S. (2012). Magnetoencephalography with a chip-scale atomic magnetometer. Biomedical Optics Express, 3, 981–990. https://doi.org/10.1364/BOE.3.000981 Sarvas, J. (1987). Basic mathematical and electromagnetic concepts of the biomagnetic inverse problem. Physics in Medicine and Biology, 32, 11–22. https://doi.org/10.1088/0031-9155/32/ 1/004 Schomer, D. L., & Lopes da Silva, F. H. (2017). Niedermeyer’s electroencephalography: Basic principles, clinical applications, and related fields (7th ed.). Oxford University Press. https:// doi.org/10.1093/med/9780190228484.001.0001 Shadlen, M. N., & Newsome, W. T. (2001). Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey. Journal of Neurophysiology, 86, 1916–1936. https:// doi.org/10.1152/jn.2001.86.4.1916 Srinivasan, R., Winter, W. R., Ding, J., & Nunez, P. L. (2007). EEG and MEG coherence: Measures of functional connectivity at distinct spatial scales of neocortical dynamics. Journal of Neuroscience Methods, 166, 41–52. https://doi.org/10.1016/j.jneumeth.2007.06.026

An Introduction to EEG/MEG for Model-Based Cognitive Neuroscience

209

Stam, C. J., Nolte, G., & Daffertshofer, A. (2007). Phase lag index: Assessment of functional connectivity from multi channel EEG and MEG with diminished bias from common sources. Human Brain Mapping, 28, 1178–1193. https://doi.org/10.1002/hbm.20346 Stenroos, M., Hunold, A., & Haueisen, J. (2014). Comparison of three-shell and simplified volume conductor models in magnetoencephalography. NeuroImage, 94, 337–348. https:// doi.org/10.1016/j.neuroimage.2014.01.006 Tierney, T. M., Holmes, N., Mellor, S., López, J. D., Roberts, G., Hill, R. M., Boto, E., Leggett, J., Shah, V., Brookes, M. J., Bowtell, R., & Barnes, G. R. (2019). Optically pumped magnetometers: From quantum origins to multi-channel magnetoencephalography. NeuroImage. https:// doi.org/10.1016/j.neuroimage.2019.05.063 Troebinger, L., López, J. D., Lutti, A., Bradbury, D., Bestmann, S., & Barnes, G. (2014). High precision anatomy for MEG. NeuroImage, 86, 583–591. https://doi.org/10.1016/ j.neuroimage.2013.07.065 van Wijk, B. C. M., Stam, C. J., & Daffertshofer, A. (2010). Comparing brain networks of different size and connectivity density using graph theory. PLoS One, 5, e13701. https://doi.org/10.1371/ journal.pone.0013701 van Wijk, B. C. M., Beek, P. J., & Daffertshofer, A. (2012). Neural synchrony within the motor system: What have we learned so far? Frontiers in Human Neuroscience, 6, 1–15. https:// doi.org/10.3389/fnhum.2012.00252 van Wijk, B. C. M., Cagnan, H., Litvak, V., Kühn, A. A., & Friston, K. J. (2018). Generic dynamic causal modelling: An illustrative application to Parkinson’s disease. NeuroImage, 181, 818– 830. https://doi.org/10.1016/j.neuroimage.2018.08.039 Woodman, G. F. (2010). A brief introduction to the use of event-related potentials in studies of perception and attention. Attention, Perception, & Psychophysics, 72, 2031–2046. https:// doi.org/10.3758/APP.72.8.2031 Wyart, V., & Tallon-Baudry, C. (2008). Neural dissociation between visual awareness and spatial attention. The Journal of Neuroscience, 28, 2667–2679. https://doi.org/10.1523/ JNEUROSCI.4748-07.2008

Advancements in Joint Modeling of Neural and Behavioral Data Brandon M. Turner, Giwon Bahg, Matthew Galdo, and Qingfang Liu

Abstract Since the first edition of this book, several new developments have made the joint modeling approach more attractive for researchers in model-based cognitive neuroscience. These developments span several dimensions, such as making joint models more accessible for use in any generic problem, increasing the scalability of the linking structures to prepare them for high-dimensional data, imposing constraints on the temporal dynamics of the model, and instantiating causal relations between neural dynamics and behavioral outcomes. In this chapter, we review many of the new advancements that are now making joint modeling a feasible alternative for relating brain to behavior in realistic settings. In an effort to maintain progress, we also provide an outlook on some of the key problems that remain unsolved. Keywords Joint modeling · Cognitive neuroscience · Latent modeling

1 Introduction The mind is a construct that was developed to explain how animals (e.g., humans) consider themselves and their environments. Because it is a construct, the mind is inherently unobservable, and so our only hope to glean insight into mental computations is to posit latent variables that can explain how observed interactions between animals and their environments come about. Clever experimental designs as well as technological advancements have enabled the collection of various manifest variables such as response time, the blood oxygenation level dependent (BOLD) response in functional magnetic resonance imaging (fMRI), and the electroencephalogram (EEG). Each of these measures ostensibly reflects signatures of

This work was supported by a CAREER award from the National Science Foundation (BMT). B. M. Turner () · G. Bahg · M. Galdo · Q. Liu The Ohio State University, Columbus, OH, USA © Springer Nature Switzerland AG 2024 B. U. Forstmann, B. M. Turner (eds.), An Introduction to Model-Based Cognitive Neuroscience, https://doi.org/10.1007/978-3-031-45271-0_9

211

212

B. M. Turner et al.

mental operations, and so they can potentially be used to inform our understanding of the processes and dynamics that underlie cognition. If our goal is to use measures of brain activity to understand cognition, a key question becomes “how can measures of the brain reflect aspects of a psychological experience.” As psychological experiences or representations ultimately guide our actions, choices, and behavior, this consideration conjures up the notion of a link between brain and behavior, which is often referred to as a set of linking propositions (Brindley, 1970; Teller, 1984; Schall, 2004; Turner et al., 2019). Linking propositions have been a productive route forward because they facilitate quantitative assessments of the contribution of physiological variables to psychological processes. For example, Teller (1984) devised a set of linking propositions by specifying logical relations among physiological variables and psychological states. Teller’s families of linking propositions were specified to be axiomatic in the sense that they relied on strict equality statements, which are impossible to satisfy in the context of measurement noise (Schall, 2004). Recognizing these practical limitations, Schall (2004) developed the concept of statistical linking functions, where probability distributions could replace axiomatic statements to accommodate the uncertainty associated with manifest variables. Finally, Forstmann et al. (2008, 2010) concretized statistical linking functions by performing null hypothesis tests among patterns of neural data and the parameters of computational models. Later, Forstmann and colleagues further articulated the concept of reciprocity in linking functions, where latent representations of the mind (i.e., as instantiated by computational models) could inform the analysis of manifest variables of brain and behavior (Forstmann et al., 2011). Of course, there are many different ways of creating a linking function between the two streams of data, and these linking functions have different advantages and disadvantages (see Chap. 2 on Linking functions). One aspect of the linking function that has become important in distinguishing among analytic approaches is the manner in which reciprocity is imposed. For example, links can be imposed that are partially reciprocal, where only one set of parameters are influenced by both streams of data. On the other hand, fully reciprocal links can be specified such that both sets of parameters are influenced by both streams of data. The definition is a technical one, but it distinguishes the types of reciprocity in terms of how the likelihood function relating model parameters to data is specified. If the likelihood of a stream of data can be written as a function of one (i.e., partial) or both (i.e., full) sets of parameters, it is what we refer to in this chapter as a joint model. Since the first edition of the Introduction to Model-based Cognitive Neuroscience (Forstmann & Wagenmakers, 2014), there have been a number of interesting new developments and applications. Whereas some of researchers have contributed to enhancing the methodological aspects of linking brain to behavior, others have used the joint modeling structure to underscore neural bases for components of theoretical models. In this chapter, we attempt to review all of the new developments since our own chapter on joint modeling in Forstmann and Wagenmaker’s book (Turner, 2015). We close with a discussion centered on practical limitations of the joint modeling approach and discuss a few possible solutions.

Advancements in Joint Modeling of Neural and Behavioral Data

213

2 Overview of the Joint Modeling Approach The basic strategy of a joint modeling approach is to exploit common statistical patterns that exist in both brain and behavior. One could—and indeed many researchers have—simply (1) analyze the brain data to extract important neural signals, (2) analyze the behavioral data by fitting a computational model to it, and (3) perform statistical tests to see which neural features are related to parameters of the behavioral model. This approach is perfectly suitable, especially when a researcher is interested in exploring potentially many hypotheses about the neural bases of their computational theory. However, using this two-stage approach necessarily compromises our ability to maximize the amount of constraint on our computational theory. Here, the word constraint refers to the range of predictions that a computational model can make, where the more constraint a model has, the narrower its predictions will be. If a computational model makes a set of narrow predictions that also happen to be in line with patterns in data, then classic theory suggests that we have garnered strong support for our theory (Roberts & Pashler, 2000; Myung & Pitt, 2002; Pitt et al., 2002). There are many ways to constrain a computational theory. One can design very specific types of experimental tasks meant to test the model’s predictions, one can increase the amount of data, or one can subject the model to better descriptions of the observable variables of cognition. For example, at one time many computational models existed that explained, in a variety of ways, how choices were made as a function of time (Luce, 1986). However, the way models were evaluated was typically on the basis of fitting them to choice and mean or median response time. Although a summary statistic of response time will technically constrain a model’s parameters when it is fit to data, because mean and medians are not sufficient characterizations of the full distribution of response times (Turner & Van Zandt, 2012; Turner & Sederberg, 2014), their ability to communicate information about how parameters in the model relate to the observed data is weaker than having the full set of response times. In the late 1970s and early 1980s, the importance of using the full response time distribution became clear to many researchers. Interestingly, once the extant models were rigorously subjected to the full distribution of response times, many models immediately fell out of favor because there existed no parameters that could capture all essential aspects of the data. The response time example serves as an important analogy for why we feel that joint modeling is a way forward for evaluating the plausibility of computational theories. If we take seriously that measures from neuroscience are reflections of underlying cognitive processes, then just as full distributions of response times placed stronger constraints on computational theories, so too can neuroscientific measures. However, most computational theories are not immediately ready to engage with neuroscientific measures, which often provide dense spatial or temporal information. The point of joint modeling is to supply a framework in which

214

B. M. Turner et al.

Fig. 1 Overview of a Joint Modeling Analysis. The figure shows a hypothetical example consisting of 30 s worth of an experiment involving a decision among three alternatives. For neural data, regions of interest are defined (left) and the blood oxygenation level dependent (BOLD) response can be extracted. Statistical models can be fit to the observed BOLD time course (middle), and parameters .δ for say, neural activation, can be estimated. For behavioral data, a cognitive model is developed (left) with mechanisms that are cognitively meaningful. The model can then be fit to data (middle), and parameters .θ for say, drift rate, can be estimated. Finally, joint models specify how the neural parameters .δ are related to the cognitive model parameters .θ through a linking function. In each model schematic, red triangles indicate stimulus presentations. BT: check the copyright permissions of this figure

computational models can agnostically be connected to neural measurements, allowing computational theories to be subjected to more rigorous constraints. As a concrete example, Fig. 1 provides an illustration of one type of joint modeling approach, applied to 30 s worth of an experiment involving a decision among three alternatives. Neural and behavioral data are separated into streams, and to accommodate computational theories that are not immediately set up to explain neural data, the burden of explaining each set of measures is placed on “submodels.” For neural data, important brain regions, called regions of interest (ROIs), are defined (top left panel) and the time course of their activations is extracted. The time series information ostensibly provides a record of how much an ROI was activated at each moment in time. Although there are some complications due to hemodynamic lag, one can specify how the presentation of a stimulus (illustrated by the red triangles) relates to the ROI’s activation by combining a function defining activation through time (called the hemodynamic response function; HRF) with a general linear model (GLM). This combination enables us to assess how experimentally controlled changes in the stimulus stream affect changes in specific ROI activations. The process of fitting the model to data procures estimates of how the experimental variables produce patterns of activation through the parameters .δ for each stimulus presentation (Molloy et al., 2019; Palestro et al., 2018).

Advancements in Joint Modeling of Neural and Behavioral Data

N δ

B θ

Directed

N

B

δ

θ

N

215

B X

∼ GP

N

B θt

Ω

T

S

Covariance

Gaussian Process

Integrative

Fig. 2 Types of Joint Modeling Implementations. Four example types of linking structures relating neural (N ) and behavioral (B) data are shown. In each panel, manifest variables are colored squares, experimental variables are gray squares, and latent parameters are represented as empty circles. Directed models (left) involve a transformation of neural data into behavioral data, whereas covariance models (middle left) capture the joint distribution of the two streams of data. Gaussian process models (middle right) exploit temporal dependencies in the manifest variables, and integrative models (right) rely on state equations to transform information in the stimulus stream to predictions for both aspects of the data

Per tradition in cognitive science, most cognitive theories have specific instantiations called cognitive models, which are designed to explain some aspect of the behavioral data. Hence, while we typically (but not necessarily) rely on statistical models to explain neural data, to explain the behavioral data, we instead rely on the cognitive model. Supposing the cognitive model has sets of parameters .θ that express different dynamics associated with the process of interest, we can similarly obtain best estimates of .θ by fitting the model to data. As in the neural stream, the parameter estimates .θ quantify how changes in the experimentally controlled stimulus stream affect the psychological processes during the task. The final step that completes a joint model is to specify a link between the parameters .δ and .θ . We note here, and return to this discussion later, that a link can exist at different (or multiple) levels. It can exist at the subject level, individual trials, or even moment by moment. In Fig. 1, the link is illustrated as bidirectional, meaning that a distribution defines the shape of the relation between the two parameters. However, this is clearly not the only way. As an example of a more intuitive link motivated by the two-stage procedure described above, one could simply regress one (or more) of the parameters onto parameters in the other set. We refer to specifications of this sort as directed joint models. Figure 2 illustrates a few different ways in which neural (orange “N” node) and behavioral (blue “B” node) data may be connected within a joint model. The middle panel is referred to as the covariance joint model and corresponds to the example in Fig. 1. The directed approach we just suggested is shown on the left panel. In both the directed and covariance joint models, the neural and behavioral streams are typically explained by separate submodels and then connected at the level of the latent variables, .δ and .θ in this illustration. To be concrete, the directed joint model may make a specification of the following form θi = β0 + βδi + ϵi ,

.

(1)

216

B. M. Turner et al.

where .ϵi ∼ N (0, σ ), a normal distribution with mean 0 and standard deviation .σ . Here, each cognitive model parameter at point i (i.e., resolution, such as trial or moment) is a realization of the neural data, subject to some linear transformation through the parameters .β0 and .β (not depicted). By contrast, a covariance joint model might have the following form (θi , δi ) ∼ Np (μ, Σ),

.

where .Np (μ, Σ) denotes a multivariate normal distribution of dimension p with mean vector .μ and variance-covariance matrix .Σ. Although the structures appear different, the basic idea is the same: changes in one parameter tend to produce changes in the other. However, to see how these changes manifest, one much consider how different the conditional distribution of one parameter, say .θ given ∗ .δ = δ , is from the marginal distribution of .θ with no information about .δ. In the chapter from the first edition (Turner, 2015), we showed how these differences were affected by the relationship between .θ and .δ: when they are uncorrelated, the conditional and marginal distributions are equivalent, but as the correlation between the two parameters increases, proportionally larger differences are observed (Turner, 2015). This dynamic implies that the stronger the correlation between the neural and behavioral data, the more constraint the neural data will provide on the computational model. A third approach we will discuss in this chapter is a new approach called Gaussian process joint modeling (GPJM; Bahg et al. (2020); Fig. 2; middle, right). Roughly interpreted, GPJMs can be thought of as an extension of the covariance approach, with a new innovation of moving the covariance structure away from the parameter space and onto the temporal axis. In this approach, instead of a linear linking function connecting two streams of submodels, a latent, nonparametric topology is defined, and this topology defines the particular pattern of data that can be observed in various streams of data. Because the topology is defined over time and not over sets of parameters (e.g., from one trial to the next), the linking function is considerably more flexible in explaining dramatic shifts in the brain-behavior dynamics. The final approach we will discuss here is the integrative model, shown on the right panel of Fig. 2. These types of models are “fully specified” in the sense that a single set of parameters directly explains both streams of data. We refer to joint models of this type as integrative. They are typically the most difficult to specify because they rely on a full characterization of brain dynamics, and specifically how those brain dynamics result in a behavioral action. In the next three sections, we discuss progress within each of these types of linking structures below. In each section, we discuss recent applications as well as important methodological advancements that enable future growth in each of these exciting new areas.

Advancements in Joint Modeling of Neural and Behavioral Data

217

3 Directed At this point, several applications of these directed models have been reported, and they have been particularly effective in perceptual decision making tasks (Cavanagh et al., 2011; Nunez et al., 2015, 2016; Frank et al., 2015; van Ravenzwaaij et al., 2017; Ratcliff et al., 2016; Herz et al., 2017; Hawkins et al., 2017). For example, Nunez et al. (2015) used EEG data on a perceptual decision making experiment as a proxy for attention. They controlled the rate of flickering stimuli presented to subject and measured power of the EEG signal at these frequencies; a measure known as steady-state visual evoked potential. The power on these frequencies is known to be modulated by attention. Importantly, Nunez et al. showed that individual differences in attention or noise suppression was indicative of the choice behavior, specifically it resulted in faster responses with higher accuracy. In a particularly novel application, Frank et al. (2015) showed how models of reinforcement learning could be fused with the DDM to gain insight into activity in the subthalamic nucleus (STN). In their study, Frank et al. used simultaneous EEG and fMRI measures as covariates in the estimation of single-trial parameters. Specifically, they used pre-defined regions of interest including the presupplementary motor area, STN, and a general measure of mid-frontal EEG theta power to constrain trial-to-trial fluctuations in response threshold, and BOLD activity in the caudate to constrain trial-to-trial fluctuations in evidence accumulation. Their work is important because it establishes concrete links between STN and preSMA communication as a function of varying reward structure, as well as a model that uses fluctuations in decision conflict (as measured by differences in expected rewards) to adjust response threshold from trial-to-trial. While Fig. 2 illustrates how the parameters .δ modulate the parameters .θ , other models assume the reverse influence, where the behavioral parameters .θ inform the neural parameters .δ. As a concrete example, Cassey et al. (2023) extended the single-unit modeling work of Purcell et al. (2010) by linking firing parameters of single-unit recordings to evidence accumulation dynamics of a decision model. Cassey et al. modeled data from a seminal experiment by Roitman and Shadlen (2002) containing behavioral recordings of two monkeys in a simple decision making task. The neural data consisted of single-cell neural recordings from the lateral intraparietal area of the cortex. On each trial, a random dot kinematogram appeared on the screen and the monkey indicated whether the coherently moving dots were drifting left or right. Response times and choices were recorded from each trial as well as the timing of action potentials from a set of neurons in the lateral intraparietal area of the cortex. The joint model builds on the work of Purcell et al. (2010, 2012) by assuming that an evidence accumulation model can provide a tight link between the observed neural firing rate and behavioral responses. In contrast to Purcell et al. (2010, 2012) where the neural data are used directly as input to an evidence accumulation model, the model also included an explicit statistical model of the single-unit spike trains. Given this implementation, descriptions of the neural data can be informed by the neuron’s properties, such as which neuron was being

218

B. M. Turner et al.

recorded, and from which monkey. The joint model by Cassey et al. (2023) was hierarchical, and the parameters of the neural submodel were allowed to vary across neurons and monkeys. A similar style of model was used by van Ravenzwaaij et al. (2017) to relate drift rate parameters of the Linear Ballistic Accumulator (LBA) model (Brown & Heathcote, 2008) to changes in event-related potentials in a mental rotation task. Although they tested several different variants, as an example, one model configuration specified a structure in which the mean drift rate parameter of the LBA model was effectively regressed onto the temporal structure of the ERP. To do this, Van Ravenzwaaij et al. epoched the ERP data (e.g., from 200 ms to 1000 ms in 100 ms bins) and aggregated the EEG data for each subject within these bins, separating them into ERPs for each condition. Even though regressing the drift rate onto the condition-level ERPs might seem somewhat agnostic, because this regression is bound to the latent components of the model, it is quite constrained with respect to the theory of evidence accumulation. A direct consequence of this structure is that the predicted ERP must change when the drift rate changes. So for example, when subjects must make longer rotations (i.e., in terms of rotation angle), one might expect the response time to increase proportionally. The pattern in the response time data will manifest in a pattern for the drift rate parameter, which in term forces the model to adjust its predictions for the ERPs. Hence, fitting the model to both streams of data requires a delicate balance of predictions for ERPs as well as choice response times.

4 Covariance By contrast to the directed approach described above, one can also specify a linking function based on covariation. In the unidimensional case, the covariance and directed approaches will be quite similar, so long as the directed link has a probabilistic structure to it. To see this connection, one can rewrite the probabilistic directed link in Eq. (1) without the trial-level residual in the following form: θi ∼ N (δi β + β0 , σ ).

.

Here, .θi is a direct mapping from .δi based on the linear transformation created by β and .β0 . If the parameters .δi are describing some neural pattern of activity (e.g., through a general linear model), estimation of .δi in this context is only determined by that neural data. By contrast, if the parameters .θi describe some behavioral data through a cognitive model, they must explain the behavioral data subject to the constraint that they are bound by the parameters .δi through .β, .β0 , and .σ . The constraint offered by the neural parameters .δ is something like the influence of a prior in Bayesian statistics, and so .θ will be affected by the values of .δ, meaning that the neural data would constrain the behavioral model. The key difference though is that the parameters .δ would not be directly affected by the values in .θ , only

.

Advancements in Joint Modeling of Neural and Behavioral Data

219

indirectly by the shape that the various .θi values can take as they converge to their best fitting values. As a consequence, the flow of information in such a model is primarily from neural data to the parameters of the behavioral model. To enable a bidirectional flow, one only needs to specify a linking function that allows both parameters to be expressed as a function of one another. One option is to use a probabilistic function, such as the multivariate normal distribution, such that (θi , δi ) ∼ Np (μ, Σ).

.

Here, the estimation of .μ and .Σ will directly impact both .θ and .δ because the parameters are specified hierarchically. This enables information in the behavioral data to be conveyed through .θi , up to .μ and .Σ, which eventually will also affect .δi . In one application, Turner et al. (2015) applied a covariance linking function to map fMRI data from 34 regions of interest into the popular diffusion decision model (DDM; Ratcliff (1978)). Off the shelf, the DDM assumes that there are trialto-trial changes in parameters such as the rate of evidence accumulation, starting point, and nondecision time (see Chap. 3 for a review); however, these changes are agnostic about what causes them to change, and by how much. Technically, the model assumes that these parameters are independent and identically distributed (IID), meaning that performance on trial 1 will be identical to performance on trial 1001. Although this is unlikely to be realistic, the model is typically fit to summary statistics of the choice response time distribution that ignore the structure of the time series information. Even when the likelihood function of the DDM is used to fit the model, the IID assumption enforces that all data have the same distribution, by definition. One way to embed structure into the temporal predictions of the model is to simply inform the DDM of trial-by-trial changes in brain activity. It is well known that our attention waxes and wanes throughout everyday tasks, a phenomenon often referred to as “mind wandering.” Supposing that aspects of mind wandering could be gleaned from neural data (Eichele et al., 2008; Wasserman, 2000; Mittner et al., 2014), changes in the neural data could be used to constrain the parameters of the DDM in ways that should increase the model’s ability to predict changes in the behavioral data, such as slower responses or upcoming errors. Turner et al. (2015) performed exactly this evaluation. They fit the classic DDM to data alongside a neurally-informed version of the DDM (the neural DDM; NDDM) by assuming a covariance linking function. They showed that the NDDM could capture complex patterns of coactivation in the brain’s functional network as well as patterns of choice response time across conditions. Perhaps the most convincing aspect of their work was a cross validation study where both models were trained on a fold of the data (i.e., training data) and then asked to predict trials in a different fold of data (i.e., test data). Here, the authors showed that the NDDM was able to outperform the DDM, suggesting that the neural data could be used to accentuate the distributional assumptions of the trial-to-trial parameters of the DDM.

220

B. M. Turner et al.

Despite the differences between directed and covariance approaches, in the univariate case, both linking functions will perform similarly. However, as the number of neural features begins to increase, the impact of multiple correlated variables in the brain data will begin to take a toll on the estimation process of a directed model due to multicollinearity. For example, if brain region A is highly correlated with brain region B, it is possible that both brain regions could be correlated with a behavioral parameter .θ due strictly to the correlation they share, rather than both being genuinely connected to the parameter itself. Of course, there are approaches for dealing with multicollinearity in multiple regression, but as the dimensionality increases, it becomes more difficult to apply these procedures to real data. By contrast, the covariance linking function directly measures each combination of bivariate correlations among all variables, which avoids the problem of multicollinearity entirely. However, modeling all possible correlations introduces another problem of scalability—we cannot hope to continue estimating all pairwise correlations in say, an application involving a whole brain’s worth of voxels. The next section discusses new approaches for reducing complexity of the covariance linking function.

4.1 FA NDDM One important issue in building connections between brain and behavior is computational complexity. For cognitive models, typically only a handful of parameters are used to detail the distribution of behavioral data, whereas in neuroscience one has access to a plethora of neurophysiological measures with both temporal and spatial precision. Due simply to the dimensionality of the neural measures, navigating the number of possibilities for mapping the neural measures to mechanisms in a cognitive model is an extremely difficult task. What is needed is a way to identify simple structures in the brain-behavior links. The multivariate normal linking function used within the NDDM discussed above has a few important advantages for describing brain-behavior dynamics. First, the multivariate normal provides a convenient measure for the central tendency of each model parameter, allowing for straightforward analysis of the hierarchical effects. Second, the multivariate normal provides a measure of the relationship between a cognitive model parameter and each neural measure in the form of a pairwise correlation parameter. Despite these conveniences, considering every possible pairwise relationship between the neural features and behavioral model parameters eventually creates a highly complex model, rendering more highdimensional applications of the NDDM, such as to the voxel rather than ROI level, practically infeasible. The issue lies in how increases in the number of neural features increases the complexity of the multivariate normal linking function. As an illustration, let q represent the number of neural measures or “features” in the data (e.g., the number of regions of interest or the number of electrodes), and let k stand for the number

Advancements in Joint Modeling of Neural and Behavioral Data

221

of behavioral model parameters we wish to relate to the neural data. If we specify a multivariate normal linking function, the dimensionality of the variance-covariance matrix .Σ will be .(p × p), where .p = q + k, making the number of free parameters within .Σ p(1 + p) 2 (q + k)(1 + q + k) = 2

nMV N =

.

=

q 2 + q + 2kq + k + k 2 . 2

(2)

Equation (2) shows that increases in either the number of neural features or the number of behavioral model parameters increase the number of free parameters by a squared term. As one can imagine, this quadratic effect will be prohibitively costly when using Monte Carlo processes to perform Bayesian estimation. To explore more parsimonious options, Turner et al. (2017) proposed a linking function based on factor analysis that decomposes the variance-covariance matrix .Σ into three matrices: Σ = ΛΦΛT + Ψ.

.

(3)

First, .Λ contains “factor loadings” that measure the degree to which a neural feature “loads” onto a latent model parameter. Second, the matrix .Φ contains what are called “factor variances” that describe the pattern of variability among the factor loadings. The intuition behind .Φ is that the factors themselves have variability in their representations of the data, and there may be relationships between the factors that partially explain the pattern of factor loadings in .Λ. Third, the residual variance matrix .Ψ captures patterns in the variance-covariance matrix that are not attributable to patterns of factor loadings. After a set of theoretically motivated constraints, Turner et al. (2017) showed that the number of free parameters in the factor analysis linking function became nF A = qk + k + q

.

= q(k + 1) + k.

(4)

Equation (4) shows that only linear terms for q and k appear, meaning that the complexity of .Σ is only linearly affected by increases in either the number of neural features or number of behavioral model parameters. To illustrate the magnitude of these effects, consider the following example. Suppose we have a cognitive model with .k = 3 parameters, and we wish to relate the model parameters to the event-related potential (ERP) from an EEG experiment with .q = 32 channels. Here, the number of free parameters we would need to estimate within .Σ when using the multivariate normal linking function is 630,

222

B. M. Turner et al.

whereas using the factor analysis linking function would reduce this number to 131. Supposing our department decided to spoil us with a set of new EEG headsets consisting of .q = 128 electrodes per headset, a replication of our experiment would require estimation of 8646 parameters for a multivariate normal linking function, but only 515 parameters when using the factor analysis linking function. This striking difference is why we think using factor analytic decomposition techniques is one feasible way forward for joint modeling applications.

4.2 Trivariate Modeling Another problem for which the covariance approach has proven useful is that of data fusion, where multiple modalities of information, such as EEG and fMRI, are processed together. Although there are now many applications of simultaneous EEG and fMRI, the approach investigated in Turner et al. (2016) was to use a cognitive model to bridge different modalities across an intertemporal choice task with a repeated measures design. Here, subjects first performed an intertemporal choice task during which EEG data were collected and then returned a week later to do the same task within fMRI (subjects were counterbalanced across modalities). To explain the pattern of choice response time data in the two sets of behavioral data, the authors used the LBA model. One of the key parameters was the rate at which preference for either the delayed or immediate option accumulated. Within the task, the degree to which an individual would have preference for the delayed option was an independent variable, and the model was constrained to explain how choice behavior changed as a function of this manipulated preference. In both the fMRI and EEG data, the authors identified neural features which corresponded to information processing within the intertemporal choice task as reported by earlier studies (Harris et al., 2013; McClure et al., 2004; Hare et al., 2014; Rodriguez et al., 2015). After confirming that these neural features were also systematically related to the levels of the independent variable, they linked all features from the two modalities to the drift rate parameter of the LBA model. After a number of different cross validation studies, the authors showed how having both EEG and fMRI data allowed for better predictions of behavioral data than did either modality alone, suggesting that the features from the EEG and fMRI data provided slightly different insights into the decision making process.

4.3 Gaussian Process Joint Modeling As discussed above, the strength of the covariance-based linking function comes from its ability to represent a systematic relationship between neural activations and latent cognitive mechanisms through a covariance matrix. So far, the only structure we have discussed is a constraint applied in the parameter space. When parameters

Advancements in Joint Modeling of Neural and Behavioral Data

223

are allowed to vary at each moment in time, covariance, and directed approaches can gain insight about interesting temporal dynamics such as changes in attentional or emotional states because these parameters change in response to the manifest variables at those points in time. However, this connection is indirect because it is based on the structural constraints in parameter space, and not based directly on points in time. To enforce a flexible temporal structure to joint models, Bahg and colleagues (Bahg et al., 2020) recently developed a nonparametric extension of the joint modeling concept through a system of Gaussian processes, motivated by Gaussian process latent variable modeling (Lawrence, 2007). Gaussian process (GP) modeling is a nonparametric approach for estimating a function based on the given data. Unlike typical GLM approaches explaining the observed responses with a linear combination of pre-determined regressors, a GP estimates a function by assuming that the target function is a sample from a distribution of functions—a multivariate normal distribution with a mean function and, importantly, a covariance function also known as a “kernel.” A kernel is the key feature of GP modeling because it determines how outputs covary according to the distance between two input points. As kernels lead neighboring inputs to have similar outputs, they play an important role in constraining the flexible shape of the function with only minimal assumptions (i.e., the shape of the kernel). For more details, we recommend readers to introductory materials on GP modeling (e.g., Williams and Rasmussen 2006; Schulz et al. 2018). Maintaining that the mind is a latent construct, Bahg et al. (2020) specified a latent GP structure, representing the mind’s state at given points in time. This latent structure is the hyperstructure of the model, and it serves as the linking function from which manifest variables can be understood. Their approach, which they call Gaussian process joint modeling (GPJM; Fig. 3) is to specify a .DX -dimensional function X with a GP prior structure over time:   X:,d ∼ N 0, k(t, t ' ) + σt2 I

.

(d = 1, · · · , DX ),

where t is a vector of time points, .k(t, t ' ) is the standard notation for a kernel function, and .σt2 is a noise term that scales an identity matrix I . In contrast to the typical covariance linking functions that specify the correlation of modalityspecific temporal changes, the GP-based linking function specifies that the dynamics are shared by all submodels. Specifically, neural and behavioral submodels take the function X as their inputs to explain the observed data. In theory, we can use classic submodels (e.g., GLM for neural data, computational cognitive models for behavioral data) jointly with the GP-based hypermodel. In their application, Bahg et al. specified separate GP regression models as submodels for both fMRI and behavioral data, which helped each submodel exploit the covariance structure developed by the underlying dynamics X. Such a specification was unique to the behavioral data they collected, which were continuous joystick movements from a motion-tracking task (e.g., Cox et al. 2012).

B. M. Turner et al.

Dimension 2

224

Hypermodel: Latent dynamics modeled as Gaussian processes (GPs)

∼ N 0, k(t, t ) + σt2I

BOLD Response

Submodels (e.g., GP regression, general linear models, computational cognitive models)

Behahvioral Response

Dimension 1

Time

Time

Fig. 3 A Visual Schematic of Gaussian Process Joint Models. Gaussian process joint models (GPJMs) assume a latent topology (top) from which manifest variables can be systematically derived (bottom). The latent topology is specified via a Gaussian process, and the position along this topology is used to make a prediction about brain (left) or behavioral (right) data. In this example, Gaussian processes are also used for each individual submodel, but other choices are possible. In the figure, each colored point in the latent topology (top) has a corresponding location in the temporal profiles of the manifest variables (bottom)

Generally speaking, the estimated dynamics are based on a nonlinear embedding of the data, and so GPJM could serve as a method for dimensionality reduction. With appropriate settings, submodels can also quantify the relative contribution of each latent dimension in generating data (e.g., automatic relevance determination; Neal (1996)). Moreover, the latent dynamics derived by GPJM can provide meaningful information about the temporal changes of cognitive activities because they are nonparametric and allow for potentially extreme changes across time that are not representative of the full time series. For example, Bahg et al. were able to separate latent dynamics into patterns that were fairly typical across time and patterns that were considered large deviations from typicality. These atypical dynamics were found to be related to different mixtures of the behavioral response, functional coactivation changes, and some distortions of the fMRI data due to head movement. However, identification of the latent structure is not the only advantage of GPJM. As a generative model, GPJM can also make an “informed guess” about the cognitive states and their manifestations for the unobserved portion of the data based on the learned latent dynamics. To test this generalization accuracy, Bahg et al. provided an out-of-sample predictive analysis in which the model was trained with a complete set of neural data and partially observed behavioral data. GPJM showed better predictive performance than Bayesian principal component regression models, which regressed the behavioral data on the first k principle components of the neural data. This result highlights the utility of using GP-based latent variable modeling not just for modeling neural data in a single modality (e.g.,

Advancements in Joint Modeling of Neural and Behavioral Data

225

Wu et al. 2017; Zhao and Park 2017) but also for complex dynamics across multiple measures of cognition.

5 Integrative Modeling Another approach to connecting brain and behavior is what we refer to as “integrative”; as shown in the right panel of Fig. 2, integrative models connect to both (or multiple) streams of data with a single set of parameters, rather than simply connecting latent variables through a linking structure. In psychology, perhaps the most prominent examples of integrative models come from Anderson and colleagues (Anderson, 2007; Anderson et al., 2008, 2010; Anderson, 2012; Borst & Anderson, 2017). In this line of work, the ACT-R model is used as the generative structure that specifies how modules in the model are activated and deactivated over time to complete a given task. The model is simulation-based in the sense that well-defined mathematical equations have not been derived for relating the model parameters to the patterns of module activation. Once the pattern of module activations have been produced by simulating the model, these activations serve as the basis for generating predictions for neural data, such as the BOLD response or changes in an EEG signal. Once a set of neural data are generated, they can be compared directly to data that were observed from the task to lend evidence to different mechanisms, as instantiated by different model configurations. In the case of ACT-R, an intense amount of theoretical development has led to its success over many years. In the case that such theory is out of scope, one can instead approach the problem of developing an integrative model structure by specifying a somewhat agnostic core model structure and instead estimate what the best structure is from the data. Such an approach has been undertaken in the literature by specifying one causal framework for modeling brain dynamics as a latent process underlying all neural measures. Figure 4 illustrates a pipeline of how an integrative model might be constructed to produce patterns of neural data. The pipeline starts from seeing (or hearing) the exogenous stimulus input on the left side, which could range from dot patterns in simple perceptual choice task to multidimensional features in high-level conceptual learning task. For any given task, one could define a set of brain ROIs that are considered most relevant to the task process (middle panel, bottom). When a complete cognitive architecture is not developed, the core component of an integrative model is a mathematical structure that specifies the latent activation of neural populations for each ROI, which can be informed by assumptions derived from either functional connectivity or cognitive process models. One successful specification for neural dynamics is Multivariate Dynamical Systems (MDS; Liu et al. (2020); Ryali et al. (2011)): S(t) = C(t)S(t − 1) + DU (t) + ω(t),

.

226

B. M. Turner et al.

Fig. 4 A Visual Schematic of an Integrative Modeling Approach. Information processing begins with exogenous changes in the environment (left), causing a dynamic change in the (latent) states of brain regions of interest (ROIs; middle). The baseline interaction of these regions can be established through functional connectivity (bottom) to guide the predictions for dynamic interactions within a task. The interpretation of dynamical interactions in the system can be facilitated by cognitive models (top). Once the details of the mathematical model defining state activations have been specified, forward going equations can be used to systematically map the latent states onto predictions for either fMRI (top) or EEG (bottom)

where .S(t) is a .(M × 1) column vector representing neuronal activations at time t for each of a set of M ROIs. .C(t) is an .(M × M) matrix specifying the strengths of endogenous brain connectivity at time point t. The vector .U (t) also has M components, each of which indicates the strength of external inputs to the corresponding ROI at time t. D is an .(M × M) diagonal matrix and .D(i, i) weights the external inputs. The noise term .ω(t) is a vector of length M sampled from a multivariate normal distribution with .ω(t) ∼ NM (0, Q(t)). On the other hand, cognitive processing models contain abstract cognitive operations necessary for performing cognitive tasks, such as the assumption of evidence accumulation processes depicted at the top of the middle panel. Cognitive models inform neuronal activation calculations by explicitly specifying that one or more brain ROIs are responsible for certain cognitive operations. For example, brain regions such as frontal eye field (FEF) and lateral intraparietal cortex (LIP) are assumed to accumulate evidence for initiating a choice, and so within the integrative modeling framework, we have treated latent neural activation (e.g. .S(t)) and accumulated evidence as equal (see Liu et al. 2020). As another example, theories about stopping rules can be used to specify how the model generates predictions for behavioral responses such as choice response time distributions. With latent neuronal activation in hand, one can obtain neural measures of different modalities (fMRI and EEG) by using a forward mapping process, as

Advancements in Joint Modeling of Neural and Behavioral Data

227

illustrated by the branching arrows connecting the middle panel to the right panel in Fig. 4. A conventional way to produce predictions for the fMRI BOLD signal is by linearly convolving the state activations with a hemodynamic response function with a double gamma shape (Liu et al., 2020; Ryali et al., 2011). Whereas this approach is somewhat generic, considerably more complicated approaches such as the Balloon model used in Dynamic Causal Modeling (Friston et al., 2003) consider many details of the vascular system. To produce predictions for scalp EEG measures, a conventional approach is to multiply the state equations with a lead field matrix (e.g. David et al. 2005; Bojak et al. 2010), which functions as a weight matrix mapping brain coordinates to EEG scalp locations. One of the unique properties of integrative models is that they posit complex latent system dynamics from which all other measures can be derived. Having a common structure that connects to each aspect of brain-behavior measurements allows a likelihood function to be derived, or at least approximated (Turner & Van Zandt, 2018). In theory, this implies that one could learn aspects of the state dynamics that are only appreciable in certain types of measurements, similar to the motivation of data fusion and trivariate modeling above (Turner et al., 2016). However, to gain this considerable advantage, we must be able to solve the inverse problem, in other words we must be able to estimate parameters accurately so that we can provide evidence toward one type of system dynamic while ruling out others. Liu et al. (2020) performed a parameter recovery study using an MDS model to generate synthetic behavior and BOLD responses. In this case, they found that the posterior distributions of constrained versions of MDS could be accurately estimated. There are many aspects of integrative models that are advantageous for the field of model-based cognitive neuroscience. Primarily, integrative modeling provides a theoretical framework using causal latent system dynamics that can be used to generate predictions for EEG, fMRI, and behavioral data. Integrative modeling is similar in spirit to some fMRI/EEG data fusion approaches in terms of basing the inference procedure on a generative model for both modalities (e.g. Daunizeau et al. 2007; Deneux and Faugeras 2010). From our perspective, integrative models also offer a window into decision dynamics, but such transformations of the internal state equations at present seem to require underlying assumptions from cognitive mechanisms. There are other methods that could be considered an integrative approach such as the behavioral DCM (Rigoux & Daunizeau, 2015). In this approach, a set of state equations are used to generate neural dynamics, and transformations of these dynamics are used to map them to probabilities of making a choice. In other words, the behavioral data can be viewed as a byproduct of modeling latent neuronal activation. In the absence of good theories to motivate the mapping of neural dynamics to choice, we see few other options. However, we advocate for the use of cognitive models (Forstmann et al., 2011) as a means to construct this transformation. We hope that in time, better theories will be constructed based on their ability to transform neural state equations into predictions for behavior, similar

228

B. M. Turner et al.

in spirit to the raw transformations of spike train data considered in Cassey et al. (2023) and Purcell et al. (2010). As a final thought, many of the dynamical aspects of the model are difficult to assess without some baseline knowledge about the brain’s functional connectivity profile. Without such knowledge, it is difficult to detect whether a parameter detailing that the interaction between two ROIs is significantly different within a task from what would be predicted simply by their structural properties. Future work will need to consider not only the dynamical aspects of the brain but also its structural properties.

6 Practical Concerns Although there continues to be substantial progress in developing and applying joint models to various cognitive problems, there remain several practical issues that must be addressed before further advancements can be made. In this section, we discuss some practical limitations of applying joint models to data, and where possible, we speculate about how these limitations may be resolved.

6.1 Accessibility Perhaps the most practical limitation is that there is some cost for entering into a joint modeling analyses. For example, the covariance approach requires that a hierarchical model be fit to data. For both Frequentists and Bayesians, this is no trivial task. Although we recommend Bayesian estimation when fitting the joint model to data, we also acknowledge that performing Bayesian estimation typically requires complex algorithms to estimate the model parameters accurately. For example, one standard method for estimating the joint posterior distribution is Markov chain Monte Carlo. For the typical user, learning about MCMC requires some knowledge of statistics and then implementing MCMC for your particular problem would require some knowledge of how to program up functions for likelihoods, priors, and perturbation kernels, effectively. Fortunately, there are now many software programs available for performing Bayesian estimation, such as JAGS (Plummer, 2003) and Stan (Carpenter et al., 2017). Using these programs, one can specify even a complex hierarchical model with only a few lines of code, and quickly be fitting their first joint model to data. Recognizing this accessibility limitation, our group has done our best to facilitate joint modeling within JAGS in both a book (Turner et al., 2018) and a tutorial paper (Palestro et al., 2018). For example, in the tutorial paper we provide code for a very simple example as well as a more complex one, showing the reader how to estimate the parameters of a general linear model with a hemodynamic response function and connect them to a simple diffusion model for choice response time data, an

Advancements in Joint Modeling of Neural and Behavioral Data

229

example that is similar to the NDDM discussed above (Turner et al., 2015, 2017). Although more complex models will require alternative strategies than what can be facilitated by JAGS or Stan, we feel that most problems can be addressed with these software programs. For example, even complex GP models can now be fit to data using libraries such as Stan, and so we hope that further developments of software such as these will enable more complex joint models, such as GPJM, to be readily accessible to non-experts.

6.2 Adaptability Another concern is whether joint modeling can be applied to your particular problem of interest. For example, one may wonder if the type of linking that is performed to connect structural information about the brain to performance variables from different subjects (Turner et al., 2013) is also appropriate for linking individual trial-level activations from fMRI to the parameters of a decision model (Turner et al., 2015). As discussed in the chapter from the first edition of this book (Turner, 2015), joint models can be specified in a variety of different ways, can be applied to different streams of neural data, and can immediately accommodate different decision models. This affords enormous flexibility for a modeler interested in exploring many different questions about the most plausible computations that underly brain dynamics. By contrast, the more recently developed GPJM is currently only structured to link moment-to-moment changes in the neural and behavioral data, limiting its practicality to block-type designs. Because most cognitive models and cognitive tasks are designed to be event-related, this makes GPJM difficult to apply to settings that are perhaps of primary interest to the field of cognitive neuroscience. Although this is a practical limitation, we do not view this as being a technical one. For example, one can imagine that the latent dynamics of the GPJM could be used to specify trial-to-trial changes in cognitive model parameters, just as would be specified in something like the NDDM. Here, such a model would impose temporal structure on the parameters via the Gaussian process assumption, rather than on parametric structure as in the NDDM, where cognitive model parameters are defined conditional on the parameters of the neural submodel. Future work will need to identify which type of constraint is best suited for modeling brain-behavior dynamics.

6.3 Computation Perhaps the biggest hurdle, central to all (especially generative) methods in neuroscience, is that of computation. Neural data provide a wealth of data, commonly consisting of thousands of data points across either high temporal or spatial

230

B. M. Turner et al.

resolutions. Although there are common functions for imposing structure across neural features, such as how neural activation is related to a stimulus presentation via the shape of the hemodynamic response function, in some cases it may be desirable to avoid such specifications and instead treat every neural feature as an independent entity. In such cases, estimating the degree to which each neural feature is related to each model component will be prohibitively costly. We are currently undergoing several different approaches for managing the computational aspects of high-dimensional data. In addition to the previously reviewed factor analytic approach (Turner et al., 2017), current work has successfully incorporated regularization techniques such as the LASSO and Ridge methods (Kang et al., 2020). In this approach, we use a specific prior to impose a type of regularization that exploits the idea of shrinkage (Polson & Scott, 2011, 2012; Park & Casella, 2008) when fitting a model to data in the Bayesian framework. Effectively, the shrinkage prior enforces that neural features exhibiting zero or small correlations (or factor loadings) with the behavioral model should be pushed downward to have a zero-loading structure, whereas features with high or moderate loadings are not penalized downward. Regularizing the correlation parameters in this way facilitates a parsimonious structure of the brain-behavior link to emerge naturally, which facilitates clearer explanations of how cognitive mechanisms play out in mental computations. Furthermore, as we have observed in simulation studies, allowing features with low correlations to be “zeroed out” reduces certain biases from occurring in some model parameters. Regularization techniques mainly facilitate an understanding of complex brainbehavior relationships. That is, there is nothing about the regularization itself that reduces the sheer number of parameters, or number of brain-behavior links to test. However, complexity metrics such as the effective number of model parameters do reduce, which can accelerate parameter estimation conducted by posterior sampling algorithms such as Markov chain Monte Carlo (MCMC). In addition, one can envision an algorithm that would discontinue posterior sampling for parameters deemed to have zero correlation. Although future work will need to verify that such procedures do not adversely affect estimation of other parameters in the model, such a procedure would dramatically improve our ability to scale joint models to more complex data structures. In addition to regularization methods, new methods are being developed to accelerate the efficiency of parameter estimation, especially in the Bayesian context. For example, we have found that variational inference (Galdo et al., 2020) holds great promise for establishing accurate posterior estimates for models used in cognitive science in a small fraction of the time that it would take to use even the most efficient Bayesian sampling algorithms such as differential evolution MCMC (Turner et al., 2013). Even in the context of hierarchical models, variational inference may prove critical for fitting complex joint models to data.

Advancements in Joint Modeling of Neural and Behavioral Data

231

6.4 Utility Although we have discussed a number of important conceptual and statistical advantages of the joint modeling approach, it is also important to consider how joint models may be of practical interest to the typical practitioner in cognitive neuroscience. As one example, efficient data collection is one of the most important considerations in many neuroimaging experiments. Online design optimization methods have been developed and applied to many neuroimaging and behavioral experiments such as The Automatic Neuroscientist and Adaptive Design Optimization (Lorenz et al., 2016; Cavagnaro et al., 2010). These methods take the history of experimental events (i.e., stimuli and associated responses) into account and propose a stimulus for the next trial that could maximize the expected information of interest (e.g., regional activation, estimates of the cognitive model parameters, comparative evidence for competing candidate models). Despite these advantages, these methods have not successfully exploited the constraints provided by using data from both modalities. When a research question is based on computational cognitive models, joint modeling can be used to guide design optimization methods to enhance the efficiency of neuroimaging data collection. In one application (Bahg et al., 2020), an fMRI-based Bayesian design optimization pipeline was developed using a directed joint model that explains a two-alternative contrast discrimination task. Figure 5a illustrates the basic concept of this approach. At the algorithm’s core, a joint model enables the evaluation of the likelihood of observing the joint set of neural and behavioral data. During the experiment as data are collected, we gain more and more information about how a subject’s brain enables choices through cognition. As we continue to gain information about our subjects, ADO evaluates the utility of candidate stimuli and chooses the stimulus with the highest expected utility, which is then presented to the subject on the next trial. In both simulation and fMRI experiments, our approach identified the most informative set of stimuli, and was able to obtain more accurate parameter estimation with fewer trials. For example, Fig. 5c shows that metrics of posterior accuracy, such as the root mean squared error and posterior standard deviation, are consistently better (i.e., smaller) for the ADO-based approach compared to a standard, randomly selected stimulus design approach. At some point, the gains made in using ADO begin to pile up, as shown in the far right panel of Fig. 5c. In this plot, we evaluated how many more experimental trials would be needed for a random design to have the same information that an ADO-based procedure would gain, as a function of trial number (x-axis). Here, the figure shows that by 20 trials, one would need an additional 12 trials (i.e., a total of 32 trials) in a randomized design to learn the same amount of information about a subject compared to an ADO-based search. The use of a directed linking function might give an impression that this method can be used only when one has a very specific hypothesis of the brain-behavior relationship. Admittedly, directed joint models, which tend to have a smaller number of parameters compared to covariance joint models, might be more suitable

232

B. M. Turner et al.

Fig. 5 Adaptive Design Optimization for Joint Modeling Applications. (a) A visual illustration of Adaptive Design Optimization (ADO) for model-based functional magnetic resonance imaging experiments. Neural and behavioral data obtained from participants are used for evaluating the utility of stimuli given the previous sequence of the experiment. ADO chooses the stimulus with the highest expected utility and uses it for the next trial. Periodically, we estimate the full posterior distribution of the model to improve the quality of grid-based approximation of the utility. (b) Experimental designs selected for a simulation experiment compared between ADO (top) and random search (RS; bottom). Black dots represent the candidate stimuli in the simulation experiment. The density of the colored shades increases as a stimulus has been selected more frequently. The results are aggregated over the first (left) and second (middle) five trials, and the entire 20-trial sequence (right). (c) Performance of ADO compared with RS. Accuracy (left) and precision (middle) of the posterior estimates from ADO and RS are compared in terms of root mean squared error and standard deviation of the posterior distributions. The right panel shows the number of additional trials required for RS to attain the same mean performance of ADO as a function of trial

in experiments using online design optimization due to limited computational resources and time pressure. However, as we have reviewed above, many directed models can be used effectively to gain similar insights as covariance models. So long as the model of interest is appropriately scaled, optimal experimental design methods can still guide experiments in either an exploratory or confirmatory way (Turner et al., 2017).

6.5 Constraint In recent years, neuroscientists have spent considerable effort studying structural connectivity. While the joint modeling framework has succeeded in fusing state-ofart neural measures such as EEG and fMRI and a variety of measurable behaviors, it has yet to fully appreciate the constraining information of structural connectivity.

Advancements in Joint Modeling of Neural and Behavioral Data

233

The primary reason being there is not yet a general framework for relating structural connectivity to other manifest variables, though we discuss promising avenues. Structural connectivity refers to white matter tracts connecting regions of the brain and is typically inferred from diffusion-weighted magnetic resonance imaging (DWI) data (Le Bihan et al., 2001). Roughly speaking, DWI uses the diffusion of water molecules to infer the structural architecture of white matter tracts in the brain. White matter tracts are primarily composed of myelinated axons, and therefore is considered a marker of the physical communicative connection between a pair of ROIs. From DWI data, a structural connectivity statistic is computed. A variety of structural connectivity statistics are used across the field including fractional anisotropy, edge weights, fiber counts, and more. More simply, one can construct a binary undirected graph based on these measures. The raw structural data has been used in joint modeling applications, such as the connection between striatum and presupplementary motor areas in Turner et al. (2013) building off of earlier work (Forstmann et al., 2011). More recently, structural information has been used within a covariance joint model to explain risky decision making (D’Alessandro et al., 2020). Intermediate of structural connectivity and neural time series data is functional connectivity. Functional connectivity is broad term that can refer to a connection as strict as a causal relationship or as nonspecific as the statistical covariance between ROIs’ timeseries. A functional connection is a statistic computable directly form the neural time series. By contrast, structural connectivity has an indirect but consistently documented relationship to functional connectivity (Damoiseaux & Greicius, 2009). However, the exact mathematical relationship between functional connectivity and structural connections remains an open question. Because it is an intermediate measure, functional connectivity is a promising avenue for relating structural connectivity to neural time series data. Functional connectivity is canonically measured as the Pearson’s correlation between ROIs’ fMRI time series. These correlations are thought to give insight of direct (via axonal pathways) or indirect (via intermediate regions) communication between ROIs. The Pearson correlation measure assumes the dependence is linear. Furthermore, the correlations can emerge spuriously or from a tertiary commoncausal ROI. To control for a confounding common-causal ROI, it is often better to examine a partial correlation than Pearson’s correlation. In this context, the partial correlation is the correlation between two ROIs conditional on all other ROIs. If functional connectivity is measured as a Pearson’s correlation, the partial correlation matrix is simply the inverse of the functional connectivity matrix. Zero elements of the partial correlation matrix imply an absence of direct functional connectivity, or, more precisely the ROIs are conditionally independent assuming the distribution of all ROIs observations are multivariate Gaussian (Baba et al., 2004). Hinne et al. argues that for two ROIs to be directly functionally connected (non-zero partial correlation), the ROIs must be structurally connected. Therefore, binary structural connectivity (connective vs. not connected) can inform us about the sparsity of the partial correlation matrix. To implement this constraint, Hinne et al. (2014) suggests structural information to specify a

234

B. M. Turner et al.

G-Wishart prior (Roverato, 2002) on the partial correlation matrix, where the GWishart distribution is the conjugate prior over partial correlation matrix that has equivalent sparsity as G, the undirected graph of connections formed from binary structural connectivity data. Hinne et al. (2014) found using this prior lead to better estimates of resting-state functional connectivity in simulated data and improved detection of previously established functional connections in empirical data. Functional connectivity is most frequently calculated for resting-state data, but task-based functional connectivity is most relevant to model-based cognitive neuroscientists. Task-based functional connectivity has also been associated with structure (Hermundstad et al., 2013). Task-based functional connectivity is analogous to the covariance between the single trial-parameters of a set of ROIs when assuming a GLM for the neural submodel in a directed or covariance approach. Therefore, using a G-Wishart prior with a structurally informed graph appears directly applicable to covariance joint models. Beyond covariance joint models, the undirected graph G could also be used to help specify the directed graph of the integrative MDS joint model, or potentially the particular shape of the topology used in the GPJM.

7 Conclusions At the highest level of interpretation, joint modeling is effectively a strategy for handing complex covariates with different temporal resolutions. Joint modeling takes seriously the notion that cognition is a construct developed to explain how the brain produces thoughts and actions. As such, cognition cannot be directly measured. However, it can be indirectly quantified through latent variable modeling. The difficult part is specifying good theories for the types of computations that a brain can perform, which would ultimately also mimic patterns of actions such as those in choice response time distributions. Unfortunately, we are far from knowing all of these computations, and so we must engage in an iterative process of developing, testing, rejecting, and validating different theories until we converge onto a set of established interpretations of the brain’s computations. Until we have a grand theory of how the brain produces behavior that we can all agree on, joint modeling is just one way to investigate the mysteries of how cognition can be realized in a biological system.

8 Suggested Readings There are many great articles that consider the problem of linking from a much more theoretical perspective, such as Teller (1984) and Schall (2004). Forstmann’s review on reciprocity (Forstmann et al., 2011) has had a profound effect on the many different ways in which cognitive models can be used to interpret brain data, which

Advancements in Joint Modeling of Neural and Behavioral Data

235

are reviewed in two articles (Turner et al., 2017; de Hollander et al., 2016). For more information on joint modeling in particular, we recommend the book (Turner et al., 2018), a more recent review (Turner et al., 2019), or the tutorial article (Palestro et al., 2018) featuring code for fitting different models (i.e., directed and covariance) to data. There are also interesting new developments in the machine learning literature featuring recurrent neural networks (Dezfouli et al., 2018). Although we discussed integrative modeling, there are many more sophisticated modeling structures that are integrative, such as those approaches led by Friston et al. (2003, 2000); Friston (2002); Friston et al. (2003) and others (Rigoux & Daunizeau, 2015). It is also important to highlight that there are conditions in which directed and joint models may not recover the ground truth (Hawkins et al., 2017).

9 Thought-Provoking Questions – What are some advantages and disadvantages of using the linear regression linking equation (i.e., Eq. (1)) when constructing directed joint models? In what cases would it be better to use a multivariate linking function? In what cases would it be better to use a nonparametric approach? – We discussed the Gaussian process and integrative modeling as two ways that account for temporal dynamics in the data. One is directly connected to the time dimension, whereas the other is connected to the stimulus information. The connection of stimulus to brain is perhaps more obvious, but can you explain why temporal dynamics may contain information that elucidates cognitive phenomena? Try to think of some concrete cognitive tasks to articulate your response. – In Chap. 10, the authors described how the ACT-R architecture can be used to model both neural and behavioral data. Can you compare the ACT-R approach with the integrative approach discussed in this chapter? Are there any practical limitations that might prohibit the use of the ACT-R approach in problems you are interested in? – Given the dimension mismatch of the time information of neural and behavioral data, current joint models either compress the neural data (e.g., extract singletrial betas in fMRI, select the time window of interest in EEG) to obtain trial-level information, or exploit moment-by-moment dynamics by developing real-time models (e.g., GPJM, integrative), sacrificing trial-level interpretability at some degree. Can you think of a better way to connect multi-modal data consisting of different time scales? – Most joint modeling practices have focused on linking brain to behavior through a computational model. As such, these approaches inherently assume that pre-defined mechanisms might explain both streams of data. If a significant relationship is found the interpretation is that the discovered brain region is doing something similar to the psychological construct from which the mechanism

236

B. M. Turner et al.

is derived. Is this reasonable? How can we protect ourselves from the obvious counterfactual that other mechanisms might have correlated equally well with this particular brain area?

References Anderson, J. R. (2007). How can the human mind occur in the physical universe? New York: Oxford University Press. Anderson, J. R. (2012). Neuropsychologia, 50, 487. Anderson, J. R., Carter, C. S., Fincham, J. M., Qin, Y., Ravizza, S. M., & Rosenberg-Lee, M. (2008). Cognitive Science, 32, 1323. Anderson, J. R., Betts, S., Ferris, J. L., & Fincham, J. M. (2010). Proceedings of the National Academy of Sciences of the United States, 107, 7018. Baba, K., Shibata, R., & Sibuya, M. (2004). Australian & New Zealand Journal of Statistics, 46(4), 657. Bahg, G., Evans, D. G., Galdo, M., & Turner, B. M. (2020). Gaussian process linking functions for brain, behavior, and mind. In press at Proceedings of the National Academy of Sciences of the United States of America. Bahg, G., Sederberg, P. B., Myung, J. I., Li, X., Pitt, M. A., Lu, Z. L., & Turner, B. M. (2020). Real-time adaptive design optimization within functional MRI experiments. In press at Computational Brain & Behavior. Bojak, I., Oostendorp, T. F., Reid, A. T., & Kötter, R. (2010). Brain Topography, 23(2), 139. Borst, J. P., & Anderson, J. R. (2017). Journal of Mathematical Psychology, 76, 94. Brindley, G. S. (1970). Physiology of retina and visual pathways (2nd ed.). Baltimore, MD: Williams & Wilkins. Brown, S., & Heathcote, A. (2008). Cognitive Psychology, 57, 153. Carpenter, B, Gelman, A, Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M. A., Guo, J., Li, P., & Riddell, A. (2017). Journal of Statistical Software, 76(1), 1–32. https:// doi.org/10.18637/jss.v076.i01. Epub 2017 Jan 11. PMID: 36568334; PMCID: PMC9788645. Cassey, P., Gaut, G., Steyvers, M., & Brown, S. D. (2023). Psychonomic Bulletin and Review, 30(4), 1171–1186. Cavagnaro, D. R., Myung, J. I., Pitt, M. A., & Kujala, J. V. (2010). Neural Computation, 22(4), 887. Cavanagh, J. F., Wiecki, T. V., Cohen, M. X., Figueroa, C. M., Samanta, J., Sherman, S. J., & Frank, M. J. (2011). Nature Neuroscience, 14, 1462. Cox, G., Kachergis, G., & Shiffrin, R. (2012). In Proceedings of the annual meeting of the cognitive science society (Vol. 34). D’Alessandro, M., Gallitto, G., Greco, A., & Lombardi, L. (2020). Brain Sciences, 10(3), 138. Damoiseaux, J. S., & Greicius, M. D. (2009). Brain Structure and Function, 213(6), 525. Daunizeau, J., Grova, C., Marrelec, G., Mattout, J., Jbabdi, S., Pélégrini-Issac, M., Lina, J. M., & Benali, H. (2007). Neuroimage, 36(1), 69. David, O., Harrison, L., & Friston, K. J. (2005). NeuroImage, 25(3), 756. Deneux, T., & Faugeras, O. (2010). Neural Computation, 22(4), 906. Dezfouli, A., Morris, R., Ramos, F. T., Dayan, P., & Balleine, B. (2018). In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.). Advances in neural information processing systems (Vol. 31, pp. 4228–4237). New York: Curran Associates, Inc. http://papers.nips.cc/paper/7677-integrated-accounts-of-behavioral-andneuroimaging-data-using-flexible-recurrent-neural-network-models.pdf. de Hollander, G., Forstmann, B. U., & Brown, S. D. (2016). Cognitive Neuroscience and Neuroimaging, 1, 101.

Advancements in Joint Modeling of Neural and Behavioral Data

237

Eichele, T., Debener, S., Calhoun, V. D., Specht, K., Engel, A. K., Hugdahl, K., von Cramon, D. Y., & Ullsperger, M. (2008). Proceedings of the National Academy of Sciences of the United States, 16, 6173. Forstmann, B. U., & Wagenmakers, E. J. (2014). An introduction to model-based cognitive neuroscience. New York: Springer. Forstmann, B. U., Dutilh, G., Brown, S., Neumann, J., von Cramon, D. Y., Ridderinkhof, K. R., & Wagenmakers, E. J. (2008). Proceedings of the National Academy of Sciences, 105, 17538. Forstmann, B. U., Anwander, A., Schäfer, A., Neumann, J., Brown, S., Wagenmakers, E. J., Bogacz, R., & Turner, R. (2010). Proceedings of the National Academy of Sciences, 107, 15916. Forstmann, B. U., Tittgemeyer, M., Wagenmakers, E. J., Derrfuss, J., Imperati, D., & Brown, S. (2011). Journal of Neuroscience, 31, 17242. Forstmann, B. U., Wagenmakers, E. J., Eichele, T., Brown, S., & Serences, J. T. (2011). Trends in Cognitive Sciences, 15, 272. Frank, M., Gagne, C., Nyhus, E., Masters, S., Wiecki, T. V., Cavanagh, J. F., & Badre, D. (2015). Journal of Neuroscience, 35(2), 485. Friston, K. (2002). NeuroImage, 16, 513. Friston, K. J., Mechelli, A., Turner, R., & Price, C. J. (2000). NeuroImage, 12(4), 466. Friston, K., Harisson, L., & Penny, W. (2003). NeuroImage, 19, 1273. Friston, K. J., Harrison, L., & Penny, W. (2003). Neuroimage, 19(4), 1273. Galdo, M., Bahg, G., & Turner, B. M. (2020). Variational Bayesian methods for cognitive science. Psychological Methods, 25(5), 535. Hare, T. A., Hakimi, S., & Rangel, A. (2014). Frontiers in Neuroscience, 8, 50. Harris, A., Hare, T., & Rangel, A. (2013). Journal of Neuroscience, 33(48), 18917. Hawkins, G., Mittner, M., Forstmann, B. U., & Heathcote, A. (2017). Journal of Mathematical Psychology, 76, 142. Hermundstad, A. M., Bassett, D. S., Brown, K. S., Aminoff, E. M., Clewett, D., Freeman, S., Frithsen, A., Johnson, A., Tipper, C. M., Miller, M. B., et al. (2013). Proceedings of the National Academy of Sciences, 110(15), 6169. Herz, D. M., Tan, H., Brittain, J. S., Fischer, P., Cheeran, B., Green, A. L., FitzGerald, J., Aziz, T. Z., Ashkan, K., Little, S., Foltynie, T., Limousin, P., Zrinzo, L., Bogacz, R., & Brown, P. (2017). eLife, 6, e21481. https://doi.org/10.7554/eLife.21481. Hinne, M., Ambrogioni, L., Janssen, R. J., Heskes, T., van Gerven, M. A. (2014). NeuroImage, 86, 294–305. Kang, I., Yi, W., & Turner, B. M. (2020). Regularization methods for linking brain and behavior. Under review. Lawrence, N. (2007). In G. Ghahramani (Ed.). Proceedings of the international conference in machine learning (pp. 481–488). New York: ACM. Le Bihan, D., Mangin, J. F., Poupon, C., Clark, C. A., Pappata, S., Molko, N., & Chabriat, H. (2001). Journal of Magnetic Resonance Imaging: An Official Journal of the International Society for Magnetic Resonance in Medicine, 13(4), 534. Liu, Q., Petrov, A. A., Lu, Z. L., & Turner, B. M. (2020). Computational Brain & Behavior, 3, 1–28. Lorenz, R., Monti, R. P., Violante, I. R., Anagnostopoulos, C., Faisal, A. A., Montana, G., & Leech, R. (2016). NeuroImage, 129, 320. Luce, R. D. (1986). Response times: Their role in inferring elementary mental organization. New York: Oxford University. McClure, S. M., Laibson, D. I., Loewenstein, G., & Cohen, J. D. (2004). Science, 306(5695), 503. Mittner, M., Boekel, W., Tucker, A. M., Turner, B. M., Heathcote, A., & Forstmann, B. U. (2014). Journal of Neuroscience, 34, 16286. Molloy, M. F., Bahg, G., Lu, Z. L., & Turner, B. M. (2019). Journal of Cognitive Neuroscience, 31(12), 1976–1996. Myung, I. J., & Pitt, M. (2002). In H. Pashler, & J. Wixted (Eds.). Stevens’ Handbook of Experimental Psychology (3rd ed., pp. 429–460). New York: Wiley.

238

B. M. Turner et al.

Neal, R. M. (1996). Bayesian learning for neural networks. New York: Springer Science & Business Media. Nunez, M. D., Srinivasan, R., & Vandekerckhove, J. (2015). Frontiers in Psychology, 8(18), 1. Nunez, M. D., Vandekerckhove, J., & Srinivasan, R. (2016). How attention influences perceptual decision making: Single-trial EEG correlates of drift-diffusion model parameters. In press. Palestro, J. J., Bahg, G., Sederberg, P. B., Lu, Z. L., Steyvers, M., & Turner, B. M. (2018). Journal of Mathematical Psychology, 84, 20. Park, T., & Casella, G. (2008). Journal of the American Statistical Association, 103(482), 681. https://doi.org/10.1198/016214508000000337. Pitt, M. A., Myung, I. J., & Zhang, S. (2002). Psychological Review, 109, 472. Plummer, M. (2003). In Proceedings of the 3rd international workshop on distributed statistical computing. Polson, N. G., & Scott, J. G. (2011). Shrink globally, act locally: Sparse Bayesian regularization and prediction. Oxford: Oxford University Press. https://doi.org/10.1093/acprof:oso/ 9780199694587.003.0017. Polson, N. G., & Scott, J. G. (2012). Bayesian Analysis, 7(4), 887. Purcell, B., Heitz, R., Cohen, J., Schall, J., Logan, G., & Palmeri, T. (2010). Psychological Review, 117, 1113. Purcell, B., Schall, J., Logan, G., & Palmeri, T. (2012). Journal of Neuroscience, 32, 3433. Ratcliff, R. (1978). Psychological Review, 85, 59. Ratcliff, R., Sederberg, P. B., Smith, T. A., & Childers, R. (2016). Neuropsychologia, 93, 128. Rigoux, L., & Daunizeau, J. (2015). NeuroImage, 117, 202. Roberts, S., & Pashler, H. (2000). Psychological Review, 107, 358. Rodriguez, C. A., Turner, B. M., Van Zandt, T., & McClure, S. M. (2015). European Journal of Neuroscience, 42, 1–11. Roitman, J. D., & Shadlen, M. N. (2002). Journal of Neuroscience, 22, 9475. Roverato, A. (2002). Scandinavian Journal of Statistics, 29(3), 391. Ryali, S., Supekar, K., Chen, T., & Menon, V. (2011). Neuroimage, 54(2), 807. Schall, J. D. (2004). Annual Review of Psychology, 55, 23. Schulz, E., Speekenbrink, M., & Krause, A. (2018). Journal of Mathematical Psychology, 85, 1. Teller, D. Y. (1984). Vision Research, 24, 1233. Turner, B. M. (2015). In B. U. Forstmann, E. Wagenmakers (Eds.), An introduction to model-based cognitive neuroscience (pp. 199–220). New York: Springer. Turner, B. M., & Sederberg, P. B. (2014). Psychonomic Bulletin and Review, 21, 227. Turner, B. M., & Van Zandt, T. (2012). Journal of Mathematical Psychology, 56, 69. Turner, B. M., & Van Zandt, T. (2018). Trends in Cognitive Sciences, 22, 826. Turner, B. M., Sederberg, P. B., Brown, S., & Steyvers, M. (2013). Psychological Methods, 18, 368. Turner, B. M., Forstmann, B. U., Wagenmakers, E. J., Brown, S. D., Sederberg, P. B., & Steyvers, M. (2013). NeuroImage, 72, 193. Turner, B. M., Van Maanen, L., & Forstmann, B. U. (2015). Psychological Review, 122, 312. Turner, B. M., Rodriguez, C. A., Norcia, T., Steyvers, M., & McClure, S. M. (2016). NeuroImage, 128, 96. Turner, B. M., Forstmann, B. U., Love, B. C., Palmeri, T. J., & L. Van Maanen (2017). Journal of Mathematical Psychology, 76, 65. Turner, B. M., Wang, T., & Merkel, E. (2017). NeuroImage, 153, 28. Turner, B. M., Forstmann, B. U., & Steyvers, M. (2018). In A. H. Criss (Ed.) Simultaneous modeling of neural and behavioral data. Switzerland: Springer International Publishing. Turner, B. M., Palestro, J. J., Miletic, S., & Forstmann, B. U. (2019). Neuroscience and Biobehavioral Reviews 102, 327. D. van Ravenzwaaij, Provost, A., & Brown, S. D. (2017). Journal of Mathematical Psychology, 76, 131. Wasserman, L. (2000). Journal of Mathematical Psychology, 44, 92–107.

Advancements in Joint Modeling of Neural and Behavioral Data

239

Williams, C. K. I., & Rasmussen, C. E. (2006). Gaussian processes for machine learning. Cambridge, MA: MIT Press. Wu, A., Roy, N. A., Keeley, S., & Pillow, J. W. (2017). In Advances in neural information processing systems (pp. 3496–3505). Zhao, Y., & Park, I. M. (2017). Neural Computation, 29(5), 1293.

Cognitive Models as a Tool to Link Decision Behavior with EEG Signals Guy E. Hawkins, James F. Cavanagh, Scott D. Brown, and Mark Steyvers

Abstract EEG is a direct measure of cortical neuronal activities, making it particularly well-suited to identify latent computations that underlie learning and decision making. This chapter takes a targeted look at how EEG signals and cognitive models of decision making can be linked in a mutually beneficial way. The first section begins with a tutorial on commonly used “linking” approaches, with a focus on sequential sampling models of decision making. This includes a depiction of increasingly sophisticated studies: from condition-level summaries of EEG signals with independently estimated model parameters to regressing triallevel EEG signals with trial-level estimates of model parameters. We then draw attention to joint modeling approaches that assess the bidirectional relationship between EEG signals and parameters of cognitive models. These approaches provide integrated, confirmatory frameworks that propose a common latent source that generates predictions for multiple outputs, such as behavior and neural data. The second section is a review of linking approaches in reinforcement learning models of decision making. This includes a brief history of formal linking approaches between EEG and reinforcement learning, from origins in qualitative comparison of prediction errors and aggregate EEG summaries through to regression of individualtrial data. The chapter concludes with some caveats to current linking approaches and a discussion of potential future directions for advancing the methods of linking EEG signals with cognitive models.

G. E. Hawkins () · S. D. Brown School of Psychological Sciences, University of Newcastle, Callaghan, NSW, Australia e-mail: [email protected]; [email protected] J. F. Cavanagh Department of Psychology, University of New Mexico, Albuquerque, NM, USA e-mail: [email protected] M. Steyvers Department of Cognitive Sciences, University of California, Irvine, Irvine, CA, USA e-mail: [email protected] © Springer Nature Switzerland AG 2024 B. U. Forstmann, B. M. Turner (eds.), An Introduction to Model-Based Cognitive Neuroscience, https://doi.org/10.1007/978-3-031-45271-0_10

241

242

G. E. Hawkins et al.

Keywords EEG · Joint model · Response time · Cognitive model

1 Introduction EEG is particularly well-suited for identification of latent mechanisms that underlie complicated learning and decision processes. As a direct measure of cortical neuronal activities, EEG is uniquely sensitive to canonical neural operations which underlie emergent latent constructs (Cavanagh, 2019; Fries, 2009; Siegel et al., 2012). Event-related EEG responses follow stereotyped patterns that temporally evolve from specific, low-level, sensory responses toward generic, high-level, modality-independent phenomena (Luck, 2014). These patterns appear to represent overlapping circuit motifs that can be separately modulated within spatio-temporal windows (Fries, 2009; Siegel et al., 2012). These event-related EEG motifs are specifically sensitive to the temporal and hierarchical evolution of surprise signals (Friston, 2003, 2005, 2010) which are used to learn in an ever-changing world. One of the definitive challenges of cognitive neuroscience is to understand what latent processes are reflected by specific neural signals. A rich history of physiological and psychological interpretation have shaped our understanding of event-related EEG responses (see Luck, 2014). Yet the cognitive constructs used to understand neural signals are not likely to map onto the cortical computations that are indirectly observed from neuroimaging. For example, any number of distinct computations may underlie a process described as “working memory” or “attention.” Our most widely used manifest indicators of these cognitive process remain behavioral choices and response time—but of course there are no dedicated accuracy or response time areas in the brain—even these indicators reflect combinations of latent processes. The joint modeling approach discussed here aims to leverage explanatory power from both behavioral performance and EEG signals in order to identify true latent computations in the brain. The first section of this chapter provides a tutorial-style overview of various approaches for simultaneously linking EEG-derived measures of neural activity to the latent components of processing in cognitive models. This section uses case studies to illustrate the evolution of methodological complexity in the field of decision making research that has seen considerable advances in joint brainbehavior modeling: evidence accumulation models of speeded decision making. The second section of the chapter delves into a focus on another popular field of decision making research where the evolution toward joint modeling is less developed: reinforcement learning of decision making. The chapter concludes with future directions for enhancing the discovery of cortical computations from multiple sources of evidence.

Linking Behavior and EEG via cognitive models

243

2 A Linking Tutorial: EEG and Accumulator Models of Decision Making The rapid time scale of EEG-derived measures of neural activity is particularly wellsuited for linking with cognitive models that operate over short time scales, like diffusion and accumulator models of speeded decision making. This field has been characterized by a progressive increase in the level of sophistication to the linking approach over the years. Here, we explain the most widely used approaches that researchers have taken to link decision-making models with EEG signals. We first provide a brief overview of two established approaches: correlation- and regressionbased linking. We then provide a detailed overview of a confirmatory linking approach that we believe holds great promise for hypothesis-driven model-based cognitive neuroscience. We note that this tutorial-style overview is not intended to provide exhaustive coverage of the excellent work conducted in this domain. For a sample of that work, see the Further Reading at the end of the chapter.

2.1 Established Linking Approaches Throughout, we use the term correlation-based linking to refer to the linking of model parameters and EEG signals at the condition level, and regression-based linking to refer to the linking of model parameters and EEG signals at the trial level. We chose these terms as they are conceptually close to the statistical methods typically used to assess the link, though we note some have used terms such as between vs within condition or across condition vs across trials, respectively, to refer to the same concepts.

2.1.1

Correlation-Based Linking

The correlation-based method is one of the earliest methods used in quantitative model-based cognitive neuroscience. It is widely used to summarize the association between latent components of cognitive models and neural activity as a betweensubjects effect. The method typically involves two independent steps, conducted one after the other. The first step involves independent analysis of the behavioral data and the neural data. This requires summarizing each participant’s behavioral data to an estimate of a parameter from a cognitive model and summarizing their neural data into an effect measure that represents a relevant form of average activation. The second step is to correlate the two summary measures across participants. The resulting correlation coefficient estimates the degree of across-participant covariation in the model parameter and average neural activity.

244

G. E. Hawkins et al.

A

C

B CUE

Color Task

Cue on Trial N-1

Letter Task

Digit Task

TARGET

Cue on Trial N

Repeat (R)

Switch-to (ST)

A4

R ST NI SA

Switch-away (SA) Not informative (NI)

RESPONSE

“LEFT” (vowel)

F

E

R

ST

SA

NI

Criterion

Drift Rate

Nondecision Time

D

R

ST

SA

NI

R

ST

SA

NI

Fig. 1 The task-switching paradigm and illustrative results of Karayanidis et al. (2009). (a) Each trial comprised a cue-target pair where the cue is indicated by two adjoining segments of the circle. The cues are associated with one of the three classification tasks (color, letter, and digit). (b) Four types of task-switching trials are possible on trial N depending on the cue presented on trial N-1. (c) Cue-locked ERP and difference waveforms for each trial type at POz. (d), (e), and (f) EZ2 Diffusion model estimates for non-decision time, drift rate, and response caution (criterion), respectively

To illustrate the correlation-based linking approach we use the task-switching study of Karayanidis et al. (2009) as an example (see Fig. 1). In this experiment, participants were required to switch between three visually presented tasks across trials (digit, letter, color identification). In the “switch-to” condition participants received a pre-trial cue that indicated with 100% validity which of the three tasks would be tested on the upcoming trial. This allowed the participant to prepare their response for exactly the task on which they would be tested. This is in contrast to the “switch-away” condition, where participants received a cue indicating that the upcoming trial would not be the task they completed on the previous trial but it did not inform which of the remaining two tasks would be tested. This cue allowed partial response preparation: optimal performance is to partially activate response preparation procedures for both possible tasks. Then, at stimulus onset once the identity of the relevant task is known, the participant must complete their response preparation for the appropriate task. Since attentional resources are split across two potential tasks in the preparation phase, less precise mechanisms can be used to optimize performance in the upcoming decision. Karayanidis et al. (2009) analyzed the behavioral data with the EZ2 diffusion model (Wagenmakers et al., 2008),

Linking Behavior and EEG via cognitive models

245

which generated estimates for three parameters of interest: the speed of evidence accumulation (drift rate), the cautiousness of decisions (response threshold), and the time taken for process unrelated to the decision itself (non-decision time). For illustrative purposes, we focus on one of Karayanidis et al.’s event-related potential (ERP) summary measures. This cue-locked positivity was quantified at POz for each participant. Karayanidis et al. (2009) correlated the cue-locked positivity amplitude with separate parameter estimates of the evidence accumulation model (drift rate, response threshold, non-decision time). This revealed that participants with larger cue-locked positivity tended to have higher drift rates, made less cautious decisions, and had shorter non-decision times. This suggests that participants with stronger neural responses during the cuing window tended to show better performance in the behavioral task, where this interpretation of the cognitive model comes from the combination of parameter effects. Interestingly, these associations were only observed in the switch-to cuing condition; no associations were statistically significant in the switch-away condition. Both the virtues and the deficits of correlation-based linking lie within the simplicity of analysis and interpretation. Independent analysis of the two data sources (behavioral and EEG) means that classic methods from mathematical psychology and cognitive neuroscience can be directly applied to summarize the data into the relevant measures, with no need to integrate the two analytic approaches. This facilitates a straightforward interpretation with simple statistics. However, there are two major downsides to the method. As it is most commonly applied, correlationbased linking does not provide a direct test of within-subject effects; these can only be inferred indirectly and often without substantial statistical justification. As an example, in Karayanidis et al.’s data there were statistically significant correlations between diffusion model parameters and the early positivity component in the switch-to condition but not in the switch-away condition. This might tempt the conclusion that a stronger relationship holds in the switch-to condition than the switch-away, but this hypothesis was never tested in the analysis. A second shortcoming of the correlation-based method is undeserved confidence in conclusions due to the independence of the two steps. All summary measures, including parameter estimates from cognitive models and aggregated neural data, contain measurement error. The magnitude of the measurement error can be quantified via standard methods in the first step. However, the second step of the analysis implicitly assumes there is no measurement error in either summary measure: the correlation is typically computed on the point estimates of the summary measures generated in the first step. Since the correlation assumes noiseless data at the participant level, which never occurs in behavioral or neural data, it overestimates the magnitude of the correlation coefficient. New statistical approaches have recently been developed which provide alternative ways of addressing these questions, and directly testing hypotheses about the nature of inter-individual variation (Ly et al., 2018; Rouder and Haaf, 2019). Such approaches may prove important in future joint modeling efforts. This dismissal of intrinsic measurement error has real consequences for the conclusions we draw from the correlation-based linking method. For example,

246

G. E. Hawkins et al.

Forstmann et al. (2008) showed that people who more effectively switch between making fast decisions and making careful decisions have a greater change in activation of the pre-supplementary motor area (pre-SMA; .r = −0.71, .p < 0.01) and the striatum (.r = −0.52, .p < 0.01). Ly et al. (2018) reanalyzed the same data with a hierarchical Bayesian version of the correlation-based method that accounts for uncertainty in the estimation of the cognitive model parameters. This method attenuated the correlations substantially: pre-SMA to .r = −0.48 and striatum to .r = −0.34, with corresponding Bayesian p-values of 0.012 and 0.06. This is a considerable change in both magnitude of the correlation coefficient and the statistical significance; following conventional criteria, the latter correlation would no longer be considered statistically significant. Ly et al.’s approach demonstrates the attenuation that occurs when accounting for uncertainty in just one of the summary measures (the cognitive model parameter). When accounting for uncertainty in both summary measures one would expect further attenuation.

2.1.2

Regression-Based Linking

A shortcoming of the correlation-based linking approach is the restriction to condition-level or individual-difference effects. This does not permit investigation of trial-to-trial variability in performance. This is not a trivial omission: most modern evidence accumulation models assume trial-to-trial variability in key components of cognitive processing including the drift rate, starting point of accumulation, and/or non-decision time. Regression-based linking approaches do not have this shortcoming. They allow trial-wise covariates, such as a recording of neural activity of interest, to be regressed onto parameters of a cognitive model. This provides unique parameter estimates for individual trials and opens a new avenue of investigation into the momentto-moment dynamics of the brain-behavior relationship. For instance, with this approach researchers can investigate which cognitive processes are most likely to covary with which neural signatures. This is a considerable advantage as it permits an examination of within-subject effects, which are closer to the core questions asked by many cognitive neuroscientists. We illustrate the regression-based linking approach with an example from Cavanagh et al. (2011). In this study, the authors investigated the role of the subthalamic nucleus (STN) in adaptively controlling responses to decision conflict. Participants completed a two-alternative forced choice task based on learned action values (see reinforcement learning below), with easy (a high value option and a low value option) or difficult “conflict” choices (two high value or two low value options). These participants had a deep brain stimulation (DBS) device implanted in their subthalamic nucleus (STN), which is an increasingly common treatment for symptoms of Parkinson’s disease. An advantage of DBS intervention is that switching the device on and off allows for causal tests of the role of the STN between upstream neural activity (EEG theta power over medial prefrontal cortex; mPFC) and downstream behavior (choices and response times). As discussed later, this

Linking Behavior and EEG via cognitive models

247

frontal midline theta power (FMT) is sensitive to punishment, error and conflict, making it particularly well-suited for examinations of cognitive control during decision making and learning. Cavanagh et al. (2011) investigated the trial-by-trial association between mPFC activity (indexed by FMT) and decision caution (indexed by response threshold) and whether this was modulated by the status of STN-DBS in patients. If theta power is responsive to conflict, and conflict is associated with more cautious decisions, then high conflict vs low conflict trials ought to be associated with both increased theta power and elevated response thresholds. Formally, the predicted response threshold, a, on trial i of conflict condition .j ∈ {high,low.} and DBS state .k ∈ {on,off.}, .aij k , was aij k = α + βj k θij k ,

.

(1)

where .θij k was the theta power on trial i of conflict condition j and DBS state k. The estimated regression coefficients were the intercept of the response threshold (.α) and the effect of single-trial theta power in each conflict condition and DBS state (.βj k ) on the predicted response threshold for individual trials. When applied to data, the estimated .β coefficients indicate the magnitude and direction of the association between the predictor variables (theta power, DBS state) and the outcome (singletrial response threshold).1 Although theta power was the same between DBS on and off conditions, only when STN-DBS was off in Parkinson’s patients was there a relationship between theta power and response threshold. Figure 2 shows the posterior distribution of the coefficient as the solid blue density curve. If we estimate the posterior mean as .βhigh,off ≈ 0.06 in the figure, then we can interpret this as every 1 unit increase in (standardized) theta power leads to a predicted 0.06 unit increase in the response threshold. Interestingly, the direction of effect reversed for low conflict trials: as theta power increased the response threshold decreased (i.e., .βlow,off < 0). In contrast, when the DBS was switched on, the conflict-related relationship to response threshold was absent (.βhigh,on < 0 and .βlow,on < 0). The strength of the conclusions in regression-based linking comes from the within-subject interpretation: as theta power changed within individuals the response threshold increased or decreased in the different conditions. Furthermore, when the DBS was switched from off to on the association between theta power and decision cautiousness changed direction. Despite the clear analytic benefits to this approach, one downside is the unidirectional inference. To use Cavanagh et al.’s approach as an example, the neural data provided additional constraint on the behavioral model but there was no capacity for the behavioral model to constrain the neural model. In some contexts the relationship will be the reverse; for instance, estimates of cognitive model

1 To simplify the tutorial, Eq. 1 is a simplified form of the model examined in Cavanagh et al. (2011). We refer the reader to the primary source for the complete model.

248

G. E. Hawkins et al.

Posterior probability density

High conflict Low conflict

DBS OFF DBS ON

20 15 10 5 0 –0.2

–0.1

0

0.1

0.2

Regression coefficient for theta Theta

Threshold

Theta

Threshold

Fig. 2 Posterior distributions for coefficients estimated from the regression-based linking approach in Cavanagh et al. (2011). Participants first learned which item in a pair was reinforced more often. Pair A/B was reinforced at the rate 100%/0% and C/D reinforced at the rate 75%/25%. In the test phase reported here, participants were presented with two items and asked to choose the better item (i.e., more likely to be rewarded). These were analyzed in two sets of trials: high conflict trials composed of “win-win” (A/C) and “lose-lose” (B/D) choices, and low conflict trials composed of “win-lose” (A/D) and “lose-win” (B/C). Reproduced and modified with permission from Cavanagh et al. (2011)

parameters may be input as a regressors to a whole-brain GLM on EEG or fMRI data. The lack of a bidirectional relationship is sometimes tangential to the research question. In other contexts, however, the additional constraint that arises from bidirectional flow of information between the neural and behavioral models can substantially increase the interpretability of the models from both streams of data. This is also a step closer to the scientific goal of developing a single, integrative model of observable performance (i.e., a model that generates neural and behavioral data).

2.2 Modern Linking Approaches: Joint Models The bidirectional relationship between behavioral and neural data is increasingly examined with so-called joint models (Turner et al., 2017, 2019). As the name suggests, joint models aim to simultaneously—jointly—model behavioral and neural data in a unified, single-step model. A number of variations on the joint modeling theme are covered in other chapters in this book. Here, we restrict our focus to a particular type of joint model that has been studied in the domain of evidence accumulation models of decision making and scalp-

Linking Behavior and EEG via cognitive models

249

recorded EEG data. It is a confirmatory approach in the same sense as Cavanagh et al.’s model was confirmatory: there was a hypothesized link that was targeted in the function linking neural and behavioral data. The extension we review here is that the model was also integrative in the sense that it proposes a common latent source which generates predictions for multiple outputs, such as behavioral and neural data (Turner et al., 2017). As such, model estimation is constrained by both the behavioral and neural data. Thus, with a greater range of outcome data to compare models against (i.e., behavior, neural), there is better capacity to discriminate between competing models of the relationship between brain and behavior. We illustrate an integrative, confirmatory joint model reported by Van Ravenzwaaij et al. (2017). Those authors investigated the specific, hypothesized link between the “rotation-related negativity,” an index of mental rotation, and the efficiency of decision making (drift rate). They reanalyzed data from a mental rotation task originally conducted by Provost et al. (2013). Participants were presented with pairs of 3D block stimuli, one of which was rotated relative to the other. In “same” trials, the pair of items were identical (but rotated), so the correct response was “same.” In “different” trials, the pair was made up of two unique items, so the correct response was “different.” Decision difficulty was manipulated over five angles of rotation between the two stimuli: .0◦ , .45◦ , .90◦ , .135◦ , and .180◦ ; the larger the rotation angle, the more difficult the same-different decision. The rotationrelated negativity was operationalized as the mean amplitude of the ERP negative waveform at electrode Pz. Van Ravenzwaaij et al. (2017) divided the ERP data into 8 epochs of 100 ms each. They developed a generative process for the ERP data and the behavioral data, shown schematically in Fig. 3. The choice and response data were assumed to be generated by an LBA model with the usual parameters—response threshold, drift rate mean and across-trial variability, start point of evidence accumulation, and non-decision time. Of primary interest, there were separate drift rates for correct and error responses in each of the five rotation angles, for a total of 10 drift rates. The rotation-related negativity data were assumed to be generated by a simple statistical model: ERPs in each epoch were normally distributed. The behavioral and neural models were structurally linked through the mean of the normal distribution. Formally, the ERP in epoch i of trial j with rotation angle .k ∈ {0, 45, 90, 135, 180}, ERP.ij k , was ERPij k ∼ N(α + vk · βi , σ ),

.

(2)

where .α was an offset parameter, .vk was the mean drift rate of the LBA for rotation angle k, .βi was the scale parameter for epoch i, and .σ was a variance term. The .β parameters were constant across rotation angle and correct/error responses while the drift rates were freely estimated across those factors. In contrast, the drift rates were constant across epochs while the linking parameters were freely estimated.2 2 Van Ravenzwaaij et al. (2017) tested multiple joint models with different linking functions. For brevity, we describe just one of those models.

250

G. E. Hawkins et al.

Fig. 3 Symbolic representation of the joint model in Van Ravenzwaaij et al. (2017). All parameters of the evidence accumulation model (top) inform the observed behavioral data (bottom left). The model’s mean drift rate for observed responses inform the ERP data (bottom right). Reproduced with permission from Van Ravenzwaaij et al. (2017)

By crossing the factors over which each of the parameters are freely estimated, powerful constraint is placed on the parameter estimation. The primary theoretical focus is on the .β parameters. They are a time-sensitive measure, each corresponding to a 100 ms window, of the link strength between the mean drift rate and the rotation-related negativity. However, the structure of the integrated model also has other important implications. For instance, the predicted ERP will change across conditions whenever the drift rate changes across conditions (i.e., with angle rotation); that is, the drift rate must accommodate changes in both behavior and neural data that are due to the rotation angle. Figure 4 shows the integrated model provided a good explanation of aggregate data. For the behavioral data, as rotation angle increased the median response time increased and accuracy decreased; in both cases, the effect was greater for different vs same judgments. Importantly, and unlike the regression-based linking approach, the integrated model can generate predictions for the neural data. The ERPs are shown separately for each epoch, rotation angle, and same vs different judgment. The model captured the main trends in the data: stronger ERP signals for lower rotation angles and same relative to different judgments. This is particularly impressive since the neural model was not afforded any free parameters to explain these differences in the ERP data. Rather, the model structure forced the effect to arise as a function of the drift rate of the behavioral model. Figure 4 also shows the estimated linking parameters (.β) from each epoch. As expected, the linking parameters show a rise-peak-fall trend that mirrors the trend in the ERP data. This indicates that condition-level drift rates most strongly

Linking Behavior and EEG via cognitive models

251

Fig. 4 Data and predictions for Van Ravenzwaaij et al.’s joint model. Behavioral data are shown in the left column: median response time (top) and accuracy (bottom). Neural data are shown in the right column: ERPs for same (top) and different (bottom) judgments. The lower right panel displays the posterior estimates of the parameters that link the drift rate of the LBA to the ERP data (.β). Reproduced with permission from Van Ravenzwaaij et al. (2017)

modulated the rotation-related negativity at intermediate processing times (300– 700 ms). This is consistent with standard understanding of evidence accumulation models: stimulus encoding occurs in the first few hundred milliseconds of trial time after which the accumulation process begins, which is driven by the model drift rates. In this window we would expect to see the strongest association between the cognitive and neural models, which is identified in the estimates of the linking parameters. Future research on integrated joint models may aim to expand the complexity of the neural model so that it more closely resembles the presumed data-generating processes. The normal distribution of Van Ravenzwaaij et al. (2017) served its

252

G. E. Hawkins et al.

purpose as a vehicle to demonstrate the utility of the integrated modeling approach. Nevertheless, more compelling and continuous generative models for the EEG data could be incorporated into the integrated framework. This would have the advantage of more faithfully representing the processes under investigation, which will deepen our understanding of brain-behavior dynamics. It would also minimize some of the arbitrary assumptions on the part of the modeler (e.g., the number and duration of epochs).

3 Linking in Practice: EEG and Reinforcement Learning Models of Decision Making A similar evolution of complexity has been taking place within studies examining the relationship between EEG signals and reinforcement learning models of decision making. However, as we detail below, there have not yet been joint modeling approaches in this field, which we discuss in the Future Directions section at the end of this chapter. Reinforcement learning (RL) algorithms describe how an agent can learn to repeat actions that led to reward (Rescorla and Wagner, 1972; Sutton and Barto, 1998; Thorndike, 1898). With flexible parameterization, RL can describe how one learns to motivate behavior after reward, inhibit other behaviors after punishment, make complex decisions about the nature of these relationships, and leverage just about any other process in the service of inference. Here we focus on how RL models and EEG signals have been used to inform each other in the discovery of latent cognitive states. Any process of learning involves diminishing uncertainty, yet RL posits a special type of valenced surprise signal with positive prediction errors (.+PE) motivating behavior and negative prediction errors (.−PE) inhibiting it. The following sections describe how the increasing sophistication of combining EEG and RL modeling have raised unique questions about the way the brain integrates RL processes. Figure 5 defines the RL and EEG terms we use.

3.1 Correlations and Qualitative Comparisons The formal integration of combined EEG and RL modeling arguably began with the seminal paper by Holroyd and Coles (2002). This report described a network model based on RL principles that could qualitatively account for feedback- and response-locked EEG activities during learning. While fronto-central feedbacklocked EEG activity corresponded to the degree of PE, response-locked EEG activity corresponded to the learned weight increase of the stimulus-response association. In addition, this paper showed how the temporal tradeoff in learning

Linking Behavior and EEG via cognitive models

253

Fig. 5 (a, b) A cartoon example of Q-Learning during a probabilistic learning task. The difference between expected and actual reward is calculated as a prediction error conveying whether events are better or worse than expected. These prediction errors are then used to adjust future expectations (action value or weight), scaled by a learning rate. (c) ERPs at the vertex, error bars are SEM. Following punishments, increasingly surprising .−PEs evoke an enhanced voltage negativity at 300 ms, sometimes termed the Feedback Related Negativity (FRN). (d) Increasingly surprising .+PEs evoke a broad positivity from 200–400 ms termed the Reward Positivity (RewP). Cyan bars indicate the time point of the topographic plots (300 ms), only significant sensors are shown (FDR corrected). (e) Time frequency plot from this vertex site reveals that this .−PE contrast is dominated by frontal midline theta (FMT), an established candidate marker for cognitive control. (f) The .+PE contrast is a delta band (2–4 Hz) effect that is unique to reward receipt. Spectral plots are .+/−2 dB, contours reflect statistically significant differences (permutation corrected). (g) The absolute value of punishment surprise (|.−PE|) correlates with FMT. (h) The degree of .+PE correlates with delta power over the midline. Red horizontal bars show statistical significance (permutation corrected). Cyan vertical indicates the time point of the topographic plots (350–450 ms), .+/− 0.25 rho, only significant sensors (FDR corrected)

254

G. E. Hawkins et al.

from the environment toward relying on internal representations could be tracked in the EEG. Other studies replicated these comparisons, showing how average .+PEs and .−PEs were similar to feedback-locked EEG, and how activity over motor cortex mirrored subsequent weight changes in an algorithmic model (Cohen and Ranganath, 2007). While compelling, these findings only offered a qualitative comparison between model outputs and EEG signals. Some cracks in this account began to emerge when quantitative comparisons were leveraged to formally test these hypotheses. Given established findings that the midbrain phasic dopaminergic system appears to underlie the computation of .+PE and .−PE (Schultz et al., 1997), there is a tacit assumption that we may expect to also see a single, bipolar, integrated system underlying slow integrative learning in scalp EEG. In an attempt to control for the intrinsic biophysical processes in the ERP, many studies investigate a single EEG feature by subtracting punishment from reward trials, creating a single, bipolar, integrated difference wave. A bivalent encoding of .−PE vs. .+PE is oftentimes apparent in this difference wave suggesting a single integrative neural system (Holroyd et al., 2008; Proudfit, 2015; Sambrook and Goslin, 2014; Sambrook et al., 2018). However, increasing evidence suggests that the assumption of a single integrative neural system is incorrect. The quantitative integration of formal RL model outputs with EEG signals has been critical to dissociate the different information content encoded by reward and punishment-evoked EEG. The logic for this test rests on the idea that a signed prediction error is a special case of surprise and thus it requires a larger burden for empirical support. To be a true reflection of a .+PE or .−PE, EEG activity would need to specifically differentiate reward and punishment and each of these unique effects would need to scale with the degree of .−PE or .+PE (Caplin and Dean, 2008). A signal that functions as a signed prediction error needs to conform to such axiomatic criteria and function in this manner in all cases, or else this signal would be unreliable and uninterpretable. The formal integration of modeled features has been critical to test if the association between .+/−PE and EEG signals are mechanistically accurate or simply correlational.

3.2 Reinforcement Learning Model Estimates as Regressors in EEG Many studies were initially supportive of the fidelity of .−PE coding in feedbacklocked EEG. The FRN was shown to scale with single-trial .−PE (Correa et al., 2018; Ichikawa et al., 2010) as did FMT (Cavanagh et al., 2012a, 2010); both of these EEG activities also predicted subsequent response time slowing. Yet in some complex tasks, .+PE also correlated with FRN amplitude (Chase et al., 2011; Philiastides et al., 2010) and FMT (Cavanagh et al., 2012a). Many studies that were supportive of a .−PE representation in FMT/FRN used tasks with a win-stay/lose-

Linking Behavior and EEG via cognitive models

255

switch response requirement that confounds outcome valence and the need for behavioral adjustment. When tested with more complex tasks without clear winstay/lose-switch requirements, FMT/FRN amplitudes appear to scale with unsigned surprise (Cavanagh et al., 2012a; Hauser et al., 2014; Talmi et al., 2013) and appear to be a generic signal of the need for cognitive control (Cavanagh and Frank, 2014; Cohen and Donner, 2013). In sum, while FMT and FRN are both sensitive to .−PE, neither one specifically and uniquely codes the information quality of .−PE. In fact, many decades of work have detailed how the same EEG feature as the FRN occurs as a differently named “N2 component” that is common to any event eliciting a need for cognitive control (Cavanagh et al., 2012b; Folstein and Van Petten, 2008; Holroyd et al., 2008). Given the tight integration with behavior, FMT/FRN/N2 activities appear to reflect the operations of a high-level actor that leverages a variety of information when informing future action selection. In a surprising contrast, the reward-locked RewP (and its associated delta spectral activity) does appear to reflect an axiomatic .+PE. The RewP is specific to .+PE, in that this neural signal is not elicited by any event other than reward. The RewP is sensitive to .+PE, in that the entire morphology of this EEG feature correlates with single-trial representations of .+PE (Cavanagh, 2015; Cavanagh et al., 2019a). This reward-specific signal appears to only reflect variance associated with .+PE with an additional influence from motivation (Brown and Cavanagh, 2018; Threadgill and Gable, 2018) and mood (Proudfit, 2015; Whitton et al., 2016). The RewP notably does not predict RT adaptation (Cavanagh, 2015; Cockburn and Holroyd, 2018). Thus, the RewP appears to reflect the operations of a low-level critic without any influence of the actor’s policy. This dissociation between the computational properties of the FRN and the RewP strongly challenge the use of a single, bipolar, integrated difference wave for interpretation. This difference between a high-level control signal and a low-level reinforcement signal dovetails with other complexities that have arisen within the field of RL. It is increasingly appreciated that a variety of computational systems are simultaneously contributing to RL in human participants. Many investigations of RL aim to identify how stimulus-action associations are learned by long-term integration of probabilities via dopamine-driven cortico-striatal plasticity increases (Frank, 2005; Schultz et al., 1997). Yet people also leverage working memory, task-set constructions and cognitive control to manage their state-action-outcome learning (Collins and Frank, 2012; Collins and Koechlin, 2012; Otto et al., 2014; Ribas-Fernandes et al., 2011). In this latter situation, EEG signals are particularly useful given their sensitivity to cortical (as opposed to striatal) activities. There are a variety of hierarchical dissociations that can be made within RL, but for simplicity here we will only dissociate higher-level (e.g., model-based, actor, control-based) from lower-level (e.g., model-free, critic, evaluative) contributions to learning. These systems run in parallel, so specific tasks are required for separating these influences. Both punishment and reward feedback-locked EEG signals appear to contribute unique variance to low-level and high-level hierarchical controllers (Collins and Frank, 2016; Sambrook et al., 2018), but further work needs to

256

G. E. Hawkins et al.

investigate what aspects of these phenomena are associated with the formal FRN or RewP components. Recent work has identified hierarchically differentiated control signals in the EEG and unique contributions across the time-course of a decision. A PE about a change in high-level circumstances (i.e., without reward or punishment) is termed a state PE. State PE is sensitively (but not specifically) reflected in the EEG at the time of the P3 component (Cavanagh, 2015; Eppinger et al., 2017; Nassar et al., 2019; Rac-Lubashevsky and Kessler, 2019). This spatio-temporal area is a nexus where any multitude of decision processes converge, including a conjunction of learning rate update, trial-specific surprise, and action value (Fischer and Ullsperger, 2013). Early in the decision phase, the degree of learned state-action value is represented in the cue-locked EEG (Collins and Frank, 2018; Fischer and Ullsperger, 2013; Hunt et al., 2012), whereas the later cue-locked P3 predicts RT speeding during task-set formation (Cavanagh, 2015). In the post-decision evaluative phase, feedback-locked P3 predicts switching responses on the next trial (Chase et al., 2011; Correa et al., 2018; Fischer and Ullsperger, 2013). Notably, this P3 component is highly similar to the centro-parietal decision-related signals discussed earlier in this chapter, in relation to work by Karayanidis et al. (2009) and Van Ravenzwaaij et al. (2017); the implications of these common computations across cognitive domains are discussed further in the Future Directions section.

3.3 Advanced Approaches EEG can provide additional information above and beyond the contribution of response time and accuracy for competing learning models. Collins and Frank (2018) demonstrated that EEG activity could provide evidence between competing models assessing the degree of simultaneous influence from working memory vs. slow integrative learning. Single-trial EEG-derived scalars applied to an RL model can add additional manifest evidence for a process (Cavanagh et al., 2013). In this study, FMT was leveraged as a marker for control and a parameter reflecting the effect of theta was added to each competing model to account for unique variance in the way that participants overcame task-specific action biases: did they learn to suppress these biases, learn to boost instrumental learning, or simply exert cognitive control in the moment? The inclusion of this effect-of-theta parameter suggested the best fitting model was one where participants adjusted their behavior via a learned suppression of biases. EEG can provide information for understanding how much the specific information content of a signal is contributing to group effects. For example, anxiety linearly predicts a larger coupling between .−PE and FRN (“EEG-PE coupling” as a measure of represented information content) as well as better learning from punishment (Cavanagh et al., 2018). In contrast, depression led to smaller RewP but no difference in the EEG-PE coupling, and no effect on behavior. These dissociations suggest that anxiety contributes to more efficient learning via a control-related

Linking Behavior and EEG via cognitive models

257

avoidance system whereas depression may diminish RewP due to motivation—and critically does not imply an inability to use the information content for learning.

3.4 Summary Caveats The stereotyped nature of event-related potentials (ERPs) is important to appreciate when actively regressing model features on EEG. The biophysical constraints of oscillatory phase reversals and overlapping frequency patterns are important to account for when considering how information content modulates intrinsic, obligatory processes within specific spatial and temporal windows. Regression of model outputs without a visual comparison to the natural EEG signal is likely to confuse readers who would like to integrate new findings in the context of multiple decades of literature. Additionally, sometimes the cure can be worse than the disease, as demonstrated by subtractive methods like the reward-minuspunishment difference wave. Condition subtraction aims to control for biophysical constraints, but if each condition reflects a distinct process (i.e., signals are not purely additive), this method can be compromised by faulty logic and lead to invalid assumptions like a single, binary, integrated neural system. Future work will be best served by visually displaying “raw” EEG signals throughout each stage of filtering, transformation, and regression with model features. As an overarching caveat to all model-based neural modeling, it is simply not adequate to regress a model output on EEG and assume that the correlated activity is a faithful representation of a presumed latent construct. For example, correlations may exist between FRN/FMT and .−PE but that does not mean that this neural system always and only codes for .−PE. Nor does it imply an association with other neural systems that code .−PE (i.e., dopamine): many systems are likely working in parallel in each task. Only clever experimental design can reveal the invariance or independence between parallel systems. Finally, researchers should consider if neural systems should even be expected to faithfully reflect abstracted model features. The brain evolved with many domain-general systems that compute combined motivational and informational (or affective-effective) content (Pessoa, 2008; Shackman et al., 2011). Algorithmic models, and the tasks they are based on, are oftentimes notoriously dry and purely cognitive. The brain is an emotional machine with a thin veneer of cognitive capabilities; our best models should account for this.

4 Future Directions and Outstanding Questions • How can we evaluate or test models that link EEG and cognitive models? One of the challenges in the evaluation of joint models is that joint models are forced to provide a poorer account of the observed behavioral data relative to a restricted

258









G. E. Hawkins et al.

version of the joint model that only has to account for behavioral data. This is because the parameters of the joint model are constrained to accommodate for effects not only in the behavioral data but in the neural data as well (Van Ravenzwaaij et al., 2017). One evaluation approach is to use generalization tests where the neural data is used to make (out-of-sample) predictions for the behavioral data or the behavioral data is used to make predictions for the neural data (Turner et al., 2019; Cassey et al., 2016). However, more research needs to be conducted to understand the different ways to construct these generalization tests and how the choice of generalization test affects model evaluation and selection. What happens when joint models have a small number of behavioral data and a large number of EEG data (e.g., many electrodes and time points)? One common approach is to re-frame the need for data reduction into the mutually exclusive options of a priori vs. “data driven” (see Cohen, 2014). If one knows what they are looking for—like P3 or FMT—these features can be quantified as a priori spatio-temporal regions-of-interest. The alternative approach is to run a form of blind-source separation to reduce dimensionality (Dien et al., 2007; Dien, 2012; Cohen, 2018). These approaches can even be combined to use known EEG features as a “filter” to identify relevant covariance in other signals, for example using joint ICA (Calhoun et al., 2009; Cavanagh et al., 2019b). The implications of this advanced fusion approach for cognitive modeling have not yet been fully appreciated. Van Ravenzwaaij et al. (2017) avoided the issue by collapsing ERP data into a small number of time bins, but it remains to be investigated how the number and structure of these bins influence model estimation. The approach employed by Van Ravenzwaaij et al. (2017) was to summarize ERP data in the time domain: electrode signals were averaged across adjacent measurement periods. Many other approaches are possible and may be better. For example, the same binning could be applied in the frequency domain, after Fourier transform (average power in nearby frequency bands). Or, timeor frequency-domain signals could be represented as combinations of a small number of basis functions, and only their coefficients linked. This chapter described how similar neural signals like P3 and FMT are commonly implicated in accumulator and reinforcement models of decision making. Given the increasing appreciation of the role of decision processes in reinforcement learning, and a likely role of learning throughout decision tasks, this is no surprise. How can we best use this common neural evidence to merge these traditionally distinct modeling approaches into a more accurate account of human cognition (e.g., see Pedersen et al., 2017)? Why have some advances in the statistical and mathematical treatment of joint models been limited to accumulator modeling, and not spread to reinforcement models?

Linking Behavior and EEG via cognitive models

259

5 Further Reading • The first section of this chapter provided just three examples of different methods to link EEG with accumulator-style models of decision making, and they are by no means exhaustive of the approaches used in the literature. For additional linking approaches that generated novel insights to the latent cognitive mechanisms, we recommend: – Twomey, D. M., Murphy, P. R., Kelly, S. P., & O’Connell, R. G. (2015). The classic P300 encodes a build-to-threshold decision variable. European Journal of Neuroscience, 42, 1636–1643. – Van Vugt, M. K., Simen, P., Nystrom, L. E., Holmes, P., & Cohen, J. D. (2012). EEG oscillations reveal neural correlates of evidence accumulation. Frontiers in Neuroscience, 6, 106. – Fischer, A. G., Nigbur, R., Klein, T. A., Danielmeier, C., & Ullsperger, M. (2018). Cortical beta power reflects decision dynamics and uncovers multiple facets of post-error adaptation. Nature Communications, 9, 1–14. • This chapter provided one scheme for classifying different approaches to linking cognitive models to neural data. For an alternative classification scheme, we recommend: – de Hollander, G., Forstmann, B. U., & Brown, S. D. (2016). Different ways of linking behavioral and neural data via computational cognitive models. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 1, 101– 109. • There is a growing understanding of distinct EEG signals that are involved in unique aspects of decision making, including phenomena like transient neural networks based on theta-band phase-coupling. For an example of the integration of a multitude of theoretically motivated EEG signals with reinforcement modeling, we recommend: – Swart, J. C., Frank, M .J., Määttä, J. I., Jensen, O., Cools, R., & den Ouden H. E. M. (2018). Frontal network dynamics reflect neurocomputational mechanisms for reducing maladaptive biases in motivated action. PLoS Biology, 16, 1–25.

6 Exercises • In the Further Reading section there are three example applications of linking sequential sampling models with EEG signals (Twomey et al., 2015; Van Vugt et al., 2012; Fischer et al., 2018). How would you describe the linking approaches in these three applications? Try to think of the problem from the perspective of the linking approaches summarized in the first section of this chapter. Discuss

260

G. E. Hawkins et al.

the linking approaches, using terms defined in this chapter (e.g. “regression” or “correlation” approaches) and think about the strengths and weaknesses of each. Extension: propose how those same questions could be extended to more advanced linking methods. • There have been few, if any, integrated joint models of EEG and reinforcement learning. Why do you think that is? Try to apply the principles of joint models reviewed in the tutorial section of this chapter to develop a joint EEG-RL model. You can use your own data if you wish, or data from Cavanagh et al. (2019a) which is freely available at www.predictsite.com (see data set d006). Acknowledgments Funding This work was supported by: Australian Research Council (ARC) Discovery Early Career Researcher Award (Hawkins, DE170100177); ARC Discovery Project (Hawkins and Brown, DP180103613); National Institutes of Mental Health (Cavanagh, NIMH RO1MH119382-01). The funding sources had no role in the study design; in the collection, analysis and interpretation of data; in the writing of the report; and in the decision to submit the article for publication.

Conflict of Interest The authors declare that they have no conflict of interest.

References Brown, D. R., & Cavanagh, J. F. (2018). Rewarding images do not invoke the reward positivity: They inflate it. International Journal of Psychophysiology, 132, 226–235. Calhoun, V. D., Liu, J., & Adalı, T. (2009). A review of group ICA for FMRI data and ICA for joint inference of imaging, genetic, and ERP data. NeuroImage, 45(1), S163–S172. Caplin, A., & Dean, M. (2008). Axiomatic methods, dopamine and reward prediction error. Current Opinion in Neurobiology, 18(2), 197–202. Cassey, P. J., Gaut, G., Steyvers, M., & Brown, S. D. (2016) A generative joint model for spike trains and saccades during perceptual decision-making. Psychonomic Bulletin & Review, 23(6):1757–1778 Cavanagh, J. F. (2015). Cortical delta activity reflects reward prediction error and related behavioral adjustments, but at different times. NeuroImage, 110, 205–216. Cavanagh, J. F. (2019). Electrophysiology as a theoretical and methodological hub for the neural sciences. Psychophysiology, 56(2), e13314. Cavanagh, J. F., Bismark, A. W., Frank, M. J., & Allen, J. J. (2019a). Multiple dissociations between comorbid depression and anxiety on reward and punishment processing: Evidence from computationally informed EEG. Computational Psychiatry, 3, 1–17. Cavanagh, J. F., Eisenberg, I., Guitart-Masip, M., Huys, Q., & Frank, M. J. (2013). Frontal theta overrides Pavlovian learning biases. Journal of Neuroscience, 33(19), 8541–8548. Cavanagh, J. F., & Frank, M. J. (2014). Frontal theta as a mechanism for cognitive control. Trends in Cognitive Sciences, 18(8), 414–421. Cavanagh, J. F., Figueroa, C. M., Cohen, M. X., & Frank, M. J. (2012a). Frontal theta reflects uncertainty and unexpectedness during exploration and exploitation. Cerebral Cortex, 22(11), 2575–2586. Cavanagh, J. F., Frank, M. J., Klein, T. J., & Allen, J. J. (2010). Frontal theta links prediction errors to behavioral adaptation in reinforcement learning. Neuroimage, 49(4), 3198–3209.

Linking Behavior and EEG via cognitive models

261

Cavanagh, J. F., Kumar, P., Mueller, A. A., Richardson, S. P., & Mueen, A. (2018). Diminished EEG habituation to novel events effectively classifies Parkinson’s patients. Clinical Neurophysiology, 129(2), 409–418. Cavanagh, J. F., Rieger, R. E., Wilson, J. K., Gill, D., Fullerton, L., Brandt, E., & Mayer, A. R. (2019b). Joint analysis of frontal theta synchrony and white matter following mild traumatic brain injury. Brain Imaging and Behavior, 14(6), 2210–2223 Cavanagh, J. F., Wiecki, T. V., Cohen, M. X., Figueroa, C. M., Samanta, J., Sherman, S. J., & Frank, M. J. (2011). Subthalamic nucleus stimulation reverses mediofrontal influence over decision threshold. Nature Neuroscience, 14, 1462–1467. Cavanagh, J. F., Zambrano-Vazquez, L., & Allen, J. J. (2012b). Theta lingua franca: A common mid-frontal substrate for action monitoring processes. Psychophysiology, 49(2), 220–238. Chase, H. W., Swainson, R., Durham, L., Benham, L., & Cools, R. (2011). Feedback-related negativity codes prediction error but not behavioral adjustment during probabilistic reversal learning. Journal of Cognitive Neuroscience, 23(4), 936–946. Cockburn, J., & Holroyd, C. B. (2018). Feedback information and the reward positivity. International Journal of Psychophysiology, 132, 243–251. Cohen, M. X. (2014). Analyzing neural time series data: Theory and practice. MIT Press. Cohen, M. X. (2018). Using spatiotemporal source separation to identify prominent features in multichannel data without sinusoidal filters. European Journal of Neuroscience, 48(7), 2454– 2465. Cohen, M. X., & Donner, T. H. (2013). Midfrontal conflict-related theta-band power reflects neural oscillations that predict behavior. Journal of Neurophysiology, 110(12), 2752–2763. Cohen, M. X., & Ranganath, C. (2007). Reinforcement learning signals predict future decisions. Journal of Neuroscience, 27(2), 371–378. Collins, A. G. E., & Frank, M. J. (2012). How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. European Journal of Neuroscience, 35(7), 1024–1035. Collins, A. G. E., & Frank, M. J. (2016). Neural signature of hierarchically structured expectations predicts clustering and transfer of rule sets in reinforcement learning. Cognition, 152, 160–169. Collins, A. G. E., & Frank, M. J. (2018). Within-and across-trial dynamics of human eeg reveal cooperative interplay between reinforcement learning and working memory. Proceedings of the National Academy of Sciences, 115(10), 2502–2507. Collins, A. G. E., & Koechlin, E. (2012). Reasoning, learning, and creativity: Frontal lobe function and human decision-making. PLoS Biology, 10(3), e1001293. Correa, C. M., Noorman, S., Jiang, J., Palminteri, S., Cohen, M. X., Lebreton M, & van Gaal S. (2018). How the level of reward awareness changes the computational and electrophysiological signatures of reinforcement learning. Journal of Neuroscience, 38(48), 10338–10348. Dien, J. (2012). Applying principal components analysis to event-related potentials: A tutorial. Developmental Neuropsychology, 37(6), 497–517. Dien, J., Khoe, W., & Mangun, G. R. (2007). Evaluation of PCA and ICA of simulated ERPS: Promax vs. Infomax rotations. Human Brain Mapping, 28(8),742–763. Eppinger, B., Walter, M., & Li, S. C. (2017). Electrophysiological correlates reflect the integration of model-based and model-free decision information. Cognitive, Affective, & Behavioral Neuroscience, 17(2), 406–421. Fischer, A. G., & Ullsperger, M. (2013). Real and fictive outcomes are processed differently but converge on a common adaptive mechanism. Neuron, 79(6), 1243–1255. Folstein, J. R., & Van Petten, C. (2008). Influence of cognitive control and mismatch on the n2 component of the ERP: A review. Psychophysiology, 45(1), 152–170. Forstmann, B. U., Dutilh, G., Brown, S., Neumann, J., Von Cramon, D. Y., Ridderinkhof, K. R., & Wagenmakers, E. J. (2008). Striatum and pre-SMA facilitate decision-making under time pressure. Proceedings of the National Academy of Sciences, 105(45), 17538–17542. Frank, M. J. (2005). Dynamic dopamine modulation in the basal ganglia: A neurocomputational account of cognitive deficits in medicated and nonmedicated parkinsonism. Journal of Cognitive Neuroscience, 17(1), 51–72.

262

G. E. Hawkins et al.

Fries, P. (2009). Neuronal gamma-band synchronization as a fundamental process in cortical computation. Annual Review of Neuroscience, 32, 209–224. Friston, K. (2003). Learning and inference in the brain. Neural Networks, 16(9), 1325–1352. Friston, K. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society B: Biological Sciences, 360(1456), 815–836. Friston, K. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience, 11(2), 127–138. Hauser, T. U., Iannaccone, R., Stämpfli, P., Drechsler, R., Brandeis, D., Walitza, S., & Brem, S. (2014). The feedback-related negativity (FRN) revisited: New insights into the localization, meaning and network organization. NeuroImage, 84, 159–168. Holroyd, C. B., & Coles, M. G. (2002). The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109(4), 679. Holroyd, C. B., Pakzad-Vaezi, K. L., & Krigolson, O. E. (2008). The feedback correct-related positivity: Sensitivity of the event-related brain potential to unexpected positive feedback. Psychophysiology, 45(5), 688–697. Hunt, L. T., Kolling, N., Soltani, A., Woolrich, M. W., Rushworth, M. F. S., & Behrens, T. E. J. (2012). Mechanisms underlying cortical activity during value-guided choice. Nature Neuroscience, 15(3), 470–476. Ichikawa, N., Siegle, G. J., Dombrovski, A., & Ohira, H. (2010). Subjective and model-estimated reward prediction: Association with the feedback-related negativity (FRN) and reward prediction error in a reinforcement learning task. International Journal of Psychophysiology, 78(3), 273–283. Karayanidis, F., Mansfield, E. L., Galloway, K. L., Smith, J. L., Provost, A., & Heathcote, A. J. (2009). Anticipatory reconfiguration elicited by fully and partially informative cues that validly predict a switch in task. Cognitive, Affective, & Behavioral Neuroscience, 9, 202–215. Luck, S. J. (2014). An introduction to the event-related potential technique. MIT Press. Ly, A., Boehm, U., Heathcote, A., Turner, B. M., Forstmann, B. U., Marsman, M., & Matzke, D. (2018). A flexible and efficient hierarchical Bayesian approach to the exploration of individual differences in cognitive-model-based neuroscience. In A. A. Moustafa (Ed.), Computational models of brain and behavior. Wiley Blackwell. Nassar, M. R., Bruckner, R., & Frank, M. J. (2019). Statistical context dictates the relationship between feedback-related EEG signals and learning. Elife, 8, e46975. Otto, A. R., Skatova, A., Madlon-Kay, S., & Daw, N. D. (2014). Cognitive control predicts use of model-based reinforcement learning. Journal of Cognitive Neuroscience, 27(2), 319–333. Pedersen, M. L., Frank, M. J., & Biele, G. (2017). The drift diffusion model as the choice rule in reinforcement learning. Psychonomic Bulletin & Review, 24(4), 1234–1251. Pessoa, L. (2008). On the relationship between emotion and cognition. Nature Reviews Neuroscience, 9(2), 148–158. Philiastides, M. G., Biele, G., Vavatzanidis, N., Kazzer, P., & Heekeren, H. R. (2010). Temporal dynamics of prediction error processing during reward-based decision making. NeuroImage, 53(1), 221–232. Proudfit, G. H. (2015). The reward positivity: From basic research on reward to a biomarker for depression. Psychophysiology, 52(4), 449–459. Provost, A., Johnson, B., Karayanidis, F., Brown, S. D., & Heathcote, A. (2013). Two routes to expertise in mental rotation. Cognitive Science, 37(7), 1321–1342. Rac-Lubashevsky, R., & Kessler, Y. (2019). Revisiting the relationship between the p3b and working memory updating. Biological Psychology, 148, 107769. Rescorla, R., & Wagner, A. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. Black, & W. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). Appleton-Century Crofts. Ribas-Fernandes, J. J., Solway, A., Diuk, C., McGuire, J. T., Barto, A. G., Niv, Y., & Botvinick, M. M. (2011). A neural signature of hierarchical reinforcement learning. Neuron, 71(2), 370–379. Rouder, J. N., & Haaf, J. M. (2019). A psychometrics of individual differences in experimental tasks. Psychonomic Bulletin & Review, 26(2), 452–467.

Linking Behavior and EEG via cognitive models

263

Sambrook, T. D., & Goslin, J. (2014). Mediofrontal event-related potentials in response to positive, negative and unsigned prediction errors. Neuropsychologia, 61, 1–10. Sambrook, T. D., Hardwick, B., Wills, A. J., & Goslin, J. (2018). Model-free and model-based reward prediction errors in EEG. NeuroImage, 178, 162–171. Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275, 1593–1599. Shackman, A. J., Salomons, T. V., Slagter, H. A., Fox, A. S., Winter, J. J., & Davidson, R. J. (2011). The integration of negative affect, pain and cognitive control in the cingulate cortex. Nature Reviews Neuroscience, 12(3), 154–167. Siegel, M., Donner, T. H., & Engel, A. K. (2012). Spectral fingerprints of large-scale neuronal interactions. Nature Reviews Neuroscience, 13(2), 121–134. Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press. Talmi, D., Atkinson, R., & El-Deredy, W. (2013). The feedback-related negativity signals salience prediction errors, not reward prediction errors. Journal of Neuroscience, 33(19):8264–8269. Thorndike, E. (1898). Animal intelligence: An experimental study of the associative processes in animals. The Psychological Review: Monograph Supplements, 2(4), i–109. Threadgill, A. H., & Gable, P. A. (2018). The sweetness of successful goal pursuit: Approachmotivated pregoal states enhance the reward positivity during goal pursuit. International Journal of Psychophysiology, 132, 277–286. Turner, B. M., Forstmann, B. U., Love, B. C., Palmeri, T. J., & Van Maanen, L. (2017). Approaches to analysis in model-based cognitive neuroscience. Journal of Mathematical Psychology, 76, 65–79. Turner, B. M., Forstmann, B. U., & Steyvers, M. (2019). Joint models of neural and behavioral data. Springer. Van Ravenzwaaij, D., Provost, A., & Brown, S. D. (2017). A confirmatory approach for integrating neural and behavioral data into a single model. Journal of Mathematical Psychology, 76, 131– 141. Wagenmakers, E. J., van der Maas, H. J. L., Dolan, C., & Grasman, R. P. P. P. (2008). Ez does it! Extensions of the EZ-diffusion model. Psychonomic Bulletin & Review, 15, 1229–1235. Whitton, A. E., Kakani, P., Foti, D., Van’t Veer, A., Haile, A., Crowley, D. J., & Pizzagalli, D. A. (2016) Blunted neural responses to reward in remitted major depression: A high-density eventrelated potential study. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 1(1), 87–95.

Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses Russell J. Boag, Steven Mileti´c, Anne C. Trutti, and Birte U. Forstmann

Abstract Working memory (WM) refers to a set of processes that makes taskrelevant information accessible to higher-level cognitive processes including abstract reasoning, decision-making, learning, and reading comprehension. In this chapter, we introduce the concept of WM and outline key behavioral and neural evidence for a number of critical subprocesses that support WM and which have become recent targets of cognitive neuroscience. We discuss common approaches to linking brain and behavior in WM research seeking to identify the neural basis of WM subprocesses. We draw attention to limitations of common approaches and suggest that much progress could be made by applying several of the recent methodological advances in model-based cognitive neuroscience discussed throughout this book (see Chapters “An Introduction to EEG/MEG for ModelBased Cognitive Neuroscience”, “Ultra-High Field Magnetic Resonance Imaging for Model-Based Neuroscience”, “Advancements in Joint Modeling of Neural and Behavioral Data”, “Cognitive Models as a Tool to Link Decision Behavior With EEG Signals”, and “Linking Models with Brain Measures”). Overall, the purpose of this chapter is to give a broad overview of WM as seen through the lens of modelbased cognitive neuroscience and to summarize our current state of knowledge of WM subprocesses and their neural basis. We hope to outline a path forward to a more complete neurocomputational understanding of WM. Keywords Working memory · Neural basis of memory · Subcortical circuits

R. J. Boag () · S. Mileti´c · B. U. Forstmann Integrative Model-Based Cognitive Neuroscience Research Unit, University of Amsterdam, Amsterdam, The Netherlands e-mail: [email protected] A. C. Trutti Integrative Model-Based Cognitive Neuroscience Research Unit, University of Amsterdam, Amsterdam, The Netherlands Cognitive Psychology Unit, Institute of Psychology & Leiden Institute for Brain and Cognition, Leiden University, Leiden, The Netherlands © Springer Nature Switzerland AG 2024 B. U. Forstmann, B. M. Turner (eds.), An Introduction to Model-Based Cognitive Neuroscience, https://doi.org/10.1007/978-3-031-45271-0_11

265

266

R. J. Boag et al.

1 Introduction 1.1 Working Memory The acronym “WM” stands for working memory. You have just encoded a representation – “WM” – of an abstract concept – “working memory” – into a distinct neural assembly in your own WM, probably somewhere in your prefrontal cortex. You are currently maintaining that representation in an activated state (e.g., via sustained neural firing), such that presenting the stimulus “WM” easily brings to mind the concept those letters stand for (and primes additional WM-related knowledge you may have stored in your long-term memory). Alongside “WM,” you are also storing transient representations of individual words in this paragraph, most of which are briefly entering your focus of attention before being quickly replaced by new information as you progress through each sentence. If we give you additional items to hold in mind (e.g., O, +, , ×), you will quickly find yourself overloaded and struggling to keep up with the meaning of the text. Here your WM has reached its capacity limit, resulting in impaired performance. However, it is possible to free up capacity using volitional strategies, such as mentally grouping items into singular representations (e.g., ⊕, ) or removing items you no longer need. During this paragraph, you have experienced several subprocesses critical to WM, including encoding, maintenance, retrieval, updating, removal, and substitution. You have also experienced performance costs associated with WM’s extremely limited storage capacity and their alleviation through a strategy called “chunking.” In this section, we introduce the concept of WM and outline prevailing theoretical perspectives from cognitive psychology and neuroscience. As you have just experienced, WM refers to a set of processes that makes task-relevant information accessible to higher-level cognitive processes, such as reading comprehension, planning, abstract reasoning, decision-making, problem solving, and learning (Baddeley, 1992; Cowan, 1988; Daneman & Carpenter, 1980; Kyllonen & Christal, 1990; Miller et al., 1960). WM has been described as a “mental workspace” (Logie, 1995) and “the sketchpad of conscious thought” (Baddeley, 1992; Miller et al., 2018). It allows information to be kept “in mind” over short timescales (Goldman-Rakic, 1995) and is central to the organization of goal-directed behavior (Chatham & Badre, 2015; Miller & Cohen, 2001). In addition, impairments of WM feature prominently in a wide variety of neuropsychiatric disorders (e.g., attention-deficit/hyperactivity disorder (Ortega et al., 2020), schizophrenia (Silver et al., 2003)). Understanding WM is thus of key interest to both cognitive and clinical neuroscientists. In terms of its structure, prominent cognitive theories view WM as an activated subset of mnemonic information stored in either long-term memory (LTM) or short-term memory (STM) (Fig. 1) (Cantor & Engle, 1993; Cowan, 1988, 1999, 2008, 2016, 2019; Oberauer, 2002, 2009). This activated subset rises to conscious awareness and becomes available for further processing via selective attention (D’Esposito, 2007; Oberauer, 2002, 2009). However, WM differs conceptually

Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses

267

Fig. 1 Oberauer’s (2002) model of working memory. Nodes and lines represent a network of mnemonic representations stored in LTM, some of which are activated (bold nodes). A subset of activated items is held (via item-context bindings) in a limited capacity region of direct access (shaded area) corresponding to the storage capacity of WM. Within this region, one item is selected for processing by the focus of attention (highlighted bold node). (From Bledowski et al. (2010) with permission)

from both LTM and STM (Cowan, 2008, 2017). LTM typically refers to unlimited capacity storage of information over long timescales (e.g., days, months, or years) (Cowan, 2008; Shiffrin & Atkinson, 1969). In contrast, WM is extremely capacitylimited (current work suggests a limit of between one and four representations or “items,” Cowan, 2001; Garavan, 1998; Sewell et al., 2014, 2016), and its contents are quickly forgotten once attention is withdrawn (for a review, see Ricker et al. (2016)). Likewise, STM refers to passive storage (i.e., without manipulation) over short timescales (Cowan, 2017; Diamond, 2013; Engle et al., 1999), whereas WM involves actively “working on” or transforming information in a manner that supports flexible goal-directed behavior (for further distinctions, see Cowan (2008)). Much of this chapter will focus on the fundamental subprocesses of WM that carry out this cognitive “work.” The information stored in WM is typically spoken of in terms of items or cognitive objects, which denote distinct mnemonic representations that are retrieved and operated on as a unit (Mathy & Feldman, 2012; Thalmann et al., 2019). The complexity of what an item may represent can vary greatly. For example, simple perceptual stimuli (e.g., line orientations, color and shape stimuli, numbers, single words), complex multi-attribute or composite stimuli (e.g., human faces, natural scenes, mechanical objects), and more abstract and internally generated mental quantities (e.g., learned associations, the subjective value of choice options, the results of mental calculations) can all be represented in WM and subsequently compared, reasoned about, and used to guide goal-directed behavior (Cary & Carlson, 2001; Hardman & Cowan, 2015; Logie et al., 1994; Luck & Vogel, 2013; Santangelo et al., 2015). At a cellular level, single-cell recordings of prefrontal neural firing in monkeys show that items in WM are represented by distinct prefrontal neural assemblies that exhibit sustained item-specific neural firing while the item is being maintained (i.e., during the retention interval in delayed-response tasks, Fig. 2) (Funahashi et al., 1989, 1990, 1991; for a review, see Goldman-Rakic

268

R. J. Boag et al.

Fig. 2 A single neuron’s response to eight differently oriented stimuli in a monkey performing a delayed-response WM task. The histograms depict the neuron’s mean firing rate (averaged over repeated trials) during the cue (C), delay (D), and response (R) phases for each WM item. The lower middle plot shows that this neuron exhibits a sustained increase in firing during the retention interval (i.e., during maintenance) only for the item oriented 270◦ from the central focal point and not for the other items. WM items are thus encoded in distinct neural assemblies. (From Funahashi et al. (1989) with permission)

(1995)). Functional neuroimaging in humans has found analogous activity in the prefrontal cortex (PFC) related to maintaining distinct item representations during delayed-response tasks (for a review, see Curtis and D’Esposito (2003)). Building on the early work with monkeys, cognitive neuroscientists have sought to identify the human brain networks involved in WM and to explain how information flows within those networks (D’Esposito, 2007). This work has used advanced neuroimaging and electrophysiological techniques – especially fMRI and EEG – to identify brain networks active during WM tasks in general, as well as neural signatures of specific operations (e.g., encoding, maintenance, updating, retrieval)

Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses

269

and experimental manipulations (e.g., WM load/set size, retention interval) (e.g., D’Esposito (2007); Kornblith et al. (2016); Murty et al. (2011); Riggall and Postle (2012); Roux et al. (2012)). Brain structures implicated in WM include prefrontal and parietal cortical regions, hippocampus, thalamus, basal ganglia, and dopaminergic midbrain nuclei (Bledowski et al., 2010; Bledowski et al., 2009; Durstewitz & Seamans, 2002; Murty et al., 2011; Ranganath et al., 2005). The brain’s dopamine networks are thought to play a central role in controlling WM (Cools, 2019; Cools & D’Esposito, 2011; Durstewitz & Seamans, 2002). This work has given rise to a number of neurophysiologically inspired computational models that provide detailed accounts of the neural networks involved in WM and offer a coherent lens through which to interpret patterns of brain activity in WM tasks (e.g. Frank et al. (2001); Hazy et al. (2006); O’Reilly (2006); O’Reilly and Frank (2006)). However, as will be discussed, our current understanding of the neural basis of WM is far from complete. Working in parallel, cognitive and mathematical psychologists have pursued a mechanistic understanding of various behavioral phenomena that arise in tasks involving WM (Baddeley, 1992; Cowan, 1988; Miller et al., 1960; Ratcliff, 1978). This work seeks to understand the latent cognitive processes and architectural constraints that give rise to WM-related behavioral phenomena. Much theoretical and practical interest lies in explaining failures of WM (e.g., unintentional forgetting) and in identifying the source of WM-related action slips and performance costs (e.g., choice/response errors, longer response latencies) that typically arise when additional demands are placed on the WM system (Oberauer et al., 2018). Typical demands include increasing the number of items to be remembered, preventing rehearsal, extending the retention interval, and adding distractors or secondary tasks. The resulting effects on behavior are considered to be fundamental or “benchmark” phenomena that good models of WM must be able to explain (Oberauer et al., 2018). The effort to understand these phenomena has generated rich theoretical debate and spawned a variety of plausible explanatory mechanisms (e.g., encoding/item fidelity, capacity limitations, temporal decay, inter-item interference, retrieval competition) that have been formalized in computational cognitive models (Farrell & Lewandowsky, 2002; Oberauer, 2009; Oberauer & Lewandowsky, 2011, 2016; Ratcliff, 1978; Sewell et al., 2016). Many of these models successfully explain a subset of the benchmark phenomena, although no model provides a unified account of them all. Overall, this work has provided significant insight into the mechanisms and structure of WM and its relation to other cognitive systems (e.g., attention, learning, decision-making). However, relatively little attention has been given to the question of how the information in WM remains relevant to the organism’s current situation and how that information is organized to best serve the organism’s goals. Recently, it has been proposed that WM relies on a set of operations or subprocesses such as the gating of new incoming information, item removal and substitution, and reorganization operations (e.g., Ecker et al. (2014b), Hazy et al. (2006), Oberauer and Lewandowsky (2016), Rac-Lubashevsky and Kessler (2016b), Timm and Papenmeier (2019)). Given the foundational importance of such algorithmic

270

R. J. Boag et al.

subprocesses, this chapter seeks to summarize our current state of knowledge of the cognitive and neural basis of these subprocesses and to suggest a path forward using methods that we believe have potential to provide a more complete neurocomputational understanding of WM. This chapter is organized as follows. Having introduced the concept of WM and its theoretical background, we now discuss a central computational problem faced by the organism: the stability-flexibility tradeoff. Drawing on current neurocomputational theory, we explain the mechanisms by which WM is used to resolve this tradeoff and the brain networks in which they are instantiated. We then outline benchmark behavioral and neural evidence for key WM subprocesses involved in resolving this tradeoff and facilitating flexible goal-directed behavior. We draw attention to several limitations associated with common analysis methods and point to recent advances in model-based cognitive neuroscience that could further our understanding of WM subprocesses. We conclude with a discussion of several current “hot topics” in WM research which serve to place our picture of WM in the broader context of current theoretical work in cognitive neuroscience.

1.2 The Stability-Flexibility Tradeoff As mentioned, WM is extremely capacity-limited. It is currently thought that only between one and four items can be maintained in an activated state in WM at a time (Cowan, 2001; Garavan, 1998; Oberauer, 2002; Sewell et al., 2014). This severe limit demands a high degree of control over WM content to ensure that only relevant information occupies WM (Dreisbach, 2012). To this end, WM must strike a balance between stability (i.e., protecting the current contents of WM from irrelevant or distracting information) and flexibility (i.e., keeping WM up to date with relevant new information and removing outdated information) (Dreisbach, 2012; Dreisbach & Fröber, 2019; Dreisbach et al., 2005; Hommel, 2015; Oberauer, 2009). Resolving this stability-flexibility tradeoff is a central problem for numerous executive control processes, including task switching, inhibition, and conflict monitoring/resolution (Musslick et al., 2018), and striking the right balance depends critically on the brain’s dopamine systems (Cools, 2019; Cools & D’Esposito, 2011; Durstewitz & Seamans, 2008). As you experienced in the opening paragraph, managing this tradeoff is important for goal-directed behavior in dynamic environments, in which the relevance of information is continually changing, and distracting information frequently competes for attention. To resolve the tradeoff, WM performs information gating, which is accomplished by switching between two different operating modes, that is, maintenance and updating (Badre, 2012; Bledowski et al., 2010; Durstewitz & Seamans, 2008; Kessler & Oberauer, 2014; Miller & Cohen, 2001; Murty et al., 2011; O’Reilly, 2006; Roth et al., 2006). The maintenance mode prevents irrelevant and distracting information from interfering with the current contents of WM, while the updating mode allows new information to enter (and old information to exit)

Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses

271

WM. Maintenance is supported by volitional strategies (e.g., articulatory rehearsal, refreshing) and cognitive control processes (e.g., gating, inhibition) that serve to strengthen or refresh item representations and their bindings to contextual cues (Morey & Cowan, 2018) and filter out or reduce the influence of distractors (Oberauer & Lewandowsky, 2016). Strategies that support effective remembering by restructuring information into more memorable formats (e.g., via “chunking” and sorting) also contribute to robust maintenance (D’Esposito et al., 1999; Marshuetz, 2005; Nassar et al., 2018; van Dijck et al., 2013). Likewise, updating is supported by a number of subprocesses that ensure new information can enter WM and old information can leave (Ecker et al., 2014a, b; Oberauer, 2018; Rac-Lubashevsky & Kessler, 2016b). For example, removal deactivates previously relevant information, freeing up capacity to activate different information (Morey & Cowan, 2018), while substitution replaces one item with another, allowing new information to be “loaded in” to WM (Rac-Lubashevsky & Kessler, 2016b). Together, these subprocesses allow WM to alternate modes between flexible, when new information is encountered, and stable, when information must be shielded from distractors. In terms of the brain networks that control switching between maintenance and updating, the most prominent neurocomputational theory is the prefrontal cortex basal ganglia WM model (PBWM) (Frank et al., 2001; Hazy et al., 2006; O’Reilly & Frank, 2006). The PBWM proposes that switching between maintenance and updating modes is accomplished via “go/no-go” signaling between the basal ganglia, thalamus, and PFC. As illustrated in Fig. 3, gate opening is controlled by a striatal “go” signal1 which inhibits substantia nigra pars reticulata (SNr) and disinhibits thalamus, which in turn excites specific neural populations of the PFC. This allows information to enter WM and updating to occur. Gate closing is controlled by a striatal “no-go” signal which inhibits external globus pallidus (GPe), disinhibits SNr, and inhibits thalamus. This in turn keeps the PFC inhibited, which prevents WM from being updated (Hazy et al., 2006). In short, the “go” signal passes through two inhibitory connections (striatum-SNr-thalamus), which excites PFC, while the “no-go” signal passes through three inhibitory connections (striatumGPe-SNr-thalamus), which inhibits PFC. This gating network has received broad support from a number of functional magnetic resonance imaging (fMRI) studies identifying regions that are selectively active during either updating or maintenance 1 The PBWM model suggests a phasic dopaminergic signal from the midbrain dopamine structures only in the early phases of a WM task when the BG must learn when to update. Once WM updating rules are learned, BG nuclei no longer rely on a phasic dopaminergic response but control WM gating via the non-dopaminergic SNr. Any additional dopaminergic input reflects either reward associations or a feedback-based response which evaluates the updating process based on the reward prediction error coded by the same neurons (Schultz, 1998; Schultz et al., 1997). This response, in the form of bursts and dips in dopaminergic release onto striatal neurons, is thought to reinforce “go” and “no-go” activation, respectively. In addition, the PBWM assumes that WM sits in the maintenance or “gate closed” mode by default. This assumption is likely too strong since it implies that gate opening must always accompany updating. Under this assumption, the PBWM would fail to predict the different gating costs to WM updating that occur in behavioral data (Rac-Lubashevsky & Kessler, 2016a, b).

272

R. J. Boag et al.

Fig. 3 Illustration of the PBWM model. Several parallel loops connect BG and the frontal cortex. Gate opening is controlled by a striatal “go” signal that inhibits SNr and disinhibits thalamus and PFC, enabling updating to occur. Gate closing is controlled by a striatal “no-go” signal that inhibits GPe, disinhibits SNr, and inhibits thalamus and PFC, preventing updating. BG nuclei learn when to perform this signaling through environmental rewards and punishments. Extending this model to include additional structures implicated in WM and cognitive control and their role in WM subprocesses beyond gate opening and closing is a key target for model-based cognitive neuroscience. (From Hazy et al. (2006) with permission)

(Cools et al., 2007b; Dahlin et al., 2008; Lewis et al., 2004; McNab & Klingberg, 2008; Murty et al., 2011; Riggall & Postle, 2012; Roth et al., 2006; Tan et al., 2007). Moreover, the same network plays a similar role in updating value representations in reinforcement learning and value-based decision making (see Chapter “Cognitive Modeling in Neuroeconomics”), suggesting that it may be a domain general neural mechanism for accomplishing information gating (Bledowski et al., 2010; Cools et al., 2007a; Hazy et al., 2006; Jocham et al., 2011; Möller & Bogacz, 2019; O’Reilly, 2006; O’Reilly & Frank, 2006; Roth et al., 2006). The PBWM provides a coherent framework through which to understand BG and PFC activity in WM-based tasks. However, there are numerous outstanding questions regarding the neural basis of several WM subprocesses. The following subsections take a closer look at behavioral and neural evidence for key WM subprocesses, through the lens of model-based cognitive neuroscience (Corrado & Doya, 2007; Forstmann & Wagenmakers, 2015; Friston, 2009; Love, 2016; Trutti et al., 2021; Turner et al., 2017a, 2019b). In doing so, we place several benchmark behavioral phenomena in the context of current cognitive and neurocomputational theory and draw attention to methodological challenges associated with linking brain and behavior, which may be resolved through the methods of model-based cognitive neuroscience.

Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses

273

2 Revealing the Subprocesses of WM WM relies on a number of subprocesses to manipulate item representations and ensure only relevant information is maintained in WM. Recent experimental work using novel WM tasks (e.g., the reference-back and multiple-item updating paradigms, Box 1) (Ecker et al., 2014a; Rac-Lubashevsky & Kessler, 2016b) has identified behavioral signatures of several WM subprocesses. These include information gating and rehearsal/refreshing processes that support maintenance and effective remembering, item-specific removal and substitution mechanisms that actually perform updating in WM, and processes that retrieve, select, and manipulate information when multiple items are represented in WM (Ecker et al., 2014a, b; Jongkees, 2020; Lewis-Peacock et al., 2018; Nir-Cohen et al., 2020; Rac-Lubashevsky & Kessler, 2016a, b, 2018, 2019; Rac-Lubashevsky et al., 2017; Verschooren et al., 2021). Before outlining behavioral and neural evidence for these processes, we briefly mention the features of experimental tasks used to study WM subprocesses in the laboratory. The details of tasks used to study WM subprocesses can vary greatly (Box 1). However, they share a general structure: Participants must hold one or more items in WM over a retention interval (typically over a brief delay period or number of successive experimental trials), after which their memory for those items is tested. On each trial, participants are cued to maintain, update, remove, or otherwise manipulate one or more of the items in WM before responding to a memory probe via button-press response. Probe stimuli may be targets or distractors, which match or mismatch the contents of WM, respectively. Responses thus require a decision based on WM representations, the quality of which affects the difficulty of decisions and has systematic effects on errors and response latencies (Donkin et al., 2016; Ratcliff, 1978; Sewell et al., 2016). Performance is often analyzed using Donders’ (1969) subtraction method, whereby behavioral measures (e.g., error rate, mean response time) are compared between trial types (e.g., maintain vs. update, high vs. low load) in order to isolate the behavioral costs unique to the WM subprocess of interest. Likewise, comparing neurophysiological measures between trial types (e.g., via event-based fMRI and EEG) isolates neural activity unique to particular subprocesses (e.g., fMRI BOLD activity unique to updating, EEG theta power unique to maintenance; Murty et al., 2011; Rac-Lubashevsky & Kessler, 2018). As we will see, a current priority of model-based cognitive neuroscience is to develop formal models of WM that explain behavioral measures in terms of latent cognitive processes and link these to brain data through joint and integrated “brain-cognitionbehavior” models (Trutti et al., 2021) (see Chapters “Linking Models with Brain Measures” and “Advancements in Joint Modeling of Neural and Behavioral Data”).

274

R. J. Boag et al.

Box 1 Measuring WM Subprocesses with the Reference-Back and Multiple-Item Updating Paradigms The Reference-Back Task Most laboratory tasks used to study WM (e.g., n-back, delayed-match-tosample) are designed to investigate the capacity and temporal properties of WM but are unable to differentiate the contribution of WM subprocesses to observed behavior (Ecker et al., 2010; Lewis-Peacock et al., 2018; Nir-Cohen et al., 2020; Rac-Lubashevsky & Kessler, 2016a, b; Roth et al., 2006). For example, in the widely used n-back task, in which the subject updates the serial position of items in WM on each trial, several cognitive processes occur together on every trial and thus cannot be dissociated. These include encoding the incoming stimulus and binding it to a position in WM, comparing the incoming stimulus with the correct reference item, inhibiting items in irrelevant positions (i.e., avoiding intrusion effects), updating the position of each item in WM, and removing items that are no longer relevant. A recently developed exception is the reference-back task (Rac-Lubashevsky & Kessler, 2016a, b), which provides dissociable measures of core WM subprocesses (gate opening, gate closing, updating, substitution) from behavioral (choiceresponse time) data. The task is illustrated in Fig. 4.

Fig. 4 Illustration of the reference-back task. On each trial, participants indicate whether the presented letter is same or different from the letter in the most recent red frame. On reference (red frame) trials, participants must also update WM with the currently displayed letter. On comparison (blue frame) trials, participants make the same/different decision but do not update WM. (Adapted from Rac-Lubashevsky and Kessler (2016b) with permission)

To perform the reference-back, participants hold one of two stimuli (e.g., an “X” or “O”) in WM while deciding whether a series of probes match the current item in WM. On reference trials (indicated by a red frame around the stimulus), the participant must update WM with the currently displayed stimulus. On comparison trials (indicated by a blue frame), the participant simply compares the current stimulus to the one held in WM (the one (continued)

Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses

275

appearing in the most recent red frame) without updating WM. Both reference and comparison trials require a same/different decision but only reference trials require updating. Comparing performance on reference and comparison trials thus provides a behavioral measure of the cost of updating. By similar logic, switching from comparison to reference trials requires opening the WM gate (to allow for updating), while switching from reference to comparison trials requires closing the WM gate (to maintain the current contents). Gate opening is measured by comparing trials on which participants switch toward a reference trial to those where reference trials are repeated. Likewise, gate closing is measured by comparing trials on which participants switch toward a comparison trial to those where comparison trials are repeated. Finally, substitution is measured via the interaction effect of trial type (reference/comparison) and match type (same/different) and represents the cost of updating a new item into WM. The benchmark behavioral finding from the reference-back task is that trials requiring additional WM processes tend to have slower RTs and/or more frequent errors than trials that do not require such processes (Jongkees, 2020; Lewis-Peacock et al., 2018; Nir-Cohen et al., 2020; Rac-Lubashevsky & Kessler, 2016a, b, 2018, 2019; Rac-Lubashevsky et al., 2017; Verschooren et al., 2021). In terms of latent cognitive processes, these costs are typically interpreted as reflecting a combination of time required for additional subprocesses to run outside of the same/different decision stage and subprocesses interfering with the primary task (e.g., creating noisier WM representations due to drawing attention/capacity away from the decision process) (Pearson et al., 2014). Decomposing behavioral costs on the reference-back into latent cognitive processes and linking them to neural activity is a current goal of model-based cognitive neuroscience (for a review, see Trutti et al. (2021)). The Multiple-Item Updating Task A close relative of the reference-back is the multiple-item updating paradigm (Ecker et al., 2014a, b), which has been used to distinguish between item-specific and global removal processes in behavioral (choice-response time) data. In this task, participants are presented with a row of three stimuli (in black frames) to hold in WM. Participants then perform an unpredictable number of updating steps on which one or more items must be updated with a new stimulus. Crucially, the to-be-updated positions are sometimes pre-cued (indicated by an empty red frame), which is assumed to give the participant time to remove the corresponding WM item prior to encoding its replacement. Pre-cueing a position for removal typically results in performance benefits (e.g., in updating speed) for the cued items, which suggests the existence of an (continued)

276

R. J. Boag et al.

item-specific removal process. Cueing participants to remove one, two, or all three items in this paradigm has also allowed researchers to make distinctions between item-specific and global removal processes (see main text for further details).

2.1 Closing the Gate (Entering Maintenance Mode) The first subprocess we consider is gate closing, which puts WM into maintenance mode via the BG-PFC signaling mechanism described earlier. In this mode, the contents of WM are shielded from distracting information, which allows the organism to maintain stable item representations (Verschooren et al., 2021; Verschooren et al., 2020). Several mechanisms contribute to stability, including those that counteract temporal degradation by boosting the activation of relevant information (e.g., articulatory rehearsal, refreshing) and those that reduce interference by diminishing the influence of irrelevant information (e.g., filtering, inhibition) (Morey & Cowan, 2018; Oberauer & Lewandowsky, 2016). Based on functional anatomy of the BG (Parent & Hazrati, 1995), it is typically assumed that WM sits in the gateclosed/maintenance mode by default (e.g., Frank et al. (2001), Hazy et al. (2006)) since this strategy frees the organism from having to continuously exert effort to maintain WM items in the face of distractors. The processing that occurs in the maintenance mode is also thought to facilitate the formation and consolidation of long-term memories, mediated by a network that includes hippocampus and dorsolateral PFC (Ranganath et al., 2005). Behaviorally, gate closing occurs when switching from a trial requiring updating to a trial requiring maintenance. Comparing performance to non-switch trials (i.e., repeated maintenance trials) measures the cost of closing the gate. The benchmark finding is that performance on trials that require closing the gate is slower and more error prone than on trials that do not require closing the gate (Kessler & Oberauer, 2014, 2015; Rac-Lubashevsky & Kessler, 2016a, b). It is currently unknown whether these gating costs are due to time added outside of the decision-making stage or interference with the primary task (e.g., by drawing resources such as attention away from the decision process) (Pearson et al., 2014). As we discuss below, formal cognitive modeling that decomposes gating costs into latent cognitive processes is needed to answer such questions. Neuroscientific work has primarily used event-based contrasts of fMRI and EEG signals (e.g., on maintenance vs. updating trials) to identify statistically significant differences in neural activity between updating and maintenance modes that can be attributed to gating. Several cortical areas previously related to cognitive control, such as dorsolateral PFC, medial PFC, and parietal cortex, exhibit BOLD activity uniquely associated with maintenance and not updating (D’Esposito & Postle, 2015;

Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses

277

Fig. 5 Whole-brain analysis of neural substrates of several WM subprocesses as measured by the reference-back task (Rac-Lubashevsky & Kessler, 2016a). Red/yellow regions represent statistically significant contrasts in fMRI BOLD activation between reference-back trial types (see Box 1 for details). (From Nir-Cohen et al. (2020) with permission)

Feredoes et al., 2011; Nir-Cohen et al., 2020; Owen et al., 2005; Roth et al., 2006). Gate closing has been linked to increased theta power in EEG oscillatory signals (Rac-Lubashevsky & Kessler, 2018), which is also a widely reported neural signature of cognitive control (Cavanagh & Frank, 2014; Cohen, 2014; Cohen & Donner, 2013), while gamma-band oscillations have been linked to the active maintenance of WM information (Roux & Uhlhaas, 2014). Dorsolateral PFC and parietal cortex – particularly the superior intraparietal sulcus – are active when maintaining stable representations in the presence of distractors (Behrmann et al., 2004; Bettencourt & Xu, 2015; Clapp et al., 2010; Dolcos et al., 2007; Durstewitz et al., 2000; Lorenc et al., 2018; Sakai et al., 2002; Toepper et al., 2010), and the insula has been implicated in sustaining attention during maintenance (Dosenbach et al., 2008). Figure 5 shows some examples of fMRI BOLD activity related to WM processes, including gate closing, in a recent study using reference-back paradigm (see Box 1).

2.2 Opening the Gate (Entering Updating Mode) Gate opening is arguably the most widely studied of the WM subprocesses and is critical for flexible goal-directed behavior. Opening the gate allows other WM subprocesses, such as substitution and removal (discussed below), to alter the contents of WM (e.g., by removing and replacing outdated information). As outlined earlier, neurocomputational theory suggests that gate opening is controlled by striatal “go” signaling in BG-thalamus-PFC networks (Frank et al., 2001; Hazy et al., 2006; O’Reilly & Frank, 2006) in which BG nuclei learn to signal when to update items in response to updating cues or the expectation of reward. The behavioral costs of opening the gate parallel those of gate closing: Performance on trials that require the gate to open (e.g., when switching from maintenance to updating trials) is typically slower and more error prone than on trials that do not

278

R. J. Boag et al.

require the gate to open (e.g., when performing repeated updating trials) (Kessler & Oberauer, 2014, 2015; Rac-Lubashevsky & Kessler, 2016a, b). In addition, gate opening and closing costs are often asymmetrical, with gate closing reportedly taking longer than gate opening (Ecker et al., 2014b; Rac-Lubashevsky & Kessler, 2016b). This may reflect additional effort required to reacquire representational stability versus “releasing” WM from the controlled maintenance state (RacLubashevsky & Kessler, 2016a, b). Developing computational models that predict asymmetrical gating costs from a set of cognitively plausible mechanisms will be a key goal for the model-based cognitive neuroscience of WM. Neuroscientific work largely supports the involvement of BG-thalamus-PFC networks in opening the gate to WM: striatal and dorsolateral PFC involvement has been reported in tasks broadly requiring mode switching and/or updating of WM (Dahlin et al., 2008; Lewis et al., 2004; McNab & Klingberg, 2008; Tan et al., 2007). Several studies have reported activity in subcortical structures (e.g., substantia nigra, ventral tegmental area, caudate) and frontoparietal cortical regions specific to updating and not maintenance (Bledowski et al., 2009; Lepsien et al., 2005; Murty et al., 2011; Roth et al., 2006). A recent fMRI study of the reference-back task found activity unique to gate opening in BG and frontoparietal cortex, as well as task-relevant sensory areas such as visual cortex (Nir-Cohen et al., 2020), which may be involved in encoding new information during updating (Roth et al., 2006). Striatal dopamine receptor-expressing neurons and dopamine-producing midbrain structures have also been implicated in WM updating (Cools et al., 2007b; McNab & Klingberg, 2008; Murty et al., 2011), and dynamic causal modeling suggests that BG plays a central role in gating information to PFC (van Schouwenburg et al., 2010). Further indirect support for striatal dopamine involvement comes from a study linking event-based eyeblink rate (a proxy measure of striatal dopamine) to WM updating in the reference-back task (Rac-Lubashevsky et al., 2017). Electrophysiological correlates of gate opening and updating have also been found using EEG. Some examples are shown in Fig. 6. Delta power in EEG oscillatory signals has been associated with gate opening and updating in the reference-back task (Rac-Lubashevsky & Kessler, 2019). Delta power is a common neural signature of reactive or “event-driven” control and action selection processes (Cavanagh, 2015; Gulbinaite et al., 2014; Harmony, 2013) (see Chapter “An Introduction to EEG/MEG for Model-Based Cognitive Neuroscience”). The same study found that anterior cortical regions showed an enhanced negative ERP component that was unique to updating (Rac-Lubashevsky & Kessler, 2019). This latter signal is believed to be involved in controlled inhibition and action selection (Folstein & Van Petten, 2008) and may reflect a gate opening or updating signal consistent with the theorized role of the striatum in BG-thalamus-PFC signaling. Overall, the brain networks that control gate opening/updating in WM have clear links to the striatal reward prediction error signaling observed in reinforcement learning and value-based decision-making, in which the striatum signals when to update item representations in response to feedback from the environment (Schultz, 1998; Schultz et al., 1997). Indeed, recent work has begun to build integrated models of WM and reinforcement learning, in which behavior is the result of a mixture of

Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses

279

Fig. 6 EEG analysis of WM subprocesses as measured by the reference-back task. The left panel shows scalp map topography for trial type (updating), matching, and an interaction effect involving stimulus frequency. The color represents the t-value of the mean regression weights for each electrode at the times shown. The right panel show average ERP signals from representative electrodes within the clusters on the left. (From Rac-Lubashevsky and Kessler (2019) with permission)

the two systems (e.g., Collins and Frank (2012), McDougle and Collins (2021)). This further highlights the close connection between WM and learning, and we pick up this point again when discussing current directions below.

280

R. J. Boag et al.

2.3 Removing Information from WM Forgetting is a critical aspect of WM. Due to its severely limited storage capacity, WM must be able to create storage space by removing information that is no longer relevant to the task at hand. This is thought to be facilitated by an active removal process akin to directed forgetting (e.g., Festini and Reuter-Lorenz (2013, 2014)), which can be deliberately engaged to remove information from WM (Ecker et al., 2014a, b; Lewis-Peacock et al., 2018; Ma et al., 2014). Removing irrelevant information from WM is crucial for goal-directed behavior because it reduces the capacity demands on WM and reduces the likelihood of retrieving false or outdated information (e.g., by eliminating potential sources of proactive interference and reallocating cognitive resources to the remaining items, which enhances their accessibility) (Festini & Reuter-Lorenz, 2013, 2014). Removal can be temporary or permanent: items can be temporarily deactivated by withdrawing attention or WM capacity (and subsequently refreshed or reactivated) or permanently removed by unbinding item-context associations from WM (Lewis-Peacock et al., 2018). It has been suggested that formal mechanisms such as Hebbian anti-learning and resetting weights on associative connections (e.g., between item-context bindings) may be involved in deactivating or reducing the accessibility of item representations (e.g., Ecker et al. (2014a)). At the neural level, removal may work by transforming items from activation-based storage to weight-based (synaptic) storage (O’Reilly & Munakata, 2000), which echoes the difference in how storage is accomplished in short- versus long-term memory (i.e., sustained neural activity vs. synaptic consolidation/long-term potentiation). Thus, removal may involve a transformation between WM and LTM. Behavioral signatures of removal are evident in WM tasks in which participants are cued to either maintain the current contents of WM or remove one or more items before responding to a memory probe. A recent study compared trials on which participants removed either one, two, or three items from a three-item WM set (Ecker et al., 2014b). Removing all three items (i.e., “wiping the global workspace”) required less time than removing a subset of one or two target items (i.e., itemspecific removal). This suggests two removal modes: In global removal, which is rapid but indiscriminate, all items are deactivated in parallel but potentially useful information is discarded, which causes retrieval failures and forgetting. In itemspecific removal, items are selected and deactivated in a serial manner. This is slower and more effortful than global removal but has the advantage of allowing still-relevant items to remain in an active and accessible state (Ecker et al., 2014b; Kessler & Meiran, 2008). The benefits of removal are also evident in behavior: Performance decrements due to set-size/capacity-sharing effects are alleviated following removal (Lewis-Peacock et al., 2018; Ma et al., 2014). Conversely, access to removed information is reduced following removal (resulting in slower, more error-prone retrieval), while access to remaining items is enhanced (resulting in faster, less error-prone retrieval). These findings support the idea that removal

Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses

281

Fig. 7 Neural correlates of intentional and unintentional forgetting. In panel (a), the right parietal cortex showed heightened BOLD activity during intentional versus unintentional forgetting. In panel (b), left inferior frontal gyrus showed heightened BOLD activity during unintentional versus intentional forgetting. (From Rizio and Dennis (2013) with permission)

reduces the capacity demands on WM by making more cognitive resources (e.g., attention, WM capacity) available for processing remaining items. Consistent with these behavioral signatures of removal, neuroscientific work has identified a number of neural phenomena related to the behavioral effects. Several studies have demonstrated that removed items have reduced neural traces at the neurophysiological level, such that probing a previously removed item elicits a muted neural response compared with other active items or the same item before removal (e.g., Lewis-Peacock et al. (2012)). Work using fMRI has reported activity in areas related to cognitive control and inhibition (e.g., prefrontal and parietal cortex) uniquely related to intentional removal as opposed to unintentional forgetting (Fig. 7) (Nowicka et al., 2011; Reber et al., 2002; Rizio & Dennis, 2013; Wylie et al., 2008). Work with EEG has identified sustained positive activation in dorsolateral PFC in response to forgetting cues (Hauswald et al., 2011; PazCaballero et al., 2004). Overall, this is consistent with a prefrontal network involved in inhibitory control over the contents of WM (see also Anderson and Hanslmayr (2014), Lewis-Peacock et al. (2018), and Ma et al. (2014)). One goal of future research will be to delve deeper into the difference between global and item-specific removal and establish whether distinct neural substrates underlie each process.

282

R. J. Boag et al.

2.4 Substituting Items in WM Closely related to removal is substitution. Substitution occurs when an item in WM is replaced by an incoming item. This allows updating to occur when WM is already occupied by active representations. In the laboratory, researchers typically study substitution by comparing the relative cost of performing different kinds of substitutions, such as overwriting active items versus empty slots (Ecker et al., 2014a) or substituting an item for an identical item (i.e., refreshing) versus a different item (i.e., replacement) (Rac-Lubashevsky & Kessler, 2016a, b). Overwriting active representations typically takes longer than overwriting empty slots (e.g., following removal) (Ecker et al., 2014a). Similarly, refreshing the same representation is less costly than replacing it with a different one (Rac-Lubashevsky & Kessler, 2016a, b). This is because replacing active information presumably requires first running the removal process, which is not needed when writing to an empty position or refreshing an existing representation. The neural basis of substitution has received less attention than the other subprocesses. However, it likely involves many of the same dopaminergic midbrain and prefrontal cortical regions as occurred in updating more broadly (e.g., Murty et al. (2011)). A recent fMRI investigation of the reference-back paradigm found that substitution elicited unique activation in the left dorsolateral PFC and inferior parietal lobule (Fig. 5) (Nir-Cohen et al., 2020). A whole-brain conjunction analysis revealed shared activity in the supplementary motor area related to updating and substitution. These structures have previously been implicated in cognitive control and the control of motor movement (Bogacz et al., 2010; Forstmann et al., 2008). However, further work is necessary to fully characterize the neural substrates unique to the substitution subprocess rather than the updating mode more broadly. The connection to the motor system is particularly interesting in terms of broader theoretical perspectives in evolutionary psychology and embodied cognition, which draw links between the mental manipulation of cognitive representations and motor manipulation of physical objects in the environment (e.g., Anderson (2014), Wilson (2002)). This is likely to be an interesting and theoretically rich area of future research.

2.5 Retrieving, Selecting, and Operating on Multiple Items When multiple items are maintained in WM simultaneously, a key problem that arises is how to single out a particular item from the set for retrieval or further processing. Processes that facilitate multiple-item WM include mechanisms of retrieval and selective attention (e.g., Cowan (1988), Ma et al. (2014), Sewell et al. (2016)), as well as grouping and reorganization operations (e.g., chunking and sorting) which aid retrieval by restructuring representational information into more memorable formats (Brady et al., 2009; D’Esposito et al., 1999; Marshuetz, 2005;

Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses

283

Nassar et al., 2018; Thalmann et al., 2019; van Dijck et al., 2013). These processes all involve an interplay between WM and attention such that various operations can be carried out selectively on some items and not others: Goal-directed attention to the external environment guides what enters WM and the fidelity with which it is represented. Conversely, goals and contextual information stored in WM influence what is attended to and guide action selection (Oberauer, 2019). Retrieval and selection processes also play a role in item-specific removal and substitution, since selectively removing or replacing one item from among many entails some ability to identify and select the target item. Current cognitive theory suggests that within WM, processing is further limited by a single-item focus of attention that can hold only one item at a time for further processing and which therefore imposes a serial processing constraint on WM operations (Oberauer, 2002, 2009). Behavioral evidence for the single-item focus comes in the form of the itemrepetition benefit (or alternatively, item-switch cost), whereby after an item in WM has been selected, selecting the same item for an immediately subsequent operation is faster than selecting a different item (Garavan, 1998; Gehring et al., 2003). This suggests that the object of a cognitive operation remains in the singleitem focus of attention after the operation has been completed and thus does not need to be selected again when the subsequent operation requires the same object (Garavan, 1998; Oberauer, 2003). The item-repetition benefit has been observed across modalities (e.g., verbal WM (Garavan, 1998; Oberauer, 2003), spatial WM (Hedge & Leonards, 2013; Kübler et al., 2003)) and for several kinds of operations on the selected item (e.g., arithmetic (Garavan, 1998; Oberauer, 2003), updating (Oberauer, 2003), recognition (Oberauer, 2006)), supporting the idea of a general attentional bottleneck for accessing items in WM. Another issue that arises with multiple items is that of WM’s extremely limited storage capacity. Studies looking at capacity effects assess task performance at several WM set sizes (usually with at least one under- and one overcapacity condition, for example, a three-item condition and a six-item condition; Rademaker et al., 2012; Zhang & Luck, 2008), which gives insight into how different WM subprocesses respond (or fail) when the capacity of WM is exceeded. One benchmark finding is that selection and operating costs scale with set size (Oberauer et al., 2018), meaning that processes that operate on WM become slower and more error prone as WM load increases. Both fMRI and EEG measures yield signals that scale with the number of items in WM (Fig. 8) (Cowan et al., 2011; Howard et al., 2003; Kornblith et al., 2016; Leung et al., 2004; Manoach et al., 1997; Roux et al., 2012; Todd & Marois, 2004; Veltman et al., 2003; Vogel & Machizawa, 2004), while some work has identified signals more strongly related to the precision (Zhao et al., 2020) or complexity (Xu & Chun, 2006) of individual items rather than overall WM load. One explanation for set size effects is framed in terms of resource sharing: Distributing a limited resource among multiple items reduces the precision or fidelity with which each item can be represented (Bays et al., 2009; Ratcliff, 1978; Sewell et al., 2016). In support of this account, participants can be directed to treat a subset of items as more important, which produces a robust gain in recall precision in emphasized items. Moreover, this gain comes at a cost to non-emphasized items,

284

R. J. Boag et al.

Fig. 8 Neural activity related to WM storage and capacity. In panel (a), short-term maintenance of visual information produces a sustained increase in BOLD activity (red and yellow) in prefrontal and posterior parietal areas and a decrease in occipital areas (Riggall & Postle, 2012). In panel (b), change in BOLD signal in parietal cortex scales in magnitude with the number of items being stored in WM (Todd & Marois, 2004). (Adapted from Ma et al. (2014) with permission)

which supports the idea that items are allocated resources from a limited capacity continuous pool controlled by selective attention (Bays & Husain, 2008; Ma et al., 2014). A related idea is that capacity effects arise due to retrieval competition, such that the probability of retrieving a particular target item decreases in proportion to the number of competing items (Oberauer, 2009). The retrieval competition account is also supported by the finding that the severity of inter-item interference scales with inter-item similarity (Oberauer & Bialkova, 2011). There is ongoing debate concerning whether items are allocated capacity from a limited pool of continuous resources (Bays & Husain, 2008; Ma et al., 2014; Zhang & Luck, 2008) or held in a small number of discrete high-precision slots (or some combination of the two) (Donkin et al., 2016; Donkin et al., 2013). Finally, capacity demands can be alleviated to a certain extent by grouping and reorganization operations that restructure information into more memorable or accessible formats. Grouping or “chunking” involves binding together previously separate items or combining features to form a single representation that can be retrieved as a unit. For example, Luck and Vogel (1997) found no performance costs when comparing memory for single- versus multiple-feature items, suggesting that multiple-feature items were processed with the same efficiency as single-feature items (Luck & Vogel, 1997; but see Hardman and Cowan (2015), who found effects of both feature and item load). Reorganization involves volitional mnemonic strategies, such as sorting items into alphabetical or chronological order or otherwise organizing the contents of WM in a manner that improves recall (D’Esposito et al., 1999; Marshuetz, 2005; Nassar et al., 2018; van Dijck et al., 2013). In terms of

Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses

285

neural correlates, “chunking” and similar encoding strategies have been linked to prefrontal and parietal networks involved in associative learning (Bor et al., 2004; Bouchacourt et al., 2020; Pang et al., 2019).

3 Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses It is clear from the preceding discussion that our current picture of WM is a complex one that we are yet to fully understand. WM is supported by numerous fundamental subprocesses that perform a variety of distinct but complimentary functions that often interact in complex ways. Significant progress has been made in characterizing the behavioral signatures of various WM subprocesses, and recent work has begun to make connections to their neural substrates. Behavioral measures of WM subprocesses are typically computed using the logic of the subtraction method (i.e., taking differences between trial types; Donders, 1969) and summarized using aggregate measures of task performance such as mean response time and error rate. These are then subject to traditional statistical analyses such as ANOVA and regression. However, an important limitation of the subtraction method is that it is not a pure measure of the latent processes that give rise to the aggregate behavioral difference. Identifying the latent processes responsible requires detailed cognitive modeling of choice-response time distributions (Ratcliff, 1978). The pitfalls of using coarse behavioral measures to draw inferences about latent cognitive processes have been discussed extensively in the mathematical psychology literature (e.g., Ratcliff (1978), Wagenmakers et al. (2007)). However, model-based analyses of WM subprocesses remain exceedingly rare. This issue is particularly important in WM, in which a large number of mechanisms operate in concert over extremely brief timescales (e.g., 10–50 ms). In practice, this means that there is a limit on how informative behavioral data (e.g., choice-response time) can be for selecting between mechanisms because the latent cognitive processes of interest tend to mimic each other at the level of behavior (Hawkins et al., 2017). This issue will likely be alleviated to some extent by the development of more detailed computational models of WM that decompose observed behavior into latent component processes or integrate WM subprocesses into models of higher-level cognition, such as reinforcement learning (e.g., Collins and Frank (2012), McDougle and Collins (2021)) and evidence accumulation models of decision-making (e.g., Brown and Heathcote (2008); Forstmann et al. (2016); Ratcliff (1978)), or more general cognitive architectures, such as ACT-R (Anderson & Lebiere, 2014; Anderson et al., 1996). Evidence accumulation models, which explain choices and response time distributions in terms of latent cognitive processes, are particularly well suited to reveal whether the WM-related phenomena outlined above occur because WM subprocesses add time outside of the decision stage (longer nondecision time), interfere with the decision process itself (reduced

286

R. J. Boag et al.

or noisier processing rate), or induce strategic adjustments engaging top-down cognitive control (increased response caution). Decomposing WM phenomena (e.g., gating, switching, updating costs) into a set of latent cognitive processes rather than relying on aggregate summary statistics (e.g., mean response time) enables exploring WM subprocesses in greater detail than is typically achieved. Neuroscience has made impressive progress in identifying unique neural signatures of WM subprocesses that are detectable in fMRI and EEG data. As we have seen, typical analysis methods involve taking event-based contrasts of neurophysiological signals (e.g., the difference in BOLD signal between maintenance and updating trials), which are sometimes regressed on or correlated with behavioral measures. The high spatial resolution of fMRI has proven particularly useful in detecting activity in subcortical structures involved in WM such as the striatum. However, most imaging of WM subprocesses has been conducted at a field strength of 3 Tesla, which offers relatively poor resolution of small subcortical structures (e.g., substantia nigra, ventral tegmental area, subthalamic nucleus) compared with ultrahigh field (e.g., 7 T and higher field strengths) machines (de Hollander et al., 2017; Mileti´c et al., 2020; Trutti et al., 2019). Ultrahigh field fMRI (with increased resolution and improved signal- and contrast-to-noise ratios) will likely play a significant role in identifying the neural substrates of WM with sufficient spatial precision. Despite advances in spatial resolution, fMRI still relies upon slow changes in BOLD response following neural activity, which gives fMRI poor temporal resolution. Due to the inherently slow BOLD response, even ultrahigh field fMRI is ill-equipped to detect activity stemming from extremely brief WM subprocess that operate at timescales on the order of 1050 ms. This is particularly so if the structures that generate such brief pulses of activity are located deep within the subcortex. High temporal resolution methods like EEG, which offer a practically continuous-time read-out of neural activity, are likely to play a complimentary role in detecting neural signatures of extremely brief subprocesses and resolving the spatiotemporal tradeoff in neurophysiological measurement. However, the low spatial resolution of EEG means that it cannot unambiguously attribute electrical activity recorded from the scalp to deep subcortical sources (this requires solving the inverse problem, Grech et al., 2008). As discussed elsewhere in this book (see Chapter “An Introduction to EEG/MEG for Model-Based Cognitive Neuroscience”), one reason for this is that dendrites of subcortical neurons are not as regularly aligned as cortical pyramidal cells. This means that the polarity of electrical signals from subcortical structures tends to cancel out when summing over many neurons and thus signal from subcortical structures does not reach the scalp. A combination of fMRI and EEG techniques, along with experimental designs that elicit strong neural contrasts between conditions, will likely be required to achieve a complete understanding of the neural basis of WM. From the perspective of model-based cognitive neuroscience, one of the most important limitations is that common approaches in the literature on WM subprocesses analyze brain and behavior separately, without attempting to link the two sources of data. Although conducting separate analyses is relatively simple,

Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses

287

such approaches ignore the mutual constraint afforded by linking behavioral and neural data in a joint analysis or integrated neurocognitive model (Turner et al., 2013, 2017a, b, 2019a, b). In cognitive neuroscience, joint modeling involves constructing a model of the behavioral data and connecting it to a model of the neural data via a linking function (see Chapter “Linking Models with Brain Measures”). The choice of behavioral model typically depends on the experimental task (e.g., speeded decision-making, reinforcement learning) and describes the latent cognitive processes that give rise to behavior. The choice of neural model typically depends on the modality of the neural data (e.g., fMRI, EEG) and describes the shape of the neural response (e.g., hemodynamic and electrophysiological response functions). Parameters and mechanisms in each model are then linked, either directly, by parameterizing one model in terms of the other, or indirectly, by imposing a hierarchical structure in which parameters are related through a common overarching distribution (e.g., multivariate normal) (see Turner et al.’s discussion of the covariance approach). This allows for simultaneous fitting and parameter estimation for both models that properly accounts for the measurement noise in each data source (see Chapter “Cognitive Models as a Tool to Link Decision Behavior With EEG Signals”). Joint analyses impose stronger constraints on theory and ultimately produce more detailed and reliable inferences about the latent processes that generate behavior. Analyses that do not account for mutual constraint of behavioral and neural data risk overfitting each data source and thus misattributing (or exaggerating) effects, which results in biased inferences that do not generalize well to new data. Joint modeling also allows closer contact to be made between cognitive models and biologically plausible neurocomputational models such as the PBWM (Frank et al., 2001; Hazy et al., 2006; O’Reilly & Frank, 2006). This would greatly improve our understanding of the relation between cognitive and neural mechanisms underlying WM. For example, joint approaches can alleviate issues of model mimicry (i.e., different models making the same prediction), since the additional structure in the neural data can be used to select between models that differ in their internal dynamics but make identical predictions at the level of choice and mean response time (Ditterich, 2010; Forstmann et al., 2011; Hawkins et al., 2017; Mack et al., 2013; Purcell et al., 2010; Purcell & Palmeri, 2017; Schall, 2019). Joint modeling can also be used to “fuse” together additional data sources (e.g., behavior + fMRI + EEG; Turner et al., 2016) into a single model, the benefit of which is to improve the model’s predictive accuracy and generalizability. For example, through several cross-validation analyses, Turner et al. (2016) showed that including both EEG and fMRI data in a joint model allowed for better predictions of behavioral data than did either modality alone, suggesting that the EEG and fMRI data contained slightly different information about the decision-making process. Overall, such approaches hold great promise for improving our understanding of WM subprocesses and bringing us closer to the ideal of an integrated model of WM that generates both behavioral and neural data from a common set of latent mechanisms.

288

R. J. Boag et al.

We are confident that the methods of model-based cognitive neuroscience outlined above hold great promise for furthering our understanding of WM subprocesses and their neural substrates. However, some caution is warranted in interpreting joint analyses. In terms of interpreting correlations between cognitive and neural mechanisms, it is important to keep in mind that numerous neurocognitive systems (e.g., WM, S/LTM, learning, cognitive control) operate in parallel even in very simple laboratory tasks. Identifying the brain networks that control particular WM subprocesses from a sea of parallel activity is a challenging task, requiring both clever experimental design and detailed joint brain-behavior modeling. It may also be the case that there is no clear correspondence between how a cognitive process is implemented at the neural level and its formal abstraction in a particular cognitive model. In such cases, we risk being blinded by the limited scope of our own models (i.e., there are no guarantees that the true model will be contained in the set of models we actually build). It is thus important to explore a large space of model variants across multiple levels of abstraction (Marr, 1982) (see Chapter “Linking Models with Brain Measures”) in order to converge on the so-called ground truth of the neural systems of WM.

4 Concluding Remarks In addition to the ongoing investigations in WM subprocesses outlined above, there are currently a number of “hot topics” in WM research worth mentioning that explore the relation between WM and other cognitive systems. One such topic is cognitive branching, which refers to the ability to postpone execution of a primary task until a secondary task has been completed (Hyafil & Koechlin, 2016). Cognitive branching is crucial for goal-directed behavior because it enables flexible switching between different task sets (i.e., contextually appropriate configurations of cognitive processes used to perceive and respond to the environment) depending on the immediate context in which the person is acting (cf. Broadbent, 1970, 1971; Kahneman, 1973; for a review, see Sakai (2008)). This work has shown that only one cognitive branch (task set) can be active at once, meaning that our actions are governed by one plan at a time. Moreover, previously active branches are maintained in a “quiet” inactive state, which can be reactivated upon completion of the focal branch. Lateral prefrontal and frontopolar cortex have been implicated in switching between branches as goals and task demands change (Hyafil & Koechlin, 2016). EEG work shows that active and inactive WM items are encoded in dissociable neural patterns and that latent mnemonic representations can be reconfigured into an active action-ready state as task demands change (Muhle-Karbe et al., 2021). This ability to postpone contextually relevant information and action plans – and reactivate them at the appropriate time – also underlies prospective memory, which is the ability to maintain deferred intentions until the appropriate time or context occurs in the future (Boag et al., 2019a, b; Einstein & McDaniel, 2005; Hyafil & Koechlin, 2016; Strickland et al., 2018). Cognitive branching is

Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses

289

thought to be a domain general cognitive function central to multitasking and performing tasks that do not follow a pre-established plan. It has been suggested that selecting (and switching between) contextually appropriate branches or task sets is controlled by brain areas that track contextual cues such as uncertainty in expected reward and stimulus-response-outcome mappings (for a review, see Bland and Schaefer (2012)). In short, global changes in the external environment serve as an “alarm system” that signals when to switch to a more appropriate strategy (Bland & Schaefer, 2012). These ideas are consistent with broader “meta-control” theories of cognition that involve adopting contextually appropriate “processing styles” (e.g., Hommel (2020)) as well as Miller et al.’s (1960) original conception of WM as supporting the execution of multiple possible “action plans.” A second active area of research concerns the interplay between WM and reinforcement learning (RL) (i.e., learning driven by rewards and punishments, Sutton & Barto, 2018). Recent work has shown that participants rely on a mixture of “automatic” feedback-driven learning and effortful WM-based processing during instrumental learning (Collins & Frank, 2012; McDougle & Collins, 2021). Here, the WM and RL systems operate in concert during learning: participants use both short-term memory of recent outcomes and long-term item-value associations consolidated through error-driven learning to guide behavior (i.e., action selection). The relative weighting of each system in guiding behavior is modulated by WM load: when the number of items to learn is small (i.e., within capacity), the WM system dominates, whereas when the number of items to learn is large (i.e., capacity is exceeded), the RL system dominates (McDougle & Collins, 2021). This is consistent with the distinction between model-based and model-free learning in the RL literature (Doll et al., 2015; Otto et al., 2013), as well as broader dual systems theories of cognitive control that involve an interplay between automatic, associative processing (e.g., RL) and controlled, capacity-limited processing (e.g., WM) (Braver, 2012; Evans, 2003; Kahneman, 1973). Crucially, this work highlights the reciprocal relationship between WM and RL: WM supports learning by holding representations that guide value-based decision-making and are, in turn, updated by the learning system (Bakkour et al., 2018; Shadlen & Shohamy, 2016; Shohamy & Daw, 2015). WM subprocesses also “learn,” for example, the timing of updating being trained by striatal “go/no-go” signaling (Frank et al., 2001; Hazy et al., 2006; O’Reilly & Frank, 2006). A final interesting line of research concerns the interaction between WM, LTM, and perception. Verschooren et al. (2021) developed a modified reference-back paradigm in which items are gated into WM from either LTM or perception, which allows for comparison of gating dynamics for perceptual versus long-term memory information. This work found evidence that a single gate and a shared attentional selection mechanism control the access to WM for both perceptual and LTM information sources. Moreover, there are asymmetric costs associated with switching attention from internal (mnemonic) representations to external perceptual information and vice versa (Verschooren et al., 2020; for a review, see Verschooren et al. (2019); see also Roth and Courtney (2007)). In related work, Bartsch and Shepherdson (2021) showed that offloading items from WM to LTM reduces WM

290

R. J. Boag et al.

load. Recall performance in a WM task was unaffected when memory load increased through the addition of items stored in LTM but deteriorated with the addition of items stored in WM, suggesting that individuals can flexibly draw on representations held in LTM in order to support WM-based decisions. Overall, the work discussed here highlights the importance of viewing WM not as an isolated module but as part of a broader set of complimentary cognitive systems (e.g., attention, perception, RL, LTM) that work together in order to support flexible goal-directed behavior. A promising avenue of future research would be to extend joint brain-behavior modeling approaches to account for data from multiple tasks, each of which may be explained by a different cognitive model or set of cognitive processes (e.g., Wall et al. (2021)). In this chapter, we introduced the concept of WM and outlined key behavioral and neural evidence for a number of critical WM subprocesses. We highlighted several common approaches to linking brain and behavior in WM research and drew attention to limitations associated with common approaches. In doing so, we suggested that a number of recent advances in model-based cognitive neuroscience discussed throughout this book hold great promise in advancing our understanding of WM at the neural, cognitive, and behavioral levels. Overall, we are confident that such approaches will bring us closer to a unified model of WM that explains both behavioral and neural data in terms of a common set of neurocognitive mechanisms.

A.1 Exercises 1. A long-standing debate in the WM literature concerns whether items in WM are stored in a small number of discrete “slots” or allocated continuous resources from a limited capacity pool. Having read the review by Ma et al. (2014), how might you use neural data (e.g., EEG, fMRI) to distinguish between discrete slots and continuous resource models? What are two empirical findings that argue against the discrete slots perspective? Under what circumstances might discrete slots and continuous resource models mimic each other? 2. Concerning WM’s severe capacity limit, Cowan (2010) says that two camps of theorists have emerged: one views capacity limits as a weakness, while the other views capacity limits as a strength. What are some of the arguments for capacity limits being a weakness? And for being a strength? Do you agree with this dichotomy? Can you think of any strengths or weaknesses that are not mentioned? 3. Cowan (2010) also mentions two phenomena that must be controlled for in WM experiments in order to obtain a pure measure of WM’s capacity limit. What are the two phenomena and what are some methodological or experimental design strategies that are used to mitigate their confounding influence? 4. Another long-standing debate concerns whether mnemonic representations degrade over time due to temporal decay or interference from other sources.

Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses

291

Having read the review by Ricker et al. (2016), outline some evidence both for and against temporal decay. These authors state that “the biggest hurdle to providing evidence for or against the existence of decay has been preventing verbal rehearsal of memoranda while at the same time avoiding the introduction of interference.” Why is this? How do these two confounds compare to those of Cowan (2010) identified in the previous question? (Hint: They are very similar!). 5. Considering what you have learned from this and other modeling chapters in this book, briefly sketch out a design for a cognitive model of WM that accounts for some of the subprocesses and behavioral phenomena discussed in this chapter. What kind of modeling architecture might you use as a starting point (e.g., neural network, evidence accumulation, etc.)? What mechanisms might be useful for explaining your target effects? How would you connect your model to neural data (e.g., EEG, fMRI)? Which neural signals might you link to which cognitive mechanisms and why?

B.1 Further Reading • The classic review by Goldman-Rakic (1995) outlines evidence for the cellular basis of WM with a focus on early research using direct neuronal recordings in behaving animals. • Cowan’s classic (2001) paper provides detailed behavioral evidence for the “magical number 4” as WM’s item capacity limit. Cowan (2010) provides additional interesting discussion and reflections on the capacity debate. • Ma et al. (2014) review behavioral and neural evidence in support of the idea that WM relies on flexible allocation of continuous resources to keep items in an activated state. • D’Esposito (2007) provides an interesting theoretical discussion of WM from a cognitive neuroscience perspective. • Lewis-Peacock et al. (2018) discuss recent evidence for a number of distinct memory removal processes in the context of current cognitive and neural theories of WM. • Ricker et al. (2016) give a historical overview of the debate surrounding whether mnemonic representations degrade over time due to temporal decay or interference from other items.

References Anderson, M. L. (2014). After phrenology: Neural reuse and the interactive brain. MIT Press. Anderson, M. C., & Hanslmayr, S. (2014). Neural mechanisms of motivated forgetting. Trends in Cognitive Sciences, 18(6), 279–292. Anderson, J. R., & Lebiere, C. J. (2014). The atomic components of thought. Psychology Press.

292

R. J. Boag et al.

Anderson, J. R., Reder, L. M., & Lebiere, C. (1996). Working memory: Activation limitations on retrieval. Cognitive Psychology, 30(3), 221–256. Baddeley, A. (1992). Working memory. Science, 255(5044), 556–559. https://doi.org/10.1126/ science.1736359 Badre, D. (2012). Opening the gate to working memory. Proceedings of the National Academy of Sciences, 109(49), 19878–19879. Bakkour, A., Zylberberg, A., Shadlen, M. N., & Shohamy, D. (2018). Value-based decisions involve sequential sampling from memory. BioRxiv, 269290. https://doi.org/10.1101/269290 Bartsch, L. M., & Shepherdson, P. (2021). Freeing capacity in WM through the use of LTM representations. Journal of Experimental Psychology: Learning, Memory, and Cognition, 48(4), 1–51. Bays, P. M., & Husain, M. (2008). Dynamic shifts of limited working memory resources in human vision. Science, 321(5890), 851–854. Bays, P. M., Catalao, R. F. G., & Husain, M. (2009). The precision of visual working memory is set by allocation of a shared resource. Journal of Vision, 9(10), 7. Behrmann, M., Geng, J. J., & Shomstein, S. (2004). Parietal cortex and attention. Current Opinion in Neurobiology, 14(2), 212–217. Bettencourt, K., & Xu, Y. (2015). Understanding the nature of visual short-term memory representation in human parietal cortex. Journal of Vision, 15(12), 292. Bland, A. R., & Schaefer, A. (2012). Unexpected uncertainty, volatility and decision-making. Frontiers in Neuroscience, 6, 85. Bledowski, C., Rahm, B., & Rowe, J. B. (2009). What “works” in working memory? Separate systems for selection and updating of critical information. Journal of Neuroscience, 29(43), 13735–13741. Bledowski, C., Kaiser, J., & Rahm, B. (2010). Basic operations in working memory: Contributions from functional imaging studies. Behavioural Brain Research, 214, 172–179. https://doi.org/ 10.1016/j.bbr.2010.05.041 Boag, R. J., Strickland, L., Heathcote, A., Neal, A., & Loft, S. (2019a). Cognitive control and capacity for prospective memory in complex dynamic environments. Journal of Experimental Psychology: General, 148(12), 2181–2206. https://doi.org/10.1037/xge0000599 Boag, R. J., Strickland, L., Loft, S., & Heathcote, A. (2019b). Strategic attention and decision control support prospective memory in a complex dual-task environment. Cognition, 191, 103974. https://doi.org/10.1016/j.cognition.2019.05.011 Bogacz, R., Wagenmakers, E. J., Forstmann, B. U., & Nieuwenhuis, S. (2010). The neural basis of the speed-accuracy tradeoff. Trends in Neurosciences, 33, 10–16. https://doi.org/10.1016/ j.tins.2009.09.002 Bor, D., Cumming, N., Scott, C. E. L., & Owen, A. M. (2004). Prefrontal cortical involvement in verbal encoding strategies. European Journal of Neuroscience, 19(12), 3365–3370. Bouchacourt, F., Palminteri, S., Koechlin, E., & Ostojic, S. (2020). Temporal chunking as a mechanism for unsupervised learning of task-sets. eLife, 9, e50469. Brady, T. F., Konkle, T., & Alvarez, G. A. (2009). Compression in visual working memory: Using statistical regularities to form more efficient memory representations. Journal of Experimental Psychology: General, 138(4), 487. Braver, T. S. (2012). The variable nature of cognitive control: A dual mechanisms framework. Trends in Cognitive Sciences, 16, 105–112. https://doi.org/10.1016/j.tics.2011.12.010 Broadbent, D. E. (1970). Stimulus set and response set, two kinds of selective attention. In D. I. Mostofsky (Ed.), Attention: Contemporary theory and analysis (pp. 51–60). Appleton-CenturyCrofts. Broadbent, D. E. (1971). Decision and stress. Academic Press. Brown, S. D., & Heathcote, A. (2008). The simplest complete model of choice response time: Linear ballistic accumulation. Cognitive Psychology, 57(3), 153–178. https://doi.org/10.1016/ j.cogpsych.2007.12.002

Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses

293

Cantor, J., & Engle, R. W. (1993). Working-memory capacity as long-term memory activation: An individual-differences approach. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19(5), 1101. Cary, M., & Carlson, R. A. (2001). Distributing working memory resources during problem solving. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27(3), 836. Cavanagh, J. F. (2015). Cortical delta activity reflects reward prediction error and related behavioral adjustments, but at different times. NeuroImage, 110, 205–216. Cavanagh, J. F., & Frank, M. J. (2014). Frontal theta as a mechanism for cognitive control. Trends in Cognitive Sciences, 18(8), 414–421. Chatham, C. H., & Badre, D. (2015). Multiple gates on working memory. Current Opinion in Behavioral Sciences, 1, 23–31. Clapp, W. C., Rubens, M. T., & Gazzaley, A. (2010). Mechanisms of working memory disruption by external interference. Cerebral Cortex, 20(4), 859–872. Cohen, M. X. (2014). A neural microcircuit for cognitive conflict detection and signaling. Trends in Neurosciences, 37(9), 480–490. Cohen, M. X., & Donner, T. H. (2013). Midfrontal conflict-related theta-band power reflects neural oscillations that predict behavior. Journal of Neurophysiology, 110(12), 2752–2763. Collins, A. G. E., & Frank, M. J. (2012). How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. European Journal of Neuroscience, 35(7), 1024–1035. Cools, R. (2019). Chemistry of the adaptive mind: Lessons from dopamine. Neuron, 104, 113–131. https://doi.org/10.1016/j.neuron.2019.09.035 Cools, R., & D’Esposito, M. (2011). Inverted-U-shaped dopamine actions on human working memory and cognitive control. Biological Psychiatry, 69(12), e113–e125. Cools, R., Lewis, S. J. G., Clark, L., Barker, R. A., & Robbins, T. W. (2007a). L-DOPA disrupts activity in the nucleus accumbens during reversal learning in Parkinson’s disease. Neuropsychopharmacology, 32(1), 180–189. https://doi.org/10.1038/sj.npp.1301153 Cools, R., Sheridan, M., Jacobs, E., & D’Esposito, M. (2007b). Impulsive personality predicts dopamine-dependent changes in frontostriatal activity during component processes of working memory. Journal of Neuroscience, 27(20), 5506–5514. Corrado, G., & Doya, K. (2007). Understanding neural coding through the model-based analysis of decision making. The Journal of Neuroscience, 27(31), 8178–8180. https://doi.org/10.1523/ JNEUROSCI.1590-07.2007 Cowan, N. (1988). Evolving conceptions of memory storage, selective attention, and their mutual constraints within the human information-processing system. Psychological Bulletin, 104, 163– 191. Cowan, N. (1999). An embedded-processes model of working memory. In Models of working memory: Mechanisms of active maintenance and executive control (pp. 62–101). Cambridge University Press. Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24(1), 87–114. https://doi.org/10.1017/ S0140525X01003922 Cowan, N. (2008). What are the differences between long-term, short-term, and working memory? Progress in Brain Research, 169, 323–338. Cowan, N. (2010). The magical mystery four: How is working memory capacity limited, and why? Current Directions in Psychological Science, 19(1), 51–57. Cowan, N. (2016). Working memory capacity: Classic edition. Psychology Press. Cowan, N. (2017). The many faces of working memory and short-term storage. Psychonomic Bulletin & Review, 24(4), 1158–1170. Cowan, N. (2019). Short-term memory based on activated long-term memory: A review in response to Norris (2017). Psychological Bulletin, 145(8), 822–847. Cowan, N., Li, D., Moffitt, A., Becker, T. M., Martin, E. A., Saults, J. S., & Christ, S. E. (2011). A neural region of abstract working memory. Journal of Cognitive Neuroscience, 23(10), 2852– 2863.

294

R. J. Boag et al.

Curtis, C. E., & D’Esposito, M. (2003). Persistent activity in the prefrontal cortex during working memory. Trends in Cognitive Sciences, 7(9), 415–423. D’Esposito, M. (2007). From cognitive to neural models of working memory. Philosophical Transactions of the Royal Society B: Biological Sciences, 362(1481), 761–772. D’Esposito, M., & Postle, B. R. (2015). The cognitive neuroscience of working memory. Annual Review of Psychology, 66, 115–142. D’Esposito, M., Postle, B. R., Ballard, D., & Lease, J. (1999). Maintenance versus manipulation of information held in working memory: An event-related fMRI study. Brain and Cognition, 41(1), 66–86. Dahlin, E., Neely, A. S., Larsson, A., Bäckman, L., & Nyberg, L. (2008). Transfer of learning after updating training mediated by the striatum. Science, 320(5882), 1510–1512. Daneman, M., & Carpenter, P. A. (1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior, 19(4), 450–466. https://doi.org/10.1016/ s0022-5371(80)90312-6 de Hollander, G., Keuken, M. C., van der Zwaag, W., Forstmann, B. U., & Trampel, R. (2017). Comparing functional MRI protocols for small, iron-rich basal ganglia nuclei such as the subthalamic nucleus at 7 T and 3 T. Human Brain Mapping, 38(6), 3226–3248. https://doi.org/ 10.1002/hbm.23586 Diamond, A. (2013). Executive functions. Annual Review of Psychology, 64, 135–168. Ditterich, J. (2010). A comparison between mechanisms of multi-alternative perceptual decision making: Ability to explain human behavior, predictions for neurophysiology, and relationship with decision theory. Frontiers in Neuroscience, 4, 184. Dolcos, F., Miller, B., Kragel, P., Jha, A., & McCarthy, G. (2007). Regional brain differences in the effect of distraction during the delay interval of a working memory task. Brain Research, 1152, 171–181. Doll, B. B., Duncan, K. D., Simon, D. A., Shohamy, D., & Daw, N. D. (2015). Model-based choices involve prospective neural activity. Nature Communications, 18(5), 767–772. https:// doi.org/10.1038/nn.3981 Donders, F. C. (1969). On the speed of mental processes. Acta Psychologica, 30(C), 412–431. https://doi.org/10.1016/0001-6918(69)90065-1 Donkin, C., Nosofsky, R. M., Gold, J. M., & Shiffrin, R. M. (2013). Discrete-slots models of visual working-memory response times. Psychological Review, 120(4), 873. Donkin, C., Kary, A., Tahir, F., & Taylor, R. (2016). Resources masquerading as slots: Flexible allocation of visual working memory. Cognitive Psychology, 85, 30–42. Dosenbach, N. U. F., Fair, D. A., Cohen, A. L., Schlaggar, B. L., & Petersen, S. E. (2008). A dual-networks architecture of top-down control. Trends in Cognitive Sciences, 12(3), 99–105. Dreisbach, G. (2012). Mechanisms of cognitive control. Current Directions in Psychological Science, 21(4), 227–231. https://doi.org/10.1177/0963721412449830 Dreisbach, G., & Fröber, K. (2019). On how to be flexible (or not): Modulation of the stabilityflexibility balance. Current Directions in Psychological Science, 28(1), 3–9. https://doi.org/ 10.1177/0963721418800030 Dreisbach, G., Müller, J., Goschke, T., Strobel, A., Schulze, K., Lesch, K.-P., & Brocke, B. (2005). Dopamine and cognitive control: The influence of spontaneous eyeblink rate and dopamine gene polymorphisms on perseveration and distractibility. Behavioral Neuroscience, 119(2), 483. Durstewitz, D., & Seamans, J. K. (2002). The computational role of dopamine D1 receptors in working memory. Neural Networks, 15(4–6), 561–572. https://doi.org/10.1016/S08936080(02)00049-7 Durstewitz, D., & Seamans, J. K. (2008). The dual-state theory of prefrontal cortex dopamine function with relevance to catechol-O-methyltransferase genotypes and schizophrenia. Biological Psychiatry, 64, 739–749. https://doi.org/10.1016/j.biopsych.2008.05.015 Durstewitz, D., Seamans, J. K., & Sejnowski, T. J. (2000). Dopamine-mediated stabilization of delay-period activity in a network model of prefrontal cortex. Journal of Neurophysiology, 83(3), 1733–1750.

Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses

295

Ecker, U. K. H., Lewandowsky, S., Oberauer, K., & Chee, A. E. H. (2010). The components of working memory updating: An experimental decomposition and individual differences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36(1), 170. Ecker, U. K. H., Lewandowsky, S., & Oberauer, K. (2014a). Removal of information from working memory: A specific updating process. Journal of Memory and Language, 74, 77–90. https:// doi.org/10.1016/j.jml.2013.09.003 Ecker, U. K. H., Oberauer, K., & Lewandowsky, S. (2014b). Working memory updating involves item-specific removal. Journal of Memory and Language, 74, 1–15. https://doi.org/10.1016/ j.jml.2014.03.006 Einstein, G. O., & McDaniel, M. A. (2005). Prospective memory: Multiple retrieval processes. Current Directions in Psychological Science, 14(6), 286–290. Engle, R. W., Tuholski, S. W., Laughlin, J. E., & Conway, A. R. A. (1999). Working memory, short-term memory, and general fluid intelligence: A latent-variable approach. Journal of Experimental Psychology: General, 128(3), 309. Evans, J. S. B. T. (2003). In two minds: Dual-process accounts of reasoning. Trends in Cognitive Sciences, 7(10), 454–459. Farrell, S., & Lewandowsky, S. (2002). An endogenous distributed model of ordering in serial recall. Psychonomic Bulletin & Review, 9(1), 59–79. Feredoes, E., Heinen, K., Weiskopf, N., Ruff, C., & Driver, J. (2011). Causal evidence for frontal involvement in memory target maintenance by posterior brain areas during distracter interference of visual working memory. Proceedings of the National Academy of Sciences, 108(42), 17510–17515. Festini, S. B., & Reuter-Lorenz, P. A. (2013). The short-and long-term consequences of directed forgetting in a working memory task. Memory, 21(7), 763–777. Festini, S. B., & Reuter-Lorenz, P. A. (2014). Cognitive control of familiarity: Directed forgetting reduces proactive interference in working memory. Cognitive, Affective, & Behavioral Neuroscience, 14(1), 78–89. Folstein, J. R., & Van Petten, C. (2008). Influence of cognitive control and mismatch on the N2 component of the ERP: A review. Psychophysiology, 45(1), 152–170. Forstmann, B. U., & Wagenmakers, E.-J. (2015). Model-based cognitive neuroscience: A conceptual introduction. In An introduction to model-based cognitive neuroscience (pp. 139–156). Springer. https://doi.org/10.1007/978-1-4939-2236-9_7 Forstmann, B. U., Dutilh, G., Brown, S., Neumann, J., Von Cramon, D. Y., Ridderinkhof, K. R., & Wagenmakers, E.-J. (2008). Striatum and pre-SMA facilitate decision-making under time pressure. Proceedings of the National Academy of Sciences of the United States of America, 105(45), 17538–17542. https://doi.org/10.1073/pnas.0805903105 Forstmann, B. U., Wagenmakers, E.-J., Eichele, T., Brown, S., & Serences, J. T. (2011). Reciprocal relations between cognitive neuroscience and formal cognitive models: Opposites attract? Trends in Cognitive Sciences, 15, 272–279. https://doi.org/10.1016/j.tics.2011.04.002 Forstmann, B. U., Ratcliff, R., & Wagenmakers, E.-J. (2016). Sequential sampling models in cognitive neuroscience: Advantages, applications, and extensions. Annual Review of Psychology, 67(1), 641–666. https://doi.org/10.1146/annurev-psych-122414-033645 Frank, M. J., Loughry, B., & O’Reilly, R. C. (2001). Interactions between frontal cortex and basal ganglia in working memory: A computational model. Cognitive, Affective, & Behavioral Neuroscience, 1, 137–160. https://doi.org/10.3758/CABN.1.2.137 Friston, K. J. (2009). Modalities, modes, and models in functional neuroimaging. Science, 326, 399–403. https://doi.org/10.1126/science.1174521 Funahashi, S., Bruce, C. J., & Goldman-Rakic, P. S. (1989). Mnemonic coding of visual space in the monkey’s dorsolateral prefrontal cortex. Journal of Neurophysiology, 61(2), 331–349. Funahashi, S., Bruce, C. J., & Goldman-Rakic, P. S. (1990). Visuospatial coding in primate prefrontal neurons revealed by oculomotor paradigms. Journal of Neurophysiology, 63(4), 814– 831.

296

R. J. Boag et al.

Funahashi, S., Bruce, C. J., & Goldman-Rakic, P. S. (1991). Neuronal activity related to saccadic eye movements in the monkey’s dorsolateral prefrontal cortex. Journal of Neurophysiology, 65(6), 1464–1483. Garavan, H. (1998). Serial attention within working memory. Memory & Cognition, 26(2), 263– 276. Gehring, W. J., Bryck, R. L., Jonides, J., Albin, R. L., & Badre, D. (2003). The mind’s eye, looking inward? In search of executive control in internal attention shifting. Psychophysiology, 40(4), 572–585. Goldman-Rakic, P. S. (1995). Cellular basis of working memory. Neuron, 14(3), 477–485. Grech, R., Cassar, T., Muscat, J., Camilleri, K. P., Fabri, S. G., Zervakis, M., et al. (2008). Review on solving the inverse problem in EEG source analysis. Journal of Neuroengineering and Rehabilitation, 5(1), 1–33. Gulbinaite, R., van Rijn, H., & Cohen, M. X. (2014). Fronto-parietal network oscillations reveal relationship between working memory capacity and cognitive control. Frontiers in Human Neuroscience, 8, 761. Hardman, K. O., & Cowan, N. (2015). Remembering complex objects in visual working memory: Do capacity limits restrict objects or features? Journal of Experimental Psychology: Learning, Memory, and Cognition, 41(2), 325. Harmony, T. (2013). The functional significance of delta oscillations in cognitive processing. Frontiers in Integrative Neuroscience, 7, 83. Hauswald, A., Schulz, H., Iordanov, T., & Kissler, J. (2011). ERP dynamics underlying successful directed forgetting of neutral but not negative pictures. Social Cognitive and Affective Neuroscience, 6(4), 450–459. Hawkins, G. E., Mittner, M., Forstmann, B. U., & Heathcote, A. (2017). On the efficiency of neurally-informed cognitive models to identify latent cognitive states. Journal of Mathematical Psychology, 76, 142–155. https://doi.org/10.1016/j.jmp.2016.06.007 Hazy, T. E., Frank, M. J., & O’Reilly, R. C. (2006). Banishing the homunculus: Making working memory work. Neuroscience, 139(1), 105–118. https://doi.org/10.1016/ j.neuroscience.2005.04.067 Hedge, C., & Leonards, U. (2013). Using eye movements to explore switch costs in working memory. Journal of Vision, 13(4), 18. Hommel, B. (2015). Between persistence and flexibility: The Yin and Yang of action control. Advances in Motivation Science, 2, 33–67. https://doi.org/10.1016/bs.adms.2015.04.003 Hommel, B. (2020). Dual-task performance: Theoretical analysis and an event-coding account. Journal of Cognition, 3(1), 29. Howard, M. W., Rizzuto, D. S., Caplan, J. B., Madsen, J. R., Lisman, J., Aschenbrenner-Scheibe, R., et al. (2003). Gamma oscillations correlate with working memory load in humans. Cerebral Cortex, 13(12), 1369–1374. Hyafil, A., & Koechlin, E. (2016). A neurocomputational model of human frontopolar cortex function. BioRxiv, 37150. https://doi.org/10.1101/037150 Jocham, G., Klein, T. A., & Ullsperger, M. (2011). Dopamine-mediated reinforcement learning signals in the striatum and ventromedial prefrontal cortex underlie value-based choices. Journal of Neuroscience, 31(5), 1606–1613. Jongkees, B. J. (2020). Baseline-dependent effect of dopamine’s precursor L-tyrosine on working memory gating but not updating. Cognitive, Affective, & Behavioral Neuroscience, 20(3), 521– 535. https://doi.org/10.3758/s13415-020-00783-8 Kahneman, D. (1973). Attention and effort (Vol. 1063). Citeseer. Kessler, Y., & Meiran, N. (2008). Two dissociable updating processes in working memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34(6), 1339. Kessler, Y., & Oberauer, K. (2014). Working memory updating latency reflects the cost of switching between maintenance and updating modes of operation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40(3), 738. Kessler, Y., & Oberauer, K. (2015). Forward scanning in verbal working memory updating. Psychonomic Bulletin & Review, 22(6), 1770–1776.

Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses

297

Kornblith, S., Buschman, T. J., & Miller, E. K. (2016). Stimulus load and oscillatory activity in higher cortex. Cerebral Cortex, 26(9), 3772–3784. Kübler, A., Murphy, K., Kaufman, J., Stein, E. A., & Garavan, H. (2003). Co-ordination within and between verbal and visuospatial working memory: Network modulation and anterior frontal recruitment. NeuroImage, 20(2), 1298–1308. Kyllonen, P. C., & Christal, R. E. (1990). Reasoning ability is (little more than) working-memory capacity?! Intelligence, 14(4), 389–433. https://doi.org/10.1016/S0160-2896(05)80012-1 Lepsien, J., Griffin, I. C., Devlin, J. T., & Nobre, A. C. (2005). Directing spatial attention in mental representations: Interactions between attentional orienting and working-memory load. NeuroImage, 26(3), 733–743. Leung, H.-C., Seelig, D., & Gore, J. C. (2004). The effect of memory load on cortical activity in the spatial working memory circuit. Cognitive, Affective, & Behavioral Neuroscience, 4(4), 553–563. Lewis, S. J. G., Dove, A., Robbins, T. W., Barker, R. A., & Owen, A. M. (2004). Striatal contributions to working memory: A functional magnetic resonance imaging study in humans. European Journal of Neuroscience, 19(3), 755–760. Lewis-Peacock, J. A., Drysdale, A. T., Oberauer, K., & Postle, B. R. (2012). Neural evidence for a distinction between short-term memory and the focus of attention. Journal of Cognitive Neuroscience, 24(1), 61–79. Lewis-Peacock, J. A., Kessler, Y., & Oberauer, K. (2018). The removal of information from working memory. Annals of the New York Academy of Sciences, 1424(1), 33–44. Logie, R. H. (1995). Visuo-spatial working memory. Lawrence Erlbaum Associates, Inc. Logie, R. H., Gilhooly, K. J., & Wynn, V. (1994). Counting on working memory in arithmetic problem solving. Memory & Cognition, 22(4), 395–410. Lorenc, E. S., Sreenivasan, K. K., Nee, D. E., Vandenbroucke, A. R. E., & D’Esposito, M. (2018). Flexible coding of visual working memory representations during distraction. Journal of Neuroscience, 38(23), 5267–5276. Love, B. C. (2016). Cognitive models as bridge between brain and behavior. Trends in Cognitive Sciences, 20, 247–248. https://doi.org/10.1016/j.tics.2016.02.006 Luck, S. J., & Vogel, E. K. (1997). The capacity of visual working memory for features and conjunctions. Nature, 390(6657), 279–281. Luck, S. J., & Vogel, E. K. (2013). Visual working memory capacity: From psychophysics and neurobiology to individual differences. Trends in Cognitive Sciences, 17(8), 391–400. Ma, W. J., Husain, M., & Bays, P. M. (2014). Changing concepts of working memory. Nature Neuroscience, 17(3), 347. Mack, M. L., Preston, A. R., & Love, B. C. (2013). Decoding the brain’s algorithm for categorization from its neural implementation. Current Biology, 23(20), 2023–2027. Manoach, D. S., Schlaug, G., Siewert, B., Darby, D. G., Bly, B. M., Benfield, A., et al. (1997). Prefrontal cortex fMRI signal changes are correlated with working memory load. Neuroreport, 8(2), 545–549. Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. W.H. Freeman. Marshuetz, C. (2005). Order information in working memory: An integrative review of evidence from brain and behavior. Psychological Bulletin, 131(3), 323. Mathy, F., & Feldman, J. (2012). What’s magic about magic numbers? Chunking and data compression in short-term memory. Cognition, 122(3), 346–362. McDougle, S. D., & Collins, A. G. E. (2021). Modeling the influence of working memory, reinforcement, and action uncertainty on reaction time and choice during instrumental learning. Psychonomic Bulletin & Review, 28, 20–39. McNab, F., & Klingberg, T. (2008). Prefrontal cortex and basal ganglia control access to working memory. Nature Neuroscience, 11(1), 103–107. Mileti´c, S., Bazin, P.-L., Weiskopf, N., van der Zwaag, W., Forstmann, B. U., & Trampel, R. (2020). fMRI protocol optimization for simultaneously studying small subcortical and cortical areas at 7 T. NeuroImage, 219, 116992.

298

R. J. Boag et al.

Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annual Review of Neuroscience, 24(1), 167–202. Miller, G. A., Galanter, E., & Pribram, K. H. (1960). Plans and the structure of behavior. Henry Holt and Co. https://doi.org/10.1037/10039-000 Miller, E. K., Lundqvist, M., & Bastos, A. M. (2018). Working Memory 2.0. Neuron, 100, 463– 475. https://doi.org/10.1016/j.neuron.2018.09.023 Möller, M., & Bogacz, R. (2019). Learning the payoffs and costs of actions. PLoS Computational Biology, 15(2), e1006285. https://doi.org/10.1371/journal.pcbi.1006285 Morey, C. C., & Cowan, N. (2018). Can we distinguish three maintenance processes in working memory? Annals of the New York Academy of Sciences, 1424(1), 45. Muhle-Karbe, P. S., Myers, N. E., & Stokes, M. G. (2021). A hierarchy of functional states in working memory. Journal of Neuroscience, 41(20), 4461–4475. Murty, V. P., Sambataro, F., Radulescu, E., Altamura, M., Iudicello, J., Zoltick, B., et al. (2011). Selective updating of working memory content modulates meso-cortico-striatal activity. NeuroImage, 57(3), 1264–1272. https://doi.org/10.1016/j.neuroimage.2011.05.006 Musslick, S., Jang, S. J., Shvartsman, M., Shenhav, A., & Cohen, J. D. (2018). Constraints associated with cognitive control and the stability-flexibility dilemma. In Proceedings of the 40th annual meeting of the Cognitive Science Society, CogSci 2018 (pp. 804–809). The Cognitive Science Society. Nassar, M. R., Helmers, J. C., & Frank, M. J. (2018). Chunking as a rational strategy for lossy data compression in visual working memory. Psychological Review, 125(4), 486. Nir-Cohen, G., Kessler, Y., & Egner, T. (2020). Neural substrates of working memory updating. Journal of Cognitive Neuroscience, 32(12), 2285–2302. https://doi.org/10.1162/jocn_a_01625 Nowicka, A., Marchewka, A., Jednorog, K., Tacikowski, P., & Brechmann, A. (2011). Forgetting of emotional information is hard: An fMRI study of directed forgetting. Cerebral Cortex, 21(3), 539–549. O’Reilly, R. C. (2006). Biologically based computational models of high-level cognition. Science, 314, 91–94. https://doi.org/10.1126/science.1127242 O’Reilly, R. C., & Frank, M. J. (2006). Making working memory work: A computational model of learning in the prefrontal cortex and basal ganglia. Neural Computation, 18(2), 283–328. https://doi.org/10.1162/089976606775093909 O’Reilly, R. C., & Munakata, Y. (2000). Computational explorations in cognitive neuroscience: Understanding the mind by simulating the brain. MIT Press. Oberauer, K. (2002). Access to information in working memory: Exploring the focus of attention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28(3), 411. Oberauer, K. (2003). Selective attention to elements in working memory. Experimental Psychology, 50(4), 257. Oberauer, K. (2006). Is the focus of attention in working memory expanded through practice? Journal of Experimental Psychology: Learning, Memory, and Cognition, 32(2), 197. Oberauer, K. (2009). Design for a working memory. Psychology of Learning and Motivation – Advances in Research and Theory, 51, 45–100. https://doi.org/10.1016/S0079-7421(09)51002X Oberauer, K. (2018). Removal of irrelevant information from working memory: Sometimes fast, sometimes slow, and sometimes not at all. Annals of the New York Academy of Sciences, 1424(1), 239–255. Oberauer, K. (2019). Working memory and attention–A conceptual analysis and review. Journal of Cognition, 2(1), 36. Oberauer, K., & Bialkova, S. (2011). Serial and parallel processes in working memory after practice. Journal of Experimental Psychology: Human Perception and Performance, 37(2), 606. Oberauer, K., & Lewandowsky, S. (2011). Modeling working memory: A computational implementation of the Time-Based Resource-Sharing theory. Psychonomic Bulletin & Review, 18(1), 10–45.

Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses

299

Oberauer, K., & Lewandowsky, S. (2016). Control of information in working memory: Encoding and removal of distractors in the complex-span paradigm. Cognition, 156, 106–128. Oberauer, K., Lewandowsky, S., Awh, E., Brown, G. D. A., Conway, A., Cowan, N., et al. (2018). Benchmarks for models of short-term and working memory. Psychological Bulletin, 144(9), 885. Ortega, R., López, V., Carrasco, X., Escobar, M. J., García, A. M., Parra, M. A., & Aboitiz, F. (2020). Neurocognitive mechanisms underlying working memory encoding and retrieval in Attention-Deficit/Hyperactivity Disorder. Scientific Reports, 10(1), 1–13. Otto, A. R., Gershman, S. J., Markman, A. B., & Daw, N. D. (2013). The curse of planning: Dissecting multiple reinforcement-learning systems by taxing the central executive. Psychological Science, 24(5), 751–761. Owen, A. M., McMillan, K. M., Laird, A. R., & Bullmore, E. (2005). N-back working memory paradigm: A meta-analysis of normative functional neuroimaging studies. Human Brain Mapping, 25(1), 46–59. Pang, J., Tang, X., Nie, Q.-Y., Conci, M., Sun, P., Wang, H., et al. (2019). Resolving the electroencephalographic correlates of rapid goal-directed chunking in the frontal-parietal network. Frontiers in Neuroscience, 13, 744. Parent, A., & Hazrati, L.-N. (1995). Functional anatomy of the basal ganglia. I. The cortico-basal ganglia-thalamo-cortical loop. Brain Research Reviews, 20(1), 91–127. Paz-Caballero, M. D., Menor, J., & Jiménez, J. M. (2004). Predictive validity of event-related potentials (ERPs) in relation to the directed forgetting effects. Clinical Neurophysiology, 115(2), 369–377. Pearson, B., Raškeviˇcius, J., Bays, P. M., Pertzov, Y., & Husain, M. (2014). Working memory retrieval as a decision process. Journal of Vision, 14(2), 1–15. https://doi.org/10.1167/14.2.2 Purcell, B. A., & Palmeri, T. J. (2017). Relating accumulator model parameters and neural dynamics. Journal of Mathematical Psychology, 76, 156–171. Purcell, B. A., Heitz, R. P., Cohen, J. Y., Schall, J. D., Logan, G. D., & Palmeri, T. J. (2010). Neurally constrained modeling of perceptual decision making. Psychological Review, 117(4), 1113–1143. https://doi.org/10.1037/a0020311 Rac-Lubashevsky, R., & Kessler, Y. (2016a). Decomposing the n-back task: An individual differences study using the reference-back paradigm. Neuropsychologia, 90, 190–199. https:// doi.org/10.1016/j.neuropsychologia.2016.07.013 Rac-Lubashevsky, R., & Kessler, Y. (2016b). Dissociating working memory updating and automatic updating: The reference-back paradigm. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 951–969. https://doi.org/10.1037/xlm0000219 Rac-Lubashevsky, R., & Kessler, Y. (2018). Oscillatory correlates of control over working memory gating and updating: An EEG study using the reference-back paradigm. Journal of Cognitive Neuroscience, 30(12), 1870–1882. https://doi.org/10.1162/jocn_a_01326 Rac-Lubashevsky, R., & Kessler, Y. (2019). Revisiting the relationship between the P3b and working memory updating. Biological Psychology, 148, 107769. https://doi.org/10.1016/ j.biopsycho.2019.107769 Rac-Lubashevsky, R., Slagter, H. A., & Kessler, Y. (2017). Tracking real-time changes in working memory updating and gating with the event-based eye-blink rate. Scientific Reports, 7(1), 1–9. https://doi.org/10.1038/s41598-017-02942-3 Rademaker, R. L., Tredway, C. H., & Tong, F. (2012). Introspective judgments predict the precision and likelihood of successful maintenance of visual working memory. Journal of Vision, 12(13), 21. Ranganath, C., Cohen, M. X., & Brozinsky, C. J. (2005). Working memory maintenance contributes to long-term memory formation: Neural and behavioral evidence. Journal of Cognitive Neuroscience, 17(7), 994–1010. Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 8, 59–108. Retrieved from https://psycnet.apa.org/record/1978-30970-001

300

R. J. Boag et al.

Reber, P. J., Siwiec, R. M., Gitleman, D. R., Parrish, T. B., Mesulam, M.-M., & Paller, K. A. (2002). Neural correlates of successful encoding identified using functional magnetic resonance imaging. Journal of Neuroscience, 22(21), 9541–9548. Ricker, T. J., Vergauwe, E., & Cowan, N. (2016). Decay theory of immediate memory: From Brown (1958) to today (2014). Quarterly Journal of Experimental Psychology, 69(10), 1969–1995. Riggall, A. C., & Postle, B. R. (2012). The relationship between working memory storage and elevated activity as measured with functional magnetic resonance imaging. Journal of Neuroscience, 32(38), 12990–12998. Rizio, A. A., & Dennis, N. A. (2013). The neural correlates of cognitive control: Successful remembering and intentional forgetting. Journal of Cognitive Neuroscience, 25(2), 297–312. Roth, J. K., & Courtney, S. M. (2007). Neural system for updating object working memory from different sources: Sensory stimuli or long-term memory. NeuroImage, 38(3), 617–630. Roth, J. K., Serences, J. T., & Courtney, S. M. (2006). Neural system for controlling the contents of object working memory in humans. Cerebral Cortex, 16, 1595–1603. Retrieved from http:// citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.524.778 Roux, F., & Uhlhaas, P. J. (2014). Working memory and neural oscillations: Alpha–gamma versus theta–gamma codes for distinct WM information? Trends in Cognitive Sciences, 18(1), 16–25. Roux, F., Wibral, M., Mohr, H. M., Singer, W., & Uhlhaas, P. J. (2012). Gamma-band activity in human prefrontal cortex codes for the number of relevant items maintained in working memory. Journal of Neuroscience, 32(36), 12411–12420. Sakai, K. (2008). Task set and prefrontal cortex. Annual Review of Neuroscience, 31, 219–245. Sakai, K., Rowe, J. B., & Passingham, R. E. (2002). Active maintenance in prefrontal area 46 creates distractor-resistant memory. Nature Neuroscience, 5(5), 479–484. Santangelo, V., Di Francesco, S. A., Mastroberardino, S., & Macaluso, E. (2015). Parietal cortex integrates contextual and saliency signals during the encoding of natural scenes in working memory. Human Brain Mapping, 36(12), 5003–5017. Schall, J. D. (2019). Accumulators, neurons, and response time. Trends in Neurosciences, 42, 848– 860. https://doi.org/10.1016/j.tins.2019.10.001 Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80, 1–27. https://doi.org/10.1152/jn.1998.80.1.1 Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593–1599. https://doi.org/10.1126/science.275.5306.1593 Sewell, D. K., Lilburn, S. D., & Smith, P. L. (2014). An information capacity limitation of visual short-term memory. Journal of Experimental Psychology: Human Perception and Performance, 40(6), 2214. Sewell, D. K., Lilburn, S. D., & Smith, P. L. (2016). Object selection costs in visual working memory: A diffusion model analysis of the focus of attention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(11), 1673. Shadlen, M. N., & Shohamy, D. (2016). Decision making and sequential sampling from memory. Neuron, 90(5), 927–939. Shiffrin, R. M., & Atkinson, R. C. (1969). Storage and retrieval processes in long-term memory. Psychological Review, 76(2), 179. Shohamy, D., & Daw, N. D. (2015). Integrating memories to guide decisions. Current Opinion in Behavioral Sciences, 5, 85–90. Silver, H., Feldman, P., Bilker, W., & Gur, R. C. (2003). Working memory deficit as a core neuropsychological dysfunction in schizophrenia. American Journal of Psychiatry, 160(10), 1809–1816. Strickland, L., Loft, S., Remington, R. W., & Heathcote, A. (2018). Racing to remember: A theory of decision control in event-based prospective memory. Psychological Review, 125(6), 851– 887. https://doi.org/10.1037/rev0000113 Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT Press.

Toward a Model-Based Cognitive Neuroscience of Working Memory Subprocesses

301

Tan, H.-Y., Chen, Q., Goldberg, T. E., Mattay, V. S., Meyer-Lindenberg, A., Weinberger, D. R., & Callicott, J. H. (2007). Catechol-O-methyltransferase Val158Met modulation of prefrontal– parietal–striatal brain systems during arithmetic and temporal transformations in working memory. Journal of Neuroscience, 27(49), 13393–13401. Thalmann, M., Souza, A. S., & Oberauer, K. (2019). How does chunking help working memory? Journal of Experimental Psychology: Learning, Memory, and Cognition, 45(1), 37. Timm, J. D., & Papenmeier, F. (2019). Reorganization of spatial configurations in visual working memory. Memory & Cognition, 47(8), 1469–1480. Todd, J. J., & Marois, R. (2004). Capacity limit of visual short-term memory in human posterior parietal cortex. Nature, 428(6984), 751–754. Toepper, M., Gebhardt, H., Beblo, T., Thomas, C., Driessen, M., Bischoff, M., et al. (2010). Functional correlates of distractor suppression during spatial working memory encoding. Neuroscience, 165(4), 1244–1253. Trutti, A. C., Mulder, M. J., Hommel, B., & Forstmann, B. U. (2019). Functional neuroanatomical review of the ventral tegmental area. NeuroImage, 191, 258–268. Trutti, A. C., Verschooren, S., Forstmann, B. U., & Boag, R. J. (2021). Understanding subprocesses of working memory through the lens of model-based cognitive neuroscience. Current Opinion in Behavioral Sciences, 38, 57–65. https://doi.org/10.1016/j.cobeha.2020.10.002 Turner, B. M., Forstmann, B. U., Wagenmakers, E. J., Brown, S. D., Sederberg, P. B., & Steyvers, M. (2013). A Bayesian framework for simultaneously modeling neural and behavioral data. NeuroImage, 72, 193–206. https://doi.org/10.1016/j.neuroimage.2013.01.048 Turner, B. M., Rodriguez, C. A., Norcia, T. M., McClure, S. M., & Steyvers, M. (2016). Why more is better: Simultaneous modeling of EEG, fMRI, and behavioral data. NeuroImage, 128, 96–115. Turner, B. M., Forstmann, B. U., Love, B. C., Palmeri, T. J., & Van Maanen, L. (2017a). Approaches to analysis in model-based cognitive neuroscience. Journal of Mathematical Psychology, 76, 65–79. https://doi.org/10.1016/j.jmp.2016.01.001 Turner, B. M., Wang, T., & Merkle, E. C. (2017b). Factor analysis linking functions for simultaneously modeling neural and behavioral data. NeuroImage, 153, 28–48. https://doi.org/ 10.1016/j.neuroimage.2017.03.044 Turner, B. M., Forstmann, B. U., & Steyvers, M. (2019a). A tutorial on joint modeling. In Joint models of neural and behavioral data. Computational approaches to cognition and perception. Springer. https://doi.org/10.1007/978-3-030-03688-1_2 Turner, B. M., Palestro, J. J., Mileti´c, S., & Forstmann, B. U. (2019b). Advances in techniques for imposing reciprocity in brain-behavior relations. Neuroscience and Biobehavioral Reviews, 102, 327–336. https://doi.org/10.1016/j.neubiorev.2019.04.018 van Dijck, J.-P., Abrahamse, E. L., Majerus, S., & Fias, W. (2013). Spatial attention interacts with serial-order retrieval from verbal working memory. Psychological Science, 24(9), 1854–1859. van Schouwenburg, M. R., den Ouden, H. E. M., & Cools, R. (2010). The human basal ganglia modulate frontal-posterior connectivity during attention shifting. Journal of Neuroscience, 30(29), 9910–9918. Veltman, D. J., Rombouts, S. A. R. B., & Dolan, R. J. (2003). Maintenance versus manipulation in verbal working memory revisited: An fMRI study. NeuroImage, 18(2), 247–256. Verschooren, S., Schindler, S., De Raedt, R., & Pourtois, G. (2019). Switching attention from internal to external information processing: A review of the literature and empirical support of the resource sharing account. Psychonomic Bulletin & Review, 26(2), 468–490. Verschooren, S., Pourtois, G., & Egner, T. (2020). More efficient shielding for internal than external attention? Evidence from asymmetrical switch costs. Journal of Experimental Psychology: Human Perception and Performance, 46(9), 912–925. Verschooren, S., Kessler, Y., & Egner, T. (2021). Evidence for a single mechanism gating perceptual and long-term memory information into working memory. Cognition, 212, 104668. https://doi.org/10.1016/j.cognition.2021.104668 Vogel, E. K., & Machizawa, M. G. (2004). Neural activity predicts individual differences in visual working memory capacity. Nature, 428(6984), 748–751.

302

R. J. Boag et al.

Wagenmakers, E.-J., Van Der Maas, H. L. J., & Grasman, R. P. P. P. (2007). An EZ-diffusion model for response time and accuracy. Psychonomic Bulletin & Review, 14(1), 3–22. Wall, L., Gunawan, D., Brown, S. D., Tran, M.-N., Kohn, R., & Hawkins, G. E. (2021). Identifying relationships between cognitive processes across tasks, contexts, and time. Behavior Research Methods, 53(1), 78–95. Wilson, M. (2002). Six views of embodied cognition. Psychonomic Bulletin & Review, 9(4), 625– 636. Wylie, G. R., Foxe, J. J., & Taylor, T. L. (2008). Forgetting as an active process: An fMRI investigation of item-method–directed forgetting. Cerebral Cortex, 18(3), 670–682. Xu, Y., & Chun, M. M. (2006). Dissociable neural mechanisms supporting visual short-term memory for objects. Nature, 440(7080), 91–95. Zhang, W., & Luck, S. J. (2008). Discrete fixed-resolution representations in visual working memory. Nature, 453(7192), 233–235. Zhao, Y., Kuai, S., Zanto, T. P., & Ku, Y. (2020). Neural correlates underlying the precision of visual working memory. Neuroscience, 425, 301–311.

Assessing Neurocognitive Hypotheses in a Likelihood-Based Model of the Free-Recall Task Sean M. Polyn

Abstract In the free-recall task, a participant studies a list of words, and then reports those words in whatever order they come to mind. As such, the behavioral dynamics of free recall are revealed by the recall sequences produced by participants. A variety of cognitive models have been designed to account for these behavioral dynamics. These models describe the cognitive operations that give rise to the observed recall sequences. This chapter provides a tutorial overview of a likelihood-based free-recall model designed to connect these cognitive operations with neural signals recorded as participants search their memories during a freerecall task. Keywords Free recall · Retrieved-context modeling · Context maintenance · Retrieval model

1 Introduction In the free-recall task, a participant studies a list of words, usually presented one at a time. This is followed by a recall period in which the participant reports as many items as they can from the study list, in whatever order they come to mind. As such, behavioral responses in this task are characterized as recall sequences: The series of responses made by the participant during the recall period. Theorists have developed a variety of cognitive models designed to account for the behavioral dynamics observed in different versions of the free-recall task. These models often describe a hypothetical set of cognitive operations that gives rise to the observed behavioral data.

This work was supported by a grant from the National Science Foundation to SMP (#1756417). S. M. Polyn () Psychology Department, Vanderbilt University, Nashville, TN, USA e-mail: [email protected] © Springer Nature Switzerland AG 2024 B. U. Forstmann, B. M. Turner (eds.), An Introduction to Model-Based Cognitive Neuroscience, https://doi.org/10.1007/978-3-031-45271-0_12

303

304

S. M. Polyn

This chapter provides a tutorial overview of a particular application of freerecall modeling, in which we assess neural signals in terms of their potential correspondence with the cognitive processes embodied in a model of free recall. Specifically, we examine a set of analyses described by Kragel et al. (2015), in which blood oxygen level dependent (BOLD) signals recorded with fMRI were related to mechanisms from a retrieved-context model, using a direct-input approach, in which neural signals are used to directly control aspects of a behavioral model (Turner et al., 2017; Purcell et al., 2010). The reader is invited to download companion Python code that implements many of the simulations described in this chapter, hosted at github.com/vucml/ncms_toolbox. The README file at that URL will direct the reader to the Python script containing the tutorial (KragEtal15_tutorial.py) as well as directions for installing the cymr toolbox the tutorial uses to run the simulations. Many models of free recall have been proposed and examined over the decades, such as the Search of Associative Memory model (SAM; Raaijmakers and Shiffrin, 1981), the Temporal Context Model (TCM; Howard and Kahana, 2002), and the Scale Invariant Memory, Perception, and Learning model (SIMPLE; Brown et al., 2007), among others (Davelaar et al., 2005; Lehman and Malmberg, 2013; Farrell, 2012). The examples developed in this chapter draw heavily from my own work with retrieved-context models of free recall (which include TCM). The family of retrieved-context models contains many variants with similar names, certain unique properties, and a strong family resemblance (Howard et al., 2005; Sederberg et al., 2008; Lohnas et al., 2015; Healey and Kahana, 2014). Here, we will examine the Context Maintenance and Retrieval (CMR) model (CMR; Polyn et al., 2009), a retrieved-context model examined by Kragel et al. (2015), but some other variants will be discussed as well. Before getting to the tutorial, I briefly review the structure and operation of CMR, highlighting mechanisms and processes that figure prominently ahead.

2 Overview of the Context Maintenance and Retrieval (CMR) Model A named model (e.g., CMR) is often a moving target. The particulars of its implementation can change from paper to paper, and often different versions of a model are examined in a single paper. As such, it can be useful to think of CMR (or TCM) as a modeling framework rather than as a singular model. This phrase emphasizes that in any given paper, different versions of the model will be constructed, assessed, and compared to one another. For example, in our paper introducing CMR (Polyn et al., 2009), we were interested in characterizing how shifting from one encoding task to another during study affected recall performance. We constructed three model variants in which task shifts had different effects on the operation of the model. The neural application

Assessing neurocognitive hypotheses in a likelihood-based model of the free-recall task

305

described below (from Kragel et al., 2015) used a version of CMR with the same basic structure, but task information was not explicitly simulated. Here, we review the Kragel et al. version of CMR, which we used to characterize the relationship of neural signals from the medial temporal lobe to the cognitive processes defined by the model. CMR is a cognitive process model. It is an assembly of mechanisms and processes, and a set of rules determining the sequence in which they are engaged. Following the notation of Turner et al. (2017), the parameters .θ control these model mechanisms, allowing it to make predictions about the observed behavioral data B. A vector of parameter values .θ , each with an allowable range, defines a parameter space, where a given point in the space corresponds to a particular configuration of the parameters. Optimization of the model for a particular set of behavioral data (B) involves searching through this parameter space to find the specific parameter set that allows the model to best predict, or fit, the data. This can be done by using the model to generate a large number of synthetic recall sequences, calculating a set of summary statistics (e.g., serial position curve, and lag-based conditional response probability curves), and evaluating how closely those summary statistics match the results of the same analyses on the observed data (B). Examples of this approach can be found in Sederberg et al. (2008) and Polyn et al. (2009). Here, we used an alternative approach to optimization, in which the model’s goodness of fit is quantified in terms of the likelihood of the specific sequences of recall events that comprise B. This allowed us to incorporate trial-specific neural signals into the model, as will be described below. Examples using this type of likelihood-based optimization approach with a free-recall task can be found in Kragel et al. (2015) and Morton and Polyn (2016).

2.1 Basic Operation of the Model The CMR model is a simplified neural network with two representational layers, each implemented as a vector space. The model is depicted schematically in Fig. 1b. The F layer represents the features of particular items as they are studied or remembered. Activation of the ith study item representation is indicated by the vector .fi . Each of these item representations is a unit vector, with one element of the vector set to 1 and the other elements set to 0. The C layer contains a gradually changing representation of temporal context, with specific states of the context layer indicated by the vector .ci . CMR is a linear associative network (Anderson et al., 1977) in which the two layers influence one another via two associative weight matrices. The .MFC weight matrix projects from the feature layer to the context layer, and .MCF contains the recurrent connections from the context layer back to the feature layer. Each matrix can be thought of as having two sets of associations: a set of pre-experimental

306

S. M. Polyn

CF associations built into the network when it is initialized (.MFC pre and .Mpre ) and a CF set of experimental associations learned over the course of a trial (.MFC exp and .Mexp ). FC FC FC These two components combine additively, that is, .M = Mpre + Mexp . When the model is initialized, the pre-experimental associations allow a study item to retrieve information about the past contexts it has been experienced in (also known as the item’s pre-experimental context). So, when an item representation .fi is activated, a matrix multiplication operation projects the item representation through these associative connections to determine the net input to the context layer:

cI N = MFC fi

.

(1)

The units in the context layer have a special integrative property. Incoming activation (.cI N ) only partially displaces the previous activation state (.ci−1 ): ci = ρci−1 + βcI N

.

(2)

Here, .β is a parameter that controls the rate of contextual integration by scaling the incoming activation .cI N . The value of the .ρ parameter is then calculated to ensure the magnitude of .ci is constant (at unit length). As new items are presented, new context units are activated, and activity in the other units fades away exponentially. As such, the state of context at a given point in time is a recencyweighted average of the model’s past experience. The second associative matrix, .MCF , connects the context layer back to the item layer, making the model a kind of simple recurrent network (SRN). The relationship of the temporal context model to other SRNs is discussed by Howard and Kahana (2002). We will return to the role of this second matrix momentarily, as it is more influential during the recall process. When an item is studied, both associative matrices are updated using a Hebbian learning rule that links the item representation to the current contextual representation (the item’s experimental context). These experimental associations (.MFC exp and .MCF exp ) embody the set of episodic memories formed during the study period. Neighboring items on the study list are associated with similar states of context, and for any two study items, the similarity of their corresponding context states decreases as the spacing between the items increases. The experimental associations imbue the network with two special powers: They allow the context representation to act as a cue to prompt the retrieval of study items, and they allow an item representation to prompt the retrieval of contextual states from the study period. Starting with the contextual cueing process: fI N = MCF ci

.

(3)

In words, the context representation projects through the associative matrix to activate a blend of item features, .fI N . This sets in motion a retrieval competition, where the likelihood of a given item being recalled is proportional to how strongly

Assessing neurocognitive hypotheses in a likelihood-based model of the free-recall task

307

it is activated in the .fI N blend. When an item wins this retrieval competition, its features are reactivated on the item layer. Now the item representation can be used to reactivate the past state of context linked to that item during the study period, a process called temporal reinstatement. The dynamics of temporal reinstatement are given in Eqs. 1 and 2. The same process that updates context during the study period also updates context during memory search. The .β parameter governing the degree of contextual updating is allowed to take different values during the study/encoding period (.βenc ) and during the search/recall period (.βrec ). If .βrec is zero, there is no temporal reinstatement during recall; the temporal context representation is unaffected by the contextual information associated with the remembered item. If .βrec is one, there is complete integration. The temporal context representation is completely overwritten by the contextual information associated with the studied item, that is, the temporal context of the study event is perfectly reinstated. Between these two extremes, temporal reinstatement partially overwrites the current state of context. After the recall event, context is a blend of the prior contextual state and the retrieved context, with larger values of .βrec indicating more successful temporal reinstatement. This process of temporal reinstatement gives rise to the behavioral phenomenon of temporal organization, whereby items that were studied in neighboring list positions tend to be recalled successively. When a past state of context is reinstated, it becomes part of the retrieval cue guiding the next retrieval competition. Because the context representation changes gradually during the study period, this reactivated context is a good cue for items from neighboring list positions, giving them a preferential boost in the retrieval competition. In the neural simulations described below, we allow neural signal recorded from the medial temporal lobe to control the temporal reinstatement process (Fig. 1). Turner et al. (2017) refer to this as a directinput approach. We estimated neural parameters .δ which were used to control the behavioral parameter .βrec , allowing us to evaluate whether this neural influence improved the ability of the model to predict the behavioral data B. The final relevant component of CMR dynamics is a process that determines when recall terminates. Each recall competition is governed by a probabilistic decision rule (Luce, 1986; Howard and Kahana, 2002) where the activation of each item representation in .fI N determines its likelihood of winning the recall competition. However, there is also the possibility that none of the items will be retrieved, meaning the recall process has terminated. We use an equation that captures the steady growth in the likelihood of recall termination as a function of output position (Dougherty and Harbison, 2007; Miller et al., 2012). The growth rate of the recall termination likelihood is an adjustable parameter of the model. A lower growth rate means the model will enjoy more recall successes (on average) before the search process eventually terminates. In the neural application developed by Kragel et al. (2015), we allowed neural signals in medial temporal lobe to influence either the temporal reinstatement process (as mentioned above) or this recall success process. This allowed us to examine whether a given candidate signal indicated a generic boost in the likelihood of recall (recall

308

S. M. Polyn

success), or a specific increase in the likelihood of a neighboring item (temporal reinstatement). In the tutorial simulations described below, we focus on the temporal reinstatement process.

2.2 Evaluating the Model There are two useful ways to evaluate the performance of the CMR model. For a given parameter set .θ , one can generate synthetic recall sequences, or one can take a set of observed recall sequences and determine the likelihood they were produced by the model. Polyn et al. (2009), for example, examined different variants of CMR. Each variant was used to generate a large number of synthetic recall sequences. Summary statistics were calculated for both the observed data and these synthetic recall sequences, and the goodness of fit of a given parameter set was a function of how well the observed and synthetic statistics matched one another (using, e.g., a chi-squared statistic or root mean squared deviation). The summary statistics used by Polyn et al. (2009) included serial position curves, probability of first recall curves, and lag-CRP curves (which characterize temporal organization). In all, 93 different behavioral data points were used, across these and other measures. A chi-squared statistic was used to determine goodness of fit. This statistic has the nice property of normalizing each data point in terms of the standard error of the underlying behavioral measure. This approach allows the modeler to specify which aspects of the data will be most influential in determining the best-fitting parameters. As such, it requires a number of decisions to be made regarding the relative importance of different behavioral measures. With this approach, the identity of the optimal set of parameters can change depending on which summary statistics are used to calculate goodness of fit. Kragel et al. (2015) used the likelihood-based approach, in which a parameter set is evaluated in terms of the model’s ability to predict the specific recall sequences observed in an experiment. The model assigns probabilities to events, and the fitness of the model is dependent on how well those assigned probabilities match the actual probabilities of those events in the observed data. To use this approach in free recall, we treat the recall sequence on a given trial as a series of discrete responses: recall events. For each recall event, we calculate the probability of each possible response. This allows us to assign a probability to each recall event: The likelihood that the model would have produced the observed response made by the participant. We can then determine the likelihood of the model producing the full sequence of responses on a given trial, and by extension the likelihood of the model producing the dataset as a whole (for a given parameter set). Let us look closer at the behavioral data collected in a standard free recall task. If the task uses spoken recall, each trial has an associated audio recording of the participant’s verbal responses. An annotator (human and/or machine) marks the identity and onset time of each word that is reported by the participant. For now, we can imagine that we have done some simplifying steps, such as excluding intrusions

Assessing neurocognitive hypotheses in a likelihood-based model of the free-recall task

309

(any reported words that were not actually on the target study list) and repetitions. In certain applications we attempt to simulate the timing of the individual responses, but here we simply focus on the sequence without regard to the timing. Each valid response is labeled with an integer corresponding to the remembered item’s position on the study list. These are the building blocks of the recall sequences. For a single trial, the length of the sequence can be anywhere from 0 (if no valid responses are made) to the length of the study list (if every studied item is successfully recalled). If we include repeated responses and intrusions, the length of the sequence and the set of possible responses are not as well constrained. If intrusions are allowed, all words in an individual’s lexicon are technically possible responses. Even with repeats and intrusions, a predictive approach is certainly possible but requires extra work to allow the model to specify the likelihood of these other responses. Note that this model is not a stationary process over the course of the recall period. According to the basic theory described above, each recalled item alters the composition of the contextual retrieval cue. The model’s predictions regarding the second recall response will be very different if the first recall response comes from an early list position versus a later list position. Each recall event alters the set of retrieval probabilities assigned to the not-yet-recalled items. The optimization process, instead of optimizing the match between observed and synthetic summary statistics, attempts to maximize the likelihood of observing a given dataset for a given model. What can we say about this probability space? We start by considering an individual trial in more detail. Even with the restrictions outlined above (exclusion of repeats and intrusions), the set of possible outcomes on a given trial can be absurdly large, but it is not infinite. Given that we have excluded repetitions, a sampling without replacement process can describe the set of possible recall sequences. How many possible recall sequences are there for a given list length? Equation 4 describes the necessary calculation to determine this. The set of possible recall sequences is equal to the set of distinct ordered subsequences (i.e., permutations) of all possible sequence lengths from 0 up to the list length. We count length zero as a possible outcome, as it represents the scenario where a person fails to recall any valid items on a given trial. This is a rare event in practice, but it does happen. Nseq =

K 

.

|P (K, r)|

(4)

r=0

In Eq. 4, K indicates list length, r is the length of a particular recall sequence, P (K, r) specifies the set of possible permutations when r items are chosen from a set of K items, and the vertical bars surrounding this indicate the cardinality of this set (i.e., the number of elements in the set). With .K = 1, Nseq = 2 (nothing is recalled, or the one studied item is recalled). With .K = 3, Nseq = 16. With .K = 8, Nseq = 109, 601, and a list length of 24 (a reasonable length for a free-recall task) yields something in the neighborhood of .1.7 × 1024 possible recall sequences. .

310

S. M. Polyn

We engage in this exercise not to despair at the possibility of considering each of these possible outcomes, but rather to set expectations regarding the scale of the likelihood values that will be produced by a computational model. Given these myriad possibilities, the likelihood of observing a specific recall sequence can be minuscule. In practice, we have no need to enumerate all possible outcomes of an experiment; we just need to calculate the likelihood of observing a particular outcome, given a particular model. Then we can compare the likelihood of that outcome under different (possibly nested) variants of the model, to evaluate which model would be most likely to produce the observed recall sequences. It is ok that particular recall sequences are assigned very low probabilities; the important thing is the relative likelihood of the recall sequences under different models. In other words, recall sequence X could have a one-in-ten-thousand likelihood under model A but a one-in-a-million likelihood under model B. If similar trends hold up across many recall sequences, model A will be preferred. The above exercise demonstrates that likelihood values at the trial level (the probability of observing a particular recall sequence) will be very small. Following standard convention, we log-transform these probability values to log-probabilities. This allows us to avoid potential problems while running our simulations on digital computers, for example, where an exceedingly small floating-point number can be rounded off to zero. The difference between an exceedingly small number and zero is of great practical importance when dealing with probabilities, because zero indicates an event is impossible. If a model assigns any observed event a zero probability, then from the point of view of model comparison, it is impossible that the model gave rise to the observed data. Calculating the likelihood of a given recall sequence under a given model is fairly straightforward. As described above, each trial has an associated recall sequence, which is a series of recall events. Each recall event can be represented by a categorical distribution (sometimes called a generalized Bernoulli distribution) where each potential outcome has an associated probability. For a list of length L, it is useful to define .L + 1 possible outcomes for a given event: One for the potential recall of each study item, and one to represent termination of the recall sequence. The probability associated with each of these outcomes is determined by the model, given a particular set of parameters. Equation 5 calculates the probability of a given recall sequence (.pseq ), where .pi is the probability assigned to the ith recall event. The likelihood of the entire sequence is simply the product of the probabilities of the individual recall events. pseq =

r+1 

.

pi

(5)

Li

(6)

i=1

Lseq =

r+1 

.

i=1

Assessing neurocognitive hypotheses in a likelihood-based model of the free-recall task

311

Probability models such as the binomial distribution and multinomial distribution take a somewhat different form than Eq. 5. These other distributions model the probability of counts of a particular outcome across a certain number of trials. In contrast, free recall is better described using a sequential sampling process, specifically, sequential sampling of a finite population without replacement (Mallows, 1973). Equation 6 shows the same calculation as Eq. 5, but on the log-transformed probabilities of the individual recall events (.Li ). In this case, the log-probability of the recall sequence is simply the sum of the .Li values assigned to each event in the sequence. To be clear, the .pseq and the .Lseq values refer to the same thing: L .log(pseq ) = Lseq , and .e seq = pseq . The probability of an entire experiment is calculated as either the product of all the trial-level probabilities or the sum of all the trial-level log-probabilities (usually referred to as log-likelihood). The aggregate log-likelihood number is not meaningful in and of itself; it is a sum over recall events, so simply adding more trials to a dataset will cause the log-likelihood to become more negative. However, if two models are applied to the same dataset, their corresponding log-likelihood scores can be meaningfully compared. Regardless of whether one evaluates a model using summary statistics or event likelihoods, similar parameter optimization techniques can be used. Using the first approach described above, an optimization algorithm (e.g., a particle swarm) is used to find the set of parameters that allow the model to produce synthetic data whose summary statistics match the observed summary statistics in a given experiment. Using the event likelihood approach, the same optimization algorithm can be used, but now the goal is to find the set of parameters that maximizes .Lseq , the loglikelihood of the data given the model. Generally speaking, log-likelihood scores will be negative values, with values closer to 0 indicating that the model is making more accurate predictions. If the model makes perfect predictions (it will not), always assigning a probability of 1 to each observed response, the log-likelihood will be 0. However, many common optimization routines are designed to minimize rather than maximize whatever function is fed into it. In these cases we simply multiply the log-likelihoods by .−1 and carry on with optimization. One can then perform model comparison on a set of candidate model variants being compared with one another. Usually, parameter optimization is carried out for each model variant, yielding a best-fit log-likelihood for each model. The one with the maximal log-likelihood (i.e., closest to 0) is the one that is most consistent with the observed data. There are a variety of model comparison statistics that help us characterize whether differences in log-likelihood between models are reliable and worth further consideration. We will return to model comparison techniques below.

3 Assessing a Neurocognitive Linking Hypothesis Here, I provide a tutorial overview of an application of event likelihood modeling of free recall, in which we incorporated neural signals into CMR to assess a set of neural linking hypotheses (Kragel et al., 2015). Code is provided for you to

312

S. M. Polyn

explore a simplified version of the analyses presented in that paper. First, I will review the questions of interest and the general technique. The neural linking hypotheses examined in this study link a particular neural signal with a cognitive process defined by the CMR model. The neural signals of interest are blood oxygen level dependent (BOLD) signals in medial temporal lobe (MTL) structures recorded with functional MRI. These signals can be thought of as neural parameters .δ estimated from the fMRI data using a general linear modeling approach. These neural parameters .δ are linked to the temporal reinstatement mechanism in the model described above. Neural circuitry in the MTL is thought to be critically involved in the formation and retrieval of episodic memories. This idea is supported by neuropsychological studies demonstrating that damage to MTL structures devastates a person’s ability to form new episodic memories (Milner et al., 1998). The Complementary Learning Systems model of McClelland et al. (1995) suggests that the hippocampus (a brain structure within the MTL) plays a key role in the formation of new episodic memories by allowing the rapid formation of associations linking the myriad details of a particular experienced event to one another, and to the broader spatiotemporal context in which the event occurs (Norman and O’Reilly, 2003; Schapiro et al., 2017). Endel Tulving, in his writings on the episodic memory system, described the phenomenon of mental time travel, whereby an individual can reactivate the contextual details of a past experience with enough vividity that it is like they are revisiting the past experience (Tulving, 1993). To the extent that this kind of reminiscence relies on the recovery and reactivation of the spatiotemporal context of an event, we expect that successful mental time travel requires an intact hippocampal system. Our goal in the Kragel et al. (2015) study was to determine whether we could relate moment-to-moment changes in blood flow in MTL regions to the behavioral performance of participants, thus allowing us to refine our understanding of how different subregions of MTL support memory-guided task performance. We used the CMR modeling framework to develop several model variants instantiating different neural linking hypotheses. In this context, a neural linking hypothesis is a proposal that links a particular neural signal to a particular cognitive operation in the model. Models with an embedded neural linking hypotheses are referred to as neurally informed, in that the strength of the neural signal influences the degree of engagement of the linked cognitive operation. Thus, fluctuations in the neural signal change the dynamics and therefore the predictions of the model, relative to a baseline, or neurally naive version of the model that is not influenced by neural signal. To the extent that a neurally informed model does a better job predicting the participant’s responses compared to a corresponding neurally naive model, the embedded neural linking hypothesis is supported, and deemed worthy of further examination. As described above, retrieved-context models describe a set of candidate cognitive mechanisms that support mental time travel. When the model retrieves the details of a specific past event, this prompts the system to retrieve associated contextual information, which then supports the retrieval of other events that occurred nearby in time. Thus, if we have a neural signal whose engagement

Assessing neurocognitive hypotheses in a likelihood-based model of the free-recall task

313

Fig. 1 (a) A participant performs a free recall task while lying in an fMRI scanner. A microphone records their vocal recall responses. Brain signal is sampled at the time of each recall event. The signal strength is used to control the temporal reinstatement mechanism in the model. (b) When an item (e.g., colander) is recalled, the semantic representation of the item is activated in the F layer. This representation is projected along the .MFC associative connections to retrieve contextual information associated with the item. Here, the strong neural signal causes context information to be retrieved with high fidelity. Specifically, the strong neural signal increases the value of the .βrec parameter (see Eqs. 2 and 7). This will increase the likelihood that the next recalled item will come from an adjacent serial position (as depicted in Fig. 2).

indicates the successful reinstatement of temporal context, we can use the model to make detailed predictions about the behavioral consequences of that reinstatement. Kragel et al. (2015) constructed a set of neurally informed temporal reinstatement models, to examine whether signal in medial temporal lobe was plausibly related to the temporal reinstatement process engaged during memory search. This is depicted schematically in Fig. 1. One set of analyses examined each gray-matter voxel within MTL in turn. First, signal within that voxel was estimated at the time of each recall event. These neural response values (.δevent ) were z-score normalized by trial. The following equation describes how these neural response values were used to update a linked model parameter: θevent = θ + νδevent

.

(7)

For the temporal reinstatement models, .θ corresponds to the base value of the βrec parameter. For each recall event, the neural scaling parameter .ν is multiplied by the voxel’s neural response and is then added to the base value of .βrec , yielding the event-specific parameter value, .θevent . As such, fluctuations in the neural signal cause the .βrec parameter appearing in Eq. 2 to fluctuate. The neurally naive version of the model is realized by setting the .ν parameter to zero. In this nested model, fluctuations in neural signal no longer influence the target parameter.

.

314

S. M. Polyn

Fig. 2 A lag-based conditional response probability analysis of synthetic recall sequences. This analysis calculates the probability of making recall transitions of particular lag distances, measured in terms of the items’ serial positions on the study list. These two lag-CRP curves indicate how the temporal reinstatement parameter .βrec affects temporal organization. When .βrec is high (greater than 0.5, red triangles), there is an increased likelihood of nearby transitions relative to when .βrec is low (less than 0.5, black circles). For this parameter set, this exhibits as an increased likelihood of +1 recall transitions (recalling the next item from the study list)

The model predicts that successful temporal reinstatement leads to temporal organization in the observed recall sequences. Figure 2 demonstrates how a behavioral measure of temporal organization, a lag-based conditional response probability analysis, or lag-CRP is affected by the degree of success of the temporal reinstatement process. If temporal reinstatement is strong when a particular item is retrieved, the contextual state associated with that item’s study event is strongly reactivated. This contextual state is a good retrieval cue for items from neighboring positions on the study list, so the next recalled item is likely to come from a nearby list position. Conversely, if temporal reinstatement is weak, there will be less of an advantage for the just-recalled item’s neighbors. Kragel et al. (2015) used the CMR model to evaluate whether fluctuations in a given neural signal during the recall period correspond to fluctuations in the degree of success of this temporal reinstatement operation. Equation 7 provides the interface between the neural signal and the computational model. When estimates of a given voxel’s activity are used to populate the .δevent matrix, fluctuations in voxel activity from recall to recall will influence the degree of temporal reinstatement, and through that, influence the model’s predictions regarding the most likely behavioral response (i.e., the likelihood of recalling a particular item next).

Assessing neurocognitive hypotheses in a likelihood-based model of the free-recall task

315

The likelihood-based optimization procedure was used to find the parameter set that maximized the model’s ability to predict each participant’s recall sequences. The neural scaling parameter .ν was one of the free parameters being estimated. If the best-fitting value of .ν was positive, this meant the sampled neural signal was useful and informative for the temporal reinstatement mechanism and improved the model’s ability to predict the specific sequence of items that would be recalled. Specifically, with a positive value of .ν, an increase in blood flow at that anatomical location increases the value of .βrec associated with that recall event. To the extent that these fluctuations in blood flow actually correspond to fluctuations in the likelihood of contiguous recalls, the neurally informed model will make more accurate predictions and end up with a better likelihood score. The likelihood score associated with the best-fitting neurally informed model was compared to the likelihood score of the best-fitting neurally naive model using a likelihood ratio test (Wilks, 1938), to determine whether any improvement in predictive power of the neurally informed model was statistically significant. 1 A second set of neurally informed models were created in which a given voxel’s activity was associated with the recall success mechanism described earlier. This allowed us to test whether signal in a given voxel indicated whether the recall process would continue, without affecting the temporal organization of the responses. We will not go into detail regarding the recall success mechanism here. The interested reader is referred to the Kragel et al. (2015) report, where this model variant is described in more detail. Using this approach, the cognitive model becomes part of a neuro-behavioral statistical framework for interpreting the functional properties of neural activity. We constructed and tested neurally informed models (the temporal reinstatement and recall success variants) for each voxel in the MTL. This allowed us to make a map indicating which voxels contained signal that was informative for each model process. In other words, we were able to visualize the anatomical distribution of voxels whose activity showed a functional correspondence to these model-defined cognitive processes. One of the central results of the paper was the identification of a functional gradient across the anterior-posterior axis of the MTL. More anterior voxels in medial temporal lobe cortex (MTLC) were more informative to the retrieval success model, indicating an involvement in memory retrieval, but not necessarily in temporal reinstatement. More posterior voxels in MTLC, and posterior voxels along the hippocampal axis, were more informative to the temporal reinstatement model, potentially indicating their involvement in context-guided memory search. The theoretical implications of these results are discussed further by Kragel et al. (2015).

1 This model comparison may not have been strictly necessary: The neurally naive model is nested within the neurally informed model (in that the neurally informed model becomes identical to the naive model when .ν = 0). As such, a statistical technique demonstrating that the best-fit value of .ν is reliably above zero would allow us to draw similar conclusions.

316

S. M. Polyn

4 Simulation Exercises In this section of the chapter we take a closer look at the CMR model and present simulations designed to familiarize you with the prediction of recall sequences and the generation of synthetic recall sequences. The URL for the companion code can be found in the Introduction section above. The code is divided into numbered sections that we will refer to in the text.

4.1 Exercise 1: Basic Parameter Recovery In section one of the tutorial code, we create a data structure that specifies a number of model parameters. These parameters are set to reasonable values that allow the model to produce recall sequences generally consistent with the results of a standard immediate free-recall experiment (e.g., Kahana, 2012). After you have worked your way through the tutorial, you may wish to try changing these parameters to alter the dynamics of the model. It is certainly possible to pick parameter values that cause the model to perform poorly (e.g., never making a successful recall) or that will cause the code to execute improperly (e.g., if the .β parameters are set outside of the range of 0–1). While you are getting your bearings, try changing parameters one at a time. In this section we also create variables that specify certain task parameters such as list length (set to 24) and the total number of trials, which is set to 120. These values were chosen to match the methodological details of the Kragel et al. (2015) experiment and simulations. As with the model parameters, these task parameters can also be altered to explore how changes affect the model’s performance. The code in section 1 uses the task parameters to create a pandas data structure containing synthetic study events. This data structure contains a row for each study event, specifying (among other things) the participant’s identity and the serial position of the presented item. In section two of the tutorial code, we generate synthetic behavioral data using these model parameters. The generative function takes the parameter structure and the study event data structure as input arguments. It returns a recalls matrix containing 120 trials worth of model-generated synthetic recall sequences. Two example summary statistics are calculated for these recall sequences: The first is a serial position analysis which calculates the probability of recalling particular items based on their serial position in the study list. The second is a lag-based conditional response probability analysis, which calculates the probability of making recall transitions of particular lag distances. For example, if a participant recalls the item from the fifth serial position followed by the item from the sixth serial position, this is a transition of lag .+1. If item 6 was followed by item 4, this is a transition of lag .−2. Detailed explorations of these and other common free recall analyses can be found in Kahana (2012). Figure 3 presents these two analyses for a sample run of the generative model.

Assessing neurocognitive hypotheses in a likelihood-based model of the free-recall task

317

Fig. 3 Examining recall performance from Exercise 1 of the tutorial code. (a) A serial position curve analysis of the synthetic recall sequences produced by the model. This calculates the probability of recalling each study item given its list position. (b) A lag-based conditional response probability (CRP) analysis on the synthetic recall sequences. This calculates the probability of successively recalling two items with a particular lag distance separating them. For example, recalling item 10 followed by item 11 is a recall transition of lag +1

Section 3 of the tutorial code runs a series of predictive simulations using the synthetic recall data from section 2. This part of the code provides a simple demonstration of parameter recovery. Parameter recovery generally refers to an attempt to determine the best-fitting parameters for synthetic data (i.e., a situation where the generating parameters are actually known). By evaluating the likelihood of a variety of parameter sets, one can determine whether the “recovered” bestfitting parameters match the parameters used to actually generate the synthetic data. This process helps to determine whether the parameters of a model are identifiable in a given simulation scenario. It is certainly possible for certain parameter settings to be recoverable, but for others to be ambiguous. For example, any parameter set that causes the model to produce no successful recalls will not be recoverable, as there are many parameter settings that can produce this particular failure state. The code creates an array of .βrec values (called B_rec_vals in the code). As the code iterates through this array, it alters the B_rec parameter, runs a predictive simulation (with the model.likelihood function), and places the returned loglikelihood score into a results array (logl). Instead of performing a full search across all parameters, we only adjust the .βrec parameter to determine whether the predictive model that matches the true generating value of .βrec (in this case 0.5) produces the best likelihood. The tutorial code will create a figure plotting the loglikelihood for each model variant. Figure 4a shows that indeed the data are best predicted by a model with .βrec set to 0.5.

318

S. M. Polyn

Fig. 4 (a) Exercise 1. A visualization of model fitness for 11 model variants with different amounts of temporal reinstatement (.βrec ). The x-axis shows .βrec for a given model, and the y-axis shows the log-likelihood for that model, after running a predictive simulation using the synthetic data described in the text. Larger values (closer to zero) indicate better model fitness. The red dashed line indicates the .βrec value (0.5) used to create the synthetic data. This coincides with the best-fitting model variant, indicating an ability to recover the generating parameter. (b) Exercise 2. Model fitness for model variants in which both temporal reinstatement (.βrec ) and a neural scaling parameter (.ν) are both manipulated. The synthetic recall data were generated by a model with .βrec = 0.5 and .ν = 0.15. The best-fitting model variant has these parameters, again indicating an ability to recover the generating parameters. See text for details

4.2 Exercise 2: Fluctuating Temporal Reinstatement and Synthetic Neural Data In most published work using retrieved-context models, the temporal reinstatement parameter (.βrec ) is set to a fixed value. It does not change from recall event to recall event. I do not think this is a strong claim of the model, in that I think it is reasonable to hypothesize that when a memory is retrieved, sometimes temporal reinstatement is more successful, and sometimes it is less successful. Usually, we do not have a principled way of knowing what those fluctuations are, so it makes sense to try to find a fixed value of .βrec that captures, in a sense, the expected amount of temporal reinstatement for a given recall event, as perhaps representative of the average recall event. However, in the Kragel et al. (2015) study, we had some leverage to explore the possibility that temporal reinstatement indeed fluctuates from event to event. As described in the previous section, we examined the possibility that blood flow to certain medial temporal lobe (MTL) brain structures reflects the degree of engagement of the temporal reinstatement mechanism. We tested this hypothesis by allowing the strength of a neural signal to control the recall-to-recall fluctuations in the .βrec parameter and performing model comparison tests to see whether this improved the predictions made by the model.

Assessing neurocognitive hypotheses in a likelihood-based model of the free-recall task

319

Section 4 of the tutorial code allows you to examine a model in which the degree of temporal reinstatement fluctuates from recall event to recall event. Whereas the true underlying fluctuations in temporal reinstatement were unknowable in the Kragel et al. (2015) study (as they occurred in the participants’ minds/cognitive systems), here we have created a simulated world in which we can perfectly know these fluctuations, because we construct them ourselves. This allows us to explore the conditions under which we can detect a correspondence between a noisy neural signal and a cognitive process. This approach certainly sidesteps many important complexities of the real world. For example, our predictive computational model is a rough and incomplete approximation of the true system generating the participant’s behavior. But in this exercise the predictive computational model is a perfect match for the system generating the simulated participant’s data. In any event, this approach allows us to demonstrate the logic and structure of this kind of analysis. We first create synthetic neural signal values to represent signal recorded from the hippocampus of a participant. We create a matrix called signal and fill it with fluctuations by drawing random numbers from a Normal distribution with mean = 0 and standard deviation = 1. We then create a field in the recall data structure called hcmp (for hippocampus) and fill it with these signal values. We create a neural_scaling parameter to be .ν from Eq. 7 and set it to 0.15. In the next section we try to recover this true neural scaling value during a parameter optimization process. We tell the code that the .βrec parameter is a dynamic parameter that is controlled by these neural signal values. Thus, for every recall event, the code will use the hcmp signal values to determine the value of B_rec (following Eq. 7). We now have a temporal reinstatement process that will vary randomly from recall event to recall event as we generate synthetic recall sequences. If you adjust the neural_scaling parameter, you can make these shifts in .βrec more subtle, or more dramatic, and see how this affects the results in the next section. In Section 5, we use likelihood-based optimization to fit the model to the synthetic recall data generated in Section 4. In order to make the simulation more interesting, we embed the true signal used to control .βrec in random noise designed to obscure the signal. We introduce a noise_weight parameter controlling the relative contributions of the true signal and the random noise. This can be adjusted from to 0.0 for pure signal, to 1.0 for pure noise. The kind of noise one observes in functional MRI data does not likely follow a Normal distribution (Bullmore et al., 2001), but for the sake of the simplicity of the exercise we will stick with this. Following the procedure used by Kragel et al. (2015), we normalize the mixture of signal and noise at the trial level. For each trial, we apply a z-score transformation (subtracting off the mean of the observations and dividing by the standard deviation). We then run likelihood-based optimization to attempt to recover the .βrec and neural scaling parameters used to generate our synthetic neural-behavioral data. The code performs a grid search, sweeping across 5 levels of .βrec (stepping from 0.3 to 0.7 in increments of 0.1; the generating value was 0.5), and 7 levels of the neural scaling parameter .ν (stepping from 0.0 to 0.3 in steps of 0.05; the generating

320

S. M. Polyn

value was 0.15). Figure 4b shows the log-likelihood scores for several of these model variants, demonstrating that the model that is best able to predict the recall sequences is the one where both .βrec and .ν match the original parameter values used to generate the synthetic recall sequences. Finally, in Section 6 we run model comparison statistics to compare the predictive power of the different model variants with one another. Generally speaking, if you have a set of models, the one with the largest log-likelihood (i.e., closest to zero) makes the most accurate predictions. However, as theorists we also prefer models with fewer free parameters, for the sake of parsimony. Consider two of the models used in the simulation exercises above, the neurally informed version of CMR where the .ν parameter and the .βrec parameters are free to vary, and the neurally naive version where .βrec is free but .ν is fixed at 0. The neurally naive model variant is nested within the neurally informed model, as the neurally informed model contains the neurally naive model within its parameter space (i.e., when .ν is set to 0). As such, it is not possible for the best-fitting neurally naive model to provide a larger log-likelihood (i.e., a better fit) than the best-fitting neurally informed model. Now, imagine that our observed neural signal was pure noise, that is, its fluctuations do not correspond to the engagement of the temporal reinstatement mechanism. If there happen to be some spurious correspondences between the pure noise neural signal and the participant’s behavior, this will lead to the neurally informed model yielding a better log-likelihood score than the neurally naive model. This potential advantage arises from the increased complexity of the neurally informed model relative to the naive model. Many model comparison methods provide a way to account for this complexity and apply a penalty to model fitness that scales with the number of free parameters. One commonly used model comparison technique is the Akaike information criterion (AI C), which takes the number of free parameters and the number of data points into account (Wagenmakers and Farrell, 2004). This technique produces a score for each candidate model, which attempts to quantify the information loss when the probability distribution of the true generating model is approximated by the probability distribution associated with the candidate model (Burnham and Anderson, 2004). In this tutorial, we generated the data ourselves, so the probability distribution of the true model is actually knowable. However, in most applications, the data will be generated by actual participants with unknown true probability distributions. Equation 8 shows the equation for a corrected form of AI C, called .AI Cc . .AI Cc includes an additive term generally penalizing more complex models (2V , where V indicates number of free parameters). A second additive term n indicates the number of data points. AI Cc = −2logL + 2V +

.

2V (V + 1) (n − V − 1)

(8)

Here, we treat each recall event as a data point. As such, in our simulations above, the exact number of data points will depend on the stochastic recall processes

Assessing neurocognitive hypotheses in a likelihood-based model of the free-recall task

321

Table 1 Exercise 2. Log-likelihood scores and model comparison scores for representative runs of the neurally naive and neurally informed models. See text for details No. param. 1 2

Neurally naive model Neurally informed model

log(L) .−2730 .−2707

AIC 5462 5418

wAIC .0.99999

implemented by the generative model. For a representative run of the generative model, the synthetic data contained 1246 events/data points. Once an AIC score is calculated for each candidate model, the raw AIC scores can be transformed into Akaike weights. These weights can be interpreted as conditional probabilities representing the probability that each candidate model is the best model of the set. In this context, best is in terms of the information theoretic definition of AIC mentioned above. Equations 9 and 10 show how these weights are calculated. First, a difference score is calculated for each model, where the best model’s AIC score is subtracted from the candidate model’s AIC score. Then these difference scores are used to calculate the relative support for each model. Δi (AI C) = AI Ci − minAI C

(9)

exp(− 12 Δi AI C) wi AI C = K 1 k=1 exp(− 2 Δk AI C)

(10)

.

.

Given that the tutorial is built up around a stochastic generative model, if you run the tutorial code multiple times, each time you will get slightly different results. Table 1 provides results for a representative run of the tutorial simulations. The neurally informed model is usually preferred, which is to be expected, as we constructed the neural signal to have fluctuations corresponding to the fluctuations in temporal reinstatement. We finish the tutorial with a demonstration of a potentially useful statistical tool, a permutation test (Hastie et al., 2001). The preceding demonstration used a neurally naive model as a baseline against which to compare the neurally informed model. One key difference between the neurally informed and neurally naive models is that in the neurally informed model, the .βrec parameter fluctuates from recall event to recall event, whereas in the neurally naive model, the parameter is stationary. One might hypothesize that the predictive advantage of the neurally informed model arises not because the variability in .βrec is specifically tracked by the fluctuations in the (synthetic) neural signal, but rather because it is generally advantageous to have this parameter vary as opposed to being stationary. We can call this the generic variability hypothesis. Our original neurally informed model represents a specific variability hypothesis in which the specific fluctuations observed on a particular trial are important. If the generic variability hypothesis is correct, we should be able to

322

S. M. Polyn

scramble the neural signal while preserving its general statistical characteristics, and preserve the predictive power of the model (i.e., get similar log-likelihood scores). If the specific variability hypothesis is correct, scrambling the neural signal will harm the model’s ability to predict behavioral performance. Let us say we were also concerned that there could be temporal structure to our neural signal such that neighboring recall events tend to have similar neural signals associated with them. We know this is not true in our synthetic data but it is a reasonable concern for neural recordings. In this case it would not be fair to our assessment of the generic variability hypothesis to fully scramble the neural signals from recall event to recall event, as it would break this temporal structure. To address this concern, we partially scramble the synthetic neural signal at the level of trials. With 120 trials, we generate a permuted list of the integers from 1 to 120 and use these to rearrange the rows of the matrix carrying the synthetic neural signal. This preserves the structure of event-to-event fluctuations, while breaking the correspondence of these fluctuations to the events of a particular trial. A similar analysis was carried out by Kragel et al. (2015), where the goal was to determine whether a generic trend in the neural signal (e.g., on every trial the neural signal is gradually decreasing) could account for the neural-behavioral correspondence observed (it could not). For this permutation test, we scramble the neural signal from trial to trial and then re-calculate the goodness of fit (log-likelihood) of the model. We perform a number of iterations of this. For each iteration, we scramble trials and recalculate fit. Each iteration gives us a log-likelihood value, and together these form a distribution. We can then compare the original log-likelihood score (from Table 1: .−2707) to this distribution. The proportion of scrambled scores that exceed the original score can be interpreted as a p-value. If the scrambling does not make a difference (as predicted by the generic variability model) then the original score should be somewhere in the middle of the permuted distribution. For the sake of efficiency, the permutation test in the tutorial code only runs 20 iterations. In our representative run, the original log-likelihood was greater than every value in the permutation distribution. This allows us to reject the generic variability hypothesis with .p < 0.05. If you increase the number of iterations, you can get a more precise p-value.

5 Conclusion In this chapter, we took a close look at the Context Maintenance and Retrieval (CMR) model of free recall. We examined different ways of using the model, including an approach in which the model (using a given parameter set .θ) is used to calculate the likelihood of observing a given behavioral dataset (.B, consisting of a set of recall sequences). This likelihood-based approach allows one to optimize the model to find a set of parameters .θ that maximize the model’s ability to predict the set of recall events in .B. A given parameter set .θ can also be used to generate synthetic recall sequences, allowing one to determine goodness of fit

Assessing neurocognitive hypotheses in a likelihood-based model of the free-recall task

323

between observed summary statistics (calculated on B) and the same summary statistics calculated on the model-generated data. One of the powers of an eventlevel likelihood-based technique is that it allows a model to be sensitive to features of the data that might not be captured by standard summary statistics. This could include the trial-specific semantic identity of studied items (Morton and Polyn, 2016), the latency of individual responses (Osth and Farrell, 2019), or event-specific fluctuations in a neural signal (Kragel et al., 2015). Kragel et al. (2015) used a likelihood-based modeling approach to examine the validity of different neural linking hypotheses and to create model-based maps of the functional properties of neural signals in the medial temporal lobe. Turner et al. (2017) refer to this as a direct-input approach, in which neural signal estimates are used to directly control the parameters of a cognitive model. This in turn affects the model’s behavioral predictions. The tutorial simulations in this chapter provide an introduction to the retrieved-context model of free recall used by Kragel et al. (2015), and some of the techniques used to evaluate the model. Specifically, the tutorial demonstrates how one might demonstrate a functional correspondence between fluctuations in fMRI signal and a computation carried out by the retrieved-context model. Of course, it may be that the true computation carried out by these brain regions is substantially different from the temporal reinstatement mechanism implemented by the model. But the demonstration of a reliable functional correspondence between brain signal and cognitive mechanism suggests that the temporal reinstatement mechanism is worthy of further study.

6 Further Exercises • Try adjusting different model parameters and observe the effect on the serial position curve (SPC) and lag-CRP curve. For example, increasing parameter P1 will increase the primacy effect of the SPC, decreasing X2 will increase the overall number of items recalled, and altering B_enc will alter both the sharpness of the lag-CRP and the recency effect. • The synthetic neural signal is embedded in noise, and the strength of the noise is controlled by the noise_weight parameter. Try increasing noise_weight to weaken the correspondence between the neural signal and model behavior. At some point, the permutation analysis will no longer identify a statistically significant correspondence. • The tutorial code contains variables controlling certain methodological characteristics of the simulated experiment: the number of participants, the number of trials per participant, and the number of items on a given study list. Try altering these variables to get a better sense of how they affect model performance. For example, you can increase study list length and see how this affects the primacy and recency effects seen in the SPC analysis figure.

324

S. M. Polyn

References Anderson, J. A., Silverstein, J. W., Ritz, S. A., & Jones, R. S. (1977). Distinctive features, categorical perception, and probability learning: Some applications of a neural model. Psychological Review, 84, 413–451. Brown, G. D. A., Neath, I., & Chater, N. (2007). A temporal ratio model of memory. Psychological Review, 114(3), 539–576. Bullmore, E., Long, C., Suckling, J., Fadili, J., Calvert, G., Zelaya, F., Carpenter, T.A., Brammer, M. (2001). Colored noise and computational inference in neurophysiological (fMRI) time series analysis: Resampling methods in time and wavelet domains. Human Brain Mapping, 12(2), 61– 78. Burnham, K. P., & Anderson, D. R. (2004). Multimodel inference understanding AIC and BIC in model selection. Sociological Methods & Research, 33(2), 261–304. Retrieved 2013-07-25, from http://smr.sagepub.com/content/33/2/261. https://doi.org/10.1177/0049124104268644 Davelaar, E. J., Goshen-Gottstein, Y., Ashkenazi, A., Haarmann, H. J., & Usher, M. (2005). The demise of short-term memory revisited: Empirical and computational investigations of recency effects. Psychological Review, 112, 3–42. Dougherty, M. & Harbison, J. (2007). Motivated to retrieve: How often are you willing to go back to the well when the well is dry? Journal of Experimental Psychology: Learning, Memory, and Cognition, 33(6), 1108. Farrell, S. (2012). Temporal clustering and sequencing in working memory and episodic memory. Psychological Review, 119(2), 223–271. Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. Springer. Healey, M. K. & Kahana, M. J. (2014). Is memory search governed by universal principles or idiosyncratic strategies? Journal of Experimental Psychology—General, 143(2), 575–596. Howard, M. W., Fotedar, M. S., Datey, A. V., & Hasselmo, M. E. (2005). The temporal context model in spatial navigation and relational learning: Toward a common explanation of medial temporal lobe function across domains. Psychological Review, 112(1), 75–116. Howard, M. W., & Kahana, M. J. (2002). A distributed representation of temporal context. Journal of Mathematical Psychology, 46, 269–299. Kahana, M. J. (2012). Foundations of human memory (1st ed.). Oxford University Press. Kragel, J. E., Morton, N. W., & Polyn, S. M. (2015). Neural activity in the medial temporal lobe reveals the fidelity of mental time travel. The Journal of Neuroscience, 35(7), 2914–2926. Lehman, M., & Malmberg, K. J. (2013). A buffer model of memory encoding and temporal correlations in retrieval. Psychological Review, 120(1), 155–189. Lohnas, L. J., Polyn, S. M., & Kahana, M. J. (2015). Expanding the scope of memory search: Modeling intralist and interlist effects in free recall. Psychological Review, 122(2), 337–363. Luce, R. D. (1986). Response times. Oxford University Press. Mallows, C. L. (1973). Sequential sampling of finite populations with and without replacement. SIAM Journal on Applied Mathematics, 24(2), 164–168. McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102(3), 419–57. Miller, J. F., Weidemann, C. T., & Kahana, M. J. (2012). Recall termination in free recall. Memory & Cognition, 40(4), 540–550. Milner, B., Squire, L. R., & Kandel, E. R. (1998). Cognitive neuroscience and the study of memory. Neuron, 20(3), 445–468. Morton, N. W., & Polyn, S. M. (2016). A predictive framework for evaluating models of semantic organization in free recall. Journal of Memory and Language, 86, 119–140. Norman, K. A., & O’Reilly, R. C. (2003). Modeling hippocampal and neocortical contributions to recognition memory: A complementary learning systems approach. Psychological Review, 110, 611–646.

Assessing neurocognitive hypotheses in a likelihood-based model of the free-recall task

325

Osth, A. F., & Farrell, S. (2019). Using response time distributions and race models to characterize primacy and recency effects in free recall initiation. Psychological Review, 126(4), 578. Polyn, S. M., Norman, K. A., & Kahana, M. J. (2009). A context maintenance and retrieval model of organizational processes in free recall. Psychological Review, 116(1), 129–156. Purcell, B. A., Heitz, R. P., Cohen, J. Y., Schall, J. D., Logan, G. D., & Palmeri, T. J. (2010). Neurally constrained modeling of perceptual decision making. Psychological Review, 117(4), 1113–1143. Raaijmakers, J. G. W., & Shiffrin, R. M. (1981). Search of associative memory. Psychological Review, 88, 93–134. Schapiro, A. C., Turk-Browne, N. B., Botvinick, M. M., & Norman, K. A. (2017). Complementary learning systems within the hippocampus: a neural network modelling approach to reconciling episodic memory with statistical learning. Philosophical Transactions of the Royal Society B: Biological Sciences, 372, 20160049. Sederberg, P. B., Howard, M. W., & Kahana, M. J. (2008). A context-based theory of recency and contiguity in free recall. Psychological Review, 115(4), 893–912. Tulving, E. (1993). What is episodic memory? Current Directions in Psychological Science, 2(3), 67–70. Turner, B. M., Forstmann, B. U., Love, B. C., Palmeri, T. J., & Van Maanen, L. (2017). Approaches to analysis in model-based cognitive neuroscience. Journal of Mathematical Psychology, 76, 65–79. Wagenmakers, E.-J. & Farrell, S. (2004). AIC model selection using Akaike weights. Psychonomic Bulletin & Review, 11(1), 192–196. Retrieved 2014-02-05, from http://link.springer.com/ article/10.3758/BF03206482. https://doi.org/10.3758/BF03206482 Wilks, S. S. (1938). The large-sample distribution of the likelihood ratio for testing composite hypotheses. The Annals of Mathematical Statistics, 9(1), 60–62. Retrieved 2014-12-04, from http://www.jstor.org/stable/2957648

Cognitive Modeling in Neuroeconomics Sebastian Gluth and Laura Fontanesi

Abstract In this chapter, we discuss two lines of research in neuroeconomics. The first line of research focuses on the role of visual attention in value-based decision-making. In particular, it formalizes the cognitive processes behind eye movements and choices on the basis of the well-established computational modeling framework of sequential sampling models. The second line of research also reverts to this framework and combines it with that of reinforcement learning models to ask how feedback in the form of rewards and punishments changes the dynamics of decisions. Finally, we discuss existing attempts and future avenues to connect the work on attention and reinforcement learning in neuroeconomics to arrive at a comprehensive computational theory of economic decision-making. Keywords Value-based decision making · Sequential sampling models · Reinforcement learning

1 Introduction Neuroeconomics is the study of the neurobiological and cognitive bases of economic behavior. It is a rapidly developing research field that emerged at the outset of the twenty-first century (Glimcher & Rustichini, 2004). One of its central goals is to predict people’s actions in the economic world. Achieving this goal can help individuals stay tuned to their financial long-term goals and to avoid getting stuck in chasing immediate rewards. Moreover, neuroeconomic findings can help policymakers to prevent future economic breakdowns, whose emergence appears to elude standard economic theory. S. Gluth () Department of Psychology, University of Hamburg, Hamburg, Germany e-mail: [email protected] L. Fontanesi Department of Psychology, University of Basel, Basel, Switzerland e-mail: [email protected] © Springer Nature Switzerland AG 2024 B. U. Forstmann, B. M. Turner (eds.), An Introduction to Model-Based Cognitive Neuroscience, https://doi.org/10.1007/978-3-031-45271-0_13

327

328

S. Gluth and L. Fontanesi

Primarily, economic behavior pertains to the research field of value-based decision-making (aka preferential choice), but it is also related to the study of reinforcement learning. A central tenet of research on value-based decision-making is to understand how our choices depend on other cognitive functions, such as attention and memory. To address this question, neuroeconomists have adopted techniques that measure cognitive functions in order to study their influence on preferential choice (Gluth et al., 2015; Krajbich et al., 2010). Reinforcement learning research, on the other hand, has been important to understand how preferences emerge and change over time based on our experiences and interactions with the environment. In the first part of this chapter, we will focus on attention as one of the central cognitive processes that interact with value-based decision-making. Attention has not only been shown to exert a pervasive impact on our decisions, but there is emerging evidence that our preferences influence how we distribute our attention in the first place. Notably, an important and favorable aspect of this neuroeconomics branch is that new empirical findings have consistently been linked to a wellestablished mathematical framework of decision-making. This framework is known as sequential sampling modeling and allows to quantify predictions of how, when, and with how much confidence people choose between different options (Busemeyer et al., 2019; Hanks & Summerfield, 2017; Ratcliff et al., 2016). Given that it is already covered in another chapter of this book, we will introduce this framework only briefly to then focus on specific sequential sampling models that have incorporated attentional mechanisms. In addition, we will give a short explanation of eye tracking, which is the primary tool to measure attention in cognitive neuroscience. In the second part of the chapter, we will present a recent surge of studies that investigated how decisions dynamically change throughout experience (i.e., receiving monetary gains and losses associated with different options) and that described such changes by combining sequential sampling and reinforcement learning modeling (Fontanesi et al. 2019, 2019; Pedersen et al. 2017). Traditionally, reinforcement learning models focused on across-trial dynamics, that is, on how we learn to choose the most advantageous option as a function of the outcomes of previous decisions (Sutton & Barto, 1998). These models did not, however, make predictions on the timing and certainty of such decisions. This limited our understanding of the within-trial dynamics, that is, of the cognitive processes that lead to a single decision. Sequential sampling models, on the other hand, provide an explanation of the within-trial dynamics but do not feature a mechanism to explain the across-trial dynamics. By combining sequential sampling and reinforcement learning models, we can thus benefit from both approaches and achieve a better understanding of the dynamics of choice across different temporal scales (Mileti´c et al., 2019).

Cognitive Modeling in Neuroeconomics

329

2 Attention and Value-Based Decision-Making 2.1 Sequential Sampling Models Here are four statements about decision-making that most people would probably agree with: (i) (ii) (iii) (iv)

Decisions do not happen in an instant but take some time. Difficult decisions take more time than easy decisions. We spend more time on important decisions than on trivial decisions. If we make decisions in a hurry, they are more likely to be wrong or suboptimal.

Even though these facts appear to be self-evident and are supported by empirical research, the vast majority of decision-making models in economics, psychology, and neuroscience cannot explain a single one of them. This is because models like expected utility theory (Neumann & Morgenstern, 1944) or prospect theory (Kahneman & Tversky, 1979) are only concerned with predicting how we decide and not how fast we make these decisions. Yet, there is one framework of models that is able to account for all of the above-mentioned effects, and this is the sequential sampling modeling framework. This framework assumes decisions to emerge from sequentially sampling pieces of evidence in favor of the available choice options. This sampling process continues until a desired level of certainty about which option to choose has been achieved. Going back to the four statements, we can see that these two principles (i.e., sequential sampling of evidence and termination of sampling at a threshold of required evidence) are sufficient to cover all of them: (i) Decisions take time because sampling happens sequentially. (ii) Difficult decisions take more time because the evidence is accumulated more slowly. (iii) We spend more time on important decisions because the desired level of certainty is higher. (iv) Decisions made in a hurry are more error-prone because we reduce the level of certainty to make faster decisions. Sequential sampling models have been applied in many domains of cognitive science, but we will focus on their use in value-based decision-making. Here, the first proposed sequential sampling model was Decision Field Theory (DFT, Busemeyer and Townsend (1993)), which was designed to explain decisions under risk (i.e., decisions between lotteries that are characterized by reward amounts and probabilities). In brief, DFT assumes that attention switches between possible events depending on the events’ probabilities, and that evidence is sampled by evaluating the reward amounts of the currently attended event. For example, assume that you are offered a simple gamble, in which a die is rolled once and you win $50 if the die shows a 6 but you lose $10 otherwise. When choosing whether to accept or reject

330

S. Gluth and L. Fontanesi

this gamble, DFT predicts that you switch your attention between the two events “win” (6) and “lose” (1)–(5) and that you pay more attention to “lose” because it is the more probable event. On the other hand, every time you consider the “win” event, you accumulate a big piece of evidence in favor of accepting the gamble because of the large gain amount of $50 compared to the small loss amount of $10. Overall, DFT predicts that the decision will take quite some time because the apparent trade-off between probability and amount makes the choice difficult. If the gain amount were just $10, one would expect the decision to be made much faster, and this is exactly what DFT would also predict. DFT is a very elaborated theory of decisions under risk that contains many more features (such as forgetting and approach-avoidance tendencies), which we cannot cover here. It is important to note, however, that while DFT received considerable resonance in cognitive psychology and was further developed to predict multialternative, multi-attribute decisions (Roe et al., 2001), it was not picked up by neuroeconomists. Instead, the first attempts to capture value-based decisions and their underlying neural mechanisms with sequential sampling models employed the Diffusion Decision Model (DDM; Ratcliff (1978)), which is the dominant sequential sampling model in neuroeconomics to this date. The DDM is covered extensively in another chapter of this book, so we will not discuss it here in detail. It suffices to say that even though the DDM is best known for its broad application in perceptual decision-making, it can be adopted for value-based decisions straightforwardly. Specifically, the drift rate (i.e., the average rate of evidence accumulation) can be coupled with the subjective value of an option (e.g., in accept vs. reject decisions) or the value difference between two options (e.g., in two-alternative forced-choice decisions), which again leads to the prediction that decisions between options with larger value differences are more deterministic and faster. As perhaps the first study that applied the DDM to value-based decisions, Basten et al. (2010) studied the neural correlates of cost–benefit decision-making by combining a simplified DDM variant (Palmer et al., 2005) with functional Magnetic Resonance Imaging (fMRI). Participants made accept–reject decisions with stimuli whose shape indicated a certain negative monetary amount (cost) and whose color indicated a certain positive monetary amount (benefit). The sum of these costs and benefits was linked to the drift rate via a power function, which provided a very precise account of the empirical choice probabilities and response times (RT). On the neural level, the authors found that cost and benefit information were represented in the amygdala and ventral striatum, respectively, and integrated in the ventromedial prefrontal cortex (vmPFC). Furthermore, the vmPFC projected the integrated value information to the intraparietal sulcus, which implemented the evidence-accumulation process. Altogether, the study provided good evidence for a close match between the behavioral and neural patterns and the dynamics described by the DDM. Beyond this study, the DDM formed the basis of neuroeconomic research on the influence of attention on decision-making, which is the focus of the first part of this chapter. Before we address this research in detail, we will provide a brief introduction to eye tracking.

Cognitive Modeling in Neuroeconomics

331

2.2 Eye Tracking: A Window into Attention Eye tracking is a very affordable tool, compared to other techniques commonly used in cognitive neuroscience and neuroeconomics, such as fMRI, electroencephalography (EEG), or magnetoencephalography (MEG). It allows to acquire insights about cognitive processes beyond the information provided by behavioral measures. As the term indicates, the goal of this technique is to track movements (and other reactions) of the eyes, while participants engage in a certain task. Most modern eye trackers achieve this goal by illuminating the pupil with an infrared or nearinfrared light, which causes corneal reflections in the pupil. These reflections are recorded by a high-speed camera and further processed to determine where the gaze is directed at. The temporal resolution of state-of-the-art eye-tracking devices goes up to the kHz range, implying that eye tracking can offer insights into the dynamics of cognitive processes with a temporal precision comparable to EEG/MEG and much higher than fMRI. Moreover, modern eye-tracking systems are unobtrusive (e.g., the illuminator and camera are positioned beneath the screen the participant is looking at) and are relatively robust against artifacts caused by head movements. Some devices such as eye-tracking glasses or head-mounted systems even allow measuring the eye in naturalistic environments (though with some loss of temporal resolution and data quality). Obviously, the principle cognitive mechanism that can be measured with eye tracking is overt spatial attention. In his landmark study, Yarbus (1967) showed that the way people move their eyes when screening a natural scene strongly depends on the instructions given to them. Even though there are cases in which spatial attention and eye movements do not align, the two are tightly coupled in most situations (Fiebelkorn et al., 2019; Hunt et al., 2019), and fixating attended information is also more effective than paying attention to something without looking at it (Deubel & Schneider, 1996; Eckstein et al., 2017). To measure attentional processes, the raw eye-tracking data, which consist in two time series of x- and y-coordinates of the recorded eye positions, are usually recorded into distinct events: fixations, which refer to relatively long time periods during which only very little and very small movements of the eye occur, saccades, which refer to short time periods between two fixations during which large changes in the eye’s position take place, and eyeblinks, which refer to transient closing of the eyelids. The majority of eyetracking studies focus on fixations, including analyses of their frequency, (total and average) duration, and order of occurrence. The studies discussed further below in this chapter were also concerned with fixations only. However, it is important to note that additional features such as velocity and trajectory of saccades or the rate of spontaneous eyeblinks have also been linked to cognitive and neural mechanisms (Eckstein et al., 2017; Van der Stigchel et al., 2006). Furthermore, eye trackers are not only used to record eye movements but also to measure changes in pupil size, which have been linked to central neural and cognitive functions. In particular, pupil dilation are thought to reflect fluctuations in activity of the noradrenergic system, which are accompanied by changes in arousal, attention, and decision uncertainty (Nassar et al., 2012; Eldar et al., 2013; Urai et al., 2017).

332

S. Gluth and L. Fontanesi

2.3 The Attentional Drift Diffusion Model A highly influential study by Krajbich et al. (2010) investigated the role of attention in value-based decision-making. Participants were first asked to evaluate several food snacks on a rating scale from .−10 to 10. Subsequently, they made a series of binary decisions between the food snacks that they had rated positively. While making these decisions, participants’ gazes were recorded with an eye tracker (Fig. 1a). The behavioral data were consistent with basic predictions of sequential sampling models: More difficult decisions (i.e., decisions between options with more similar subjective values) were less deterministic and took longer. More interestingly, the eye-tracking results revealed a profound influence of gaze on choices: Options that were looked at longer were more likely to be chosen. Furthermore, participants exhibited a very strong tendency to choose the last-fixated option. Only when this last-fixated option was rated much lower than the other option, it was not chosen. Crucially, the authors ruled out an alternative explanation of these results, which is that higher-rated options are simply looked at longer. As a computational account of their behavioral and eye-tracking data, Krajbich et al. (2010) put forward the attention Drift Diffusion Model (aDDM) (Fig. 1b). The aDDM is an extension of the DDM that takes the influence of gaze on preferences into account. Specifically, the authors proposed that the relative decision variable RDV , which reflects the accumulated evidence in favor of one option i compared to another option j , is updated at every time point t based on the value difference between the options (for which they took the rating values .ri , .rj ). The critical element of the aDDM is the assumption that the value of the currently unattended option impacts the RDV to a lesser extent than the value of the attended option. This assumption is implemented by multiplying the value of the unattended option with a parameter .θ (with .θ .< 1). Thus, in case option i is fixated at time t, the RDV is updated as follows: RDVt = RDVt−1 + vt + ϵt

(1)

vt = d · (ri − rj · θ ),

(2)

.

with .

where d refers to a drift constant that controls the speed of integration and .ϵt refers to Gaussian noise with mean 0 and standard deviation .σ . In case option j is fixated, .θ “moves” to the left and is now multiplied with .ri , so that the influence of i’s value on the RDV is lessened: vt = d · (ri · θ − rj ).

.

(3)

A decision is made as soon as the RDV surpasses an upper bound of 1 (in which case option i is chosen) or a lower bound of .−1 (in which case option j is chosen).

Fig. 1 The aDDM for binary decisions. (a) Schematic outline of the experimental design of the study by Krajbich et al. (2010). First, participants rated each snack’s subjective value. Then, they made binary choices between snacks. Eye movements were recorded during decision-making (credit: icons designed by Freepik from Flaticon). (b) Simulation of the aDDM process for an example decision between options i (magenta) and j (cyan). The currently fixated option is indicated by the background color. (c) Predictions of the aDDM. The first row shows the predicted choice probabilities and RT as a function of difficulty; the second row shows the influence of gaze time on choice probability (left), and the tendency to choose the option fixated last (right); the third row shows the predicted choice probabilities and RT as a function of the overall attractiveness of the options

334

S. Gluth and L. Fontanesi

The aDDM makes several predictions on the interaction of attention and choice that are in line with the above-mentioned results (Fig. 1c). First of all, as an extension of a sequential sampling model, the aDDM obviously predicts that more difficult decisions are less deterministic and slower than easy decisions. Second, the longer an option is fixated relative to the other option the more likely it is chosen, because its value influences the RDV to a greater extent (more precisely, the value of the longer fixated option exerts its full impact on the RDV for a longer period). Third, the model provides an elegant account of the tendency to choose the lastfixated option. If an option is fixated, the RDV usually drifts toward the boundary of the fixated option and eventually crosses the boundary. This crossing is less likely if the option is not fixated (because the RDV drifts away). Only if the attended option has a much lower utility than the unattended option, the RDV still drifts toward the unattended option, and this option might be chosen without looking at it. Fourth, the model does not predict that more valuable options are attended to more, consistent with independence of overall gaze duration on value as reported by Krajbich et al. (2010) (but see below for more recent studies challenging this finding and assumption). Finally, the model predicts that decisions between highvalue options are faster than decisions between low-value options, even if difficulty is held constant (Smith & Krajbich, 2019). Although not as stable as the difficulty effect on RT, this “magnitude” effect is often reported in value-based decisionsmaking (Fontanesi et al., 2019; Gluth et al., 2018) (see also Sect. 3.2.1 and Fig. 4 further below) and has been found in perceptual decisions as well (Teodorescu et al., 2016; Polanía et al., 2014; Pirrone et al., 2018). Notably, the aDDM makes this prediction because it assumes attention to exert a multiplicative (rather than an additive) influence on value accumulation (see Eqs. (2) and (3) above). Krajbich et al. (2010) were not the first to explore the influence of attention on decisions (see, e.g., Shimojo et al. 2003). However, the specific combination of behavioral and physiological data analyses with the development and testing of a computational model evoked a surge of interest in this topic within and beyond the field of neuroeconomics. Importantly, subsequent studies have provided compelling evidence that the dependency of decisions on eye movements is not restricted to (arguably simple) choices between food snacks but extends to many other decisionmaking domains, including decisions under risk (Stewart et al., 2016), moral decisions (Pärnamets et al., 2015), social preferences (Smith & Krajbich, 2018), and even perceptual decisions (Tavares et al., 2017). Furthermore, a handful of studies have provided some evidence for a causal link of attention on decisions by forcing participants to look at particular options for a predetermined amount of time or at a specific point in time and assessing whether these manipulations have an influence on the decisions (Armel et al., 2008; Pärnamets et al., 2015; Ghaffari & Fiedler, 2018).

Cognitive Modeling in Neuroeconomics

2.3.1

335

Extensions of the aDDM

Over the past 10 years since the seminal work of Krajbich et al. (2010), research on the interplay of attention and decision-making has flourished. Novel findings supported the central predictions of the aDDM but also challenged some of its assumptions. First of all, the aDDM has been extended to multi-alternative decisions, that is, decisions between three or more options (Krajbich & Rangel, 2011; Thomas et al., 2019). In the first of these proposed extensions (Krajbich & Rangel, 2011), it was assumed that the evidence for each option is accumulated according to a Wiener diffusion process whose drift rate depends on the option’s subjective value and on whether the option is fixated. Formally, the evidence .Ei,t for option i at time point t is updated according to Ei,t = Ei,t−1 + d · ri · (θ + [1 − θ ] · Fi,t ) + ϵt ,

.

(4)

where .Fi,t is an index variable indicating whether option i is currently fixated (.Fi,t = 1) or not (.Fi,t = 0). The decision is then based on a best-vs-next comparison, that is, the difference between the currently highest and second-highest accumulator is required to surpass a threshold to elicit a decision. Technically, this comparison is achieved by subtracting from .Ei,t the highest evidence for any other option: Vi,t = Ei,t − max(Ej,t ), i /= j

.

(5)

and checking whether .Vi,t for any option i is higher than a specific value (i.e., the decision threshold). The second extension (Thomas et al., 2019) focused on solving the increased computational demands of predicting decisions with many alternatives. Thereto, the aDDM was simplified by summing the total amount of fixation time per option and implementing these sums into a Gaze-weighted Linear Accumulator Model (GLAM). This accumulator model assumes a race between all options and thus does not require an additional best-vs-second comparison. The computational advantage also results from the fact that GLAM offers a (quasi) closed-form solution for the first-passage time problem (i.e., the likelihood of the decision variable to hit a boundary at a specific time), so that lengthy simulations of the model are not required to quantify its predictions. When introducing GLAM, Thomas and colleagues showed that the model yields robust estimates of the influence of attention on choice (i.e., quantified by the parameter .θ ) in various tasks and datasets, thus offering a tool to investigate inter-individual differences in a reliable manner. However, a debatable assumption of GLAM is that it does not feature a non-decision time parameter. This implies that simple shifts in the RT distribution (due to changes in sensory- or motor-processing time) cannot be accounted for.

336

S. Gluth and L. Fontanesi

A separate branch of studies addressed the potential causes of fixations, thereby challenging the aDDM’s assumptions that fixations are random and independent of value.1 Towal et al. (2013) augmented the aDDM with a “gaze model” that determined the probability and duration of fixations on the basis of the options’ salience and subjective value. To quantify salience, the authors employed the “Itty-Koch” algorithm, which is based on extensive research on bottom-up visual attention (Itti et al., 1998). The augmented model provided a superior fit to the choice and RT data of a four-alternative forced-choice task with food snacks. Moreover, parameter estimates and model comparisons suggested that both salience and value increased fixation frequency (and duration), and they did so in a multiplicative way. Importantly, the influence of salience and value was assumed to be constant over time (see also Gluth et al. (2018)). In a recent eye-tracking study on decisions between three (food snack) options (Gluth et al., 2020), we obtained further evidence for the influence of subjective values on attention. In contrast to the previous studies discussed above (Towal et al., 2013; Gluth et al., 2018), however, this influence was not constant over time but developed dynamically over the course of each decision. More specifically, we found a small dependency of the first fixation on the options’ value (i.e., highvalue options were more likely to be looked at first) that developed into a much stronger value dependency over the course of the decision (Fig. 2a).2 Notably, the option with the lowest value was ignored at some point—especially when it was much worse than the other two alternatives (Fig. 2b). Thus, participants appeared to distribute their attention top-down by concentrating on the two most promising options. Remarkably, this strategic allocation of attention influenced RT, such that the decision for one of the two good options was faster when the worst option had a particularly low subjective value. The observation that attention was dynamically shifted toward more favorable options led us to propose an extension of the aDDM. According to this extension, the probability to fixate an option .fi,t depends on the accumulated evidence .Ei,t for that option relative to the accumulated evidence for the other options: exp(γ · [Ei,t ]) , fi,t =  exp(γ · [Ej,t ])

.

(6)

where .γ is a free parameter that modulates the strength of the association between accumulated evidence and fixation probability. This link between accumulated value and attention (together with the assumption that evidence starts being accumulated

1 As

stated above, Krajbich et al. (2010) reported that fixation probability was independent of subjective value in binary decisions. Interestingly, this independence was violated in ternary decisions (Krajbich & Rangel, 2011). Therefore, the authors took subjective value into account when sampling from the empirical fixation distributions to fit the aDDM. However, no mechanistic account of the relationship between value and fixations was provided. 2 Importantly, this effect could not be driven by the fact that the chosen option is usually fixated last (see above), as we excluded the last fixation from our analysis.

Cognitive Modeling in Neuroeconomics

337

Fig. 2 The aDDM for multi-alternative decisions, extended by value-based attention. (a) Simulation of the aDDM process for an example decision between three options of different utilities (Best/blue .> Second-best/red .> Distractor/green). Gluth et al. (2020) extended the aDDM with value-based attention by linking accumulated evidence to the probability to fixate an option p(fix). (b) Empirical and predicted within-trial fixation patterns. The extended aDDM accounts for the fact that fixations become increasingly focused on the two best options (left) and that the decrease of fixations on the worst option depends on its value (right). Figure adapted with permission from Gluth et al. (2020)

prior to the first fixation) allows the model to predict a small dependency of the first fixation on value which then increases over time (Fig. 2b). In a nutshell, the interplay of attention and value results in a “Gaze cascade effect” (Shimojo et al., 2003), so that we look at what we like and we like what we look at, and this effect can be captured by our proposed extension of the multi-alternative aDDM.3

3 It should be noted that the strength of the relationship between accumulated value and fixation probability is rather modest, as indicated by a comparatively low estimate of parameter .γ (i.e., .γ 0.3). This is to be expected because a strong relationship would imply a too strong Gaze cascade effect, such that the first fixated option would be very likely to attract all the attention and the remaining options might not be looked at even once.

338

S. Gluth and L. Fontanesi

2.4 Outstanding Questions on Attention and Value-Based Decision-Making Although considerable progress has been made in understanding the computational principles of the interaction between attention, valuation, and decision-making, many unanswered questions remain to be addressed in future neuroeconomic research. First of all, the aDDM has been extended to multi-alternative decisions but not to multi-attribute decisions. The aDDM works on the level of (integrated) subjective values and thus ignores the fact that many (if not all) options are characterized by multiple attributes, such as goods of different price and quality or assets with different expected returns and risks. This is a critical shortcoming of the model, given that it is well established that people violate some fundamental principles of standard economic theories when making multi-alternative, multiattribute decisions (Rieskamp et al., 2006) and that the ability to account for these violations has become a benchmark for decision-making models in cognitive science (Busemeyer et al., 2019). Interestingly, a recent eye-tracking experiment on multi-attribute decisions showed that people tend to compare multiple options with respect to a single attribute rather than trying to integrate multiple attributes within a single option (Noguchi & Stewart, 2014). Based on these findings, the authors proposed a sequential sampling model of pairwise attribute-based comparisons that is able to account for the mentioned violations of economic principles (Noguchi & Stewart, 2018). However, this model, as well as other models that assume attention to be a critical driver of the aforementioned violations (Tsetsos et al., 2012; Bhatia, 2013), does not model eye movements (contrary to the aDDM), so that a comprehensive model of visual attention and multi-alternative, multi-attribute decisions-making is still lacking. Another open question pertains to the neural mechanisms underlying the influence of attention on decision-making. In one fMRI study (Lim et al., 2011), participants were enforced to fixate on either the left or the right choice option in an alternating manner and for a predetermined amount of time. The authors could show that the vmPFC encoded a relative value signal of the attended minus the unattended option, providing evidence that an attention-dependent valuation process controls the decision. However, the findings were not directly linked to the aDDM (e.g., with model-based fMRI), and it also remains unclear whether the vmPFC is the central or the only cortical structure that instantiates the evidence-accumulation process (for contrasting evidence, see, for instance (Hare et al., 2011; Gluth et al., 2012; Pisauro et al., 2017)). On the methodological level, it would be desirable to develop techniques that allow to fit computational models to eye-tracking data. Attention-inspired models such as the aDDM include eye movements in order to make more precise predictions of choice and RT data, and only these choices and RT data are subjected to parameter estimation. The models’ predictions of fixations, fixation duration, or gaze trajectories themselves are not utilized in model fitting or in quantitative model comparisons. Notably, the joint modeling approach (Turner et al., 2015) has been put

Cognitive Modeling in Neuroeconomics

339

forward to enable the inclusion of neural (e.g., fMRI) data into parameter estimation of computational models. This approach might thus be a promising starting point to achieve the same goal with respect to eye-tracking data.

3 Decisions in Reinforcement Learning In the previous section, we described decisions as a function of options’ values or attributes. By doing so, however, we made the implicit assumption that such decisions, if repeated, are interchangeable and independent. While this assumption holds in most cases, it is systematically violated when we experience different outcomes after each choice. In our daily lives, we often have to choose between options that have uncertain outcomes. For example, when choosing between different kinds of rice to make risotto, we might learn that one kind (e.g., Carnaroli) gives better results than another one (e.g., Arborio). We might have to learn this through experience by integrating the feedback based on our personal taste or provided by a demanding guest. Since such observations might be noisy (e.g., the risotto might taste bad for other reasons than just using the wrong kind of rice), we can be sure about which kind of rice delivers the best result only after several attempts. Thus, consecutive decisions cannot be considered being independent in these situations, and we have to address the sequential effects of repeated choices. Traditionally, these kinds of decisions (often labeled as experience-based) have been opposed to description-based decisions, such as choosing between two medical therapies based on their estimated success rate. While in experience-based decisions noisy or uncertain information is obtained by feedback, in description-based decisions such information is described to the decision-makers. Reinforcement learning was formalized by Sutton and Barto in the late 1990s (Sutton & Barto, 1998) as an interdisciplinary framework to describe decisions based on feedback. By defining algorithms for the integration of past with newly received feedback in the form of updating rules, reinforcement learning models have been used to explain human and animal behavior in learning contexts. They were also developed to teach artificial agents to learn and act in previously unknown environments in order to maximize gains and minimize losses. A comprehensive overview of reinforcement learning can be found in a different chapter of this book. Notably, reinforcement learning has also inspired models in behavioral economics, in particular for explaining human behavior in multi-player repeated-play games (e.g., Camerer and Hua Ho 1999; Erev and Roth 1998), and in finance to explain the dependency of investment behavior on previous experiences such as stock market crashes or recessions (e.g., Malmendier and Nagel 2015; Kolm and Ritter 2019). Here, we are concerned specifically with how these models can be developed to explain human behavior in terms of both the speed and the accuracy of choices during learning by feedback.

340

S. Gluth and L. Fontanesi

Reinforcement learning has also been very central in neuroeconomics, as it is relatively easy to implement a variety of tasks to explore more aspects of economic behavior (i.e., by varying characteristics of the environment and the reward schemes), and to measure the neural correlates of latent learning variables. The most notable of such latent variables is the reward prediction error (RPE), that is, the difference between the value of new feedback associated with an option and the previous expectation associated with that option. Correlates of the RPE were found in the dopamine nuclei in the midbrain, and in other areas that these nuclei project to (e.g., the ventral striatum) Schultz (1998, 2015); Arias-Carrión et al. (2010). These findings paved the way for what is today known as model-based fMRI, or the fMRI-based investigation of the neural correlates of computational models’ mechanisms quantified as model parameters (see, e.g., O’Doherty et al. 2003, 2004, 2007).

3.1 Modeling of Response Times During Reinforcement Learning Reinforcement learning models, however, have some limitations. When fitting reinforcement learning models, the data consist of choices among n alternatives (from the so-called n-armed bandit tasks). Thus, these often complicated models have to rely on a discrete measure, which is the outcome of each choice. This might lead not only to issues of model identifiability (see, e.g., Spektor and Kellen 2018; Steingroever et al. 2013), but it also gives us only a very poor understanding of the processes underlying every single decision. Stated differently, while reinforcement learning models successfully describe the dynamics of choice across trials, they do not attempt to describe the dynamics of choice within each trial (i.e., how we come to a fast/slow decision based on what we learned up to that point). As a matter of fact, reinforcement learning models typically use a softmax function to map expectations to the probability of choosing each option. But the softmax function is only a mapping function and provides us with little insights about how decisions are made. Fortunately, the sequential sampling modeling framework, which we introduced in the previous part of this chapter, offers a much richer account of the within-trial dynamics of single decisions. In line with this notion, several studies in the last years have proposed new reinforcement learning models in which the softmax function is replaced by a sequential sampling model, and the learning dynamics modulate the decision parameters of the sequential sampling model. At first, sequential sampling models have been fit to choices and RT data in reinforcement learning tasks without modeling across-trials dynamics. This can be achieved by simply fitting separate sets of sequential sampling model parameters per condition (i.e., difficult vs. easy), thus assuming trials to be independent and identically distributed. For example, Ratcliff and Frank (2012) fit the DDM with and

Cognitive Modeling in Neuroeconomics

341

without collapsing boundaries4 to a set of trials after an initial learning period but where feedback was still provided. By comparing punishing and rewarding contexts, they found that the data were best explained by either a higher non-decision time parameter in the punishing context or by setting the initial threshold parameter in this context higher and then letting it decay throughout the decision. Later, Cavanagh et al. (2014) used a similar task but fit their model on trials without feedback. They looked at correlations between some but not all DDM parameters (i.e., drift rate and threshold, but not non-decision time) and eye-tracking measurements. They found that while gaze dwell time was predictive of an increase in the drift rate (additional to the effect of the option’s value), pupil dilation was more predictive of an increase in the threshold parameter. While these studies provided some crucial insight into decision-making during reinforcement learning, in particular regarding the effects of punishments and conflict on the decision parameters, they do not give an explanation of how learning occurs and how trial outcomes influence the single decision. The first attempt to inform a sequential sampling model (in this case, the DDM) with a learning model was made by Frank et al. (2015). The authors described learning by a Bayesian optimal observer model. This model approximates how participants behave in the learning task but cannot address individual differences in learning. To link this model to the DDM, they estimated the correlation between the difference in learned values predicted by the model and the drift rate. In particular, they estimated a drift-rate intercept (corresponding to a baseline drift rate) and a drift-rate coefficient for value differences (corresponding to modulations of the baseline drift rate due to learning). Using a similar method, they also regressed the threshold onto the absolute difference of the reward expectations. Furthermore, neural correlates of value representation and conflict—as assessed by simultaneous electroencephalography (EEG) and fMRI recordings—were linked to drift rate and threshold, respectively. First of all, value difference influenced the drift rate. For the threshold, both main and interaction effects were reported. Specifically, they found a main effect of midbrain BOLD signal (overlapping with a subthalamic nucleus [STN] mask), an interaction effect of conflict and pre-SMA BOLD signal, and an interaction effect of conflict, midbrain signal, and mediofrontal EEG theta. This study confirmed that decision parameters during reinforcement learning are strongly influenced by learning variables at a trial level. While the accumulation of evidence is proportional to the difference in the learned values, the decision threshold appears to be affected by a complex neuromodulatory system for conflict. However, limitations of this study are that neither individual differences in learning

4 Sequential sampling models traditionally assume that decision boundaries are fixed throughout a decision, meaning that the level of cautiousness does not decrease over time. However, this assumption has been challenged in recent years by the proposal that decisions become progressively less cautious as more time passes (e.g., Cisek et al. 2009; Drugowitsch et al. 2012; Fudenberg et al. 2018). Especially when people make decisions under time pressure, there is good evidence that decisions boundaries decrease over time (e.g., Mileti´c and van Maanen 2019; Murphy et al. 2016). This mechanism leads to less skewed RT distributions and to the prediction that slower responses are less correct.

342

S. Gluth and L. Fontanesi

Fig. 3 Reinforcement learning diffusion decision models (RLDDM) describe both across- and within-trial dynamics. Participants are provided with feedback after each choice between two options, and one of these two options has a higher expected value (blue) than the other one (yellow; first row). Participants update their expectations (Q-values) so that the difference in Q-values tends to increase (second row). In each trial, the drift rate of the DDM-based evidence-accumulation process is proportional to this difference. While the predicted probability of choosing each option is similar to predictions of standard reinforcement learning models (third row), this model also makes predictions with respect to RT (fourth row): As the probability of choosing the higher value option increases, the predicted RT decreases

nor a more precise linking function between values and accumulation of evidence were provided. Therefore, it remains unclear how the mechanisms of learning and decision-making exactly match.

3.2 Reinforcement Learning Diffusion Decision Models To overcome these limitations, a new category of models (labeled RLDDMs for Reinforcement Learning Diffusion Decision Models) has been proposed more recently (Fontanesi et al. 2019, 2019; Pedersen et al. 2017) (see Fig. 3). The central

Cognitive Modeling in Neuroeconomics

343

aim of RLDDMs is to jointly model learning and decision-making processes. Thereto, the across-trial learning dynamics are described by the Rescorla–Wagner rule: Qt = Qt−1 + η · (ft − Qt−1 ),

.

(7)

where t is the trial number, f is the experienced feedback, and Q is the subjective value. This rule describes model-free learning, that is, learning that does not assume agents to develop and use a full representation or model of the environment but that simply maps rewards to states and actions (Dolan & Dayan, 2013). More specifically, new expectations are weighted averages between past expectations and a new outcome, where the weight is referred to as the learning rate .η (and is treated as a free parameter to capture inter-individual differences). In the RLDDM, the difference in Q-values (i.e., the updated expectations corresponding to the available options) is not simply plugged into a softmax function (as in traditional reinforcement learning models) but is used to define the drift rate in each trial: vt = vmod · (Qhigher,t − Qlower,t ),

.

(8)

where .vt is the drift rate at trial t, .vmod is a scaling parameter that reflects individual sensitivity to value differences, and .QA,t and .QB,t are the expectations at trial t corresponding to the values of two presented options, A and B, respectively. Thus, the first and simplest RLDDM has four parameters: the learning rate .η to update the subjective values according to the Rescorla–Wagner rule, the drift-rate scaling parameter .vmod , the threshold a, and the non-decision time .Ter . This model allows to explain the most basic learning effects, that is, the increasing probability of choosing the option with the higher expected value, and the corresponding decrease of RT across trials (Fig. 3). However, the most simple RLDDM could not explain other effects that occurred during learning and that appear to depend on specifics of the learning environment. In particular, in two studies (Fontanesi et al. 2019, 2019), we tested this model across various learning contexts with differences in value difference (i.e., small vs. large), outcome magnitude (i.e., high vs. low), valence (i.e., losses vs. gains), or information (i.e., partial feedback for the chosen option only vs. complete feedback for both chosen and unchosen options). In a different study (Pedersen et al., 2017), the model was used to capture the effects of stimulant medication in adult subjects with attentiondeficit hyperactivity disorder (ADHD). In all of these empirical studies, the basic RLDDM required a few modifications (i.e., additional mechanisms) to provide a full and accurate account of the data.

3.2.1

Extensions of the RLDDM

In a recent study (Fontanesi et al., 2019), we manipulated value difference and outcome magnitude in a reinforcement learning task. In each trial, participants

344

S. Gluth and L. Fontanesi

chose between two out of four options, whose associated outcomes were drawn from normal distributions with different means and equal variance. In addition to the learning effects described above, we observed several effects that the simple RLDDM could not capture. First, the simple RLDDM failed to fully capture difficulty effects, as the probability of choosing the high-value option in difficult trials (i.e., trials in which the difference between mean outcome values was small) was underestimated systematically. To capture this effect, we proposed a new mapping function to link the difference in Q-values to the drift rate in each trial:   .vt = S vmod · (Qhigher,t − Qlower,t ) , (9) with S(z) =

.

2 · vmax − vmax , 1 + e−z

(10)

where .S(z) is an S-shaped function centered at 0, and .vmax is the maximum absolute value that the transformed drift rate .S(z) can take: .limz→±∞ S(z) = ±vmax . While .vmax only affects the maximum and minimum values that the drift rate can take, .vmod affects the curvature of the linking function. Smaller values of .vmod lead to more linear mapping between value difference and drift rate, and therefore less sensitivity to value differences. Introducing this mapping function allowed the RLDDM to explain both choices and RT as well as their development across learning in easier and more difficult trials alike. This non-linear relationship between value differences and drift rate was also found in different tasks. Peters and D’Esposito (2019) used the same non-linear mapping function that we used to explain intertemporal choices (i.e., where the outcome is experienced with some delay) and risky choices (i.e., where the outcome has a known probability of occurring). In a different reinforcement learning study, Sewell et al. (2019) proposed an RLDDM to explain participants’ behavior in an associative task. Associative tasks differ from standard n-armed bandit tasks, since the correct classification (in this case in two categories) for each option has to be learned. Therefore, instead of choosing between two options, participants learn to associate different actions with each option. What they found is that a non-linear mapping function (albeit different from the one we proposed) between the learned associations and the drift rate at each trial was necessary to capture the behavioral patterns in their task. The simple RLDDM also failed to capture magnitude effects, which refer to the observation that participants responded faster when choosing between a pair of overall high-value options compared to a pair of overall low-value options (see Fig. 4). While getting faster, however, the probability that they would choose the “better” option of the pair did not decrease, speaking against a speed–accuracy trade-off. The basic RLDDM cannot capture this magnitude effect because it assumes that performance does not change when the difference in Q-values is the same. To give an example, a choice between two options, one of value 1 and the other one of value 2, should lead to the same behavior as a choice between two other options, of values 5 and 6. However, the magnitude effect we observed implies

Cognitive Modeling in Neuroeconomics

345

Fig. 4 Contextual learning effects on choices and RT. The first row refers to behavioral patterns observed in Fontanesi et al. (2019); the second row refers to behavioral patterns observed in Fontanesi et al. (2019). In both studies, we observed the main effects of learning on accuracy and RT, which increased and decreased, respectively. On top of these effects, we observed a speeding effect due to magnitude (first row, lower RT for high compared to low values) and valence of reward (second row, lower RT for positive compared to negative rewards), which was not accompanied by a change in accuracy. Finally, we found that difficult decisions were slower and less accurate than easy decisions (first row) and that providing full feedback led to faster and more accurate decisions than partial feedback (second row)

that responses were faster but not more or less consistent for the 5 vs. 6 compared to the 1 vs. 2 decision. To capture this effect, we proposed a threshold-modulating mechanism, allowing the threshold a to vary across contexts:5 a = exp(afix + amod · Qpres ),

.

(11)

where .afix is the threshold baseline, .amod is the threshold-modulating parameter, and .Qpres is the average subjective value of the presented options. When .amod is 5 Note that other mechanisms, such as implementing the accumulation of evidence as a race rather than a diffusion model, might predict the magnitude effect on RT but would also impact accuracy (or choice consistency) substantially. Changes of the threshold, however, have a comparatively small influence on accuracy, especially when accuracy is high. Potential mechanisms that explain such effect should thus consider their joint effect on RT and accuracy.

346

S. Gluth and L. Fontanesi

negative, the higher the .Qpres , the lower the threshold in that specific trial. This mechanism was crucial to explain the speeding effect elicited by overall higher value pairs of options. Note that some may interpret this threshold-modulating mechanism as implementing a speed–accuracy trade-off. However, changes in the threshold parameter have a different impact on the observed changes in choice probability and RT. In particular, the higher the choice probability in favor of one option, the less it is impacted by an increase in the threshold (i.e., similarly to a ceiling effect). On the other hand, the same increase in the threshold will have a strong effect on RT. In the study discussed above (Fontanesi et al., 2019), we tested the RLDDM in an appetitive environment, that is, an environment in which outcomes can be higher or lower but are always positive. However, learning is not restricted to appetitive stimuli and often occurs in aversive environments as well. In this case, one should learn to select options that minimize unwanted outcomes (vs. maximizing wanted outcomes). These two different environments are often referred to as rewarding and punishing contexts. In a separate study (Fontanesi et al., 2019), we found that the RLDDM failed to capture differences in behavior depending on whether people choose between rewards or between punishments. In the task that was analyzed in this chapter, participants were presented with pairs of options either in the negative (losses) or in the positive (gains) domain. Similarly to the magnitude effect, we found that participants were faster in the gain domain, but not more or less accurate (see Fig. 4). To account for this effect, the RLDDM was extended by a mechanism that allows the non-decision time to be modulated by the learned context value, such that shorter non-decision times could be assumed in the gain domain to explain the speeding effect in this condition. The reduced non-decision time for gains, which differs from the threshold-modulating mechanism proposed for the appetitive context, resembles a motor facilitation effect. In a recent study, Millner et al. (2018) investigated the effects of aversive-only learning contexts. Their goal was to elicit both avoidance and escape responses, by changing the consequences of participants’ action (i.e., active vs. passive) in a go/no-go task. While in the avoidance condition participants had to learn to avoid the action (i.e., go vs. no-go) that would result in an aversive sound with higher probability, in the escape condition participants learned the action (again, go vs. no-go) that had a higher probability to make an aversive sound stop. Their results showed that participants were more accurate in learning no-go responses in the avoidance condition and go responses in the escape condition. However, they were generally faster in the active escape as compared to the passive-avoidance learning context (for both go and no-go responses). The authors applied an RLDDM to explain the development of choice and RT data over learning and how this development was affected by the context. The most parsimonious RLDDM allowed the starting point of the diffusion process to be biased toward go responses in the active-escape context but toward no-go responses in the passive-avoidance context. While the starting point bias could explain the accuracy bias, it did not help to capture the general speeding-up effect in the escape context (which could perhaps be explained by either a threshold or a non-decision time-based mechanism). This

Cognitive Modeling in Neuroeconomics

347

study showed the importance of investigating approach/avoidance responses in a wider variety of learning contexts and actions during reinforcement learning. Taken together, the two studies that tested the RLDDM in aversive contexts provide some evidence for effects on the speed of decisions that go beyond evidence-accumulation mechanisms (e.g., freezing and fight-or-flight responses (Rösler & Gamer, 2019)). However, future studies should attempt to reconcile the different accounts into the same RLDDM.

3.2.2

Clinical Applications

Pedersen et al. (2017) compared performance in a learning task of adult patients with attention-deficit hyperactivity disorder (ADHD) on and off medication. The performance on medication was generally better (with a steeper increase of accuracy throughout learning). They tested the effects of medication on both learning and decision parameters, by fitting an RLDDM with separate parameters in the two conditions. They found that the drift-rate coefficient was higher on medication, as was the learning rate for positive prediction errors (i.e., the degree of value updating when outcomes exceeded the expectations). Therefore, medication did not only have an effect on learning itself, but on how decisions were informed by the learned values. Medication further affected the threshold and the non-decision time, with both parameters being higher on medication. This suggests that medication increased cautiousness (making responses slower and more accurate at the same time) but that it also generally slowed down motor responses.

3.2.3

Optimality of Behavior

By neglecting decision speed, reinforcement learning models have mostly focused on optimality of behavior in terms of maximizing rewards and minimizing punishments. They did not, however, consider that the time to obtain a reward (i.e., the reward rate) should be also minimized. This concept has already been formalized in the sequential sampling framework (see, e.g., Bogacz et al. 2006; Simen et al. 2006) but has been restricted so far to situations in which every trial is independent. As we explained at the beginning of this section, decision based on feedback typically violates this assumption. Therefore, the question of optimality of behavior in reinforcement learning, including information about decision speed, remains to be understood. Nguyen et al. (2019) investigated optimality of behavior in a task in which feedback was provided after a sequence of trials. By assuming a utility function that maximized the reward rate (thus taking the decision speed into account), they found that an optimal observer adjusts their starting point bias and threshold, and not the rate of evidence accumulation. This generally leads to faster and equally correct decisions. All previously mentioned studies tested the RLDDM in static learning environment, where the mean reward associated with a certain option does not change over

348

S. Gluth and L. Fontanesi

time. However, in most real-life applications, it is essential to keep track of changes that could occur due to various factors. For example, the quality of your favorite restaurant could suddenly drop due to a change in the kitchen staff. These sudden changes, typical of dynamic learning environments, can affect the certainty around previous estimates, and this is likely to affect decisions and learning parameters for future choices. Learning models based on the Rescorla–Wagner rule, like the ones used in the RLDDMs described so far, do not explicitly keep track of the certainty around reward expectations. Thus, certainty-based mechanisms have not been implemented yet in RLDDMs. However, in a recent study, Drugowitsch et al. (2019) addressed the problem of optimality of learning and decision-making in dynamic environments. To model the optimal learning of uncertainty, they relied on the Bayesian framework, which allowed them to quantify uncertainty in the form of a covariance matrix. In their learning rule, learning was (i) minimal after correct choices made with high certainty, (ii) high after incorrect choices made with high certainty, and (iii) high after choices made with low certainty. This rule also naturally explains a phenomenon often encountered in animal and human behavior, which is the tendency to repeat correct (i.e., previously rewarded) choices, especially in high-conflict contexts.

3.2.4

Methodological Advantages of Combined Learning and Choice Models

Recent studies explored the methodological advantages of RLDDMs, as compared to reinforcement learning models. Shahar et al. (2019) compared parameter recovery performance of RLDDM and standard reinforcement learning models in a twostage reinforcement learning task (Daw et al., 2011). This task has received considerable attention in recent years. It differs from standard n-armed bandit tasks, as participants can learn to map actions and rewards depending on which state of the environment they are in. In particular, there are two subsequent stages. The action taken in the first stage can have a higher or lower probability to lead to one of the two possible second states. Transitions between the first and second stages can thus be either “common” or “rare,” depending on their probability. Since one of the two second states yields higher rewards, participants should choose actions associated with common transitions toward that state. In the second stage, participants make choices between options as in traditional n-armed bandit tasks. This task is widely used to compare model-free and model-based learning because only the later form of learning takes transition probabilities into account. Intriguingly, Shahar et al. (2019) could show that parameter recovery was very low for standard reinforcement learning models in this task but was improved substantially for an RLDDM. Furthermore, performance depended heavily on the number of trials, with the RLDDM reaching acceptable recovery rates after a more reasonable amount of trials (i.e., about 200 trials per participant). Internal validity was also better in the RLDDM with less amount of trials. Moreover, on a conceptual level, the RLDDM enabled new model-free predictions of RT data that discriminate

Cognitive Modeling in Neuroeconomics

349

between model-based and model-free participants. In particular, the RLDDM made the correct prediction that more model-based participants were quicker to make a second-stage choice after a common transition and were slower after a rare transition. This effect is due to the fact that these participants, unlike the modelfree ones, keep track of the transition probabilities and are thus less certain about their choices after a rare transition. Unfortunately though, the temporal stability of the w parameter—which quantifies the proportion of model-based vs. model-free responses—was not improved by the RLDDM. In a different study, Ballard and McClure (2019) investigated learning in a dynamic environment and showed how a joint modeling of choices and RT, as implemented by a softmax function for choices and a linear regression for RT, improved the ability to estimate learning rate parameters. Even though a RLDDM was not tested in this study, the results support the idea that RT data in reinforcement learning provide additional, useful information that can influence the learning model (and not only vice versa).

4 Open Challenges and a Warning In this chapter, we have presented two core areas of current research in neuroeconomics, which address how mechanisms of attention and learning influence the process of value-based decision-making. An obvious open challenge for future research is to integrate these two topics into a more comprehensive picture of how we make decisions. Notably, a few studies have already gone in this direction by testing the role of attention in reinforcement learning settings. One of these studies is the above-mentioned work by Cavanagh et al. (2014), who modeled the influence of eye movements on decision-making in an n-armed bandit task. Interestingly, this study provided evidence for an additive impact of attention on preference formation, according to which the drift rate depends on the weighted sum of learned value and attention. This contrasts with the multiplicative assumption of the aDDM (i.e., value and attention are multiplied). Such accounts might point toward different mechanistic roles of attention in decisions from description and decisions from experience. On the other hand, the interaction between attention and valuation implemented in the aDDM provides an alternative explanation for the magnitude effect on RT in reinforcement learning that contrasts with the assumption of a modulation of the decision threshold (Fontanesi et al., 2019). Stated differently, the fact that people choose faster between stimuli that are associated with high as compared to low rewards (or punishments) does not need to be driven by an influence of overall reward amounts on the decision threshold (as currently done by the RLDDM). Instead, it could be an emerging property of the impact of attention on evidence accumulation (as assumed by the aDDM). At the moment, however, we would argue that more empirical data on eye movements in reinforcement learning tasks are required to ascertain whether attention exerts an additive or a multiplicative influence on experience-based decisions.

350

S. Gluth and L. Fontanesi

In addition to its potential influence on decisions, attention itself might be influenced by reinforcement learning. Along this line, a recent study combined fMRI with eye tracking to study how the allocation of attention changes in multidimensional reinforcement learning (Leong et al., 2017). Participants learned to choose the best out of three options, each of them characterized by three dimensions (faces, tools, houses). Only one of the dimensions was relevant (i.e., predictive of reward) at any given point in time, but which dimension was the relevant one could change from one trial to the next. Analogous to the above-mentioned Gaze cascade effect (i.e., we look at what we like, we like what we look at), the authors reported a feedback loop of learning and attention. On the one hand, participants learned to attend to the relevant dimensions. On the other hand, both their decisions and their learning were influenced more by those dimensions they attended to the most. This interaction was accompanied by an increased functional coupling between the brain’s valuation network (i.e., vmPFC) and parts of the fronto-parietal attention network. Future work should extend this learning-and-attention model to incorporate a cognitively plausible process model of decision-making (i.e., an RLDDM). Finally, we would like to provide a warning against a practical pitfall when studying choice processes in reinforcement learning. In reinforcement learning tasks, participants are asked to make the same decisions over and over again. This bears the risk that participants anticipate and prepare their next decision already before the trial starts and the options are presented. In most reinforcement learning tasks, the next decision could actually be prepared right after the last feedback has been provided, because participants know that they will face the same options in the next trial again. In this case, the “real” decision is unlikely to take place at the time when the choice options are presented, and neither the RT nor any timelocked neural or physiological response can be expected to reflect a DDM-like deliberation process. Notably, this phenomenon has been observed in a recent eyetracking study on model-based vs. model-free reinforcement learning (Konovalov & Krajbich, 2016). The authors showed that participants, whose learning dynamics were best characterized as model-based, made very few fixations during the choice period of the task. It appeared as if they simply searched for the option that they had already decided to take beforehand. To minimize the risk of getting caught in this trap, we therefore advise to ensure that participants cannot anticipate which options are shown in the upcoming trial and how they can select these options. Notably, these features were implemented in the study in which we introduced the RLDDM (Fontanesi et al., 2019). Here, participants always chose between two options out of a set of four potential options, and the association of options and responses (left vs. right) was randomized from trial to trial. Even though these features can be expected to make learning more difficult, they are necessary elements to ensure that the deliberation process occurs at the time point at which the experimenter expects it to occur, and thus to understand decision processes in learning from rewards and punishments.

Cognitive Modeling in Neuroeconomics

351

5 Exercises 1. What is the “Gaze cascade effect”? Which aspect of it has been implemented in the original version of the aDDM? Which aspect of it has been implemented in the extended aDDM as proposed by Gluth et al. (2020)? 2. The aDDM implements the interaction between attention and preference formation in a multiplicative way (see Eqs. (1)–(3)). Thus, the influence of attention on choice is predicted to be higher, when the value of the options is higher. Show this by comparing the drift rates (vt ) for a decision between two options i and j with values ri = rj = 10 when i / j is fixated, with the drift rates for a decision between two options k and l with values rk = rl = 5 when k / l is fixated. Assume θ = 0.3. What happens if r = 0 for both options? What happens if the options have negative values? 3. The aDDM does not always predict that the relative decision value RDV (Eqs. (1)–(3)) drifts toward the attended option because the RDV also depends on the value difference between the options. Find the general rule that specifies how θ , ri and rj need to be related to each other, so that even if option j is fixated, the RDV still drifts toward the boundary of i. 4. What are the main advantages of combining reinforcement learning and sequential sampling models? In particular, how does modeling RT add to standard reinforcement learning, and how does the modeling of learning add to standard sequential sampling models? 5. Equations (9) and (10) describe the non-linear mapping of subjective value differences to the accumulation of evidence in the RLDDM. In the programming language of your choice, simulate how vt changes as a function of value differences. Do that (i) by fixing vmax to 3, letting vmod vary between 0 and 2 (in steps of at least 6), and letting Qhigher − Qlower vary between −15 and 15 (in steps of at least 20); and (ii) by fixing vmod to 0.5, letting vmax vary between 1 and 5 (in steps of at least 6), and letting Qhigher − Qlower vary between −15 and 15 (in steps of at least 20). What happens when vmod = 0? What are the effects of vmax and vmod parameters? What is their psychological interpretation? 6. In Eq. (11), what happens when amod is 0? What would happen if amod would be higher than 0?

6 Further Readings The “bible” of neuroeconomics is the book entitled Neuroeconomics: Decision Making and the Brain, which features a large collection of chapters on all subdomains of the field, written by the leading neuroeconomists worldwide. Good introductions to sequential sampling models in the domain of value-based decisionmaking can be found in the recent reviews by Busemeyer et al. (2019) as well as by John Clithero (2018; 2018). For overviews of research on the interaction

352

S. Gluth and L. Fontanesi

between attention and decision-making, see Orquin and Loose (2013) and Krajbich (2019). An interesting discussion of the potential benefits of combining sequential sampling and reinforcement learning models is provided by Mileti´c et al. (2019). Even though this chapter focused on value-based decisions, a recent study by Turner (2019) showed that adaptive learning mechanisms can be incorporated in sequential sampling models (in particular, exemplar-based ones) in the perceptual domain. We refer to that study for an overview of the literature on learning processes in the perceptual domain as well.

7 Exercises with Answers 1. What is the “Gaze cascade effect”? Which aspect of it has been implemented in the original version of the aDDM? Which aspect of it has been implemented in the extended aDDM as proposed by Gluth et al. (2020)? ANSWER: The “Gaze cascade effect” refers to the phenomenon that we look at what we like and that we like what we look at, so that preference formation develops in a cascade-like acceleration. The original aDDM accounted for the second part of the Gaze cascade effect by assuming that the evidence tends to accumulate in favor of the attended option. The extended aDDM adds the first part by assuming that the probability to fixate an option increases as a function of the accumulated evidence for that option. 2. The aDDM implements the interaction between attention and preference formation in a multiplicative way (see Eqs. (1)–(3)). Thus, the influence of attention on choice is predicted to be higher, when the value of the options is higher. Show this by comparing the drift rates (vt ) for a decision between two options i and j with values ri = rj = 10 when i / j is fixated, with the drift rates for a decision between two options k and l with values rk = rl = 5 when k / l is fixated. Assume θ = 0.3. What happens if r = 0 for both options? What happens if the options have negative values? ANSWER: The drift rate is vt = 7 / vt = −7 when option i / j is fixated. The drift rate is vt = 3.5 / vt = −3.5 when option k / l is fixated. Thus, the difference in drift rates depending on the fixation is more extreme in the case of higher values. When r = 0, then there is no influence of attention on choice. When options have negative values, the aDDM predicts that the option that receives more fixations is less likely to be chosen. 3. The aDDM does not always predict that the relative decision value RDV (Eqs. (1)–(3)) drifts toward the attended option because the RDV also depends on the value difference between the options. Find the general rule that specifies how θ , ri , and rj need to be related to each other, so that even if option j is fixated, the RDV still drifts toward the boundary of i. ANSWER: Start with Eq. (3): vt = d · (ri · θ − rj ).

.

(12)

Cognitive Modeling in Neuroeconomics

353

We want to find a rule for vt < 0, so we set up the inequality: 0 < d · (ri · θ − rj ).

.

(13)

First, we get rid of the drift-rate constant d by dividing it. Note that if d < 0, the inequality sign would flip, but the aDDM usually enforces d > 0, so we do not need to care. Thus, we get 0 < r i · θ − rj .

.

(14)

This equation just needs to be rearranged to see the relationship: 0 < r i · θ − rj

(15)

rj < ri · θ

(16)

rj < θ. ri

(17)

.

.

.

In words, the ratio between rj and ri needs to be smaller than θ , so that the RDV drifts toward choosing i even if j is fixated. For example, if θ = 0.3, ri = 10, rj = 2, and d = 1, then the drift rate will be 1 ∗ (10 ∗ 0.3 − 2) = 3 − 2 = 1 in favor of option i when j is fixated (and 9.4 when i is fixated). 4. What are the main advantages of combining reinforcement learning and sequential sampling models? In particular, how does modeling RT add to standard reinforcement learning, and how does the modeling of learning add to standard sequential sampling models? ANSWER: In general, combining these two classes of models leads to substantial theoretical progress in the field of computational cognitive modeling of behavior. While reinforcement learning models describe across-trial learning from feedback dynamics, sequential sampling models describe within-trial decision dynamics. Therefore, combined models are necessary to understand the interplay of the two. Furthermore, by adding the modeling of RT to reinforcement learning, researchers have been able to (i) increase model identifiability (Shahar et al., 2019) and (ii) explain the effects of learning contexts that were not evident on choices alone (Fontanesi et al. 2019, 2019). These findings contribute to our understanding of learning by feedback. By adding the modeling of sequential effects to sequential sampling models, researchers have been able to (i) address violations of i.i.d. assumed by sequential sampling models, (ii) better understand the mapping of subjective values to decision parameters (Fontanesi et al., 2019), and (iii) define optimality of behavior in more naturalistic environments (Drugowitsch et al., 2019; Nguyen et al., 2019). 5. Equations (9) and (10) describe the non-linear mapping of subjective value differences to the accumulation of evidence in the RLDDM. In the programming language of your choice, simulate how vt changes as a function of value

354

S. Gluth and L. Fontanesi

Fig. 5 The non-linear mapping function of value differences to the drift rate

differences. Do that (i) by fixing vmax to 3, letting vmod vary between 0 and 2 (in steps of at least 6), and letting Qhigher − Qlower vary between −15 and 15 (in steps of at least 20); and (ii) by fixing vmod to 0.5, letting vmax vary between 1 and 5 (in steps of at least 6), and letting Qhigher − Qlower vary between −15 and 15 (in steps of at least 20). What happens when vmod = 0? What are the effects of vmax and vmod parameters? What is their psychological interpretation? ANSWER: vmax is the highest level the drift rate vt can assume. When vmax /2 is reached (e.g., when vmax = 1 and differences in value are around 10), the drift rate ceases to increase even for a higher amount of evidence toward the correct option. vmod determines the steepness of the mapping function. When vmod = 0, the drift rate is not sensitive to choices anymore. When vmod is below 0.4, the mapping is almost linear. vmax thus defines the higher level of performance (in terms of both accuracy and RT) a participant can reach in the task, independently of the easiness of the decision problem. On the other hand, vmod describes the sensitivity to value differences, i.e., how much your performance increases depending on the easiness of the decision problem (Fig. 5). 6. In Eq. (11), what happens when amod is 0? What would happen if amod would be higher than 0? ANSWER: When amod = 0, Eq. (11) reduces to a = exp(afix )

.

(18)

meaning that there is no threshold modulation. In case of a positive amod parameter, responses would become slower and more accurate when pairs of options with overall higher value were to be presented.

Cognitive Modeling in Neuroeconomics

355

References Arias-Carrión, O., Stamelou, M., Murillo-Rodríguez, E., Menéndez-Gonzáles, M., & Pöppel, E. (2010). Dopaminergic reward system: A short integrative review. International Archives of Medicine, 3(24), 1–6. https://doi.org/10.1186/1755-7682-3-24. Armel, K. C., Beaumel, A., & Rangel, A. (2008). Biasing simple choices by manipulating relative visual attention. Judgment and Decision Making, 3(5), 396–403. Ballard, I. C., & McClure, S. M. (2019). Joint modeling of reaction times and choice improves parameter identifiability in reinforcement learning models. Journal of Neuroscience Methods, 317, 37–44. Basten, U., Biele, G., Heekeren, H. R., & Fiebach, C. J. (2010). How the brain integrates costs and benefits during decision making. Proceedings of the National Academy of Sciences, 107(50), 21767–21772. Bhatia, S. (2013). Associations and the accumulation of preference. Psychological Review, 120(3), 522. Bogacz, R., Brown, E., Moehlis, J., Holmes, P., & Cohen, J. D. (2006). The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. Psychological Review, 113(4), 700. Busemeyer, J., & Townsend, J. T. (1993). Decision field theory: A dynamic-cognitive approach to decision making in an uncertain environment. Psychological Review, 100(3), 432–459. https:// doi.org/10.1037/0033-295X.100.3.432. Busemeyer, J. R., Gluth, S., Rieskamp, J., & Turner, B. M. (2019). Cognitive and neural bases of multi-attribute, multi-alternative, value-based decisions. Trends in Cognitive Sciences, 23(3), 251–263 Camerer, C., & Hua Ho, T. (1999). Experience-weighted attraction learning in normal form games. Econometrica, 67(4), 827–874. Cavanagh, J. F., Wiecki, T. V., Kochar, A., & Frank, M. J. (2014). Eye tracking and pupillometry are indicators of dissociable latent decision processes. Journal of Experimental Psychology: General, 143(4), 1476. Cisek, P., Puskas, G. A., & El-Murr, S. (2009). Decisions in changing conditions: The urgencygating model. Journal of Neuroscience, 29(37), 11560–11571. Clithero, J. A. (2018). Improving out-of-sample predictions using response times and a model of the decision process. Journal of Economic Behavior and Organization, 148, 344–375. Clithero, J. A. (2018). Response times in economics: Looking through the lens of sequential sampling models. Journal of Economic Psychology, 69, 61–86. Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P., & Dolan, R. J. (2011). Model-based influences on humans’ choices and striatal prediction errors. Neuron, 69(6), 1204–1215. Deubel, H., & Schneider, W. X. (1996). Saccade target selection and object recognition: Evidence for a common attentional mechanism. Vision Research, 36(12), 1827–1837. Dolan, R. J., & Dayan, P. (2013). Goals and habits in the brain. Neuron, 80(2), 312–325. Drugowitsch, J., Mendonça, A. G., Mainen, Z. F., & Pouget, A. (2019). Learning optimal decisions with confidence. Proceedings of the National Academy of Sciences, 116(49), 24872–24880. Drugowitsch, J., Moreno-Bote, R., Churchland, A. K., Shadlen, M. N., & Pouget, A. (2012). The cost of accumulating evidence in perceptual decision making. Journal of Neuroscience, 32(11), 3612–3628. Eckstein, M. K., Guerra-Carrillo, B., Singley, A. T. M., & Bunge, S. A. (2017). Beyond eye gaze: What else can eyetracking reveal about cognition and cognitive development? Developmental Cognitive Neuroscience, 25, 69–91. Eldar, E., Cohen, J. D., & Niv, Y. (2013). The effects of neural gain on attention and learning. Nature Neuroscience, 16(8), 1146. Erev, I., & Roth, A. E. (1998). Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. American Economic Review, 88, 848–881.

356

S. Gluth and L. Fontanesi

Fiebelkorn, I. C., Pinsk, M. A., & Kastner, S. (2019). The mediodorsal pulvinar coordinates the macaque fronto-parietal network during rhythmic spatial attention. Nature Communications, 10(1), 215. Fontanesi, L., Gluth, S., Spektor, M. S., & Rieskamp, J. (2019). A reinforcement learning diffusion decision model for value-based decisions. Psychonomic Bulletin and Review, 26(4), 1099– 1121. Fontanesi, L., Palminteri, S., & Lebreton, M. (2019). Decomposing the effects of context valence and feedback information on speed and accuracy during reinforcement learning: A metaanalytical approach using diffusion decision modeling. Cognitive, Affective, and Behavioral Neuroscience, 19, 490–502. Frank, M. J., Gagne, C., Nyhus, E., Masters, S., Wiecki, T. V., Cavanagh, J. F., & Badre, D. (2015). fMRI and EEG predictors of dynamic decision parameters during human reinforcement learning. Journal of Neuroscience, 35(2), 485–494. Fudenberg, D., Strack, P., & Strzalecki, T. (2018). Speed, accuracy, and the optimal timing of choices. American Economic Review, 108(12), 3651–3684. Ghaffari, M., & Fiedler, S. (2018). The power of attention: using eye gaze to predict other-regarding and moral choices. Psychological Science, 29(11), 1878–1889. Glimcher, P. W., & Rustichini, A.: Neuroeconomics: The consilience of brain and decision. Science, 306(5695), 447–452 (2004) Gluth, S., Kern, N., Kortmann, M., & Vitali, C. L. (2020). Value-based attention but not divisive normalization influences decisions with multiple alternatives. Nature Human Behaviour, 4, 634–645. Gluth, S., Rieskamp, J., & Büchel, C. (2012). Deciding when to decide: Time-variant sequential sampling models explain the emergence of value-based decisions in the human brain. Journal of Neuroscience, 32(31), 10686–10698. Gluth, S., Sommer, T., Rieskamp, J., & Büchel, C. (2015). Effective connectivity between hippocampus and ventromedial prefrontal cortex controls preferential choices from memory. Neuron, 86(4), 1078–1090. Gluth, S., Spektor, M. S., & Rieskamp, J. (2018). Value-based attentional capture affects multialternative decision making. Elife, 7, e39659. Hanks, T. D., & Summerfield, C. (2017). Perceptual decision making in rodents, monkeys, and humans. Neuron, 93(1), 15–31. Hare, T. A., Schultz, W., Camerer, C. F., O’Doherty, J. P., & Rangel, A. (2011). Transformation of stimulus value signals into motor commands during simple choice. Proceedings of the National Academy of Sciences, 108(44), 18120–18125. Hunt, A. R., Reuther, J., Hilchey, M. D., & Klein, R. M. (2019). The relationship between spatial attention and eye movements (pp. 255–278). Springer International Publishing. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263–292. https://doi.org/10.2307/1914185. Kolm, P. N., & Ritter, G. (2019). Modern perspectives on reinforcement learning in finance. The Journal of Machine Learning in Finance, 1(1). Konovalov, A., & Krajbich, I. (2016). Gaze data reveal distinct choice processes underlying modelbased and model-free reinforcement learning. Nature Communications, 7(1), 1–11. Krajbich, I. (2019). Accounting for attention in sequential sampling models of decision making. Current Opinion in Psychology, 29, 6–11. Krajbich, I., Armel, C., & Rangel, A. (2010). Visual fixations and the computation and comparison of value in simple choice. Nature Neuroscience, 13(10), 1292. Krajbich, I., & Rangel, A. (2011). Multialternative drift-diffusion model predicts the relationship between visual fixations and choice in value-based decisions. Proceedings of the National Academy of Sciences, 108(33), 13852–13857.

Cognitive Modeling in Neuroeconomics

357

Leong, Y. C., Radulescu, A., Daniel, R., DeWoskin, V., & Niv, Y. (2017). Dynamic interaction between reinforcement learning and attention in multidimensional environments. Neuron, 93(2), 451–463. Lim, S. L., O’Doherty, J. P., & Rangel, A. (2011). The decision value computations in the VMPFC and striatum use a relative value code that is guided by visual attention. Journal of Neuroscience, 31(37), 13214–13223. Malmendier, U., & Nagel, S. (2015). Learning from inflation experiences. The Quarterly Journal of Economics, 131(1), 53–87. Mileti´c, S., Boag, R. J., & Forstmann, B. U. (2019). Mutual benefits: Combining reinforcement learning with sequential sampling models. Neuropsychologia, 136, 107261. Mileti´c, S., & van Maanen, L. (2019). Caution in decision-making under time pressure is mediated by timing ability. Cognitive Psychology, 110, 16–29. Millner, A. J., Gershman, S. J., Nock, M. K., & den Ouden, H. E. (2018). Pavlovian control of escape and avoidance. Journal of Cognitive Neuroscience, 30(10), 1379–1390. Murphy, P. R., Boonstra, E., & Nieuwenhuis, S. (2016). Global gain modulation generates timedependent urgency during perceptual choice in humans. Nature Communications, 7(1), 1–15. Nassar, M. R., Rumsey, K. M., Wilson, R. C., Parikh, K., Heasly, B., & Gold, J. I. (2012). Rational regulation of learning dynamics by pupil-linked arousal systems. Nature Neuroscience, 15(7), 1040. Neumann, J. V., & Morgenstern, O. (1944). Theory of games and economic behavior. Princeton University Press. Nguyen, K. P., Josi´c, K., & Kilpatrick, Z. P. (2019). Optimizing sequential decisions in the drift– diffusion model. Journal of Mathematical Psychology, 88, 32–47. Noguchi, T., Stewart, N. (2014). In the attraction, compromise, and similarity effects, alternatives are repeatedly compared in pairs on single dimensions. Cognition, 132(1), 44–56. Noguchi, T., & Stewart, N. (2018). Multialternative decision by sampling: A model of decision making constrained by process data. Psychological Review, 125(4), 512. O’Doherty, J., Dayan, P., Schultz, J., Deichmann, R., Friston, K., & Dolan, R. J. (2004). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science, 304(5669), 452–454. O’Doherty, J. P., Dayan, P., Friston, K., Critchley, H., & Dolan, R. J. (2003). Temporal difference models and reward-related learning in the human brain. Neuron, 38(2), 329–337. O’Doherty, J. P., Hampton, A., & Kim, H. (2007). Model-based fMRI and its application to reward learning and decision making. Annals of the New York Academy of Sciences, 1104(1), 35–53. Orquin, J. L., & Loose, S. M. (2013). Attention and choice: A review on eye movements in decision making. Acta Psychologica, 144(1), 190–206. Palmer, J., Huk, A. C., & Shadlen, M. N. (2005). The effect of stimulus strength on the speed and accuracy of a perceptual decision. Journal of Vision, 5(5), 1–1. Pärnamets, P., Johansson, P., Hall, L., Balkenius, C., Spivey, M. J., & Richardson, D. C. (2015). Biasing moral decisions by exploiting the dynamics of eye gaze. Proceedings of the National Academy of Sciences, 112(13), 4170–4175. Pedersen, M. L., Frank, M. J., & Biele, G. (2017). The drift diffusion model as the choice rule in reinforcement learning. Psychonomic Bulletin and Review, 24(4), 1234–1251. Peters, J., & D’Esposito, M. (2019). The drift diffusion model as the choice rule in inter-temporal and risky choice: A case study in medial orbitofrontal cortex lesion patients and controls. PLoS Computational Biology, 16(4), e1007615. https://doi.org/10.1101/642587. https://www.biorxiv. org/content/early/2019/06/30/642587. Pirrone, A., Azab, H., Hayden, B. Y., Stafford, T., & Marshall, J. A. (2018). Evidence for the speed–value trade-off: Human and monkey decision making is magnitude sensitive. Decision, 5(2), 129. Pisauro, M. A., Fouragnan, E., Retzler, C., & Philiastides, M. G. (2017). Neural correlates of evidence accumulation during value-based decisions revealed via simultaneous EEG-fMRI. Nature Communications, 8(1), 1–9.

358

S. Gluth and L. Fontanesi

Polanía, R., Krajbich, I., Grueschow, M., & Ruff, C. C. (2014). Neural oscillations and synchronization differentially support evidence accumulation in perceptual and value-based decision making. Neuron, 82(3), 709–720. Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85(2), 59. Ratcliff, R., & Frank, M. J. (2012). Reinforcement-based decision making in corticostriatal circuits: mutual constraints by neurocomputational and diffusion models. Neural Computation, 24(5), 1186–1229. Ratcliff, R., Smith, P. L., Brown, S. D., & McKoon, G. (2016). Diffusion decision model: Current issues and history. Trends in Cognitive Sciences, 20(4), 260–281. Rieskamp, J., Busemeyer, J. R., & Mellers, B. A. (2006). Extending the bounds of rationality: Evidence and theories of preferential choice. Journal of Economic Literature, 44(3), 631–661. Roe, R. M., Busemeyer, J. R., & Townsend, J.T. (2001). Multialternative decision field theory: A dynamic connectionist model of decision making. Psychological Review, 108(2), 370. Rösler, L., & Gamer, M. (2019). Freezing of gaze during action preparation under threat imminence. Scientific Reports, 9(1), 1–9. Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80(1), 1–27. https://doi.org/10.1152/jn.1998.80.1.1. Schultz, W. (2015). Neuronal reward and decision signals: From theories to data. Physiological Reviews, 95, 853–951. https://doi.org/10.1152/physrev.00023.2014. Sewell, D. K., Jach, H. K., Boag, R. J., & Van Heer, C. A. (2019). Combining error-driven models of associative learning with evidence accumulation models of decision-making. Psychonomic Bulletin and Review 26(3), 868–893. Shahar, N., Hauser, T. U., Moutoussis, M., Moran, R., Keramati, M., Dolan, R. J., et al. (2019). Improving the reliability of model-based decision-making estimates in the two-stage decision task with reaction-times and drift-diffusion modeling. PLoS Computational Biology, 15(2), e1006803. Shimojo, S., Simion, C., Shimojo, E., & Scheier, C. (2003). Gaze bias both reflects and influences preference. Nature Neuroscience, 6(12), 1317. Simen, P., Cohen, J. D., & Holmes, P. (2006). Rapid decision threshold modulation by reward rate in a neural network. Neural Networks, 19(8), 1013–1026. Smith, S. M., & Krajbich, I. (2018). Attention and choice across domains. Journal of Experimental Psychology: General, 147(12), 1810. Smith, S. M., & Krajbich, I. (2019). Gaze amplifies value in decision making. Psychological Science, 30(1), 116–128. Spektor, M. S., & Kellen, D. (2018). The relative merit of empirical priors in non-identifiable and sloppy models: Applications to models of learning and decision-making. Psychonomic Bulletin and Review, 25(6), 2047–2068. Steingroever, H., Wetzels, R., Horstmann, A., Neumann, J., & Wagenmakers, E. J. (2013). Performance of healthy participants on the IOWA gambling task. Psychological Assessment, 25(1), 180–193. https://doi.org/10.1037/a0029929. Stewart, N., Hermens, F., & Matthews, W. J. (2016). Eye movements in risky choice. Journal of Behavioral Decision Making, 29(2-3), 116–136. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. MIT Press. Tavares, G., Perona, P., & Rangel, A. (2017). The attentional drift diffusion model of simple perceptual decision-making. Frontiers in Neuroscience, 11, 468. Teodorescu, A. R., Moran, R., & Usher, M. (2016). Absolutely relative or relatively absolute: Violations of value invariance in human decision making. Psychonomic Bulletin and Review, 23(1), 22–38. Thomas, A. W., Molter, F., Krajbich, I., Heekeren, H. R., & Mohr, P. N. (2019). Gaze bias differences capture individual choice behaviour. Nature Human Behaviour, 3(6), 625–635. Towal, R. B., Mormann, M., & Koch, C. (2013). Simultaneous modeling of visual saliency and value computation improves predictions of economic choice. Proceedings of the National Academy of Sciences, 110(40), E3858–E3867.

Cognitive Modeling in Neuroeconomics

359

Tsetsos, K., Chater, N., & Usher, M. (2012). Salience driven value integration explains decision biases and preference reversal. Proceedings of the National Academy of Sciences, 109(24), 9659–9664 Turner, B. M. (2019). Toward a common representational framework for adaptation. Psychological Review, 126(5), 660. Turner, B. M., Van Maanen, L., & Forstmann, B. U. (2015). Informing cognitive abstractions through neuroimaging: the neural drift diffusion model. Psychological Review, 122(2), 312. Urai, A. E., Braun, A., & Donner, T. H. (2017). Pupil-linked arousal is driven by decision uncertainty and alters serial choice bias. Nature Communications, 8, 14637. Van der Stigchel, S., Meeter, M., & Theeuwes, J. (2006). Eye movement trajectories and what they tell us. Neuroscience and Biobehavioral Reviews, 30(5), 666–679. Yarbus, A. L. (1967). Eye Movements and Vision. Plenum Press.

Cognitive Control of Choices and Actions Andrew Heathcote, Frederick Verbruggen, C. Nico Boehler, and Dora Matzke

Abstract We review model-based neuroscience work on cognitive control of choices and actions. We consider both strategically deployed executive processes and more automatic influences, first in binary choice tasks and then in more complex tasks. These include “conflict” tasks, where automatic and executive control processes sometimes act in opposition; delay discounting tasks, which require self-control to obtain larger rewards; and tasks where routine actions are occasionally interrupted by cues requiring different action or the inhibition of action. For all of these tasks, dynamic cognitive models have been developed based on the idea of accumulating evidence. They have also been studied by traditional neuroscience methods, but direct links to the cognitive models have not always been made. We detail the way in which progress has been made with model-based neuroscience methods in some cases and in others highlight how this points the way towards opportunities for progress. We emphasise generative Bayesian estimation methods that are well suited to the complexities of model-based neuroscience and provide exercises with open-source code that allow readers to develop skills with models relevant to cognitive control. Keywords Cognitive control · Conflict tasks · Evidence accumulation models

1 Introduction Choice and action control impacts on the entire gamut of psychological processes from signal detection to action selection and action execution (Verbruggen et al., 2014), and so it needs cognitive models that can encompass this full range. A. Heathcote () University of Amsterdam, Amsterdam, The Netherlands University of Newcastle, Callaghan, NSW, Australia F. Verbruggen · C. N. Boehler Ghent University, Ghent, Belgium D. Matzke University of Amsterdam, Amsterdam, The Netherlands © Springer Nature Switzerland AG 2024 B. U. Forstmann, B. M. Turner (eds.), An Introduction to Model-Based Cognitive Neuroscience, https://doi.org/10.1007/978-3-031-45271-0_14

361

362

A. Heathcote et al.

Evidence-accumulation models (EAMs), which choose actions based on gradually accruing required amounts of particular types of information (Donkin & Brown, 2018), fulfil this desideratum, and have provided the framework of choice for models of cognitive control. Action control can also require complicated computations that resolve conflicting information and conflicting demands among two or more stimuli, stimulus attributes, and response options. For this reason, cognitive control models have increasingly used a flexible type of evidence-accumulation architecture constituted of accumulators racing, either independently or interactively, in order to select an internal state or an external action (for a review, see Heathcote and Matzke (2022)). In the exercises for this chapter, we enable the reader to fit models based on various such architectures, for instance, using an independent race between single barrier diffusion processes (Tillman et al., 2020; see Box 1).

Error Accumulator

Evidence Threshold

Evidence Threshold

0.5

1.0

Decision Time (s)

1.5

0.5 0.0

−0.5 0.0

1.0

Density

0.5

Mean Rate Error

0.0

0.0

0.5

Accumulated Evidence

Mean Rate Correct

Correct and Error RT Distributions

1.5

1.0

Correct Accumulator

−0.5

Accumulated Evidence

1.0

Box 1: The Racing Diffusion (RD) Model of Binary Choice Each accumulator adds up noisy evidence (red and green lines) accruing at different rates (greater for the correct than error choice). The chosen response corresponds to the first accumulator to reach its threshold, with RT equal to the time to do so plus the sum of times to encode the stimulus and produce a response (non-decision time). Rates are a locus for controlling signal detection through selective attention (evidence quality) and effort (overall speed), as is the encoding portion of non-decision time. Thresholds and the response production portion of non-decision time provide loci for controlling action selection and action execution. The distribution of finishing times for a single accumulator follows a shifted Wald distribution (Heathcote, 2004; Matzke & Wagenmakers, 2009), with Fig. 1c showing the RT distributions for the 78% of trials where the correct (i.e. higher mean rate) accumulator wins and for the remaining 22% of trials when the error (i.e. lower mean rate) accumulator wins.

0.0

0.5

1.0

Decision Time (s)

1.5

0.2

0.4

0.6

0.8

1.0

1.2

1.4

RT (seconds)

Fig. 1 Ten simulated trajectories for correct (left panel; rate = 2) and error (middle panel; rate = 0.5) accumulators and corresponding RT distributions (right panel; threshold crossing time for the winner plus 0.25 s non-decision time). https://tinyurl.com/ 4n9h89ch under CC-BY 2.0 license (https://creativecommons.org/licenses/by/2.0/)

Cognitive Control of Choices and Actions

363

EAMs specify what can be controlled but not how control is achieved, although identifying where control is exerted can often be informative on this question. Importantly, EAMs support rigorous theory testing by linking these details to a finegrained characterisation of behaviour in terms of the responses that are executed and corresponding response times (RT)—the period that elapses between the appearance of a stimulus and the execution of a motor response—as well as how often responses are withheld. Crucially, in the current context, EAM parameters provide a natural medium to link neuroscience and behavioural data. Unlike “manifest” (i.e. directly observable) behavioural quantities, model parameters are typically “latent” quantities that can only be estimated indirectly based on finding values that provide an accurate and parsimonious account of the data. Correspondences between latent parameters and manifest physiological measures create a synergy between cognitive modelling and neuroscience: the difficult (and in some cases impossible) task of inferring parameters is made easier by linking them to neuroscience measures, and the model aids in the interpretation of neuroscientific data that can otherwise be ambiguous. In the following, we first briefly review early successes of model-based neuroscience combining standard EAMs and functional magnetic resonance imaging (fMRI) to understand the control of when to act. We then review two further areas of cognitive control, control over which actions to make and which actions to withhold, that make use of fMRI; electroencephalographic (EEG) signals, both in the amplitude and frequency domains; transcranial magnetic stimulation (TMS); and electromyography (EMG). In each of these three areas, we highlight recent advances in cognitive modelling that could benefit from the model-based neuroscience approach. Note that our focus here is on the cognitive and neural processes associated with cognitive control rather than the representations on which it is based (Freund et al., 2021). In terms of Nigg’s (2017) classification, these processes represent low-level executive functions exerting control over actions at short time scales. We do address control over impulsive choice in the delaydiscounting paradigm, which is usually studied separately from action control and also associated to some degree with higher-level executive functions, longer time scales, and different manifestations in clinical groups (see also Dalley and Robbins (2017)). Our review shows that EAMs can provide a framework to integrate these two aspects of cognitive control at a process level. However, we do not address more complex cognitive control (e.g. Alexander & Brown, 2015; Botvinick et al., 2009, Dayan, 2008; Rougier et al., 2005; see O’Reilly et al., 2010, for a review).

2 Controlling When to Act As illustrated in Box 1, EAM threshold parameters provide a means of controlling when to act. It takes longer to accrue enough evidence to reach a higher threshold, and so decisions are slowed. However, the decisions also become more accurate because longer accumulation diminishes the potential for noise in the accumulation

364

A. Heathcote et al.

process (evident in the irregular trajectories in Fig. 1) to allow the wrong accumulator to win. Hence, the evidence threshold controls a trade-off between speed and accuracy that participants have to manage when making decisions under time pressure. In an early example of model-based neuroscience, Forstmann et al. (2008) studied the speed-accuracy trade-off through fMRI and fitting an EAM, the linear ballistic accumulator (LBA, Brown & Heathcote, 2008). The LBA is a racing accumulator model, but unlike the racing diffusion (RD) model shown in Box 1, accumulation is deterministic with a constant rate (i.e. the trajectories in Fig. 1 would be straight lines). Instead of decision noise being in the moment-tomoment evidence total, in the LBA, it manifests in normally distributed trial-to-trial variations in the rates (an assumption shared with the LATER model; Carpenter & Williams, 1995) and uniformly distributed noise in the starting point of evidence accumulation. Forstmann et al. cued participants on a trial-by-trial basis to make either quicker or more accurate decisions about the predominant direction in a cloud of moving dots and accounted for the associated changes in speed and accuracy with a change in the LBA’s threshold parameters. The fMRI data revealed that the right anterior striatum and the right pre-supplementary motor area (preSMA) were modulated by the speed vs. accuracy instructions. Across participants, the magnitude of these changes correlated with the magnitude of estimated LBA threshold changes. The results of the “two-stage” procedure (i.e. first fitting a process model to behavioural data and a descriptive model to brain data, and then correlating model parameter estimates) were consistent with the proposal that the propensity to act is controlled by a cortico-striatal loop additionally involving the sub-thalamic nucleus (StN; Frank, 2006; Wiecki & Frank, 2013). Turner et al. (2015) criticised the two-stage procedure on two grounds: it neglects trial-to-trial changes and it is not statistically reciprocal, as the neural data do not influence estimation of the parameters of the behavioural model and vice versa. It also fails to properly account for uncertainty, which attenuates correlation estimates and makes inference overly confident (Ly et al., 2018; Matzke et al., 2017c). Turner et al. asserted that instead a “joint-modelling” framework is the way forward for model-based neuroscience. Joint modelling requires two statistical models for the data on each trial, in their case an independent components analysis (ICA) model for the fMRI data (Eichele et al., 2008), and a Wiener diffusion model (Stone, 1960; often referred to as the drift diffusion model in the neuroscience literature) of the behavioural data. As depicted in Fig. 2, their behavioural model is similar to the RD model, but it consists of a single accumulator with two thresholds, one for each response in a binary choice task. The variant of this model most widely used with behavioural data alone, the diffusion decision model (Ratcliff & McKoon, 2008), makes the same assumptions about trial-to-trial rate and starting-point variability as the LBA. However, in both cases, these variations are mathematically integrated out, so that only estimates of the parameters of the overall distributions are obtained. In contrast, Turner et al. used a Bayesian hierarchical model with both trial and participants levels (Vandekerckhove et al., 2011), enabled by the numerical methods of Navarro and Fuss (2009), to estimate rates and starting points for individual

Cognitive Control of Choices and Actions

365

Fig. 2 The neural drift diffusion model. Parameters of the neural ICA model (δ) are linked to parameters of the behavioural model (θ) through parameters (Ω) quantifying the variance that they share. Four evidence trajectories are depicted for the behavioural (diffusion decision) model in the lower left panel exemplifying combinations of high and low starting points (HS and LS) and high and low drift rates (HD and LD). (Reproduced with permission from Turner et al. (2015))

trials and link them to trial-by-trial BOLD estimates through a multi-variate normal distribution quantifying the correlations they share, which they called the neural drift diffusion model (NDDM). We will not describe Turner et al.’s (2015) results further here, as their analysis did not focus on cognitive control. However, a later analysis by Turner et al. (2017b) used a refined version of joint modelling to investigate the conventional wisdom that EAMs produce speed-accuracy trade-offs purely by threshold changes. More recent work with behavioural data alone has found that non-decision time and accumulation rates also change (e.g. Lerche & Voss, 2018; Rae et al., 2014). Turner et al.’s (2017b) refined method replaced the NDDM’s multi-variate normal link—which rapidly becomes impractical when investigating large brain networks because the required number of parameters is a quadratic function of the number of brain areas modelled—with a factor analysis model, where the increase is only linear. They used this model to examine a large number of brain areas and found evidence for changes in the brain networks associated with non-decision time and rates between speed and accuracy emphasis conditions, consistent with thresholds not being the only mechanism involved. We close this section with an opportunity related to increasing concerns with the role of time in decision-making in circumstances where more reward might be gained from correct responses per unit time by requiring less evidence as decision time progresses (e.g. when decision difficulty varies unpredictably from trial to trial). This has led to the proposal of models with “collapsing bounds” (thresholds that decrease over time) or adding an “urgency” input that drives responding (e.g.

366

A. Heathcote et al.

Churchland et al., 2008). Although it is now clear that neither mechanism provides much reward gain (Boehm et al., 2020), there are some circumstances both in humans and primates that are better fit by collapsing than constant bounds (Palestro et al., 2018b; Evans et al., 2020). Unfortunately, collapsing bounds and urgency models are usually computationally expensive (Hawkins et al., 2015) and hence difficult to work with (see, however, Trueblood et al. (2021)). The first opportunity box describes a new more tractable approach that is ripe for investigation with model-based neuroscience methods.

Opportunity 1: Further Exploring Control of Action Timing Hawkins and Heathcote (2021) proposed an elaboration of the model depicted in Box 1, the timed racing diffusion model (TRDM) that, consistent with theories of human time estimation (Simen et al., 2016), measures the passage of time with an extra diffusion process. The TRDM, with a choice being guessed if the timer wins, provides as good or better fits to data than the leading alternative models in paradigms favouring both fixed and collapsing bounds. It can also simultaneously account for decision-making and time estimation tasks by the same individuals. As TRDM is analytically highly tractable, it is ideal for joint modelling, which could explore the overlap between the cortico-striatal decision-making circuit (Frank, 2006) and the cortico-thalamic-basal ganglia timing circuit (Merchant et al., 2013), as well as joint modelling of more than one type of task (Kvam et al., 2021). As discussed further in the Sect. 4, it also provides a good model of the sustained attention response task used to study failures of control due to mind wandering.

3 Controlling Which Actions to Take We begin by discussing a series of papers focusing on choice selection in the delaydiscounting task (see also Dai & Busemeyer, 2014). Rodriguez et al. (2014) set the stage with an LBA model of behaviour alone, which was first extended to two-stage fMRI modelling (Rodriguez et al., 2015) and joint modelling of EEG and fMRI (Turner et al., 2016). This work mainly addressed where and how value is represented in the brain, whereas the final study (Turner et al., 2019) focused more on control processes, using the Leaky Competing Accumulator (LCA; Usher & McClelland, 2001) model to address behaviour and returning to a twostage modelling approach. We then turn to a prototypical approach to studying cognitive control through how people resolve decision conflicts in tasks such as the Stroop (MacLeod, 1991), Hommel (2011), and Flanker (Eriksen, 1995). There is an associated voluminous experimental literature, numerous applications to the assessment of individual differences in selective attention, and a range of cognitive

Cognitive Control of Choices and Actions

367

models that have been applied to one (e.g. White et al., 2011) or two (e.g. Ulrich et al., 2015) of these tasks. Here we focus on the Flanker task, where participants indicate the direction of a central arrow and the conflicting information comes from surrounding arrows (e.g.