Language Electrified: Principles, Methods, and Future Perspectives of Investigation (Neuromethods, 202) 1071632620, 9781071632628

Language, as a system we use to communicate, represents the brain’s biologically perfected machinery for converting thou

118 41 25MB

English Pages 816 [800] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface to the Series
Preface
Contents
Contributors
Part I: Principles and Tools
Chapter 1: From Neurons to Language and Speech: An Overview
1 The Neuron: Structure and Functions
2 Older Structures and Language
2.1 The Hindbrain
2.2 The Diencephalon
2.3 The Basal Ganglia
3 The Upper Floors and Language
3.1 The Cerebral Cortex
3.2 The Two Hemispheres
4 The Lobes
4.1 The Frontal Lobe
4.2 The Parietal Lobe
4.3 The Temporal Lobe
4.4 The Occipital Lobe
5 Assembling the Pieces of the Puzzle: Anatomical or Functional Differences?
6 The Neural Architecture of Language
6.1 The Dual Route Model
6.2 Anatomical-Functional Pathways
6.3 Memory, Unification, and Control
7 Conclusion and Future Challenges
References
Chapter 2: How the Brain Works: Perspectives on the Future of Human Neuroscience Research
1 A Brief Quantitative Anatomy of the Human Brain
2 Circuits, Networks, and Fields
3 Relationship Between Brain Structure and Measurements of Brain Function
4 What Does Localization of ``Brain Activity´´ Really Mean?
5 Neurocognitive Models
6 Future Directions
References
Chapter 3: How Do We Get the Brain to Tell Us About Language Computations and Representations? Designing and Implementing Expe...
1 Introduction: How to Get Reliable Data to Test Your Hypotheses
2 Building and Running Experiments
2.1 Designing Experiments
Box 1. Examples of Some Typical Paradigms Used in Cognitive Neuroscience of Language
Box 2. Summary of the Guidelines for Designing Experiments
2.2 Experiment Building Tools
Box 3. Guidelines for Choosing an Experiment Building Software
2.3 Stimulus Preparation and Technical Setup
Box 4. Recommendations for Stimulus Preparation
2.4 Running Experiments
3 Setting Up a Psychophysiology Laboratory
3.1 Data Recording Infrastructure
3.2 Behavioral Recordings
Box 5. Summary of Recommendations for Building a Psychophysiology Lab
3.3 Laboratory Practicalities
4 Improving Data Quality
Box 6. Recommendations for Improving Data Quality
5 Combining Methodological Modalities
6 Summary and Conclusions
References
Chapter 4: Software and Resources for Experiments and Data Analysis
1 Single-Subject Analysis and Group Analysis
1.1 Sensor Space and Source Space
1.2 Unique Issues to MEG and EEG, Respectively
2 Data Analysis
2.1 EEG Single-Subject Analysis
2.1.1 Short Description of EEG Dataset
2.1.2 EEG Recording
2.1.3 Scripting
2.1.4 Reading in and Segmenting Data
2.1.5 Extracting EOG Channels
2.1.6 Rejecting Based on Objective Threshold
2.1.7 Create the Event-Related Potentials
2.1.8 Plotting the ERPs
2.1.9 Difference Waves
2.1.10 Source Reconstruction
2.1.11 Lead Field
2.1.12 Compute the Minimum Norm Estimate
2.1.13 Alternative Localization Using Dipole Fits: Lead Field and Source Model
2.1.14 Alternative Localization Using Dipole Fits: Fitting the Dipole
2.1.15 Alternative Localization Using Dipole Fits: Plot Dipole Fit
2.1.16 Alternative Localization Using Dipole Fits: Extract Time Courses
2.1.17 Alternative Localization Using Dipole Fits: Plotting Extracted Time Courses
2.2 Summary of Analysis
2.3 Single-Subject MEG Analysis: MNE-Python
2.3.1 Short Description of Dataset
2.3.2 MEG Recording
2.3.3 Set Paths and Import
2.3.4 Find Events in the Raw Data
2.3.5 Epoch the Data
2.3.6 Creating Evoked Responses
2.3.7 Plotting Evoked Responses
2.4 MRI Preprocessing
2.4.1 Importing the MRI
2.4.2 Segmenting the Brain
2.4.3 Creating the Boundary Element Method (BEM) Model
2.4.4 The Forward Solution
2.4.5 Inverse Solution
2.4.6 Estimate Source Time Courses
2.5 Summary of MNE Analysis
2.6 Single-Subject Analysis: Beamformer
2.6.1 Reading in and Segmenting Data
2.6.2 Creating Boolean Indices
2.6.3 Getting the Event-Related Responses
2.6.4 Plotting the ERFs for a Quick Inspection
2.6.5 Detailed Time-Frequency Representation
2.6.6 Plotting the Time-Frequency Representation
2.7 MR Preprocessing
2.7.1 Constructing a Head Model
2.7.2 Inspect the Match Between MR and MEG Sensors
2.7.3 Creating a Source Space and a Lead Field
2.8 Do Beamforming
2.8.1 Create a Time-Frequency Representation for Beamforming
2.8.2 Plot the Coarse Time-Frequency Representation
2.8.3 Beamforming the Time-Frequency Pairs
2.8.4 Plotting the Beamformer
2.8.5 Plotting the Contrast Between Two Conditions: Beamformer
2.8.6 Regularization
2.9 Summary of Beamformer Analysis of Oscillatory Activity
3 Group Analysis
3.1 Morphing and Warping
3.1.1 Morphing in MNE-Python
3.1.2 Warping in FieldTrip
3.2 Final Words
References
Chapter 5: Principles of Statistical Analyses: Old and New Tools
1 Introduction
Box 1. What Is Statistical Significance? What Are p-Values?
1.1 Experimental Design and Statistical Design
1.2 The Signal: Average Versus Single-Trial
1.2.1 Why Grand Averages and Not Single-Subject Averages?
1.2.2 Grand Averages Don´t Stand Alone: Measures of Uncertainty
1.3 Time-Locking in Time and Space: Selecting a Region of Interest
2 The Classical Perspective: Analysis of Variance (ANOVA)
2.1 Structuring an ANOVA for Repeated-Measures
2.2 Problems with Classical ANOVAs
2.2.1 Language as a Fixed-Effect Fallacy
2.2.2 Significance and Estimation
2.2.3 Language Is Both a Continuous and a Categorical Independent Variable
2.2.4 Brain Activity Is Continuous Across Time and Space
3 A Modern Perspective: Mixed-Effects Models
3.1 Structuring a Regression Model for Repeated Measures
3.2 Misinterpretation as a Nuisance Variable
3.3 Keep It Optimal: Choosing an Appropriate Random-Effect Structure
3.4 ROIs Revisited: Channel Is Usually Not a Random Effect
3.5 Shrinkage and Partial Pooling
3.6 Contrast Coding for Fixed Effects
4 A Practical Example
5 Future Directions
5.1 Generalization of Regression and Correlation
5.2 Multivariate Pattern Analysis, Machine Learning, and Decoding Methods
5.3 Bayesian and Robust Approaches
6 Conclusion
References
Part II: Technologies and Methods
Chapter 6: Fundamentals of Electroencephalography and Magnetoencephalography
1 Basics of Neurophysiology
2 EEG Instrumentation
2.1 EEG Signal and Its Acquisition
Box 1 EEG Versus MEG
Box 2 MEEG Terminology and Definitions
3 MEG Instrumentation
3.1 MEG Basic Physics Principles
4 Main Functional Measures for MEG/EEG
4.1 Evoked Responses: Event-Related Potentials and Event-Related Fields
4.2 Neural Oscillations
5 Source Localization
6 From Isolated Sources to Connectivity
7 Multimodal Imaging
8 Future Perspectives
8.1 Naturalistic Stimulation Paradigms for Language Research
8.2 Body-Brain Interaction
8.3 New Sensor Technologies
8.4 Open Science
8.5 Imaging Genetics
References
Chapter 7: Event-Related Potentials (ERPs) and Event-Related Fields (ERFs)
1 Introduction: Using Evoked Responses (ERPs/ERFs) to Understand Language Processing in the Brain
2 The Main Responses in Cognitive Neuroscience of Language and Their Measurement
2.1 Early Auditory Processing: Brainstem Responses and Middle-Latency Responses
2.1.1 Auditory Brainstem Responses
2.1.2 Middle-Latency Responses
2.2 From Physical Features to Language Perception: Long-Latency Components P1-N1-P2-N2, Visual Responses, and Mismatch Negativ...
2.2.1 P50
2.2.2 N100
2.2.3 P200 and N200
2.2.4 MMN
2.2.5 Visual Responses
2.3 Understanding Language: Cognitive Components (N400)
3 Source Modeling-Based Separation and Identification of the Underlying Neural Components in ERPs/ERFs
3.1 Dipole Modeling
3.2 Minimum Norm Estimate
3.3 Beamformers
3.4 Other Approaches
4 Practical Tips for Evoked Response Acquisition and Analysis Notes
4.1 Acquisition
4.2 Analysis
4.2.1 Visual Inspection
4.2.2 Preprocessing
4.2.3 Visual Inspection After Preprocessing
4.2.4 Forward Modeling and Determination of the Source Space
4.2.5 Inverse Modeling
4.2.6 Group-Level Analysis
References
Chapter 8: Neural Oscillations in EEG and MEG
1 Neural Oscillations: Generation and Functions
1.1 Brain Rhythms
1.2 What Is an Oscillator?
1.3 Oscillators in the Brain
1.4 Oscillations Across Frequency Bands
2 Mechanisms of Neural Oscillations
2.1 Two Hypotheses
2.2 Representing External Input
2.3 Neural Information Transmission
3 The Temporal Structure of Speech
3.1 Cortical Speech Tracking
3.2 How to Measure Neural Entrainment to Speech
3.3 Individual Differences in Speech Tracking
4 Neural Oscillations and Language Units
4.1 On the Possible Role of Rhythms in Language Perception
4.2 Frequency Bands for Language Perception
4.3 Oscillations for Syntactic Chunking?
References
Chapter 9: Human Intracranial Recordings for Language Research
1 Introduction
2 History
3 ECoG Data Acquisition
3.1 Participants
3.2 Recording and Clinical Procedures
3.3 Seizure Monitoring
3.4 Intraoperative Recording
3.5 Chronic iEEG Recording
3.6 Stimulation
4 iEEG Signal Processing
4.1 High-Frequency Neural Activity
4.2 Signal Processing
4.2.1 Conversion, Line Noise, Filtering, and Referencing
4.2.2 Frequency Decomposition
4.2.3 Normalization
4.3 Data Analysis
4.3.1 Feature Extraction
4.3.2 Encoding Versus Decoding Analyses
4.3.3 Single-Electrode Versus Population Analyses
4.3.4 Research Design
5 Future Directions in iEEG and Language Research
6 Conclusions
References
Chapter 10: Transcranial Magnetic Stimulation in Speech and Language Research
1 Transcranial Magnetic Stimulation (TMS)
1.1 Physiological Basis of TMS
1.2 TMS Protocols for Measurement and Neuromodulation
1.3 TMS-EEG Co-registration
2 TMS Protocols in Language and Speech Research
2.1 Interference Paradigms
2.1.1 Technical Challenges of Interference Paradigms in Language and Speech Research
2.2 Corticobulbar Excitability
2.2.1 EMG Technical Challenges
2.2.2 Coil Positioning
2.2.3 TMS Intensity and Threshold
2.2.4 Corticobulbar Response Properties
3 Brief Overview of TMS in Language and Speech Research
4 Conclusions
References
Chapter 11: Transcranial Direct Current Stimulation (tDCS)
1 Introduction
2 Transcranial Direct Current Stimulation (tDCS)
2.1 Physiological Principles and Mechanism of Action of tDCS
2.2 Monitoring the Physiological Effects of tDCS
2.2.1 Monitoring with Transcranial Magnetic Stimulation (TMS)
2.2.2 Monitoring with Electroencephalography (EEG) and Event-Related Potentials (ERPs)
2.2.3 Monitoring with Magnetic Resonance Imaging (MRI) and Magnetic Resonance Spectroscopy (MRS)
3 tDCS in the Laboratory
3.1 tDCS Devices and Procedures
3.2 Safety Versus Tolerability
3.3 Stimulation Protocol for Stimulating Language: Insights from tDCS
3.3.1 tDCS in Healthy Population
tDCS Over the Frontal Cortex
tDCS Over the Temporal Cortex
3.4 tDCS in Aphasic Population
3.4.1 tDCS Over Frontal Cortex
3.4.2 tDCS Over the Temporoparietal Cortex
3.4.3 Other Targets of tDCS Stimulation/Future Directions
4 Conclusions
References
Chapter 12: Electromyographic (EMG) Responses of Facial Muscles During Language Processing
1 Introduction: Facial Muscles as a Specific Category of Striate Muscles
1.1 Functional and Metabolic Differences Among Facial Muscles
1.2 Afferent Feedback Signals from the Face to the Brain
1.3 Motor Control of Facial Muscles by the Brain
2 Recording of Surface EMG Activity from Facial Muscles
2.1 Advantages of Recording Facial Muscle Activity Using EMG
2.2 Bipolar Versus Monopolar Recording of EMG and Signal Amplification
2.3 Different Types of Surface EMG Electrodes
2.4 Skin Preparation and Risk Reduction Procedures
2.5 Electrode Size, Interelectrode Distance, and Electrode Locations
2.6 Recording, Preprocessing, and Quantification of EMG Signals
2.7 Standardization of EMG Responses
2.8 Unilateral or Bilateral Facial EMG Recordings
3 Significance of Facial EMG Responses in Relation to Information Processing
3.1 Effects of Emotion
3.2 Effects of Attention to External Stimuli
3.3 Effects of Cognitive Information Processing Tasks
3.4 Effects of Linguistic Processes and Subvocal Speech
4 Conclusions and Future Directions/Future Challenges
References
Part III: From Sounds to Syntax
Chapter 13: Neurocomputational Properties of Speech Sound Perception and Production
1 Introduction
1.1 Neurobiological Properties of Speech and Language
1.2 Building Epistemological Bridges
1.3 Aims of the Present Work
2 Linguistic Primitives
3 Neurophysiological Primitives
4 Decoding the Brain Waves Related to Speech Sounds Waves
4.1 Electroencephalography and Magnetoencephalography
4.2 Amplitudes and Latencies
4.3 Source Generators for Vowels and Consonants
4.4 Interlude
4.5 The Dynamical Flow of Vowels Representations
4.6 To Categorize or Not to Categorize: The Mismatch Negativity Component
4.6.1 Phonological Categories, Phonological Processes, and Dialect Variation
4.6.2 Understanding Mental Representation of Phonemes
4.6.3 Second Language Acquisitions and Categorization
5 The Dynamical Flow of Neural Oscillations
5.1 A Short Overview of Event-Related Oscillations
5.2 On the Perceptual Side
5.3 On the Production Side
6 Stimulating Motor and Auditory Areas
6.1 Motor Reverberations Evoked by the Acoustic Signal
6.2 Coming Back to Distinctive Features
7 When EEG Is Integrated with Magnetic Resonance Imaging: A Brief Look
8 Integrating Linguistic Theory and Neurobiological Research
8.1 The Role of Distinctive Features in Speech Perception and Production
8.2 Is Speech Processing Analog or Digital?
9 Conclusion and Further Remarks
References
Chapter 14: Neural Correlates of Morphology Computation and Representation
1 Introduction
1.1 ERP and MEG Components for the Study of Morphological Processing
2 The Existence of Morphology
2.1 Comparing Word Structures
2.1.1 Constraints on Derivation
2.1.2 Constraints on Inflection
2.2 Priming Studies for Morphology
2.3 Dissociations Between Morphology Types
2.4 Productivity Effects in Morphology
2.4.1 Productivity Effects in Derivation
2.4.2 Productivity in Inflection Versus Derivation
2.5 Semantic Transparency Effects on Morphology
2.5.1 Transparency and Compounding
2.5.2 Semantic Transparency and Derivation
2.5.3 Transparency and Inflection: Regular and Irregular Inflection
2.6 Discussion: Studies on Single-Word Processing
2.6.1 Issues in Studies on Single-Word Processing
3 Processing Morphology in Sentences
3.1 Inflection in Sentences
3.1.1 Inflection Regularity Effects in Sentences
3.1.2 Constraints on Inflection in Sentences
3.1.3 Subject-Verb Agreement in Sentences
3.1.4 Gender and Number Agreement in Sentences
3.2 Discussion: Morphology Processing in Sentences
3.3 Issues in Studies of Morphology Processing in Sentences
4 Morphological Processing in the First-Language Acquisition, Second-Language Learning, and Multilingualism
4.1 Morphological Processing in Children
4.1.1 Compounding in Children
4.1.2 Inflection in Children
4.1.3 Other Studies of Morphology in Children
4.2 Morphology in Second-Language Learners
4.2.1 Compounding in Second-Language Learners
4.2.2 Derivation in Second-Language Learners
4.2.3 Inflection in Second-Language Learners
4.3 Discussion: Morphological Processing in First-Language Acquisition and Second-Language Learning
5 General Discussion
5.1 Future Neuroimaging Studies on Morphology
References
Chapter 15: Electrophysiology of Word Learning
1 Introduction
2 ERP Indices of Word Processing
3 Implicit Learning of Novel Words
4 Explicit Learning and Consolidation of Novel Words
5 Beyond Simple Words: Learning New Morphological Units
6 Summary and Future Directions
References
Chapter 16: Neural Underpinnings of Semantic Processing
1 Introduction
Box 1 N400 as Implicit Prediction Error and Learning Signal
Box 2 N400 and Theories of Sentence Comprehension
2 The N400 ERP Component
2.1 Basic Findings
2.2 Meaning Beyond Sentences
2.3 More Subtle N400 Phenomena: Negation, Articles, and Role Reversals
2.4 Computational Models of the N400 and Meaning Processing During Language Comprehension
3 The P600 ERP Component
4 Future Perspectives and Challenges
References
Chapter 17: Sentence Processing: How Words Generate Syntactic Structures in the Brain
1 Introduction
2 Methodological Aspects of Electrophysiological Studies on Sentence Processing
3 Syntactic Processing in ERP Responses
4 Syntactic Processing in the Frequency Domain
4.1 Oscillatory Dynamics of Syntactically Correct Configurations
4.2 Oscillatory Dynamics of Syntactic Violations
5 The Emerging Picture and Future Directions
5.1 The Link Between Delta-Band Oscillations and Structure-Based Processing
5.2 Beyond Delta-Band Oscillations and Structure-Based Processing
Box 1: Predictive Processing
5.3 On the Relationship Between Neural Oscillations and ERPs
6 Conclusions and Further Remarks
Box 2: Outstanding Questions
References
Part IV: Beyond Segments
Chapter 18: Pragmatics Electrified
1 From Grice to the Electrodes
1.1 Pragmatics in a Nutshell
1.2 The Electrophysiological Landscape Relevant for Pragmatics
2 Non-literal Language
2.1 EEG Indexes for Lexical Pragmatics
2.1.1 Functional Interpretation
2.2 EEG Indexes for Mind-Reading Pragmatics
2.2.1 Functional Interpretation
3 Context, Discourse, and Conversation
3.1 Decomposing Contextual Factors
3.2 Linking Discourse
3.3 Conversation
4 ERP Responses Related to Pragmatics in Clinical Conditions
5 Conclusions and Future Directions
References
Chapter 19: Electrophysiology of Non-Literal Language
1 Introduction
2 Metaphors
2.1 Metaphors and Literal Language
2.2 Conventional and Novel Metaphors
2.3 Metaphor and Embodiment
3 Idioms
3.1 EEG and MEG Studies of Idiomatic Language
3.2 Brain Stimulation Studies of Idiomatic Language
4 Irony
4.1 Verbal Irony, Prosody, and Context
4.2 The Roles of Extralinguistic Information and Speaker Attributes in Irony
4.3 ERP Correlates of Irony
4.4 Irony and Brain Oscillations
4.5 Irony Summary
5 Jokes
5.1 Electrophysiology of Joke Processing
5.2 Different Types of Jokes
5.3 Individual Differences
6 Conclusion
References
Chapter 20: Neurological Evidence of the Phonological Nature of Tones
Abbreviations
1 Introduction
2 Organization of this Chapter
3 Tones in Linguistics
4 Neurological Evidence
4.1 Left Versus Right Versus Both Hemispheres
4.2 Lexical Versus Non-Lexical Tones
4.3 Performance Versus System (Phonetics Versus Phonology)
4.4 Learned Versus Innate
4.5 Categorical Versus Continued
5 Tonochrony
6 Abstractness
7 Discussion
8 Conclusions and Further Suggestions
References
Chapter 21: Neurophysiological Underpinnings of Prosody
1 Introduction
1.1 Electrophysiological Markers of Linguistic Prosody Processing
1.2 Electrophysiological Markers of Non-linguistic Affective Prosody Processing
1.3 Electrophysiological Markers of Social Prosody Processing
1.4 Influencing Factors and Characteristics
1.4.1 Task Focus
1.4.2 Sex
1.4.3 Age
2 Culture and Language Background
2.1 Future Perspectives and Challenges
References
Chapter 22: Using Facial EMG to Track Emotion During Language Comprehension: Past, Present, and Future
1 Introduction
2 Measuring Emotion
2.1 Why Self-Report Is Not Enough
2.2 Measuring Emotion via Facial Electromyography
3 A Review of Facial EMG Research on Affective Language Comprehension
3.1 Single-Word Studies
3.1.1 Manipulations of Lexical Emotional Contents
3.1.2 Manipulations of Prosodic Emotional Contents
3.2 Phrase and Sentence Studies
3.2.1 Manipulations of Lexical Emotional Contents
3.2.2 Manipulations of Prosodic Emotional Contents
3.2.3 Manipulations of Processing Complexity
3.3 Discourse Studies
3.3.1 Moral and Other Social Norm Manipulations
3.3.2 Media Research Manipulations
3.3.3 Other Manipulations in Discourse
3.4 Some General Observations
4 The fALC Model
4.1 Language Comprehension at Multiple Levels
4.2 Emotion-Based Evaluation
4.3 Emotion as Simulation
4.4 Emotional Mimicry and Other Factors
4.5 Using the Model
5 Challenges and Opportunities
References
Chapter 23: Eye-Tracking Methods in Psycholinguistics
1 Getting Started with Eye-Tracking Methodology: An Overview
2 Eye-Tracking Implementation in Psycholinguistics
2.1 Eye-Tracking in Reading Research
2.2 Eye-Tracking in Spoken Language Comprehension
2.3 Eye-Tracking in Language Production
2.4 Eye-Tracking in Language Learning and Bilingualism Research
3 Beyond Eye Movements: Co-Registration of Brain-Ocular Activity
3.1 Brain Potentials Locked to Oculomotor Behavior
3.2 Word-by-Word Versus Free-Viewing Paradigms
3.3 Methodological Challenges of Co-Registration
4 Conclusions and Future Directions
References
Chapter 24: Neurophysiology of Language Pathologies
1 Introduction
2 Forms of Language Disorders
2.1 Pathologies Due to Focal Damage
2.2 Pathologies Due to Diffuse Damage
2.3 Disorders of Perception Versus Production
2.4 Developmental Language Disorders
3 Role of Subcortical Structures in Language Processing
4 Cortico-Subcortical Networks Involved in Language Pathologies
5 Novel Therapeutic Approaches
6 Summary and Future Directions
7 Conclusion
References
Chapter 25: Electrophysiological Correlates of Second-Language Acquisition: From Words to Sentences
1 Introduction
1.1 Experiential Factors
1.2 The ERP Technique in L2 Studies
2 The Impact of Experiential Factors on L2 Processing
3 L2 Phonology
4 L2 Semantics
5 L2 Syntax
6 General Conclusion
7 Future Perspectives and Challenges
References
Index
Recommend Papers

Language Electrified: Principles, Methods, and Future Perspectives of Investigation (Neuromethods, 202)
 1071632620, 9781071632628

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Neuromethods 202

Mirko Grimaldi Elvira Brattico Yury Shtyrov Editors

Language Electrified Principles, Methods, and Future Perspectives of Investigation

NEUROMETHODS

Series Editor Wolfgang Walz University of Saskatchewan Saskatoon, SK, Canada

For further volumes: http://www.springer.com/series/7657

Neuromethods publishes cutting-edge methods and protocols in all areas of neuroscience as well as translational neurological and mental research. Each volume in the series offers tested laboratory protocols, step-by-step methods for reproducible lab experiments and addresses methodological controversies and pitfalls in order to aid neuroscientists in experimentation. Neuromethods focuses on traditional and emerging topics with wide-ranging implications to brain function, such as electrophysiology, neuroimaging, behavioral analysis, genomics, neurodegeneration, translational research and clinical trials. Neuromethods provides investigators and trainees with highly useful compendiums of key strategies and approaches for successful research in animal and human brain function including translational “bench to bedside” approaches to mental and neurological diseases.

Language Electrified Principles, Methods, and Future Perspectives of Investigation

Edited by

Mirko Grimaldi Centro di Ricerca Interdisciplinare sul Linguaggio (CRIL), University of Salento, Lecce, Italy

Elvira Brattico Department of Clinical Medicine, Aarhus University, Aarhus, Denmark

Yury Shtyrov Center of Functionally Integrative Neuroscience (CFIN), Department of Clinical Medicine, Aarhus University, Aarhus, Denmark

Editors Mirko Grimaldi Centro di Ricerca Interdisciplinare sul Linguaggio (CRIL) University of Salento Lecce, Italy

Elvira Brattico Department of Clinical Medicine Aarhus University Aarhus, Denmark

Yury Shtyrov Center of Functionally Integrative Neuroscience (CFIN) Department of Clinical Medicine Aarhus University Aarhus, Denmark

ISSN 0893-2336 ISSN 1940-6045 (electronic) Neuromethods ISBN 978-1-0716-3262-8 ISBN 978-1-0716-3263-5 (eBook) https://doi.org/10.1007/978-1-0716-3263-5 © Springer Science+Business Media, LLC, part of Springer Nature 2023 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Humana imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer Nature. The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.

Preface to the Series Experimental life sciences have two basic foundations: concepts and tools. The Neuromethods series focuses on the tools and techniques unique to the investigation of the nervous system and excitable cells. It will not, however, shortchange the concept side of things as care has been taken to integrate these tools within the context of the concepts and questions under investigation. In this way, the series is unique in that it not only collects protocols but also includes theoretical background information and critiques which led to the methods and their development. Thus it gives the reader a better understanding of the origin of the techniques and their potential future development. The Neuromethods publishing program strikes a balance between recent and exciting developments like those concerning new animal models of disease, imaging, in vivo methods, and more established techniques, including, for example, immunocytochemistry and electrophysiological technologies. New trainees in neurosciences still need a sound footing in these older methods in order to apply a critical approach to their results. Under the guidance of its founders, Alan Boulton and Glen Baker, the Neuromethods series has been a success since its first volume published through Humana Press in 1985. The series continues to flourish through many changes over the years. It is now published under the umbrella of Springer Protocols. While methods involving brain research have changed a lot since the series started, the publishing environment and technology have changed even more radically. Neuromethods has the distinct layout and style of the Springer Protocols program, designed specifically for readability and ease of reference in a laboratory setting. The careful application of methods is potentially the most important step in the process of scientific inquiry. In the past, new methodologies led the way in developing new disciplines in the biological and medical sciences. For example, Physiology emerged out of Anatomy in the nineteenth century by harnessing new methods based on the newly discovered phenomenon of electricity. Nowadays, the relationships between disciplines and methods are more complex. Methods are now widely shared between disciplines and research areas. New developments in electronic publishing make it possible for scientists that encounter new methods to quickly find sources of information electronically. The design of individual volumes and chapters in this series takes this new access technology into account. Springer Protocols makes it possible to download single protocols separately. In addition, Springer makes its print-on-demand technology available globally. A print copy can therefore be acquired quickly and for a competitive price anywhere in the world. Saskatoon, SK, Canada

Wolfgang Walz

v

Preface At this very moment you are receiving a mental transplant. The same sequences of phonemes, syllables, words, and sentences our brains generated some time ago are being reproduced in your brain as you are reading these lines. Writing and reading, though, emerged only recently, about 3000 BC. The first mental transplant of this kind happened even earlier—at least about 50–40 thousand years ago (although estimates vary tremendously)—when we became capable of using language and speech for transmitting thoughts to our conspecifics in real time. To the best of our knowledge, no other living creature on our planet has to date developed a similarly efficient communication capability. Language, as a system we use to communicate, represents the brain’s biologically perfected machinery for converting thoughts (ideas, concepts, and reflections of both the outside world and our inner feelings) into external entities. Thanks to the properties of the vocal apparatus, the air, the auditory system, etc., these entities can be immediately shared among speakers and listeners and converted, in turn, back into mental representations. Very likely, already before the emergence of language, our ancestors possessed the capacity for symbolic thought: that is, the ability to generate and manipulate complex representations of persons, objects (even those not within the immediate environment), actions, sensations, emotions, and events (past, present, and future). For instance, observing a piece of flint on the ground, analyzing its properties, imagining chipping it to obtain a sharp object useful for killing and skinning prey requires the intentional control of complex mental representations, together with the ability of seeing oneself mentally working the flint and imagining the finished product (i.e., abstraction from the immediately present reality). Our ancestors had to have a very detailed representation of the form and properties of the object they wanted to obtain; they had to be able to grasp its (a)symmetries to understand how to best shape it, and, finally, they had to have the ability to plan the chipping work in detail. Symbolic thought, language, speech, and consciousness became possible thanks to the highly developed brain areas we now call neocortex (frontal, temporal, parietal, and occipital cortices), which are strongly interconnected and synchronized with each other and with the evolutionary older parts of the brain (such as thalamus, basal ganglia, archi- and paleocortex, cerebellum, various limbic structures, etc.). These synchronizations underpin continuous top-down and bottom-up neural activity, which enables, among many things, not just the efficient control of both articulatory and auditory systems, but also their tight coordination, essential for the linguistic function. When symbolic thought was integrated with language and speech, the possibility of naming the reality led us to “recreate” the world giving it a new, systematized shape and structure: when it is named, the sharp object obtained by chipping the flint gets a new life in the system of verbal symbols; in this verbal world, the concept the name is referring to may be easily transmitted between speakers even when the object itself does not change hands (and may not be present at all). To achieve this, the evolution came up with a very economical technique: given the vocal apparatus’ ability to produce a wide range of articulatory gestures, it is sufficient to use a finite set of speech sounds (30 on average) that can be recombined into different units and linked to concepts, thereby creating words (the external entities mentioned above). Operations of this kind are possible because we can contrastively

vii

viii

Preface

exploit specific acoustic-articulatory features characterizing sounds: for instance, a simple change of a single phoneme [k] to [r] in [kæt] cat vs. [ræt] rat changes the meaning of the word dramatically. Crucially, although the inventory of speech sounds we use is limited, the potential for producing their different (arbitrarily chosen) combinations and associating them with different concepts is not: so, with a small number of sounds we may create infinite number of words or linguistic signs, which, in turn, can be recombined further in larger constructs (such as sentences) whose possible number is also infinite. In this way, a creative act of a single speaker, when accepted (as such or in a modified form) by a community of speakers, becomes part of their lexicon and property of the grammar. Thanks to sophisticated learning and memory processes, the vocabulary of a specific community can be transmitted from generation to generation, forming no less than the very backbone of our society and all of its cultural, economic, and other activities. As real-life actions and behaviors can be iterative and recursive, so can be thoughts and concepts that describe them. This intrinsic feature has been projected onto morphological and syntactic structures of language, which, as a result, makes heavy use of embedding in linguistic constructions. Similar to all biological functions, language forms a system in which everything holds together: when we produce linguistic output, everything in its internal structure (sentences, words, the sounds which they are formed of) follows the relations generated by the syntax through such iterative merging operations. This implies sophisticated hierarchical architectures, containing, for instance, the rules for deriving complex words and the rules for building syntactic structures. Thus, the morphology and phonology of our verbal output is modelled according to the generative machinery of syntax. All these features are transiently coded into the acoustic signal, which can then be decoded online into abstract mental representations. Thus, language and speech are characterized by rich hierarchical properties and complex temporal dynamics that must be mastered by the brain, which is able to both produce and decode spoken (and written) messages in real time with seemingly little effort. How is it possible that hundreds of billions of neurons within the dark of the skull, apart from regulating breathing, blood circulation, hormonal release, and so on, control a multitude of intelligent behaviors and cognitive functions, including directed action, memory, learning, thinking and reasoning, decision-making, language? This is an ancient question, presently known as the mind-brain (or mind–body) problem. Although it has been debated in different ways for over two thousand years, starting from pre-Aristotelian philosophers, it remains unresolved to this day. Different lines of thought (not all of which are mutually exclusive) developed over time: (1) mind and brain are two different entities (dualism), (2) mental properties can be identified with and can be explained in terms of neurophysiological processes (reductionism), (3) mental processes cannot be reduced to their neurophysiological properties, (4) mental states do not objectively exist and thus the mind is an artificial construct that has no place in the scientific understanding of the world (eliminativism). The challenge the mind-brain problem poses for philosophers and neuroscientists is not unlike one that theoretical physicists are faced with: that is, trying to find the ultimate overarching laws governing both micro- and macro-universe. Not surprisingly, the relations between quantum physics and higher brain functions have often been discussed in recent years. These two scientific pursuits—understanding the laws governing the universe and understanding the laws governing the brain—have so far proven to be the most difficult in human scholarly history. Both require highly sophisticated methods and techniques and complicated theoretical frameworks. Yet, whereas major advances that have taken place in

Preface

ix

physics in recent decades seem to have brought us closer to the “theory of everything,” the puzzle of the human brain remains the biggest challenge in our quest for knowledge. While the laws of physics that govern the universe generate measurable substances, forces, and energies, the laws of the brain generate abstract representations that are too evasive to be quantified objectively using the tools we now have so far. This challenge is exacerbated by a very special paradox: neuroscientists are the only scholars who must for their studies use the very same organ which they try to study and understand. This critical circularity may well turn out to be the ultimate insurmountable obstacle in achieving their goal. Cognitive neuroscience arguably started from observations of how lesions of specific brain parts impaired different cognitive functions, inferring, in this way, that a certain area controls the specific function. This localizationist approach essentially continued with the advent of neuroimaging and neurophysiological techniques that allow direct observations of increases and decreases in various measures of neural activity and correlating them to cognitive processing. The underlying assumption is that the complexity of cognitive processes may be reduced to the activation of specific regions of the brain. At a gross level, some cognitive functions are, of course, strongly associated with certain brain regions (for instance, the hippocampus seems to be the chief seat of at least some aspects of memory formation and retrieval). The risk that such approach entails is that of accumulating thousands and thousands of datasets with enormous descriptive power, but with very little explicative power. The brain, however, is not made up of gray matter alone; in fact, much of its volume is taken up by connecting tracts, many of which traverse very long distances enabling information transfer between remote areas; an average neuron is believed to form thousands of connections with other brain cells. This is one of the reasons why more recently the focus of investigations—particularly for complex cognitive functions—has been steadily shifting from circumcised brain areas to large-scale networks that bind together even distant neurons into functional ensembles whose activity can be dynamically regulated depending on the specific processing demands. This, of course, does not mean that functional localization is invalid; rather, that it has to be integrated within a broader range of research paradigms and theoretical frameworks. Indeed, in recent years, neuroscientists reached the awareness that, to understand the brain-mind relationship, theoretical and experimental approaches must be integrated to make sense of the enormous amount of existing data, as well as to guide future experiments. So, empirically testable theories are emerging with the aim to capture the tremendously complex computational and representational processes taking place in the brain. One particularly influential theoretical approach assumes that the brain works as a predictive machine which continuously compares sensory inputs against internally generated predictions; the latter are, in turn, constantly updating themselves based on the dynamic changes in the incoming information. This stresses the importance of the time domain as a crucial factor for information coding, processing, and transmission across the neural system. The high speed and rapid temporal dynamics of neural activity are underpinned by electrochemical signaling properties of the neural tissue that enable information transfer from peripheral receptors to high-level processing centers as well as between different brain structures within (tens of) milliseconds. To track such neural dynamics in time, we need physiological tools such as electroencephalographic techniques highlighted in this volume, capable of following temporal patterns of neural activity on a similar fine-grain time scale. These neural signals are characterized by synchronization and de-synchronization, reflecting the dynamic involvement of functional networks that integrate different types of information in computational

x

Preface

and representational processes. Within this perspective, we are beginning to uncover the connectome: that is (by analogy with the genome), the entirety of the brain’s structural and functional neural networks that control cognitive processing (together with other essential biological functions) in space and time. Language and speech are the ideal candidates for this kind of investigation, due to the high temporal variability in linguistic information flow and its hierarchical-recursive properties. Furthermore, starting from the middle of the last century, linguistics developed powerful theoretical models which can be put to test on the basis of neural data. One immediately obvious problem is that linguistic ontologies such as phonemes, syllables, morphemes, lexicon, syntax, and their operations cannot be directly linked to neurophysiological realities, such as neurons, dendrites, spines, synapses, cortical columns, action potentials, and their functions. As these two architectures are not directly commensurable, the only way we have is to exploit the explanatory power of theoretical models to correlate different levels of knowledge of the world, including structural and functional interconnections at and between all levels. In other words, solid theoretical primitives can guide us in developing appropriate experiments which will help us reach a deeper understanding of the neurobiological basis of language on the one hand, while, on the other hand, informing and further refining the theoretical frameworks themselves. This can ultimately lead to developing cross-disciplinary theories building epistemological bridges between different fields of research and different domains of knowledge. Indeed, building a neurobiological theory of language is certainly a demanding enterprise which can only be successful if it is placed on interdisciplinary foundations, incorporating different theoretical and methodological approaches both within and across disciplines. To address this challenge in this book, we feel it is necessary to begin to provide a real interdisciplinary academic background for scholars wishing to embark on this field of study. Presently, linguists and psycholinguists who wish to delve into the exciting world of neuroscience in order to understand the biological foundations of language encounter serious difficulties in obtaining the necessary competences in the domain of neurophysiological techniques and methods. On the other hand, neuroscientists working in the field of language processing in the brain often do not have sufficient knowledge of linguistic theoretical frameworks, and sometimes even brush them off as biologically irrelevant. These are just two of many limitations in the development of a fruitful dialogue between linguists and neuroscientists. To enable future researchers to coherently manipulate different linguistic, psychological, neurophysiological, and methodological tools, this disconnect between linguistics and neuroscience must be reduced, if not eliminated altogether. The success of this enterprise relies only on a common effort to build cross-disciplinary bridges. The present book was conceived taking all of these issues into account. Unlike many similar efforts, we designed this book as a hands-on tool offering the reader a possibility to progressively acquire principles, techniques, and methods necessary to pursue research in this field. We opted to mainly focus on neurophysiological techniques and methods which can help track linguistic processes in time—the dimension, which is of critical importance for the highly dynamic language function. Furthermore, most of these techniques are (relatively) easily acquirable and learnable and can provide high-quality laboratory data without excessive costs associated with modern brain scanners. The latter considerations also apply to other, less “brain-centered” techniques, like eye-tracking or electromyography, which are also highlighted in this volume.

Preface

xi

This volume is divided into four parts. Part I contains contributions discussing and illustrating neural principles and tools for an effective approach to the field of investigation. Mirko Grimaldi and Cosimo Iaia in Chapter 1 offer a general overview of the basic principles of the human brain functionality, focusing on language and speech processes (also future research perspectives and challenges are outlined). Chapter 2, by Ramesh Srinivasan, develops an inspiring discussion of the general principles and the kind of questions that can be asked about the human brain using non-invasive tools and measures. This topic is taken further by Miika and Alina Leminen in Chapter 3, which discusses questions related to experimental designs and their implementation in more detail, aimed at newbies in the field and making a particular stress on electroencephalographic experimentation. Lau Møller Andersen gets even more specific in Chapter 4 focusing on software and resources for experiments and data analysis, as does Phillip Alday in Chapter 5 thrashing out the principles of statistical analysis of electrophysiological data on language processing. Part II addresses the issues and perspectives concerned with the use of a range of neurophysiological technologies and methods to investigate the neural computations of speech and language processes. It starts off with the fundamentals of electroencephalography (EEG) and magnetoencephalography (MEG) by Antonio Criscuolo and Elvira Brattico in Chapter 6, and then continues to describe two main strands of EEG and MEG research— event-related potentials (ERPs) and event-related fields (ERFs)—in Chapter 7 by Tiina Parviainen and Jan Kujala, and of neural oscillations in Chapter 8 by Alessandro Tavano. William Schuerman and Matthew Leonard describe in Chapter 9 the use of intracranial recordings—the closest we can get to the brain in language research. Following these chapters on electrophysiological methods of recording brain activity, Part II switches to equally important neurophysiological techniques for non-invasive brain stimulation that allow making causal inferences about the functions of specific brain areas by modulating their activity externally: Chapter 10, by Alessandro D’Ausilio, Maria Concetta Pellicciari, Elias Paolo Casula, and Luciano Fadiga, introduces transcranial magnetic stimulation (TMS) in relation to speech and language research, while Roberta Ferrucci, Fabiana Ruggiero, Francesca Mameli, Tommaso Bocci, and Alberto Priori do the same for transcranial direct current stimulation (tDCS) methodology in Chapter 11. Finally, Anton van Boxtel, in Chapter 12, discusses the potential significance of facial electromyography (EMG) responses as an index of language processing and subvocal speech. Part III is devoted to an in-depth exploration of the neural processes associated with the main types of linguistic information, from phonemes and prosody to syntax, pragmatics, and figurative language. In Chapter 13, Mirko Grimaldi dives into the murky world of neurocomputational properties of speech sound perception and production, making a theoretical proposal to integrate linguistic and neurobiological perspectives. In Chapter 14, Phaedra Royle and Karsten Steinhauer critically review experiments on morphological processing with a focus on compounds, derivations, and inflections, dealing with neural correlates of morphological computations and representations. Chapter 15, by Alina Leminen, Eino Partanen, and Yury Shtyrov, is dedicated to the electrophysiology of word learning, with a particular emphasis on the time-course of the brain’s learning-related activity. This is followed by Chapter 16, by Milena Rabovsky, which explores semantic processing, chiefly focusing the N400 brain responses—perhaps the most widely used ERP component in language research. In Chapter 17, Jordi Martorell, Piermatteo Morucci, Simona Mancini, and Nicola Molinaro go to the next level—that of sentence processing and dynamic generation of syntactic structures in the brain.

xii

Preface

The next chapters in Part IV leave the realm of basic linguistic units and representations and focus on phenomena that go beyond segments. In Chapter 18, Polo Canal and Valentina Bambini present an account of the brain’s electrophysiological correlates of pragmatic processing. In turn, Vicky Tzuyin Lai, Ryan Hubbard, Li-Chuan Ku, and Valeria Pfeifer discuss the electrophysiology of non-literal language use in Chapter 19, with metaphors, idioms, irony, and jokes as the four main types of figurative language. Following this, the book dives into a different kind of key phenomena in speech: the nature of tones in the brain is addressed by Amedeo De Dominicis in Chapter 20, whereas the neurophysiological underpinnings of prosody are tackled in Chapter 21 by Silke Paulmann. We finish this volume with four distinct special topics. Since face-to-face linguistic communication is to a large degree accompanied—and helped—by the transmission of emotional expressions, Chapter 22, by Jos van Berkum, Marijn Struiksma, and Bjo¨rn ’t Hart, is dedicated to facial electromyography, used to investigate language-elicited emotions. Our eyes are the most accessible window to the brain, and ocular behavior is deeply implicated in different linguistic activities, making it the easiest proxy for investigating complex cognitive dynamics; hence, Chapter 23, by Mikhail Pokhoday, Beatriz Bermu´dez-Margaretto, Yury Shtyrov, and Andriy Myachykov, is dedicated to the use of eye-tracking methodology in language research, both as a stand-alone method and in combination with electroencephalography. Another important—particularly in the applied sense of this word—area of neurolinguistic research is the physiology of speech pathologies, addressed by Laura Verga, Michael Schwartze, and Sonja Kotz in Chapter 24. Finally, Chapter 25 is dedicated to one of the most intriguing processes controlled by our bran: i.e., second language acquisition. Sendy Caffarra and Manuel Carreiras discussed electrophysiological studies on this topic, presented by at different levels from single words to sentences. No book can claim to be a complete reflection of the state-of-the-art in the field in its entirety, and this one is no exception. We focused on the need to introduce the reader to the main topics, techniques, and questions in the field, placing a particular stress on the main tools that we believe to be optimal for pursuing this research, and omitting some techniques and areas of investigation that could be currently considered of lesser relevance to these main questions. In this effort, we have tried to address the glaring need to bring together multidisciplinary—linguistic, psychological, neurobiological, philosophical—perspectives on language in order to advance its study. This is because we firmly believe that language research cannot progress without tighter integration and coordination between these different fields and without better cross-disciplinary exchange of empirical and, crucially, theoretical knowledge. Our team of authors wishes you an enjoyable and safe journey through the chapters of this book. Bon voyage! Lecce, Italy Aarhus, Denmark Aarhus, Denmark

Mirko Grimaldi Elvira Brattico Yury Shtyrov

Contents Preface to the Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

PART I

v vii xv

PRINCIPLES AND TOOLS

1 From Neurons to Language and Speech: An Overview . . . . . . . . . . . . . . . . . . . . . . 3 Mirko Grimaldi and Cosimo Iaia 2 How the Brain Works: Perspectives on the Future of Human Neuroscience Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Ramesh Srinivasan 3 How Do We Get the Brain to Tell Us About Language Computations and Representations? Designing and Implementing Experiments . . . . . . . . . . . . . 43 Miika Leminen and Alina Leminen 4 Software and Resources for Experiments and Data Analysis . . . . . . . . . . . . . . . . . . 65 Lau Møller Andersen 5 Principles of Statistical Analyses: Old and New Tools . . . . . . . . . . . . . . . . . . . . . . . . 123 Franziska Kretzschmar and Phillip M. Alday

PART II

TECHNOLOGIES AND METHODS

6 Fundamentals of Electroencephalography and Magnetoencephalography . . . . . . Antonio Criscuolo and Elvira Brattico 7 Event-Related Potentials (ERPs) and Event-Related Fields (ERFs) . . . . . . . . . . . . Tiina Parviainen and Jan Kujala 8 Neural Oscillations in EEG and MEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alessandro Tavano, Johanna M. Rimmele, Georgios Michalareas, and David Poeppel 9 Human Intracranial Recordings for Language Research. . . . . . . . . . . . . . . . . . . . . . William L. Schuerman and Matthew K. Leonard 10 Transcranial Magnetic Stimulation in Speech and Language Research . . . . . . . . . Alessandro D’Ausilio, Maria Concetta Pellicciari, Elias Paolo Casula, and Luciano Fadiga 11 Transcranial Direct Current Stimulation (tDCS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roberta Ferrucci, Fabiana Ruggiero, Francesca Mameli, Tommaso Bocci, and Alberto Priori 12 Electromyographic (EMG) Responses of Facial Muscles During Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anton van Boxtel

xiii

163 195 241

285 311

339

367

xiv

Contents

PART III

FROM SOUNDS TO SYNTAX

13

Neurocomputational Properties of Speech Sound Perception and Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mirko Grimaldi 14 Neural Correlates of Morphology Computation and Representation . . . . . . . . . . Phaedra Royle and Karsten Steinhauer 15 Electrophysiology of Word Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alina Leminen, Eino Partanen, and Yury Shtyrov 16 Neural Underpinnings of Semantic Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Milena Rabovsky 17 Sentence Processing: How Words Generate Syntactic Structures in the Brain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jordi Martorell, Piermatteo Morucci, Simona Mancini, and Nicola Molinaro

PART IV 18

389 447 505 527

551

BEYOND SEGMENTS

Pragmatics Electrified . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Paolo Canal and Valentina Bambini Electrophysiology of Non-Literal Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vicky Tzuyin Lai, Ryan Hubbard, Li-Chuan Ku, and Valeria Pfeifer Neurological Evidence of the Phonological Nature of Tones . . . . . . . . . . . . . . . . . Amedeo De Dominicis Neurophysiological Underpinnings of Prosody . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Silke Paulmann Using Facial EMG to Track Emotion During Language Comprehension: Past, Present, and Future . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jos J. A. van Berkum, Marijn Struiksma, and Bjo¨rn ‘t Hart Eye-Tracking Methods in Psycholinguistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mikhail Pokhoday, Beatriz Bermu´dez-Margaretto, Anastasia Malyshevskaya, Petr Kotrelev, Yury Shtyrov, and Andriy Myachykov Neurophysiology of Language Pathologies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Laura Verga, Michael Schwartze, and Sonja A. Kotz Electrophysiological Correlates of Second-Language Acquisition: From Words to Sentences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sendy Caffarra and Manuel Carreiras

583

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

795

19

20 21 22

23

24 25

613

647 669

687 731

753

777

Contributors PHILLIP M. ALDAY • Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands LAU MØLLER ANDERSEN • Aarhus Institute of Advanced Studies, Aarhus, Denmark; Department of Linguistics, Cognitive Science and Semiotics, Aarhus University, Aarhus, Denmark; Center of Functionally Integrative Neuroscience, Aarhus University, Aarhus, Denmark VALENTINA BAMBINI • Department of Humanities and Life Sciences, University School for Advanced Studies IUSS, Pavia, Italy JOS J. A. VAN BERKUM • Institute for Language Sciences, Utrecht University, Utrecht, The Netherlands BEATRIZ BERMU´DEZ-MARGARETTO • University of Salamanca, Salamanca, Spain TOMMASO BOCCI • University of Milan, Milan, Italy ANTON VAN BOXTEL • Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands ELVIRA BRATTICO • Center for Music in the Brain (MIB), Department of Clinical Medicine, Aarhus University & Royal Academy of Music Aarhus/Aalborg, Aarhus C, Denmark; Department of Education, Psychology, Communication, University of Bari Aldo Moro, Bari, Italy SENDY CAFFARRA • Basque Center on Cognition, Brain and Language, Donostia-San Sebastian, Spain; Stanford University, Developmental-Behavioral Pediatrics Department, Stanford Medical School, Stanford, CA, USA; University of Modena and Reggio Emilia, Department of Biomedical, Metabolic and Neural Sciences, Modena, Italy PAOLO CANAL • Department of Humanities and Life Sciences, University School for Advanced Studies IUSS, Pavia, Italy MANUEL CARREIRAS • Basque Center on Cognition, Brain and Language, Donostia-San Sebastian, Spain; Ikerbasque, Basque Foundation for Science, Bilbao, Spain; University of the Basque Country, UPV/EHU, Bilbao, Spain ELIAS PAOLO CASULA • Department of System Medicine, University of Tor Vergata, Rome, Italy ANTONIO CRISCUOLO • Department of Neuropsychology & Psychopharmacology, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands ` di Ferrara, Ferrara, Italy; Fondazione Istituto Italiano ALESSANDRO D’AUSILIO • Universita di Tecnologia, Genoa, Italy AMEDEO DE DOMINICIS • University of Tuscia, Viterbo, Italy ` di Ferrara, Ferrara, Italy; Fondazione Istituto Italiano di LUCIANO FADIGA • Universita Tecnologia, Genoa, Italy ROBERTA FERRUCCI • University of Milan, Milan, Italy MIRKO GRIMALDI • Centro di Ricerca Interdisciplinare sul Linguaggio (CRIL), University of Salento, Lecce, Italy RYAN HUBBARD • Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Champaign, IL, USA COSIMO IAIA • Centro di Ricerca Interdisciplinare sul Linguaggio (CRIL), University of Salento, Lecce, Italy

xv

xvi

Contributors

PETR KOTRELEV • Center for Cognition and Decision Making, HSE, Moscow, Russia; Sirius University of Science and Technology, Sochi, Russia SONJA A. KOTZ • Maastricht University, Maastricht, The Netherlands FRANZISKA KRETZSCHMAR • CRC 1252 Prominence in Language, University of Cologne, Cologne, Germany; Leibniz-Institute for the German Language, Mannheim, Germany LI-CHUAN KU • Department of Psychology, Cognitive Science Program, University of Arizona, Tucson, AZ, USA JAN KUJALA • University of Jyv€ askyl€ a , Jyv€ a skyl€ a , Finland VICKY TZUYIN LAI • Department of Psychology, Cognitive Science Program, University of Arizona, Tucson, AZ, USA ALINA LEMINEN • Laurea University of Applied Sciences, Vantaa, Finland; University of Helsinki, Helsinki, Finland MIIKA LEMINEN • AI and Analytics Services, Helsinki University Hospital, Helsinki, Finland; University of Helsinki, Helsinki, Finland MATTHEW K. LEONARD • Department of Neurological Surgery, University of California (UCSF), San Francisco, CA, USA ANASTASIA MALYSHEVSKAYA • Center for Cognition and Decision Making, HSE, Moscow, Russia; Sirius University of Science and Technology, Sochi, Russia FRANCESCA MAMELI • Fondazione IRCCS Ca’ Granda - Ospedale Maggiore Policlinico, Milan, Italy SIMONA MANCINI • Basque Center on Cognition, Brain and Language (BCBL), DonostiaSan Sebastia´n, Spain JORDI MARTORELL • Basque Center on Cognition, Brain and Language (BCBL), DonostiaSan Sebastia´n, Spain GEORGIOS MICHALAREAS • Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany NICOLA MOLINARO • Basque Center on Cognition, Brain and Language (BCBL), DonostiaSan Sebastia´n, Spain; Ikerbasque, Basque Foundation for Science, Bilbao, Spain PIERMATTEO MORUCCI • Basque Center on Cognition, Brain and Language (BCBL), Donostia-San Sebastia´n, Spain ANDRIY MYACHYKOV • Sirius University of Science and Technology, Sochi, Russia; Northumbria University Newcastle, Newcastle upon Tyne, UK EINO PARTANEN • Cognitive Brain Research Unit, University of Helsinki, Helsinki, Finland a skyl€ a , Jyv€ a skyl€ a, Finland TIINA PARVIAINEN • University of Jyv€ SILKE PAULMANN • Department of Psychology, Centre for Brain Science, University of Essex, Colchester, UK MARIA CONCETTA PELLICCIARI • Department of Human Sciences, LUMSA University, Rome, Italy VALERIA PFEIFER • Department of Psychology, Cognitive Science Program, University of Arizona, Tucson, AZ, USA DAVID POEPPEL • Max Planck NYU Center for Language, Music, and Emotion, New York, NY, USA; Ernst Stru¨ngmann Institute for Neuroscience, Frankfurt am Main, Germany MIKHAIL POKHODAY • Center for Cognition and Decision Making, HSE, Moscow, Russia; Sirius University of Science and Technology, Sochi, Russia ALBERTO PRIORI • University of Milan, Milan, Italy MILENA RABOVSKY • University of Potsdam, Potsdam, Germany

Contributors

xvii

JOHANNA M. RIMMELE • Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany; Max Planck NYU Center for Language, Music, and Emotion, New York, NY, USA PHAEDRA ROYLE • Universite´ de Montre´al, CRBLM, BRAMS, Montreal, QC, Canada FABIANA RUGGIERO • Fondazione IRCCS Ca’ Granda - Ospedale Maggiore Policlinico, Milan, Italy WILLIAM L. SCHUERMAN • Department of Neurological Surgery, University of California (UCSF), San Francisco, CA, USA MICHAEL SCHWARTZE • Maastricht University, Maastricht, The Netherlands YURY SHTYROV • Centre of Functionally Integrative Neuroscience (CFIN), Department of Clinical Medicine, Aarhus University, Aarhus, Denmark RAMESH SRINIVASAN • Department of Cognitive Sciences, University of California, Irvine, CA, USA KARSTEN STEINHAUER • McGill University, CRBLM, Montreal, QC, Canada MARIJN STRUIKSMA • Institute for Language Sciences, Utrecht University, Utrecht, The Netherlands BJO¨RN ‘T HART • Institute for Language Sciences, Utrecht University, Utrecht, The Netherlands ALESSANDRO TAVANO • Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany LAURA VERGA • Maastricht University, Maastricht, The Netherlands; Comparative Bioacoustics Research Group , Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands

Part I Principles and Tools

Chapter 1 From Neurons to Language and Speech: An Overview Mirko Grimaldi and Cosimo Iaia Abstract In this chapter, we describe some of the basic principles of the human brain functionality, focusing on language and speech processes. After discussing the structure of neurons and their communication system, we delineate the functional anatomical organization of the brain. We can think of this organ as a building with multiple floors, built at different times, where the whole architecture makes sense because of the interconnections of the different floors. The lower parts of the building represent the older structures (the cerebellum, the thalamus, the limbic system, and the basal ganglia), while the upper parts are more recent (the cerebral cortex): it contains more neurons than any other cerebral structure and performs exceptional cognitive functions thanks to continuous bottom-up and top-down neural connections. In this way, the frontal, temporal, and parietal cortices were synchronized with each other—through groups (bundles) of neurons devoted to this task—and all together were synchronized with the thalamus, the limbic system, the basal ganglia, and the cerebellum. Thereby, symbolic thought, higher consciousness and language emerged. From this perspective, we discuss also how archaic structures of the brain (as the basal ganglia and cerebellum) were re-functionalized in order to mediate language processing as a result of complex synchronized ascending/descending pathways. Finally, some theoretical models that try to capture the linguistic neural organization are briefly outlined, and future perspectives of investigation and challenges are addressed. Key words Brain structures, Brain areas, Neural architecture of language, Language perception, Speech production

1

The Neuron: Structure and Functions Neurons essentially consist of a cell body (soma), which contains the nucleus, and two specialized cellular extensions for different functions: the dendrites (usually branched) and the axon. Communication between neurons occurs as the dendrites of one neuron receive information from the axons of other neurons through specific terminations called synaptic boutons: the information received is processed within the nucleus and transmitted, again, through the axons to other dendrites (see Fig. 1). Dendrites are therefore part of the postsynaptic neuron (as they receive information after the synapse), while axons are part of the presynaptic neuron.

Mirko Grimaldi et al. (eds.), Language Electrified: Principles, Methods, and Future Perspectives of Investigation, Neuromethods, vol. 202, https://doi.org/10.1007/978-1-0716-3263-5_1, © Springer Science+Business Media, LLC, part of Springer Nature 2023

3

4

Mirko Grimaldi and Cosimo Iaia

Fig. 1 Representation of a neuron. (Adapted from [3])

Neurons can have different shape, features, and functions, based on the role they play and the areas of the brain in which they are located. Axons are wrapped by a myelin sheath formed by the membranes of Schwann cells. The sheath acts like an “electrical insulator.” Like all living cells, neurons have electrically charged molecules, that is, ions whose internal charge is different from the external charge. Each neuron, therefore, has an electrical membrane potential that, at rest, is about -70 millivolts (mV): see Fig. 2. Typically, the inside of a neuron has a negative charge, while the outside has a positive charge. Although the distribution of positive ions is not uniform, usually on the outside there are more sodium ions (Na+), while on the inside there are more potassium ions (K+). In other words, we can compare a neuron to a battery with a form of potential energy that allows to generate electrical signals through the variation of the membrane potential, mediated by active processes at the level of the membrane itself [2]. A key feature of neurons is excitability, that is, neurons respond to a stimulus (acoustic, visual, tactile, etc.) with a rapid change in membrane

From Neurons to Language and Speech: An Overview

5

Fig. 2 Representation of the action potential. The variation in potential can be observed on the y-axis, while the time of potential generation is described on the x-axis. (Adapted from [1])

polarity (depolarization). For a few milliseconds, sodium ions enter inside the cell; then everything goes back as it was (repolarization). If the stimulus is intense enough to reach a threshold value, this modification of polarity propagates very quickly (from 1 to about 100 meters per second) across the cell membrane (see Fig. 2). The reversal of charge and its immediate recovery is called action potential, which lasts a few milliseconds in each restricted area of a neuron: once a peak of about +50 mV is reached, the potential returns below the threshold. Its propagation occurs as a wave along the entire nerve cell, increasing in intensity as it proceeds along the axon. As Schwann cells develop far from each other, the myelin sheath covering the axon is interrupted at irregular intervals, forming the so-called nodes of Ranvier (see Fig. 1). These nodes are rich in sodium pumps and here the action potential (at its resting state) is regenerated: nerve impulses propagate along axons even faster, by jumping from one node to another (see [2] for a detailed discussion). Thus, the action potential enables synaptic transmission, which is the process that allows billions of neurons to communicate with each other. There are two types of synapses: electrical and chemical. The former is quite rare in humans and allows a direct flow of electrical current from one neuron to another: the current flows through the communicating junctions, specialized channels of the membrane that connect two cells. The chemical synapses, by contrast, allow communication between neurons through the secretion of chemicals (neurotransmitters) released from the axons that generate a secondary current flow in the dendrites thanks to special receptor molecules (cf. Fig. 1).

6

Mirko Grimaldi and Cosimo Iaia

Fig. 3 Sagittal view of the central nervous system. (Adapted from [5])

We can think to the human brain as a building with multiple floors, built up at different times, where older structures are interconnected with newer ones, so the whole architecture makes sense thanks to its structural interconnections. In fact, we will see that the enormous potential of the human brain is not due to its size, but to the way in which older anatomical parts began to interact with newer ones. The neurons composing the nervous system are differentially distributed within the human body. The brain (also known as encephalon), along with the spinal cord, forms the central nervous system and is located in the cranial cavity. The peripheral nervous system, on the other hand, starts in the spinal cord and branches throughout the entire body (see Fig. 3).

From Neurons to Language and Speech: An Overview

2 2.1

7

Older Structures and Language The Hindbrain

2.2 The Diencephalon

The oldest part of the brain, the hindbrain (also called rhombencephalon), is composed of the cerebellum and the brainstem, and is accommodated in the posterior cranial fossa. The brainstem contains the medulla oblongata (or simply medulla), the pons, and the midbrain (also called mesencephalon). The cerebellum has a key function in the planning and execution of movements. Recent studies, however, have highlighted its role in the mental organization of articulatory gestures useful to shape sounds (cf. Figs. 3 and 4). The pre-articulatory planning of speech is important for the rapid and rhythmically organized production of sequences of syllables, then organized into words and sentences: this process is directly managed by the frontal lobe motor cortex with which the cerebellum is connected to through the thalamic pathways. Pathologies of the cerebellum can result in an impairment in sound production stability, slowing articulation, and a disruption in the coordination of the larynx and orofacial muscles’ activities (ataxic dysarthria). Therefore, the cerebellum’s processing properties appear to contribute to the modulation of the rhythmic structure of a word sequence. Recent evidence supports the hypothesis that the cerebellum is also involved in the perception of speech sounds. Lesions of the cerebellum, along with some areas of the frontal cortex, result into difficulties in the temporal processing of sounds. When reconsidered from an evolutionary perspective, these data suggest that the ability to produce and perceive language sounds relies on ancient neural properties [4]. The medulla connects the brain with the spinal cord: here run the long pathways that carry motor commands from the brain to the peripheral nervous system. This area, through the cranial nerves, is responsible for sensibility and motor control of the face, mouth, tongue, throat, and the respiratory system (as well as the heart). Visual and auditory reflexes, instead, are regulated by the mesencephalon, while facial expressions and eye movements are regulated by the pons. Above the hindbrain, we find the diencephalon, which contains the thalamus, the hypothalamus, and the pituitary gland or hypophysis (cf. Figs. 3 and 4). The thalamus can be considered as a hub of exchange with the brain as all the synapses involved in sensory processes (except for the olfaction) pass from here before continuing to the upper areas. As will be discussed later, some parts of the thalamus receive information from the retina and send it to the areas responsible for vision, while other areas receive information from the internal ear and send it to the auditory areas. Finally, the thalamus is directly connected with the part of the motor cortex that controls the larynx and the movements of the vocal tract during phonation.

8

Mirko Grimaldi and Cosimo Iaia

Thalamus Frontal lobe

Parietal lobe

Temporal lobe

Cingulate gyrus Fornix

Occipital lobe Amygdala

Pons

Hippocampus Cerebellum

Medulla oblongata

Parahippocampal gyrus

Thalamus Fronix Hypothalamus

Epiphysis

Cingulate Sulcus

Superior colliculus

Corpus callosum Inferior colliculus

Hypophysis Mesencephalon Pons Truncus cerebri

Cerebellum Myelencephalon Medulla spinalis

Fig. 4 Anatomy of the brain. Representation of the lobes in the left hemisphere and main structures of the limbic system. (Adapted from [6, 7])

Below the thalamus, we find the hypothalamus: it is about the size of a sugar cube and composed of tiny clusters of nerve cells called nuclei. Along with the hypophysis, at the base of the hypothalamus, the nuclei regulate body temperature, hunger, blood flow, sleep-wake cycle, etc., as well as emotional responses such as anger and fear. The thalamus, hypothalamus, and hippocampus, along with the cingulate gyrus and the amygdala, form the limbic system, a primitive area of the central nervous system that controls instinctual behavior (fight or flight response) and emotions: it is also involved in attention and memory processes (see Fig. 4). Nevertheless, the limbic system is to some extent involved in speech with regard to emotional vocalizations: fear, panic, anger, crying, laughter, etc.

From Neurons to Language and Speech: An Overview

2.3 The Basal Ganglia

9

Around and above the limbic system, there is a set of neuronal aggregations, the basal ganglia, consisting of the caudate nucleus, the putamen, and the globus pallidus (see Fig. 3). The main function of the basal ganglia is to plan and control movements by interacting with the upper areas of the brain (the motor cortex). The basal ganglia receive information from the motor cortex, plan whether a particular motor action should be performed, and through the thalamus send back the decision to the cortex. Any damage to these organs can cause, for example, an inability to change the direction of thoughts and thus make decisions about any action to be performed. Therefore, it is not surprising that the basal ganglia play a central role in speech production, where precise control of the movement of many muscles is required to control face, jaw, diaphragm, and tongue [8]. In fact, subjects suffering from Parkinson’s disease and Huntington’s disease (which affects the basal ganglia) show problems in controlling sound articulation, as well as in controlling and perceiving tone and volume of the voice (prosodic phenomena). In some cases, it has also been observed that difficulties emerge in the comprehension of sentences that have a moderately complex syntax. There is great evidence that even in healthy subjects the basal ganglia are involved in the perception of linguistic sounds (especially when fast auditory processes are required), in syntactic computations, as well as in semantic operations, that is, in the intentional retrieval of words regardless of whether they are nouns, verbs, etc. [9]. The basal ganglia are also involved in learning processes based on the association and memorization of two events that influence each other (associative learning). The caudate nucleus is activated in bilingual subjects when they need to select one language over another and thus need to perform language production control (select the desired language and suppress the unnecessary one). In contrast, the left putamen appears to be implicated in the articulatory processes of a second language, but only when the speaker has a low level of proficiency [10]. In addition, damage to the basal ganglia appears to impair the recognition of emotional intonations produced by other speakers. Involvement of the basal ganglia in language control seems related to the mutation of a gene, called FOXP2, which during evolution has adapted the circuits of this group of neurons by enhancing interactions with the upper areas of the brain specifically designated for language. When this genetic mutation occurred is difficult to say, however, some studies speculate that it appeared about 200,000 years ago, while others much later [11].

10

3

Mirko Grimaldi and Cosimo Iaia

The Upper Floors and Language

3.1 The Cerebral Cortex

The diencephalon, the limbic system, and the basal ganglia are surrounded by the cerebral cortex. The cortex, which is composed of more neurons than any other cerebral structure, performs functions that have greatly enhanced our cognitive abilities. Thanks to the cortex, we make decisions, organize the world, store our individual experiences in memory, produce and understand language, look at and appreciate paintings, and listen to music. To find space in the cramped volume of the skull, the cortex appears as it is wound on itself: its thickness ranges from 1.5 to 4.5 mm and is formed by the cell bodies of neurons, their dendrites, and some axons, including axons that project information from the underlying structures (thalamus) to the cortical areas. Since it is wounded on itself, the cortex has ridges and folds: the ridges are called gyri, while the depressions between the folds are called sulci, and if they are particularly deep, fissures (fissures divide the brain into lobes). Approximately 90% of the human cerebral cortex is neocortex, developed later on in evolution and characterized by six cell layers (see Fig. 5). The layers are numbered starting at the surface of the brain (first layer), continuing inward to the white matter (sixth layer). The neocortex is characterized by pyramidal neurons and small inter-neurons called stellate cells: both differ in size and quantity and are organized vertically in a columnar pattern. Along these layers, starting from the bottom and following the pyramidal pathway, the electrical activity of neurons ascends the scalp where it can be recorded with different methods, allowing us to investigate cognitive processes. A classic partition of the cortex was proposed early in the last century by neurologist Korbinian Brodmann. Based on differences in the shape and organization of neurons, Brodmann identified about 52 regions to which he attributed different functions (see Fig. 6). Although later studies, using more sophisticated techniques, have come to hypothesize about 180 distinct areas [14], most researchers still follow Brodmann’s model, schematically described as follows (for more details see the Subheadings 4.1, 4.2, 4.3, and 4.4): • Areas 1–3, 5, 31, 40 represent the somatosensory cortex (centers of touch, position sense, pressure, pain, and temperature). • Areas 4, 6, 32 include the motor cortex. • Area 8 controls eye movements, but it is also implicated in language, memory, and attention. • Areas 9, 10, 11, 12, 23, 24, 28, 36, 46, 47 are involved in motor programming, planning, decision-making, emotion control, and creative intelligence.

From Neurons to Language and Speech: An Overview

Golgi stain

11

Nissl stain

I.Molecular layer

II.External granular layer

III.External pyramidal layer

IV.Internal granular layer

V.Interal pyramidal layer

VI.Multiform layer

Fig. 5 The six layers of the neocortex. On the left, the section of neocortex was “stained” using Nissl’s method to highlight the number and shape of neurons, while on the center the staining was done using Golgi’s method, which highlights the outlines of a limited number of neurons. (Adapted from [12], originally from [13])

12

Mirko Grimaldi and Cosimo Iaia

Fig. 6 Illustration of the Brodmann areas. Above: lateral view of the left hemisphere; below: sagittal view of the left hemisphere. The anterior part of the brain is located on the left, and the posterior part on the right. (Adapted from [15])

• Areas 44, 45 represent Broca’s area (cf. Subheading 4.1). • Areas 41, 42, 22 are the auditory areas (in the posterior part of 22 we find the Wernicke’s area). • Area 38 is involved in the control of emotional states. • Area 17 (and adjacent areas 18 and 19) circumscribe the primary visual cortex. • Areas 20, 21, 37 are involved in processing the characteristics of visual stimuli. • Areas 7, 39 are responsible for processing information related to movement and position of stimuli in visual space.

From Neurons to Language and Speech: An Overview

3.2 The Two Hemispheres

4

13

The brain is divided into two hemispheres by the longitudinal fissure. Each of the two hemispheres governs the opposite side of the body: the left cerebral hemisphere controls the right side, while the left side depends on the right hemisphere. The two hemispheres communicate with each other through the corpus callosum (Fig. 4), the major nerve pathway of the brain, formed by about three hundred million nerve fibers. For a long time, it has been thought that the two hemispheres are specialized into different cognitive functions (hemispheric lateralization): for example, the left hemisphere was thought to exclusively control language, while the right hemisphere was alleged to regulate emotions and creativity (therefore in the right side of the brain would occur the perception of music and art, including the perception of “beauty”). However, recent studies have shown that hemispheric specialization is far from being absolute: the right side of the brain is never “switched off” during the perception and production of language, and, as we will in the following chapters, only some levels of language processing are exclusively managed by the left hemisphere (the same applies to other cognitive processes).

The Lobes The two major landmarks on the brain surface of each hemisphere are the central sulcus (or sulcus of Rolando) and the lateral fissure (or the Sylvian fissure) (cf. Fig. 7). Both divide each hemisphere into four areas, called lobes: the frontal lobe, the parietal lobe, the temporal lobe, and the occipital lobe (Figs. 4 and 7). The most prominent cortical gyri on the surface of each hemisphere are (i) the precentral gyrus, site of the motor cortex; (ii) the postcentral gyrus, which contains the somatosensory cortex; and (iii) the superior temporal gyrus. Each lobe is the predominant site of a cognitive process. However, cognitive processes generally involve multiple lobes and, as we have seen, subcortical structures. A firm point reached by current knowledge about the brain is that cognitive processes are generated by a dense network of cortical and subcortical connections in ongoing communication with each other.

4.1

The Frontal Lobe

The frontal lobe contains very important motor areas (located in front of the central sulcus): Brodmann area 4 is considered the motor area par excellence, as it controls peripheral muscle contraction, while areas 6 and 8 are considered premotor areas, with a planning function (see Fig. 6). The lower part of area 4 controls a very important organ of the phonatory apparatus: the larynx, which contains the vocal folds. The motor area of the larynx is responsible for the activation of about 100 muscles for the production of linguistic sounds and the control of intonation, as well as the management of breathing and deglutition: it interacts with (and is

14

Mirko Grimaldi and Cosimo Iaia

Fig. 7 Representation of the principal gyri and sulci of the brain. Also, the arcuate fasciculus is highlighted connecting regions of the temporal, parietal, and frontal lobes. (Modified from [21])

controlled by) a network of subcortical structures such as the brainstem and spinal cord (which contain a phonological sensorimotor neural nucleus for the control of innate vocalizations), the limbic system, the basal ganglia, and the cerebellum [16]. In monkeys, the motor cortex of the larynx is located in premotor area 6 and thus only indirectly controls the motor neurons of this organ: in fact, its impairment does not affect the production of vocalizations. These differences could result from an evolutionary process that led the laryngeal cortex to perform the crucial function of intentional motor control in the production of linguistic sounds [17]. In the inferior part of area 4 is also located the neural (somatotopic) map of the tongue that, together with the supplementary motor area, the inferior part of area 6, would play a key role in the production but also in the perception of linguistic sounds (Fig. 6): according to this hypothesis, the areas that control the articulation of sounds are to some extent involved in the processes of perception [18] (cf. Chap. 13). Since the beginning of the last century, there have been countless discussions on this issue that have not yet led to a shared theory of how, starting from the acoustic signal, the brain comes to generate precise representations of every single sound produced, allowing us to reconstruct words or sequences of words.

From Neurons to Language and Speech: An Overview

15

The frontal lobe also includes other key areas for language and speech. The traditional model considered the area 44/45 of the inferior frontal gyrus (Fig. 7), called Broca’s area (from the name of the French neurologist who discovered it), to be involved in verbal fluency: in fact, a lesion in this area causes Broca’s aphasia, which compromises the ability to articulate coordinated sequences of sounds. Recent studies have allowed us to better define the size of this area and its complex functionality. Available data suggest the following: 1. Areas 44/46, close to motor areas 4 and 6, are implicated in the articulation of sounds and their assembly into syllables. 2. Areas 45/44 are involved in processing syntactic relations between words. 3. Areas 47/45 are responsible for processing semantic relations between words. In short, these areas seem to be involved in combinatorial and integration operations between phonological, syntactic, and lexical levels [19]. Other studies have also highlighted the importance of areas 9, 8, and 46 (dorsolateral prefrontal cortex): these areas of the left frontal lobe have been found to be active during the comprehension and production of sentences [20]. 4.2

The Parietal Lobe

4.3 The Temporal Lobe

The parietal lobe contains the somatosensory cortex (Brodmann areas 1, 2, and 3), which, through connections ascending from the thalamus, is responsible for the perception and processing of the sensory stimuli of some parts of the body: touch, sense of position, pressure, pain, and temperature. The parietal cortex contains also the supramarginal gyrus and the angular gyrus (Figs. 7 and 9), which seem to play a fundamental role in the processing of meaning (e.g., deciding whether a noun is animate or inanimate) or in the syllabic processing of words, but also in word formation (morphology) [19]. In recent studies, scholars have highlighted that the somatosensory areas that control the vocal apparatus (the lower parts of areas 1, 2, and 3) are involved in both the perception and production of linguistic sounds. The mapping (topography) of the anatomical parts of the phonatory apparatus—controlled by area 4 of the motor cortex and areas 1, 2, and 3 of the somatosensory cortex—can be seen in detail in the somatotopic map in Fig. 8. Since motor and somatosensory areas are located next to each other and physiologically integrated (the motor output is often guided by the sensorial input), they are on the whole named sensorimotor areas (or cortices). The temporal lobe is the auditory area par excellence. Here, in Heschl’s gyrus is located the primary auditory cortex (areas 41 and 42), and right below is the secondary auditory cortex (area 22) in the

16

Mirko Grimaldi and Cosimo Iaia

Fig. 8 Left: somatotopic map of the motor cortex (Brodmann area 4). Right: somatotopic map of the somatosensory cortex (Brodmann areas 1, 2, 3). (Adapted from [5], originally from [22])

superior temporal gyrus. Neurons in these areas are directly connected to the ear. It is thought that these two cortices have two different roles in processing linguistic sounds: in general, areas 41/42 would be responsible for the acoustic analysis of linguistic sounds, while area 22, with the superior temporal gyrus and the superior temporal sulcus (Fig. 7), would be responsible for their subsequent processing to generate distinct categorical representations for each type of sound processed. A hallmark of auditory cortices is that they have groups of neurons spatially distributed according to the different frequencies that characterize sounds (not only speech sounds). For this reason, the auditory area can be considered a tonotopic map, that is, a neuronal map where acoustic inputs are processed and represented at different points by different groups of neurons (cf. Chap. 13).

From Neurons to Language and Speech: An Overview

17

The temporal cortex also can be seen as an interface between the analysis of speech sounds and meaning: here the association between a sequence of sounds and meaning is likely to occur, forming words. This process is managed by Wernicke’s area (described by the homonymous German neurologist), in the posterior part of area 22: subjects with a lesion in this area can speak fluently, but the organization of meaning is mostly absent and also the understanding of meaning is impaired. Wernicke’s area is connected to Broca’s area via the arcuate fasciculus (Fig. 7). The temporal lobe controls, at the same time, other linguistic functionalities; according to some studies, the area right below the auditory areas (area 21), in the middle temporal gyrus, is involved in the retrieval of information related to the syntactic properties of words, namely, those morpho-syntactic properties that allow groups of words to be merged according to their features in order to generate a well-formed sentence. The processing of these properties is then carried out by areas 45/44 of the frontal lobe [19]. For what concerns nouns and verbs, it has been highlighted that the activation of the temporal and frontal areas is modulated by morphosyntactic processes involving these categories, rather than by lexical categories or morphological peculiarities per se. Thus, it would seem that from a neuronal point of view, verbs and nouns are treated in relation to their denotation and predication properties. In other words, verbs and nouns are firstly retrieved from longterm memory in the non-inflected form (radical for the noun, infinitive for the verb); then, morpho-syntactic relations are elaborated in the temporal area. A following process projects this information into the frontal area where the relationships between verbs and nouns are computed in respect of syntactic and semantic processing [23–25]. Within the Sylvian fissure, in the upper part of Wernicke’s area at the parietal-temporal boundary confining with the angular and supramarginal gyrus, there is a neural circuit that seems to perform sensory-motor transformation for speech, that is, the Sylvian parietal-temporal (Stp) circuit (cf. Figs. 7 and 9). Investigations conducted in the recent years found that this area has multiple functions: on the one hand, it plays the role of a sensorimotor circuit for the vocal tract (a sort of converter of auditory inputs into motor inputs); on the other hand, the Stp functions as a hub of information to and from the auditory areas. Although this neural circuit is not exclusively devoted to language—it is also activated by tone sequences, musical sequences, audio-visual processing of speech, sign language, etc.—its dysfunction causes conduction aphasia, a particular pathology that prevents the repetition of what has been heard (while language comprehension skills remain intact). As the frontal lobe cortical motor system does not work alone, further research has highlighted that the Stp contributes

18

Mirko Grimaldi and Cosimo Iaia

Fig. 9 Simplified representation of cortical structures and cranial nerves involved in language perception and speech production. The lines in bold represent the most important connections between cortical areas. (Adapted from [31])

actively to laryngeal control providing sensory feedback inputs (the frontal and the Stp areas connected through the arcuate fasciculus: Fig. 7). This implies that the human brain has developed the required efferent motor pathway integrated with cortical circuits for controlling those efferent signals. Overall, the Stp may play an important role in language processes by providing a prosodic frame for speech planning [26–28]. 4.4 The Occipital Lobe

Finally, the occipital lobe includes the visual cortices. Areas 17, 18, and 19 are deputed to process visual information coming from the retina and optic nerve (through the thalamus pathways) and initiate the decoding of the physical properties of stimuli: color, brightness, spatial frequency, orientation, and movement. A neuronal pathway connects this area with the temporal lobe in order to transmit information about the properties of stimuli, while another pathway spreads to the parietal lobe to send information concerning movement and position of stimuli in the visual space.

From Neurons to Language and Speech: An Overview

5

19

Assembling the Pieces of the Puzzle: Anatomical or Functional Differences? The picture outlined in the previous paragraphs suggests that the emergence of language and speech is likely due to special properties that human brain developed during evolution. Thus, it was not just anatomical differences that distinguished us from chimpanzees, but rather functional differences. What does this new functionality consist of? Let us think about an orchestra: without the dynamic and continuous synchronization of all the elements that are part of it, even the most beautiful Mozart’s piece would lack the expressiveness and harmony that the composer wanted to give to it. Now, the elements of the orchestra are neurons inside the brain: if groups of cortical and subcortical neurons, which are far from each other, stimulate and are stimulated simultaneously by other populations of neurons, a reentrant process is generated (from top to bottom and vice versa) that gives rise to primary consciousness. Primary consciousness is endowed to animals and allows them to categorize sensory input (perceptual categorization), manipulate mental objects (build mental scenes), generate motor commands, etc. This property, however, is connected with the awareness of the present and the immediate past and is suitable to give adequate responses to stimuli from the surrounding world. In other words, primary consciousness does not allow symbolic and linguistic abilities [29]. At some point the reentrant activity further developed, involving a much larger number of neural connections: the frontal, temporal, and parietal cortices were synchronized with each other—through groups (bundles) of neurons devoted to this task—and all together were synchronized with the limbic system, the basal ganglia, and the cerebellum. Thereby, symbolic thought language emerged, together with higher consciousness. Among vertebrates, only humans possess higher consciousness, that is, states of conscious awareness: we are the only vertebrates having self-consciousness (we are aware of being conscious), and we can talk about our mental states (how we perceive the reality) and “tell” them in various linguistic forms [29]. We have developed a higher consciousness as a result of the emergence of language. For this faculty to be fully realized in a specific language, it was needed that a vast cortical and subcortical network controls the phonatory and auditory systems in a completely new way. Those systems are composed of very ancient structures originally designed to perform rather different functions. As we have seen, the centers controlling breathing, articulation, and larynx are located in older neural structures such as the brainstem and spinal cord, although they are also controlled by the motor cortex of the frontal lobe. We now add the final pieces of the puzzle.

20

Mirko Grimaldi and Cosimo Iaia

Surrounding the midbrain is a group of neurons, called periaqueductal gray matter, which in mammals have a key function in controlling emotions, generating specific respiratory and laryngeal motor activities related to vocalization. A damage to periaqueductal gray matter can cause mutism in humans [30]. Twelve cranial nerves emerge from the brainstem and spinal cord, 8 of which are implicated in the control of the phonatory and auditory systems, and 31 spinal nerves, one-third of which control the respiratory activity, including the diaphragm and intercostal muscles (Fig. 9). The V, VII, IX, XI, and XII cranial nerves manage the muscles of the face and oropharynx, which make key contributions to sound formation. The X cranial nerve, or vagus nerve, provides nearly all the sensory and motor innervations necessary for the functioning of the larynx, as well as sensory information from the thorax and abdomen [30]. The phonatory process involves both sensory (auditory cortices) and motor structures (basal ganglia, sensorimotor cortices, and Broca’s area), as the speaker must continuously check whether mental representations of sounds are consistent with the sequences of sounds produced. In brief, an enormous amount of information (in the form of electrochemical signals) flows from bottom to top and from top to bottom, forming a vast and intricate neural network that reorganized and re-functionalized ancient structures, synchronizing them with more recent structures (the cortex). This network is responsible to reprocess incoming information in a new way, generating new cognitive abilities and an integrated control of thinking, memory, and learning. Short-term memory (information stored for about 1 min) has been synchronized with long-term memory (information stored for days, months, years, or a lifetime) generating continuous computational and representational processes. Thus, it emerges the faculty of retrieving and managing the information necessary to generate the linguistic signs, their recursive combination, and then, the production of sentences. How likely is it that the same faculty of language would emerge in another possible world through the same evolutionary concatenation? The probabilities are close to 0. If, and we cannot rule it out, there are other forms of life in the universe, we can be sure that evolution will have shaped other forms of consciousness, other forms of knowledge of the world, another faculty of language. A very schematic representation of the cortical network involved in language processes can be seen in Fig. 9. The connections with subcortical structures (thalamus, limbic system, and basal ganglia) are not visible: their functionalities are shortly described in Table 1.

From Neurons to Language and Speech: An Overview

21

Table 1 Lower and higher neural structures involved in language perception and production Lower floors

Cerebellum Mental planning (pre-articulatory) of the motor gestures that will shape sounds (connection with area 4 of the frontal lobe). Modulation of the rhythmic structure of a sequence of words. Temporal perception of sounds Thalamus Check for information arriving from the inner ear and transmission to auditory areas. Connections with the part of the motor cortex that controls the larynx and vocal tract movements during phonation Limbic Emotional vocalizations: fear, panic, anger, crying, laughter, etc. system Basal Articulation of sounds. Control of tone and volume of voice. Comprehension of ganglia sentences. Learning processes. Intentional word retrieval. Language selection in bilingual subjects. Articulation of the sounds of a second language not yet acquired. Recognition of emotional intonations

Upper floors

Frontal lobe

Parietal lobe Temporal lobe

6

Area 4: control of the larynx and the tongue. Perception of sounds in which the tongue is predominantly involved (in collaboration with area 6). Area 44/46: articulation of sounds and their assembly into syllables. Area 45/44: elaboration of syntactic relations between words. Area 47/45: processing of semantic relations between words. Areas 9, 8, and 46: comprehension and production of sentences. Areas 1, 2, and 3 (lower parts): perception and production of speech sounds. Supramarginal and Angular gyri: analysis of meaning (animate or inanimate gender nouns), syllabic analysis of words, word formation Areas 41/42 (primary auditory) and area 22 (secondary auditory, connected to Broca’s area through the Arcuate fasciculus): elaboration and categorization of sounds. Association of a sequence of sounds to a concept. Area 21: retrieval of information related to syntactic properties of words that will be processed by area 45/44 of the frontal lobe. Superior planum temporale (Sylvian parietal-temporal area): it transforms auditory input into motor input; conjunction of information from and to the auditory area; laryngeal control

The Neural Architecture of Language The evidence previously discussed clearly suggests a functionalanatomic organization of language and speech in the brain. However, the development of theoretical models that capture this linguistic neural organization turned out to be an arduous challenge. In this final section, we will try to briefly outline a few of them. When neurolinguistic research started, more than a century ago, language was thought to be located exclusively in the left hemisphere, where parceled areas of the brain were involved in language and speech functions. Actually, the classical WernickeLichtheim-Genschwind model was based on the idea that each brain’s region subserves a specific cognitive function: language production and speech comprehension were thought to be subserved by Broca’s and Wernicke’s areas, respectively.

22

Mirko Grimaldi and Cosimo Iaia

Early models of the neurobiology of language were limited by the exclusive reliance on clinical cases and the almost restrictive attention to single word processing [20, 33, 34]. The first studies were conducted on patients suffering from aphasia deficits: to determine the localization of different aspects of language in a certain area, the brain had to be analyzed after the patient’s death [35, 36]. In fact, aphasia syndromes were useful indicators of the function of one area: patients with a deficit in language production manifested a lesion in Broca’s area; patients with a lesion in Wernicke’s region often showed a deficit in language comprehension. Subsequent research has seen a shift in paradigm thanks to the introduction of neuroimaging methods. A central question in the neurobiology of language research has become how brain regions interact with each other, thus adding the functional perspective before discussed [37, 38]. Within this perspective, several models attempted to capture how speech and language are implemented in the brain. 6.1 The Dual Route Model

1 2

A first proposal is based on the interplay between a sensoryconceptual route and a sensory-motor route [39–41]. According to this model, early speech processing involves auditory regions with the superior temporal gyrus being responsible for spectrotemporal analysis and the superior temporal sulcus being responsible for phonological processing [40, 41]. Although both hemispheres are involved in early speech processing, this model points out “computational asymmetry” in hemispheric information processing: on the one hand, the left and right hemispheres might selectively process temporal versus spectral information, respectively [42–44]; on the other hand, the left and right hemispheres might process short versus long temporal information, resulting in information sampling at high and low frequencies (“asymmetric sampling in time”) [45–47]. Finally, a trade-off has been suggested between spectral and temporal information, such that the higher sensitivity to (short) temporal windows in the left would be associated with a lower sensitivity to spectral modulations and vice versa [41, 42, 48]. Importantly, at the core of the model are two streams that relate to sensory-conceptual and sensory-motor aspects of speech and language: a ventral1 bilateral stream, more spread over the temporal lobe, oversees lexical access and semantic processes (“sound to meaning”); structures in the posterior frontal lobe and in the posterior planum temporale constitute a mostly leftlateralized dorsal2 stream which is responsible for sensory-motor integration (“sound to action”) [40, 41].

With the term “ventral,” we refer to areas or structures located toward the bottom of the brain. With the term “dorsal,” we refer to areas or structures located toward the top of the brain.

From Neurons to Language and Speech: An Overview

23

6.2 AnatomicalFunctional Pathways

Another proposal takes into account lexical-syntactic operations and is founded on the hypothesis that four anatomical pathways connect brain regions genetically involved in language within the frontal and the temporal cortices [33]: these connections are fiber bundles that trace two dorsal pathways and two ventral pathways, with different functions [49, 50]. According to this proposal, superior temporal gyrus and middle temporal gyrus are dorsally connected to the premotor cortex via the parietal cortex for “sensory-to-motor mapping” [33, 51]. Syntactic integration at the phrasal level is subserved by a ventral network which recruits the anterior superior temporal gyrus and the frontal operculum connected by the uncinate fasciculus [33]. A second dorsal pathway connects BA 44 and the posterior superior temporal gyrus via the arcuate fasciculus, and it is responsible for more complex syntactic operations, such as syntactic hierarchy building [33]. Finally, BA 45 and BA 47 are ventrally connected to the superior temporal gyrus via the inferior fronto-occipital fasciculus (IFOF) supporting semantic processes [33, 51]. An important distinction made by this model is between the posterior and anterior parts of BA 45: the first, which is closely located to BA 44, would be responsible for syntactic processing, while the latter, adjacent to BA 47, would support semantic processing, thus making BA 45 an integration area where syntactic and semantic information interface with each other [52]. Finally, BA 44 is proposed to be divided in a ventral and a dorsal part, with the ventral part being highly specialized for syntactic processing and functionally different from the frontal operculum [53, 54], while the dorsal part is involved in phonological processing [52, 54].

6.3 Memory, Unification, and Control

Shifting from single word processing to sentence processing, the Memory, Unification, and Control model [20, 34, 54] relies on the interplay of three major components: information pertaining to linguistically meaningful units (phonological, semantic, and syntactic) would be reflected in the activation of temporal and parietal cortices, based on the type of representation required during memory retrieval and language processing [20, 34]. Once the information is retrieved from memory, minimal units of language (i.e., lexical items) are combined into larger structures, such as phrases. Following Jackendoff’s theory, access to linguistic levels (syntax, phonology, semantics) is supported by a combinatorial process, referred to as Unification [55]: the involvement of the left inferior frontal gyrus and the left temporal cortex defines the “Unification Space” where the combinatorial process occurs [20, 34, 54]. More specifically, phonological, syntactic, and semantic unification are, respectively, subserved by Brodmann areas 44/6, areas 45/44, and areas 47/45 [20, 54, 34] (cf. Figs. 6 and 9). Furthermore, a “frequency-segregation” approach has been proposed to track

24

Mirko Grimaldi and Cosimo Iaia

neural dynamics during sentence processing to finely disentangle overlapping neural structures involved in syntactic and semantic processes [56–59]. Finally, since language is generally used in an appropriate context during conversational interactions, an attentional control system would be supported by the dorsolateral prefrontal cortex, the anterior cingulate cortex, and parts of the parietal cortex [20].

7

Conclusion and Future Challenges In this chapter, we outlined basic principles and processes that have led the human brain to control language and speech on the base of unique biological properties. A key issue is that, during human evolution, old structures in the lower part of the brain (originally developed to control basic physiological functions) have been re-functionalized to perform new cognitive processes thanks to ascending synchronized connections with the cerebral cortex. In this way, vast and intricate bottom-up and top-down processes emerged: as consequence, a sophisticated neural network was developed where distant brain areas are continuously interconnected to oversee thinking, memory, consciousness, and learning processes. When this neural network has begun to control the vocal apparatus in a new way, our ancestors were provided with an exceptional instrument able to externalize and share mental states in the form of language and speech. As we have seen in the previous sections, the neurobiology of language has made remarkable progress in understanding the functionality of the neural systems involved in language and speech processing; in recent times, enormous efforts have been made to characterize the connectome of human language (cf. [60, 61]). However, little progresses have been made in deeply understanding how linguistic entities (phonemes, syllable, nouns, verbs, etc.), their internal structures, and the relationships developed in the continuous embedding of the elements during the construction of sentences are related to neural computation and representational processes (cf. Parts III and IV of this volume). For many years, we tried to address this issue cautiously handling experimental variables, conditions, and stimuli within a laboratory setting. The stimuli used were not, of course, representative of naturalistic language and speech acts, since experimental paradigms were generally focused on phonemes, words, sentences, and so on, in isolation. Recently, we have seen the rapid advancement of neuroscientific methods useful to investigate natural language using complete narrative stories or audio-book chapters as stimuli (or other linguistic material ad hoc developed). This approach permits to simulate (at least at the level of natural language perception) the same conversational context as it would in real life, outside of an

From Neurons to Language and Speech: An Overview

25

experimental controlled setting (cf. [62] and the articles published in the special issue). Importantly, using natural stimuli we can look at how the different levels of language (from phonemes to discourse and pragmatic phenomena) are integrated online at different timescales. This opportunity is offered by another approach that in the recent years has enriched the neurophysiological toolbox, that is, the event-related fluctuations in rhythmic, oscillatory electroencephalography/magnetoencephalography activity (cf. Chap. 8). Thus, the linguistic timescales may be correlated with different frequencies of oscillatory activity in the brain that, in turn, may correspond to linguistic computations and representations that are hierarchically and dynamically organized. Crucially, moving from isolated sound/word/sentence experiments to larger contexts opens to enormous possibilities of new research questions that cannot be studied using traditional designs. In support of this new perspective of investigation, there are computational models, previously only available for the analysis of behavioral data, which now allow the inclusion of large numbers of predictor variables in neuroimaging and neurophysiological analyses (cf. Chap. 2). Such tools can be used to estimate linguistic predictions, model linguistic features, and specify a sequence of processing steps that may be quantitatively fit to neural signals collected while participants are actively involved in language processing. Progress has been helped by advances in machine learning, attention to linguistically interpretable models, and openly shared data sets that allow researchers to compare and contrast a variety of models (cf. [63]). If we are able to take advantage from these new avenues and to take up the challenges posited by this field of investigation, we will have high probabilities that in the future the brain will reveal us the secret of the nature of language and speech kept jealously hidden. References 1. Krol LR. https://commons.wikimedia.org/ wiki/File:Action_potential_schematic.svg. Accessed 19 Dec 2022 2. Kandel ER, Koester JD, Mack SH et al (eds) (2021) Principle of neuroscience, 6th edn. McGraw Hill, New York 3. https://pixabay.com/it/illustrations/disegnocellula-nervosa-neuroni-730778/. Accessed 19 Dec 2022 4. Ackermann H, Brendel B (2016) Cerebellar contributions to speech and language. In: Hickok G, Small SL (eds) Neurobiology of language. Academic Press, Cambridge, pp 73–84 5. Luo L (2016) Principles of neurobiology. Garland Science, New York

6. https://commons.wikimedia.org/wiki/File: Brain_headBorder.jpg. Accessed 19 Dec 2022 7. Biaigo I. https://commons.wikimedia.org/ wiki/File:Brain_latino.jpg. Accessed 19 Dec 2022 8. Lieberman P (2009) Human language and our reptilian brain: the subcortical bases of speech, syntax, and thought. Harvard University Press, Harvard 9. Crosson B, McGregor K, Gopinath KS et al (2007) Functional MRI of language in aphasia: a review of the literature and the methodological challenges. Neuropsychol Rev 17:157–177. https://doi.org/10.1007/s11065-0079024-z

26

Mirko Grimaldi and Cosimo Iaia

10. Abutalebi J, Della Rosa PA, Gonzaga AKC et al (2013) The role of the left putamen in multilingual language production. Brain Lang 125(3):307–315 11. Fisher SE (2016) A molecular genetic perspective on speech and language. In: Hickok G, Small SL (eds) Neurobiology of language. Academic Press, Cambridge, pp 13–24 1 2 . h t t p s : // b a s i c m e d i c a l k e y. c o m / h i g h e rfunctions-of-the-nervous-system/. Accessed 19 Dec 2022 13. Brodmann K (1909) Vergleichende Lokalisation lehre der Grosshirnrinde in ihren prinzipien Dargestellt auf Grund des Zellenbaues. Barth, Leipzig 14. Bruner E (2022) A network approach to the topological organization of the Brodmann map. Anat Rec 305(12):3504–3515. https:// doi.org/10.1002/ar.24941 15. https://commons.wikimedia.org/wiki/File: Brodmann_areas.jpg. Accessed 19 Dec 2022 16. Simonyan K, Ackermann H, Chang EF et al (2016) New developments in understanding the complexity of human speech production. J Neurosci 36(45):11440–11448. https://doi. org/10.1523/JNEUROSCI.2424-16.2016 17. Simonyan K (2014) The laryngeal motor cortex: its organization and connectivity. Curr Opin Neurobiol 28:15–21. https://doi.org/ 10.1016/j.conb.2014.05.006 18. Skipper JI, Devlin JT, Lametti DR (2017) The hearing ear is always found close to the speaking tongue: review of the role of the motor system in speech perception. Brain Lang 164:77–105. https://doi.org/10.1016/ j.bandl.2016.10.004 19. Grimaldi M (2012) Toward a neural theory of language: old issues and news perspectives. J Neurolinguistics 24(5):304–327 20. Hagoort P (2016) MUC (Memory, Unification, Control) a model on the neurobiology of language beyond single word processing. In: Hickok G, Small SL (eds) Neurobiology of language. Academic Press, Cambridge, pp 339–347 21. https://www.pinterest.it/pin/268667933 996501566/. Accessed 27 Dec 2022 22. Penfield W, Rasmussen T (1950) The cerebral cortex of man: a clinical study of localization of function. Macmillan, Oxford 23. Crepaldi D, Berlingeri M, Paulesu E et al (2011) A place for nouns and a place for verbs? A critical review of neurocognitive data on grammatical-class effects. Brain Lang 116(1):33–49 24. Vigliocco G, Vinson DP, Druks J et al (2011) Nouns and verbs in the brain: a review of

behavioural, electrophysiological, neuropsychological and imaging studies. Neurosci Biobehav Rev 35(3):407–426 25. Lukic S, Borghesani V, Weis E et al (2021) Dissociating nouns and verbs in temporal and perisylvian networks: evidence from neurodegenerative diseases. Cortex 142:47–61 26. Pa J, Hickok G (2008) A parietal–temporal sensory–motor integration area for the human vocal tract: evidence from an fMRI study of skilled musicians. Neuropsychologia 46(1): 362–368 27. Hickok G, Okada K, Serences JT (2009) Area Spt in the human planum temporale supports sensory-motor integration for speech processing. J Neurophysiol 101:2725–2732 28. Hickok G (2017) A cortical circuit for voluntary laryngeal control: implications for the evolution language. Psychon Bull Rev 24:56–63 29. Edelman GM, Tononi G (2000) A Universe of Consciousness. How Matter Becomes Imagination, New York: Basic Books. 30. Davis PJ, Zhang SP, Winkworth A et al (1996) Neural control of respiration: respiratory and emotional influences. J Voice 10:23–38 31. Kreiman J, Sidtis D (2011) Foundations of voice studies: an interdisciplinary approach to voice production and perception. WileyBlackwell, New York/London. https://doi. org/10.1002/9781444395068 32. Grimaldi M (2019) Il cervello fonologico. Carocci, Roma 33. Friederici AD (2016) The neuroanatomical pathway model of language: syntactic and semantic networks. In: Hickok G, Small S (eds) Neurobiology of language. Academic Press, Cambridge, pp 349–356 34. Hagoort P (2013) MUC (Memory, Unification, Control) and beyond. Front Psychol 4. https://doi.org/10.3389/fpsyg.2013. 00416 35. Broca P (1861) Remarques sur le sie`ge de la faculte´ du langage articule´, suivies d’une observation d’aphe´mie (parte de la parole). Bulletins de la Socie´te´ Anatomique de Paris 6:330–357 36. Wernicke C (1874) Der aphasische Symptomencomplex. Springer-Verlag, Berlin 37. Kopell NJ, Gritton HJ, Whittington MA et al (2014) Beyond the connectome: the dynome. Neuron 83:1319–1328 38. Murphy E (2015) The brain dynamics of linguistic computation. Front Psychol 6. https:// doi.org/10.3389/fpsyg.2015.01515 39. Hickok G, Poeppel D (2004) Dorsal and ventral streams: a framework for understanding

From Neurons to Language and Speech: An Overview aspects of the functional anatomy of language. Cognition 92:67–99 40. Hickok G, Poeppel D (2007) The cortical organization of speech perception. Nat Rev Neurosci 8:393–402 41. Hickok G, Poeppel D (2016) Neural basis of speech perception. In: Hickok G, Small SL (eds) Neurobiology of language. Academic Press, Cambridge, pp 299–310 42. Zatorre RJ, Belin P, Penhune VB (2002) Structure and function of auditory cortex: music and speech. Trends Cogn Sci 6:37–46 43. Obleser J, Eisner F, Kotz SA (2008) Bilateral speech comprehension reflects differential sensitivity to spectral and temporal features. J Neurosci 28(32):8116–8123 44. Albouy P, Benjamin L, Morillon B, Zatorre RJ (2020) Distinct sensitivity to spectrotemporal modulation supports brain asymmetry for speech and melody. Science 367:1043–1047 45. Poeppel D (2003) The analysis of speech in different temporal integration windows: cerebral lateralization as “asymmetric sampling in time”. Speech Commun 41:245–255 46. Flinker A, Doyle WK, Mehta AD et al (2019) Spectrotemporal modulation provides a unifying framework for auditory cortical asymmetries. Nat Hum Behav 3(4):393–405 47. Giroud J, Tre´buchon A, Scho¨n D et al (2020) Asymmetric sampling in human auditory cortex reveals spectral processing hierarchy. PLoS Biol 18(3):e3000207 48. Norman-Haignere SV, Long LK, Devinsky O et al (2022) Multiscale temporal integration organizes hierarchical computation in human auditory cortex. Nat Hum Behav 6(3): 455–469 49. Friederici AD (2009) Pathways to language: fiber tracts in the human brain. Trends Cogn Sci 13:175–181 50. Friederici AD (2009) Allocating function to fiber tracts: facing its indirectness. Trends Cogn Sci 9:370–371 51. Saur D, Kreher BW, Schnell S et al (2008) Ventral and dorsal pathways for language. PNAS 105:18035–18040. https://doi.org/ 10.1073/pnas.0805234105 52. Friederici AD (2017) Language in our brain: the origins of a uniquely human capacity. MIT Press, Cambridge

27

53. Zaccarella E, Friederici AD (2015) Merge in the human brain: a sub-region based functional investigation in the left pars opercularis. Front Psychol 6:1818 54. Hagoort P (2005) On Broca, brain, and binding: a new framework. Trends Cogn Sci 9:416– 423 55. Jackendoff R (2002) Foundations of language: brain, meaning, grammar, evolution. Oxford University Press, Oxford 56. Bastiaansen M, Magyari L, Hagoort P (2010) Syntactic unification operations are reflected in oscillatory dynamics during on-line sentence comprehension. J Cogn Neurosci 22(7): 1333–1347 57. Bastiaansen M, Hagoort P (2015) Frequencybased segregation of syntactic and semantic unification during online sentence level language comprehension. J Cogn Neurosci 27(11):2095–2107 58. Lewis AG, Bastiaansen M (2015) A predictive coding framework for rapid neural dynamics during sentence-level language comprehension. Cortex 68:155–168 59. Grimaldi M (2019) From brain noise to syntactic structures: a formal proposal within the oscillatory rhythm perspective. In: Franco L, Lorusso P (eds) Linguistic variation: structure and interpretation. De Gruyter Mouton, Berlin/Boston, pp 293–316. https://doi.org/10. 1515/9781501505201-017 60. Rolls ET, Deco G, Huang CC et al (2022) The human language effective connectome. NeuroImage 258:119352 61. Roger E, De Almeida LR, Loevenbruck H et al (2022) Unraveling the functional attributes of the language connectome: crucial subnetworks, flexibility and variability. NeuroImage 263:119672 62. Hauk O, Weiss B (2020) The neuroscience of natural language processing. Lang Cogn Neurosci 35(5):541–542. https://doi.org/10. 1080/23273798.2020.1761989 63. Hale JT, Campanelli L, Li J, Bhattasali S et al (2022) Neurocomputational models of language processing. Annu Rev Linguist 8(1): 427–446

Chapter 2 How the Brain Works: Perspectives on the Future of Human Neuroscience Research Ramesh Srinivasan Abstract Many books and review articles have been written about the structure and function of the brain. That is beyond the scope of this one chapter or, indeed, of this entire volume. Instead, this chapter will emphasize general principles and salient features of human brains and attempt to provide a useful theoretical framework for considering how to design experiments and perform data analysis that yields a better understanding of the vast amounts of information being obtained about the function of the human brain by neuroscientists using EEG, MEG, ECoG, fMRI, NIRS, etc. The details of these techniques can be found in other chapters in this volume, and I will not consider the strengths and weaknesses of these methods in detail. Rather, I will try to assess what kind of questions we can ask with non-invasive measures of brain function in humans. In my view, the field of cognitive neuroscience has to grow beyond the marriage of experimental psychology to brain mapping, and I consider some potential directions. Key words Theoretical neuroscience, Neurocognitive models, Complex systems

1

A Brief Quantitative Anatomy of the Human Brain The three primary divisions of the human brain are the brainstem, cerebellum, and cerebrum. The brainstem (the brain’s stalk) is the structure through which nerve fibers relay signals (action potentials) in both directions between the spinal cord and higher brain centers. The thalamus, composed of two egg-shaped structures at the top and to the side of the brainstem, is a relay station and important integrating center for all sensory input to the cortex except smell. The cerebellum, which sits on top and to the back of the brainstem, has long been associated with the fine control of muscle movements. More recently, the cerebellum has been shown to play additional roles in cognition, especially learning. The large part of the brain that remains when the brainstem and cerebellum are excluded is the cerebrum, which is divided almost equally into two halves. The outer portion of the cerebrum, the cerebral cortex (or neocortex in mammals), is a folded structure

Mirko Grimaldi et al. (eds.), Language Electrified: Principles, Methods, and Future Perspectives of Investigation, Neuromethods, vol. 202, https://doi.org/10.1007/978-1-0716-3263-5_2, © Springer Science+Business Media, LLC, part of Springer Nature 2023

29

30

Ramesh Srinivasan

varying in thickness from about 2–5 mm, having a total surface area (in humans) of roughly 1600–4000 cm2, and containing about 1010 neurons (nerve cells) [1]. Cortical neurons are strongly interconnected. For example, the surface of a large cortical neuron may be covered with as many as 104–105 synapses that transmit inputs from other neurons. The synaptic inputs to a neuron are of two types: those which produce excitatory postsynaptic potentials (EPSPs) across the membrane of the output neuron, thereby making it easier for the target neuron to fire an action potential, and the inhibitory postsynaptic potentials (IPSPs), which act in the opposite manner on the output neuron. EPSPs produce local membrane current sinks with corresponding distributed passive sources to preserve current conservation. IPSPs produce local membrane current sources with more distant distributed passive sinks. The cortex is also believed to be the structure that generates most of the electric potential measured on the scalp with EEG and the magnetic field recorded with MEG [2]. Much of our conscious experience must involve, in some largely unknown manner, the interaction of cortical neurons. The cortex is composed of gray matter, so-called because it contains a predominance of cell bodies that turn gray when stained by anatomists; but gray matter is actually pink when alive. Just below the gray matter is a second major region, the white matter, composed of nerve fibers (axons). In humans, white matter volume is somewhat larger that of the neocortex. White matter interconnections between cortical regions (association fibers or cortico-cortical fibers) are quite numerous. A patch at the boundary of gray and white matter of area one cm2 may contain 107 input and output fibers, mostly cortico-cortical axons interconnecting different regions of the cortex. Early attempts to map these connections in humans were relatively rare studies in deceased brains [3, 4]. Recent advances in neuroimaging have allowed for in vivo estimates of some of the white matter connections of the brain using diffusion-weighted imaging. Figure 1 shows an example of a structural connectome estimated by a tractography analysis of the group average of diffusion imaging from 842 subjects from the HCP 842 dataset [5]. We combined the resultant streamlines (estimated axon fiber bundles) with the Lausanne parcellation [6] to define 114 cortical regions of interest (ROI) and identify the corticocortical and callosal connectivity between these regions. A much smaller fraction (perhaps less than 1%) of axons that enter or leave the underside of the human neocortical surface radiates from the thalamus (thalamocortical fibers) [7]. This fraction is only a few percent in humans, but substantially larger in lower mammals [1]. This difference partly accounts for the strong emphasis on thalamocortical interactions (versus cortico-cortical interactions), in the animal electrophysiological literature. The extreme dominance of cortico-cortical over thalamocortical

How the Brain Works: Perspectives on the Future of Human Neuroscience Research

31

Fig. 1 Structural connectome: We show the streamlines derived by probabilistic tractography analysis of diffusion-weighted imaging of 842 individuals [5] on the left and the structural connectome for the 114 areas of the Lausanne parcellation of the cortex [6] on the right. We have labeled a subset of areas each with one to three subdivisions (see [6] for all subdivisions of the Lausanne parcellation). We show the binarized structural connectome, with any non-zero edge being shown in yellow

connections may be the critical distinction of human brains. The implications of this admittedly oversimplified anatomical picture are that in humans, the connections between neocortical neurons are mostly with neurons in other regions of the neocortex. The brain is mostly operating on input signals emanating from other areas of the brain rather than sensory inputs and mostly sending outputs to other parts of the brain rather than the motor systems of the body!

2

Circuits, Networks, and Fields A common view of brain operation is that of a complex circuit or neural network. In this view, groups of cortical cells are imagined as analogous to electric circuit elements. Cortical columns, i.e., organization of cortical activity across the layers of the cortex, at several scales are candidates for such local cell groups (cf. Fig. 5 in Chapter 1). The smallest scale is a minicolumn which has been identified as (potentially) the smallest processing unit in the brain [8–10]. When a sensory signal enters the cortex via the thalamus, it activates a canonical circuit involving interactions between excitatory pyramidal cells and both excitatory and inhibitory

32

Ramesh Srinivasan

interneurons, producing a functional unit with both strong excitation across layers and inhibition of surrounding tissues. The details of this fundamental structure have been elucidated over the past 40 years by a number of prominent investigators. The most important point is that columnar organization is a functional property of the activity of neurons rather than a fixed wiring diagram like a circuit. Columns form and dissipate dynamically in response to input to the cortex that excites pyramidal cells and interneurons. The interneurons in a minicolumn have axons that remain within the white matter and only synapse of neighboring cells. The pyramidal cells within a minicolumn have axons with diverse targets as discussed below. Neocortical neurons within each cerebral hemisphere are connected by short intracortical fibers with axon lengths mostly less than 5 mm. The macrocolumn of typical radius of 3 mm reflects the extent of the intracortical axons of pyramidal cells and is another level of circuit definition. The macrocolumn typically contains hundreds or thousands of minicolumns which often share functional properties. For example, in the visual cortex, the macrocolumn (or hypercolumn, per [11]) contains all the cells that respond to different stimulus features in one location of space. The macrocolumn is potentially a much more useful functional unit than the minicolumn for the modeling of human brain data. Macrocolumns are comparable in size to a typical voxel (5 mm3) in MRI research. Macrocolumns are also a reasonable size of tissue for modeling the current sources of EEG/MEG as a dipole source [2]. In addition to the intracortical axons, each pyramidal cell projects an axon which enters the white matter and synapses at one (or more) distant locations in the brain. Thus, each of the pyramidal cells in the cortex receives excitatory input from (possibly many) cells at other locations. If this input is sufficient to depolarize the cell, the cell transmits action potentials to other locations in the brain. Thus, the neocortex is densely interconnected by about 1010 cortico-cortical axons with axon lengths in roughly the 1–15 cm range. Cross-hemisphere interactions occur by means of about 108 callosal axons through the corpus callosum and several smaller structures connecting the two brain halves. A comparably small number of fibers project to and from subcortical structures such as the thalamus and basal ganglia. The cortico-cortical, callosal, and subcortical axons might then be analogous to wires connecting the circuit elements. In this oversimplified (and probably mostly wrong) electric network picture, “circuit elements” are also under external control by means of electrical and chemical input from the brainstem neuromodulatory systems. More detailed computational models that retain the essential aspects of this picture but provide more intricate anatomical details are still gross approximations. For one thing, even a single neuron is far more complex and diverse

How the Brain Works: Perspectives on the Future of Human Neuroscience Research

33

than the most complex model of neural networks likely to be created in the near future [12]. Transmission times for action potentials along cortico-cortical axons may range from roughly 10–30 ms between the most remote cortical regions. Local delays due to capacitive-resistive properties of single neurons are typically in the 1–10 ms range, but may also be longer. As the brain’s awareness of an external event seems to require multiple feedback between remote regions [13], perceptual consciousness may take several hundred ms to develop as a consequence of brain network activity. The multiple mechanisms by which neurons interact across and within brain areas in integrative brain functions are often labeled by the term cell assemblies [14]. The label “cell assembly” denotes a diffuse cell group capable of acting briefly as a single structure. We may reasonably postulate cooperative activity within cell assemblies without explicitly specifying interaction mechanisms or relying on the specificity of a neural network. Brain processes may involve the formation of cell assemblies at several spatial scales [1, 15]. At smaller spatial scales, corresponding to recordings from individual cells, such groups of neurons may be described by neural network models that can incorporate details of physiologically realistic features such as feedforward and feedback connections [9, 16]. Such anatomical specificity may not have a direct bearing on recordings at a macroscopic scale of human neural data. That is, while these detailed anatomical models (largely derived from animal models) have a strong influence on the behavior of individual cells, they may not be easily related to the coarsegrained variables at macrocolumn or larger spatial scales accessible in non-invasive recordings in human subjects. Even the coarsegrained measures of anatomy such as structural connectome shown in Fig. 1 and network dynamics measured in EEG, MEG, and fMRI signals have complex relationships which are an active field of research. Field descriptions of brain dynamics may be required to model dynamic behavior and make contact with macroscopic data measured in humans such as EEG, MEG, or fMRI. In this context, the word “field” refers to mathematical functions expressing, for example, the numbers of active synaptic or action potentials in macroscopic tissue volumes. Alternatively, probability of neural firing in a tissue mass may be treated as a field variable. In this view, cell assemblies are pictured as embedded within synaptic and action potential fields [1, 17]. Electric and magnetic fields (EEG and MEG) provide large-scale, short-time measures of the modulations of synaptic and action potential fields around their background levels. Similarly, fMRI or fNIRS provides information about the modulation of blood flow or oxygen consumption from a background level. These fields are analogous to common physical fields, for example, sound waves, which are short-time modulations of

34

Ramesh Srinivasan

pressure or mass density about background levels. We distinguish these short-time modulations of synaptic activity from long-time scale (usually minutes but sometimes seconds) modulations of brain chemistry controlled by neuromodulators.

3

Relationship Between Brain Structure and Measurements of Brain Function Figure 2 shows a conceptualization of the complexity of relating brain measurements in humans (fMRI and EEG) to each other and to behavior [2]. If we imagine there are cell assemblies distributed in different cortical regions that give rise to behavior, with an fMRI or EEG experiment, we can establish correlations between the behavior and the fMRI and/or EEG signals. Both fMRI and EEG are spatial and temporal filtered representations of the activity of the cell assemblies with the details of the filtering depending on the specific characteristics of the recording method. For instance, it is well known that the EEG has excellent temporal resolution but poor spatial resolution, providing a representation of spaceaveraged synaptic activity [2] (see however the discussion in Chapter 7). This makes EEG especially sensitive to synchronous synaptic activity in populations of neurons and insensitive to asynchronous activity. fMRI is sensitive to the metabolic demand and consequent blood flow also resulting from synaptic activity of the cell assemblies; however, some of the cell groups contributing to fMRI, e.g., inhibitory basket cells, produce no external electric field. Thus, in general, different cell groups can be expected to generate the EEG or fMRI signals.

Fig. 2 Conceptual framework for brain signals in cognitive experiments. Double arrows indicate experimental correlative relationships between behavior/cognition and EEG, MEG, MRI, or PET. By definition, Cell Groups 1 generate EEG or MEG, and Cell Groups 2 generate MRI or PET. While theoretical models of Cell Groups 1 are well developed (see [2]), Cell Groups 2 are not known, but are the subject of intensive study in animal models [18]. But in actuality, there exist unknown cell assemblies that underlie the behavior/cognition which are not directly accessible with either recording technique. Cell Groups 1 and 2 may be part of this cell assembly or may be influenced by this cell assembly producing the observation of correlations between EEG and fMRI signals and behavior/cognition

How the Brain Works: Perspectives on the Future of Human Neuroscience Research

35

Early efforts to combine separately measured EEG and fMRI signals focused on using source localization of EEG signals to obtain information about the dynamics of each fMRI activation. This is a spatial model that assumes that the cell groups generating EEG and fMRI signals are at identical positions. Methods ranging from the simple (equivalent dipoles) to sophisticated (distributed Bayesian solutions with fMRI informed priors) have been developed and applied to localize EEG signals to the activation sites detected with fMRI. Although the technical problems of EEG inverse solutions remain a formidable challenge, this approach suffers from far more significant conceptual problems. As indicated in Fig. 2, fMRI and EEG are recording from different cell groups, and there is no reason to expect a simple spatial correspondence between the cell groups that generate fMRI and EEG signals. Even with the same stimulus and task conditions, EEG and fMRI emphasize different neural populations and may lack substantial spatial or temporal overlap. Structure-function relationships for macroscopic field variables have been the subject of intense study with the discovery of diffusion-weighted imaging, which provides estimates of the structural connectivity at a macroscopic scale (as shown in Fig. 1) which is more readily comparable to fMRI or EEG/MEG data. The integration of such structural and functional data is a crucial step in establishing the physiological basis of network models of brain function.

4

What Does Localization of “Brain Activity” Really Mean? A considerable amount (perhaps the majority) of cognitive neuroscience research is concerned with documenting the relationship between “brain activity” and cognitive functions usually by obtaining experimental evidence that the signal recorded from some region of the brain has been modulated by a cognitive task. Clever task manipulations, gleaned from experimental psychology, are used to generate contrasts for statistical tests to associate brain activity with hypothesized cognitive processes. This “spatial” model of brain function takes an overly simplistic view of brain networks as a series of “activations” in brain areas – the strength of the fMRI is to tell us where to find these activations, and the job of EEG (or usually evoked responses, ERP) is to tell us when the activation occurred. However, a fundamental unknown (and in many cases unknowable) in any neurophysiological study is whether observed modulations of neural responses at one location in the brain by cognitive processes should be interpreted only as the action of a local network in the specific cortical region or due to the interactions between this cortical region and the rest of the brain in global

36

Ramesh Srinivasan

networks. Non-local interactions between cortical regions are mainly mediated by connections of the cortico-cortical (also labeled association) fibers. The axons range in length from less than 1 cm (the U fibers connecting adjacent gyri) to the total length through the white matter between frontal and occipital lobes. The total number of cortico-cortical fibers is roughly equal to the number of pyramidal cells, about 1010. Thus, the issue of global networks interacting in cognitive processes is salient to the interpretation of physiological signals obtained from the brain with any technique – EEG, MEG, fMRI, LFPs, or even spiking activity of a single neuron. The physiology and anatomy of the brain indicate that our model of the underlying cognitive processes should favor global networks over local networks unless there is strong evidence of functional localization. This is the great strength of human neuroscience! Studies in animal models necessarily place electrodes in a limited number of hypothesized brain regions, while human neuroscience research views the function of the whole brain, allowing us to understand how brain networks give rise to intelligent brain function. This dense network connectivity suggests that it is far less common for brain function to be a purely local operation in one location of the brain. Examples of such local processing might include feature extraction in the processing of sensory systems. However, if we consider the entire processing stream involved in figure-ground segregation of (perceptual) objects such as a written or spoken word, we find that the processing involves feedforward and feedback processing along anatomical pathways linking neurons distributed in different cortical areas into a functional network [13]. Thus, even “low-level” perceptual and motor processes involve distributed brain function in hierarchically organized neural systems whose complexity is beyond the simple measures of localized brain activity that predominate studies of brain function. The future of human brain research is in the study of whole brain systems.

5

Neurocognitive Models The study of the human brain has been closely linked to advances in cognitive science, which provides the theoretical foundation for studies of human brain function. In recent years, cognitive science has married mathematical theories of behavior with experimental data via computational modeling. More recently, neurocognitive models have been developed that formally integrate neural signals into mathematical theories of cognitive function [19]. Computational models of behavior usually propose a mechanistic or algorithmic description of the computations that may be happening in the brain to support behavior. These models usually

How the Brain Works: Perspectives on the Future of Human Neuroscience Research

37

have parameters (e.g., drift rate or learning rate) that quantitatively modulate the computations made by the model. Model fitting techniques allow us to infer the parameters that are most likely to give rise to the observed behavior. Then, given a set of parameters for a model, it is also possible to obtain the latent variables that are part of the models’ computations and putatively are the underlying variables needed to account for the observed behavior. Thus, cognitive modeling may provide two types of benefits to relate behavior and neural signals. First, fitting computational models to behavioral data allow researchers to extract model parameters that are related to mechanisms underlying behavior – rather than an implicit specification of a cognitive process being manipulated, an explicit model is made of how the behavioral data is generated. These parameters may or may not capture variability (between conditions or individuals) better than raw behavioral data, but are more scientifically meaningful as the generative mechanisms are specified. Second, model fitting also allows researchers to extract latent variables that putatively reflect the computations supporting behavior. These variables may then be better candidates to reflect the trial-by-trial neural signal. Perhaps the most active area of neurocognitive modeling are models of perceptual decision-making [20, 21]. The drift-diffusion model (DDM) is a specific model of perceptual decision-making that has been integrated with neural signals. The DDM is used to account simultaneously for accuracy and reaction time observations in binary perceptual decision tasks, such as the random dot motion task [22]. Specifically, the DDM formalizes decision as a noisy accumulation of evidence to one of two bounds; it assumes that once the decision variable reaches the bound, the corresponding choice is made. The DDM is usually parameterized with three parameters: non-decision time, drift rate, and decision threshold. The non-decision time reflects a fixed period of time during which no information is accumulated; mechanistically, it may include both initial perception latency and motor command latency after the decision is made. The drift rate reflects the rate at which information is accumulated or the strength of each new piece of evidence. The threshold indicates the level the evidence should reach prior to a decision being taken. Other parameters are sometimes included in the DDM to better capture behavior; for example, a bias term may be needed to capture participants’ tendency to select one option more than another. Using the DDM of quick decision-making as an example, single-trial estimates of evidence accumulation rate during quick decision-making and non-decision time (time in milliseconds of a human reaction time not related to a decision) have been obtained using hierarchical Bayesian modeling with ERP amplitude estimates on single trials, time-locked to the onset of visual stimuli [23, 24]. Hierarchical Bayesian modeling (HBM) of human

38

Ramesh Srinivasan

cognition is one of the most powerful methods to integrate EEG and behavior, since these datasets are linked with respect to the cognitive function specified by the model and shared relationships are estimated simultaneously. The hierarchical Bayesian modeling (HBM) framework is ideally suited for the joint analysis of multiple modes of data. In addition, the EEG data can also provide new and additional information about the cognitive process that cannot be discerned with just behavior alone. This flexible framework can inform building and testing theoretical models of the relationship of neural signals from the human cortex (EEG, fMRI, etc.), human cognition, and human behavior. Figure 3 summarizes the results of our studies linking EEG signals to diffusion models of perceptual decision-making. ERP measures described trial-to-trial differences in visual encoding time (a component of non-decision time during reaction time) and trial-to-trial differences in evidence accumulation rate, as described by trial-level estimates of the drift parameter [23, 24]. EEG correlates of additional cognitive processes, such as visual attention, can also add inference about the overall human cognitive process when used in combination with behavioral modeling. Nunez et al. (2015) [26] found evidence that differences in experimental participants’ attention (both visual noise suppression and visual signal enhancement) as measured by SSVEPs related to some specific differences in participants’ cognition during decisionmaking. Lui et al. 2020 [25] showed that the duration of decisionmaking is indexed by the readiness potential in the motor cortex. Hierarchical Bayesian modeling also allows discovering complex relationships between multiple data types within cognitive neuroscience [27] by allowing the simultaneous estimation of posterior distributions of multiple parameters. Fitting procedures produce samples from probability distributions that display knowledge (i.e., “uncertainty”) about parameter estimates and thus certainty about the effects of cognition or neural data in specific theoretical models.

6

Future Directions Human brains are typically viewed as the pre-eminent complex systems with cognition believed to emerge from dynamic interactions within and between brain sub-systems [17, 28–33]. Here, we cite two salient anatomical and physiological features that contribute to brain complexity and, by implication, the conditions apparently required for healthy cognition. These features give rise to multi-scale spatial-temporal patterns of brain activity, revealed with imaging techniques like EEG and fMRI, which are strongly correlated with mental states. One such salient feature is anatomical and physiological nested hierarchy: as we have seen, cortical

How the Brain Works: Perspectives on the Future of Human Neuroscience Research

39

Fig. 3 A theoretical representation of some modeling studies to discover cognitive mechanisms of decisionmaking using neurocognitive modeling of EEG and human behavior during visual decision-making tasks. Bold text represents observed data (EEG measures or human behavioral data), while italic text represents derived cognitive parameters that can be estimated through joint modeling of time-domain EEG collected from the scalp (top left: cartoon with three scalp electrodes sitting above the brain, CSF, skull, and skin that result in time-domain waveforms), frequency-domain EEG collected from high-density arrays (bottom left: EEG amplitudes that were spline-interpolated between electrodes on a flat representation of the human scalp), and/or choice RTs (response time distributions shown for correct responses, top, and error responses, bottom flipped). Event-related potentials (ERPs) can be calculated from event-locked EEG averages and embedded in neural drift-diffusion models (NDDMs) to discover the cognitive time course of decision-making separating visual encoding time (VET), decision time (DT), and motor execution time (MET) that together add up to each trial’s response time (RT) [23, 24]. Correct and error responses are described after the evidence accumulation path passes one of two boundaries during decision time (this trial is represented as a black line with two other gray lines representing other simulations from the same process that describe response times and possibly EEG potentials). Particular ERPs of interest are N200, P300/CPP, and RP waveforms. N200 waveforms are thought to reflect VET and the onset of evidence accumulation [24]. The P300 or centro-parietal positivity (CPP) are thought to reflect DT and possibly the evidence accumulation process itself. The readiness potential (RP) is a motor-related preparatory signal thought to reflect DT and MET under certain experimental conditions [25]. Steady-state visual evoked potentials (SSVEPs) can be calculated from band-limited frequency-domain EEG data using frequency-tagging experiments. Amplitude measures of SSVEPs across electrodes can then be used to estimate visual attention and, in particular, signal enhancement and noise suppression that could affect the rate and variance of evidence accumulation [26]

40

Ramesh Srinivasan

anatomy and physiology consist of neurons within minicolumns within modules within macrocolumns [1, 9]. Emergence and complexity generally occur in hierarchically nested physical and biological systems where each higher level of complexity displays novel emergent features based on the levels below it, their interactions, and their interactions with higher levels. Such systems may follow general principles that underlie many complex systems, including anthropology, artificial intelligence, chemistry, economics, meteorology, molecular biology, neuroscience, physics, psychology, and sociology [2, 12, 30, 32–34]. A second salient feature of many complex systems is non-local interactions in which dynamic activity at one location influences distant locations without affecting intermediate regions, as enabled in human brains by long (up to 15–20 cm) cortico-cortical fibers [1, 3, 4, 7, 35] and in human social systems by modern long distant communications facilitating small world behavior [36]. The label “small world” originates from the purported maximum six steps separating any two persons in the world; small worlds are widely studied in graph theory. The high density of short-range (mm-scale) intracortical connections coupled with an admixture of cortico-cortical axons favors small world behavior in the brain, which may be the essence of the dynamic sculpting of network architectures in brain function. For example, the path length between any pair of neocortical neurons is estimated to be no more than two or three synaptic connections [7]. Small worlds often promote high complexity; they also appear to be abundant in brain structural networks, across systems, scales, and species [32, 33]. This complex system view is critical to future genuine understanding of brain networks. However, because of the complexity of these types of analysis, to date, most of the studies that have attempted to characterize such brain networks have focused on resting-state networks in fMRI and EEG data. Very few studies of human brain function have linked network properties to cognitive operations. This is an open field with great potential for the future of brain sciences. References 1. Nunez PL (1995) Neocortical dynamics and human EEG rhythms. Oxford University Press, New York 2. Nunez PL, Srinivasan R (2006) Electric fields of the brain: the neurophysics of EEG. Oxford University Press, New York 3. Krieg WJS (1963) Connections of the cerebral cortex. Brain Books, Evanston 4. Krieg WJS (1973) Architectonics of human cerebral fiber system. Brain Books, Evanston, p 1973

5. Yeh F, Panesar S, Fernandes D, Meola A, Yoshino M, Fernandez-Miranda JC, Vettel JM, Verstynen T (2018) Population-averaged atlas of the macroscale human structural connectome and its network topology. NeuroImage 178:57–68 6. Cammoun L, Gigandet X, Meskaldji D, Thiran JP, Sporns O, Do KQ, Maeder P, Meuli R, Hagmann P (2012) Mapping the human connectome at multiple scales with diffusion spectrum MRI. J Neurosci Methods 203:386–397

How the Brain Works: Perspectives on the Future of Human Neuroscience Research 7. Braitenberg V (1978) Cortical architectonics: general and areal. In: Brazier MAB, Petsche H (eds) Architectonics of the cerebral cortex. Raven Press, New York, pp 443–465 8. Mountcastle VB (1979) An organizing principle for cerebral function: the unit module and the distributed system. In: Schmitt FO, Worden FG (eds) The neurosciences 4th study program. MIT Press, Cambridge 9. Szentagothai J (1979) Local neuron circuits of the neocortex. In: Schmitt FO, Worden FG (eds) The neurosciences 4th study program. MIT Press, Cambridge, MA, pp 399–415 10. Mountcastle VB (1998) Perceptual neuroscience. The cerebral cortex. Harvard University Press, Cambridge 11. Hubel DH, Wiesel TN (1977) Ferrier lecture. Functional architecture of macaque monkey visual cortex. Proc R Soc Lond B Biol Sci 198:1–59 12. Scott A (1995) Stairway to the mind. SpringerVerlag, New York 13. Lamme VA, Roelfsema PR (2000) The distinct modes of vision offered by feedforward and recurrent processing. TRENDS Neurosci 23: 571–579 14. Hebb DO (1949) The organization of behavior. Wiley, New York 15. Freeman WJ (1975) Mass action in the nervous system. Academic Press, New York 16. Lamme VA, Supe`r H, Spekreijse H (1998) Feedforward, horizontal, and feedback processing in the visual cortex. Curr Opin Neurobiol 8:529–535 17. Haken H (1996) Principles of brain functioning: a synergetic approach to brain activity, behavior and cognition. Springer, Berlin 18. Logothetis NK (2002) On the neural basis of the BOLD fMRI signal. Philos Trans R Soc Lond Ser B Biol Sci 357:1003–1037 19. Turner BM, Forstmann BU, Wagenmakers EJ, Brown SD, Sederberg PB, Steyvers MA (2013) Bayesian framework for simultaneously modeling neural and behavioral data. NeuroImage 72:193–206 20. Forstmann BU, Ratcliff R, Wagenmakers E-J (2016) Sequential sampling models in cognitive neuroscience: Advantages, applications, and extensions. Annu Rev Psychol 67:641–666 21. Palmieri TJ, Love BC, Turner BM (2017) Model-based cognitive neuroscience. J Math Psychol 76(Pt B):59–64 22. Roitman JD, Shadlen MN (2002) Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. J Neurosci 22(21):9475–9489

41

23. Nunez MD, Vandekerckhove J, Srinivasan R (2017) How attention influences perceptual decision making: single-trial EEG correlates of drift-diffusion model parameters. J Math Psychol 76(Pt B):117–130. https://doi.org/ 10.1016/j.jmp.2016.03.003 24. Nunez MD, Gosai A, Vandekerckhove J, Srinivasan R (2019) The latency of a visual evoked potential tracks the onset of decision making. NeuroImage 197:93–108. https://doi.org/ 10.1016/J.Neuroimage.2019.04.052 25. Lui KK, Nunez MD, Cassidy JM, Vandekerckhove J, Cramer SC, Srinivasan R (2020) Timing of readiness potentials reflect a decision-making process in the human brain. Comput Brain Behav 4:264–283. https://doi. org/10.1007/s42113-020-00097-5 26. Nunez MD, Srinivasan R, Vandekerckhove J (2015) Individual differences in attention influence perceptual decision making. Front Psychol 8:18. https://doi.org/10.3389/ fpsyg.2015.00018 27. Wiecki TV, Sofer I, Frank MJ (2013) HDDM: hierarchical Bayesian estimation of the driftdiffusion model in python. Front Neuroinform 7:14. https://doi.org/10.3389/fninf.2013. 00014 28. Freeman WJ (2003) The wave packet: an action potential for the 21st century. J Integr Neurosci 2:3–30 29. Friston KJ, Tononi G, Sporns O, Edelman GM (1995) Characterising the complexity of neuronal interactions. Hum Brain Mapp 3:302– 314 30. Edelman GM, Tononi G (2000) A universe of consciousness. Basic Books, New York 31. Buzsaki G (2006) Rhythms of the brain. Oxford University Press, New York 32. Tononi G, Edelman GM (1998) Consciousness and complexity. Science 282:1846–1851 33. Bassett DS, Gazzaniga MS (2011) Understanding complexity in the human brain. Trends Cogn Sci 15:200–209. Sporns O. (2011) Networks of the brain. MIT Press; Cambridge 34. Gell-Mann M, Lloyd S (1996) Information measures, effective complexity, and total information. Complexity 2:44–52. Pesenson M ed. (2013) Multiscale analysis and nonlinear dynamics. Wiley-VCH, Weinheim 35. Braitenberg V, Schuz A (1991) Anatomy of the cortex: statistics and geometry. SpringerVerlag, New York 36. Bassett DS, Bullmore ET (2009) Human brain networks in health and disease. Curr Opin Neurol 22:340–347

Chapter 3 How Do We Get the Brain to Tell Us About Language Computations and Representations? Designing and Implementing Experiments Miika Leminen and Alina Leminen Abstract Planning and conduction of electrophysiological experiments involves considering numerous factors, such as meticulous matching linguistic stimuli by their psycholinguistic features, finding optimal data recording settings, and choosing data-analysis criteria. All these factors may considerably alter the results of your experiment. This chapter is intended for those, who are new to the field and wonder where to start. This chapter is suitable also for those readers, who wish to update their knowledge on the topic of EEG data recording and analysis. While many laboratories around the world already have EEG equipment, this chapter offers recommendations also for those, who wish to update their EEG equipment or lab settings. We focus particularly on the settings that are intended for planning and conducting experiments with linguistic stimuli. Key words Experiment, Laboratory, EEG, MEG, Guidelines, Psycholinguistics, Cognitive neuroscience, Language

1

Introduction: How to Get Reliable Data to Test Your Hypotheses This chapter is intended particularly for young investigators, who are at the beginning of their path to designing and conducting neuroscientific studies. Building experiments in the field of cognitive neuroscience of language means not only presenting linguistic stimuli, but in a wider scope, the ability to control for different features of language, which enables opening a window to various cognitive aspects of language processing. However, due to some limitations in behavioral and psychophysiological measurements, some neurocognitive effects are still very difficult, if not impossible, to illuminate. The main bottlenecks one can face during conducting a study are the manipulation of critical stimulus features and the measurement errors. These topics are partly covered in this chapter’s Subheadings 2 and 3, respectively. Once the studied phenomena are revealed by a suitable experimental manipulation, the

Mirko Grimaldi et al. (eds.), Language Electrified: Principles, Methods, and Future Perspectives of Investigation, Neuromethods, vol. 202, https://doi.org/10.1007/978-1-0716-3263-5_3, © Springer Science+Business Media, LLC, part of Springer Nature 2023

43

44

Miika Leminen and Alina Leminen

experiment can still be jeopardized by data recording, data analysis, statistical approaches, or result interpretation. Hence, it is essential to understand what can and cannot be measured with present methodology, understand the limitations, and use them in a competent manner. This is the reason why both experimental design and data recording issues are linked in our chapter—they go hand in hand and cannot be planned independently. Thus, in our view, the window to linguistic phenomena includes both the limited capability to isolate or categorize abstract features of a language and a limited window to neurocognitive processes. Clear and simple recommendations would be easy to apply. Unfortunately, such recommendations would be oversimplified and could not cover all the different needs in different labs, experiments, and situations. However, we will try to give some hints that we hope will be useful to the readership. For every topic covered in this chapter, we will first explain the typical setups and try to elaborate different aspects that affect the practical decision-making. In the end of each topic, we will give simplified guidelines. The recommendations are based on our experience in using, developing, maintaining, and operating psychophysiology laboratories.

2

Building and Running Experiments

2.1 Designing Experiments

It is important to build general knowledge about different experimental paradigms used in your research field. It can be complicated and time-consuming, particularly because you should be able to collect your “library” or toolkit of competent experiments not only from your own methodological area (e.g., EEG/event-related potentials (ERPs)), but from other areas as well (e.g., functional magnetic resonance imaging (fMRI), behavioral experiments, eye-tracking). A good way to start is to take an experiment that is already been repeatedly and successfully used in the field (see Box 1 for some examples) and make some modifications to it according to your needs and research questions. This will make sure that interpretations and conclusions are less dependent on the outcome of your experimental design, while your scientific argumentation can be partly corroborated by existing literature. Making some modifications to an existing experimental paradigm allows you to ensure the coverage your phenomena of interest while minimizing the risk of getting null or laboriously explainable results. Developing own experimental design from scratch is undoubtedly possible as well, however, piloting and developing an experiment typically takes much longer, on a timescale of weeks or months.

How Do We Get the Brain to Tell Us About Language Computations. . .

Box 1. Examples of Some Typical Paradigms Used in Cognitive Neuroscience of Language Isolated phonemes. Used, for instance, in behavioral phoneme discrimination or identification paradigms, or in Mismatch Negativity paradigms (cf. Chap. 6). Phonemes can be recorded naturally or synthetized. There can either be a single item representing the whole class or feature (e.g., /e/), or naturally varying tens of different items belonging to the same class but are different examples of it (e.g., different voices, sexes, intonations, base frequencies). Natural variation can lead to more variance in brain responses. Isolated syllables. Similar natural variation as in isolated phonemes [1, 2]. In this case, a syllable is composed of two or more phonemes. A syllable can exist in a language or it can be novel. This aspect can also lead to large differences in brain responses. Violation paradigms and linguistic judgment paradigms typically involve modifications or violations of one or several linguistic rules (e.g., “The boy is tuning the guitar/*the sock before the concert [3–5]). The participants typically perform a judgment whether the sentence is acceptable or not or if there is anything strange in the sentence. While violation paradigms can provide interesting insights about language processing in the brain, the processing of violated structures may not be equal to natural language processing. Listening or reading of lists of isolated words [6]. Using single words stimuli is more straightforward to analyze as compared to clauses or sentences, which can also be its weakness, as a list of words is a mere simplification of a natural language. For example, in a natural language context, the context guides extracting word meanings, and single words are rarely presented alone without any context in naturally unfolding language. Lexical decision tasks. Typically, word list paradigms that require a lexicality judgment (i.e., whether a linguistic item is a real word or not) [7]. Natural listening or reading. Presenting a participant with natural texts and passages without strict control of stimulus characteristics [8]. Priming paradigms. In priming paradigms, linguistic relationships between different words are examined by presenting a prime, such as “cat” and a target word, such as “dog.” With this approach, it is possible to investigate the extent a prior presentation of a word facilitates the recognition of another word [9].

45

46

Miika Leminen and Alina Leminen

It is also important to note that the neurocognitive processing of some specific task may differ depending on the exact formulation of the participant instruction. For instance, if the participants are asked to judge whether a word is real or not, some participants might think that they should have heard someone using that word, while others might think that the meaning should be understandable according to linguistic rules irrespective of its usage. Hence, it is crucial to be as explicit as possible and use examples while formulating instructions for the participants. In addition, it is always advisable to use practice trials to ensure that every participant has understood the task instructions properly. A potential difficulty in many experimental paradigms is related to behavioral responses of an experimental task, that is, sometimes patterns of behavioral and brain responses are incompatible with each other. For instance, when comparing two experimental conditions (e.g., words and pseudowords), brain responses can differ significantly, while no significant differences are observed in the behavioral data and vice versa. Such an outcome may certainly create challenges for interpretation of the results and comparing them with previous behavioral findings. On the other hand, if we do not collect behavioral responses during EEG/MEG measurement, we cannot be certain that participants indeed paid attention to the stimuli and performed the task correctly. A typical way to verify this is to use a control condition, in which motor preparation and performance do not differ from the condition of interest. As a rule of thumb—make sure that “you know what your participants are thinking.” If you do not collect behavioral evidence, spend some extra time in planning the paradigm. Other challenges that need to be considered during designing experiments are, for instance, overlapping brain responses and timelocking of the ERP/ERF responses. Brain responses are often difficult to separate from each other and overlapping brain responses occur both spatially (in space) and temporally (in time). Thus, dataanalysis approaches are often chosen to validate an assumption that a specific brain response is indeed related to a specific cognitive function and/or activity of some specific neural network (cf. Chap. 6). A well-designed experiment goes conjointly with the analysis methodology and attempts to minimize the complexity of brain responses as well as allows you to isolate different neurocognitive functions. Moreover, most electrophysiological analysis techniques require using some point in time to correlate some stimulus features with corresponding neural data. Such time-locking can be done in several ways. The most typical method is using the onset of a stimulus item as a reference time point, such as the onset of an auditorily presented word or the onset of a visually presented word. In natural language, words have very different durations, and if a brain response of interest (e.g., ERP/ERF component) is small and

How Do We Get the Brain to Tell Us About Language Computations. . .

47

focal, you may lose your effects in averaging [10]. This may happen, for instance, if a response of interest occurs in the end of the word than in the beginning, or at some other point in time when important information becomes available. In this case you could timelock the responses to another point, such as the suffix onset [6, 11], a disambiguation point [12, 13], or a button press. However, these unconventional time-locking methods are often not straightforward to operationalize. For example, strong responses related to word onset can cause unwanted variance in response baseline and can disturb analyses. Furthermore, processing of linguistic stimuli and electrophysiological responses associated with them do not end after the stimuli are presented to the participants. For instance, when using sentence-level and discourse-level stimuli, integration of individual words to a sentential context may continue several seconds after the stimulus presentation has ended. If a next sentence is presented too soon after the previous one, it is possible that the brain responses of interest will overlap with other responses and will be more difficult to quantify and separate. The same challenge is faced when presenting the target stimulus too soon after the prime stimulus in priming experiments, especially if the prime and target are presented in different stimulus modalities (e.g., auditory and visual), and especially if stimulus lengths and durations are not well controlled for. Hence, in such paradigms, we recommend including a sufficiently long inter-stimulus and inter-trial intervals. Box 2 summarizes recommendations presented above. Box 2. Summary of the Guidelines for Designing Experiments It is easier and less risky to begin neuroscience experiments using a modification of an existing paradigm, using welldefined and known ERP/ERF components. This also facilitates the formulation of a priori hypotheses. An entirely novel paradigm requires careful testing and piloting, and it is important to invest time and resources into this preparative work. In addition, novel paradigms may complicate hypothesis setting, as it may be challenging to predict the exact brain responses that will be elicited by the novel paradigm. This may lead to exploratory analyses, which has a weaker explanatory scientific power. Carefully plan your experiments to isolate ERP/ERF components of interest from other potentially overlapping components; carefully explore previous literature. Plan to which point in time you time-lock ERP/ERF responses of interest; this may affect the results greatly.

48

Miika Leminen and Alina Leminen

2.2 Experiment Building Tools

What is a good strategy for obtaining an experiment execution software for a research lab or unit? There is no solution that is suitable for everyone’s needs; however, we recommend that the decision is based on strategic planning. When planning and evaluating different experiment software tools, you should also consider their costs, technical accuracy, usability, and difficulty of implementation. You can then select the software according to your budget, laboratory settings, and/or programming skills. For example, you can program any experiment from scratch by, for instance, a lowerlevel programming language, such as C or C++. In this case, any solution can be programmed, since development of an experiment is not limited by features missing from an existing software or a toolbox. However, experiment development and programming are time-consuming tasks that require careful testing and piloting to exclude any errors in the code. Moreover, if there is a wide researcher community in the lab already skilled in, for instance, Python or Matlab programming languages, it would be easier to use tools in which this existing expertise can be used. If you are not a programmer and cannot invest months or years to learn a new programming language, a more cost-efficient and time-efficient way is to use a readily available tools to program experiments. These can be code libraries, toolboxes (such as Psychophysics toolbox for the Matlab environment), or software packages (such as PsychoPy, E-prime, Presentation by Neurobehavioral Systems, and Experiment Builder by SR Research). Irrespective of the experiment building software, if you are new to research, designing experiments and/or programming them, you might need support to get you started or when you experience problems or errors in the code. In some labs, every researcher in lab is using different experimental tools, making it undoubtedly more challenging to get support and advice from colleagues. Fortunately, nowadays, international online communities do offer plenty of support. It is worth considering, however, whether you would like to develop a library of paradigms that can be used and modified in your lab, ultimately saving time from paradigm software development and testing in future. Some experiment building software are open source, freely available resources, and some are commercial. If a tool is commercial, then it is important to estimate how many separate experiments you would need to run in parallel for the next few years. Licensing options for different software products can also be quite different. For instance, if each license purchase requires 2000 euros, it might be challenging to obtain 30 licenses for the whole classroom test lab. On the other hand, some software manufacturers offer other licensing options, such as online test licenses with an annual fee, and the costs can be more easily adapted for varying needs. Check the manufacturer’s licensing options and discuss it in your lab.

How Do We Get the Brain to Tell Us About Language Computations. . .

49

Crucially, different experiment building tools differ from each other in their technical (timing) accuracy. For instance, some tools offer a wide range of algorithms to manipulate visual stimuli but can be rather inaccurate in presenting auditory stimuli. Hence, it is advisable to choose the software or platform depending on the needs of your experiment. Furthermore, different tools require and support different computer hardware. For instance, whereas one software works optimally only with certain manufacturer’s audio cards, another may require a different card. Another example could be that a certain video card may be optimized for the high refresh rate for some software, whereas another tool misses some frames every now and then and would run optimally in some other video card. Make sure to verify your lab computer settings before you start programming your experiment. It is also recommended to use free or low-cost demo versions to get some first-hand experience about usability and technical accuracy. As a final note, it is worth mentioning that implementation of any experimental tool in lab environment requires engineering work and testing. Thus, it can be quite costly to implement more than one experimental environment in the lab, since all of them require resources for development and maintenance. For instance, testing of timing accuracy and other important features (such as presence of skipped frames with visual stimuli) should be done systematically on a regular basis. Hence, the more different tools and systems a lab keeps running in parallel, the more working hours will be needed. As mentioned above, not all hardware combinations are supported by all tools, which makes it more challenging, if not impossible, to implement simultaneously all possible tools in the same laboratory or experiment unit. The abovementioned recommendations are simplified in the guidelines presented in Box 3.

Box 3. Guidelines for Choosing an Experiment Building Software Find out which experiment building tools work optimally with linguistic stimuli you frequently use (e.g., auditory, visual, audio-visual). Plan how many simultaneous experiment units you want to have in your lab and what licensing options are optimal for it. Consider whether you need online paradigms. Some of the tools support both running the same experiment in the lab and online via an Internet browser (e.g., PsychoPy). It always takes time to learn to use a new tool. Invest time in it and be patient. Typically, learning a new experiment

(continued)

50

Miika Leminen and Alina Leminen

Box 3 (continued) building tool means learning of a new programming syntax as well as finding optimal technical settings. Moreover, experiment-designing philosophy can be quite different in different software (i.e., how to optimally build hierarchical experiment structures with trials and blocks of trials, and how to control their randomization), and it can be rather time-consuming to transform a ready-made experiment from one platform to another.

2.3 Stimulus Preparation and Technical Setup

Electrophysiological responses are typically very sensitive to physical differences in stimuli. Hence, you must avoid unintentional and undesirable stimulus differences, as they may potentially lead to differences in brain responses overall, while your effects of interest may remain hidden. Hence, it is essential to invest time in learning to prepare and edit your stimuli meticulously. With visual stimuli you often need to control for at least the length of your visual stimuli, such as words, the size of an item or text on the display, which is usually reported as visual field angles, taking into account also participants’ distance from the display. In addition to size, you need to consider the resolution of stimuli or a computer display (i.e., the number of pixels in x and y dimensions), colors, brightness (note that the text font thickness affects perceived brightness), loci on the visual field (you should know where a participant’s gaze is fixated when your stimuli appear), visual frequencies (density of the lines or stripes—is your image “busy with lines” or not?), and luminance in different spatial locations on the computer display. Be very careful if you aim to present different stimuli in asymmetric locations on the display (e.g., by comparing brain responses to stimuli located in the middle of the display vs. in the top); the differences in response sizes can be easily affected by unwanted difference in stimulus brightness. Note that especially in low-quality displays the luminance and other properties vary greatly at different locations of the display. Unfortunately, technical specifications reported by display manufacturers do not often help, but lab setup should be verified by calibration test measurements in a lab. When preparing auditory stimuli, you need to carefully match at least the duration of an auditory item, such as a syllable or a word. Please keep in mind that duration of an audio file can differ from duration of an auditory item, if you include silence in the beginning of your audio file (avoid doing that) and intensity (raw audio intensity, how well the dynamic range is used and how loudly it is presented). Note that loudness depends not only on the raw file but also on the technical setup of stimulus presentation, including volume adjustment on a computer and also a potentially separate audio/headphones amplifier. As mentioned above, check that you

How Do We Get the Brain to Tell Us About Language Computations. . .

51

do not have silence randomly in the beginning of the audio file. This will lead to jitter in triggering, because your experiment building software assumes that your stimulus starts when an audio file starts. Simply go through all of your audio files and delete possible unnecessary silence periods. You also need to control for how similar is the perceived intensity between the different stimuli, which can occasionally be quite challenging, as our perception of loudness is different for consonants and vowels, to name a few. Moreover, the algorithm used for measuring intensity always has certain parameters, such as length of a time window and the frequency weight distribution (check these settings with physical loudness meters as well as in audio file loudness normalization algorithms provided by your audio editing software). Hence, it is a good idea to pay attention to these parameters. Keep in mind that if your stimulus is very short in duration, also the time window for the intensity normalization algorithm should be short. Furthermore, rapid transitions in audio signal can sometimes cause some additional sounds in headphones or loudspeakers and even generate brain responses of their own. To avoid these unwanted responses, use, for instance, some fade-in and fade-out ramping functions (e.g., linear or logarithmic) to make different stimuli comparable. These fading ramps ensure that your audio signal’s amplitude rises/falls from/to zero (silence) in some specific controlled time window. Typically, researchers use, for instance, 5 ms rise/fall ramp times. However, be careful especially with consonants while ramping their onsets. You also need to control for prosody (intonation), sex (male or female speaker), and speed and pace of speech as well as other psycholinguistic features of the stimuli (e.g., lexical frequency, phonological neighborhood size). During stimulus recording, make sure that factors such as background noise, microphone quality, and echo are taken care of. You do not necessarily need a professional studio for recordings, but you need at least a high-quality microphone and an audio card. Some higher-class USB microphones are also relevant, as they have an inbuilt AD converter and do not require a separate audio card. Do make sure that the background noise is low in your recording venue. Echoes can be dampened by, for instance, simple echo attenuators around the microphone. Those are sold, for instance, in music instrument stores. If you go to one, obtain a pop filter as well (a light shield between the microphone and the speaker). Next, we mention a few words on stimulus triggers and logging your experiment. In our opinion, it is better to store as much information as possible, since it helps solving possible issues that might come up later on. As a rule of thumb, careful planning before data collection always saves time in the analysis phase. It is, thus, crucial to ensure that your triggered and logged timepoints are precisely the ones you need. If there is a systematic delay in stimulus triggering (a gap between the trigger sent from a stimulus

52

Miika Leminen and Alina Leminen

computer to an EEG/MEG recording equipment and actual stimulus onset), then you should at least be aware of it and take it into account during data analysis. We also recommend that a lab engineer or a technician should also routinely check that the delay remains constant. In addition to routine checks, measurements of trigger jitter (variance of the delay) should also always be performed after any changes in the lab settings. Trigger jitter can easily cause unwanted disturbance to brain responses, and the jitter issue is particularly harmful if you have not measured and minimized it. Even if your stimulus software is able to send triggers accurately synchronized with a timepoint that is assumed to be the stimulus onset by your stimulus software, it may not necessarily match with the actual stimulus onset. With visual stimuli, first it takes some time for a computer to prepare the stimuli in a video card. Thereafter, it may wait for the next frame and some processing may also take place in the display. Finally, your stimuli are presented on the display. However, even then the onset slope varies depending on the display model and technology of the panel it is using, meaning that the visual stimulus is not visible in full brightness immediately after the onset but fades in over approximately 5 or 30 ms. This onset slope is always present and often you cannot do more than test several monitors to find the optimal onset slope. Similarly, there is an offset slope, that is, a stimulus does not return to dark immediately but follows a fade-out curve. The easiest way to measure the trigger jitter as well as onset and offset slopes is to build a measurement setup with a sufficiently quick photosensitive sensor. These onset and offset responses are sometimes the reason why expensive research-specific displays may be worth of an investment. Box 4 summarizes our recommendations presented above.

Box 4. Recommendations for Stimulus Preparation Be meticulous in your stimulus preparation—make sure that your stimuli differ from each other only by the desired stimulus characteristics and variables. Accuracy of stimulus triggering is critical especially in electrophysiological recordings, due to their high temporal resolution. In non-optimized stimulus computers jitters in both inter-stimulus intervals and stimulus-trigger asynchrony can easily be in the same time range with the neural responses of interest, that is, even tens of milliseconds and can thus ruin your responses.

How Do We Get the Brain to Tell Us About Language Computations. . .

2.4 Running Experiments

3

53

When you are ready to run your experiments with actual participants, it is important to keep recording notes; that will help with possible issues that might need to be solved afterward. It is also essential to minimize the possibility to connect personal identification information with recorded data. Hence, we recommend creating pseudo-identification numbers for all the participants, using this pseudo-ID in all the data, log files, and test documents. Such pseudo-ID numbers can be created in advance, prior to actual measurements. A participant’s name should be used in an informed consent documents only, and these should be stored in a separate place. To ensure privacy of the participants, do not insert even pseudo-ID in the documents that do not require IDs. You can carefully and securely store one mapping table (a paper or an MS Excel sheet) for mapping participant names and pseudo-IDs, so you can easily match the pseudo-IDs with any participant if needed (e.g., you discover that you have to exclude one of the participants during the data-analysis phase). Thus, you can also anonymize your data by deleting this mapping file (assuming that no other information in your data allows identification of your participants. Pay extra attention to data security in case you have full head anatomical MR images). Describe this procedure in your ethical application and follow your local institute’s instructions. Remember to provide sufficient breaks for your participants. Make sure to keep the length of the recording session reasonable (maximum 1.5 h per session) and even much shorter with child participants. Keeping your participants alert will ensure their attention on the task and will minimize unwanted disturbance of your data (e.g., alpha waves caused by fatigue; their signal amplitudes can easily be much larger than the ERP responses of interest). Fatigue also affects cognitive performance of the participants. Hence, it is important to randomize the order of the blocks for each participant to avoid systematic fatigue or movement artifacts for some experimental conditions, which may disturb your neural and behavioral data. Invest in a good armchair so that your participants can sit comfortably throughout the experiment. Offer refreshments when needed and offer several breaks between the experimental blocks.

Setting Up a Psychophysiology Laboratory

3.1 Data Recording Infrastructure

Recently, lower prices of high-quality laboratory equipment have enabled many smaller labs to purchase high-quality research facilities. However, there are also equipment on the market that do not meet standards of a high-quality psychophysiology lab, particularly the one focusing on language research. As with strategic planning of your stimulus setup, it is important to make a strategic plan for developing, running, and maintaining a recording instrumentation. This will help you to decide which features are essential and which are less important to include in the lab.

54

Miika Leminen and Alina Leminen

There are less manufacturers for MEG equipment than for EEG and, hence, less options are available. However, the features built around the basic MEG infrastructure allow for more customization (such as different behavioral response instrumentation, auditory and visual stimulation setups, and simultaneous EEG and eye-tracking data acquisition). In our opinion, MEG-compatible (or inbuilt) EEG equipment is highly useful to purchase together with MEG equipment. It will allow you to measure EEG and MEG simultaneously, offering a possibility to use of more advanced neural source modeling techniques, since EEG’s lead fields are different from both (MEG) gradiometers’ and magnetometers’, and, thus, can offer complementary information. When planning EEG facilities, we recommend to consider at least the number of channels, mobility, and electromagnetic noise shielding features. In many studies on the neurocognition of language, 16, 32, or 64 channels are sufficient, not every study needs to have 128 or 256 EEG channels. In other words, high-quality science is possible to do with less than 64 channels and not every study using over hundred channels is automatically better. What matters is how (and if) you use the advantage of having better spatial information. High-resolution EEG with up to 256 channels is advantageous in neural source modeling and in analysis techniques such as independent component analysis (ICA). The latter, the so-called blind source separation technique, can also be used in data cleaning. The disadvantages of multi-channel EEG are larger lab expenses, larger equipment, and slower (and less comfortable) preparation of EEG recordings. In many cases, it is optimal to have equipment, which allows one using a different number of channels for different kinds of experiments. With respect to mobility, consider if you plan to record EEG outside the lab in the so-called naturalistic settings. The highest-quality lab equipment with a large number of channels is often bigger, heavier, and includes many modules with a lot of wires. This makes transportation of equipment to a new recording venue for each recording session rather inconvenient. The recommended options are to have a separate lab and mobile equipment with different qualities or to purchase semi-portable equipment. Furthermore, different equipment has different noise shielding features, such as active shielding (the wires between an electrode and an amplifier are actively shielded), active electrodes (instrumentation buffer integrated in each electrode), length of electrode wires (the longer the wires, the more noise they can pick up), as well as the distance between participant and AD transformation unit. Regarding the latter, in some equipment this distance is the same as the length of electrode wires, but in some equipment digital conversion does not take place in a so-called headbox close to participant, but in an amplifier situated farther away. In such a case, the signal between the headbox and the amplifier can be still rather vulnerable. If your lab has

How Do We Get the Brain to Tell Us About Language Computations. . .

55

low electrical noise environment, such as an electrically shielded room, these features have less power to improve the signal-to-noise ratio, as shielded room is already handling most of the electrical noise and thus shielding your measurement. However, in varying and naturalistic environments in particular, these features are more crucial. Another important issue to consider while building an EEG lab is triggering possibilities. Some small ambulatory devices or clinical neurophysiology devices have poor triggering interfaces and may even make ERP recordings impossible. A sufficient input port for EEG recordings aiming at ERP analyses are for an 8-bit or 16-bit TTL signal (i.e., the port where a stimulus computer’s trigger output is connected). In addition to EEG electrodes, you might need to record data from additional sensors. You might consider how many bipolar electromyogram (EMG) inputs are needed in addition to common referenced EEG inputs. For instance, EMG inputs are typically used for two bipolar electro-oculograms (EOG), that is, vertical and horizontal EOGs to optimally record eye movements and blinks. Other potential needs for EMG inputs are electrocardiograms (ECG), facial EMG for autonomous responses (cf. Chaps. 11 and 19), as well as muscle tonus for sleep recordings to distinguish random eye movement (REM) from other sleep stages and the waking period. Four bipolar inputs (8 electrodes altogether) are typically sufficient to cover most of the needs. If your needs change, some amplifiers allow you to upgrade your system with more channels or inputs for different additional sensors. Next, we will say a few words about amplitude resolution. For the highest-level scientific purposes, EEG equipment should have a 22–24-bit AD converter. With such a converter, the smallest recordable changes in an EEG signal are within a range of few tens of nanovolts, yet the dynamic range is large enough for even larger-scale artifacts and signal changes, while the amplifier will remain in the functional (dynamic) range and not saturate. In cheapest or “consumer” brain-computer interface (BCI) EEG equipment, the resolution is sometimes too poor for scientific ERP studies. Another problem with cheapest EEG equipment is that the dynamic range of the converter is so small that heavy filtering (signal dampening or smoothing) is performed before the AD conversion. Quite often this filter harms your ERP responses, particularly slow language-related ERP components such as N400 and P600. To run the recordings smoothly, you should make participant preparation as fast and participant-friendly as possible. Children, elderly, and clinical group participants benefit from gentle electrode preparation, leading to possibly less tension or movement of the participants during the recordings. The so-called traditional passive electrodes with electrode gel/paste are most difficult in this sense.

56

Miika Leminen and Alina Leminen

Active electrodes with electrode gel are faster to prepare and (gentle) scratching of the participant’s skull is often unnecessary (see below for more explanation). Nevertheless, electrode gel application is still somewhat time-consuming. Electrodes with saline solution pads are fastest and easiest to apply. However, saline pad connected electrodes have a higher risk of varying signal quality due to movement, and during long recordings in particular, the electrodes may dry, leading to poorer electrode connection, and, hence, to poorer signal-to-noise ratio (SNR). In this sense, active electrodes are better, but they are pricier. In addition to electrode preparation, you should pay close attention to how comparable your signal quality is in the beginning and in the end of the recording session. For instance, you might need to compare, brain responses in the beginning and the end of the session in a language learning experiment. In such studies, it is crucial for the SNR to remain the same throughout the recording session. As discussed above, saline pad electrodes in particular do not always meet this criterion. If the electrodes are not attached permanently to the EEG caps, as is often the case with active electrodes, it is good to purchase EEG caps of several different sizes and only a few sets of (rather expensive) electrodes. This will prolong the life cycle of EEG caps, as you can always get a new cap of an appropriate size. In this case, you will not have to try to use too small caps, as stretching will eventually damage the cap, leading to the loss of its shape. In too large caps (or in ones that have lost their original shape), some electrodes will be loosely connected to the scalp and cause bad or varying signal quality. Generally, any lab should have a variety of EEG caps with different sizes and more copies of those that are used most frequently. Otherwise, you may have to use a wet EEG cap if you have several participants in a row. See Fig. 1 for schematic view of a modular EEG cap/electrode system. Furthermore, if an EEG system is actively used, single electrodes and wires are typically broken quite often. It is therefore important to know if single electrodes can be replaced in your lab or if the whole set of electrodes needs to be sent to a manufacturer. Even if replacement of a single electrode is inexpensive, it always means that one of the electrode sets cannot be used for several weeks. Hence, we recommend having a sufficient number of spare electrode sets to avoid interruption in the EEG recordings. You should also do some simple lifecycle cost estimation for your EEG equipment, for instance, take into account the prices of replacing electrodes and caps. After such an analysis, you may find that expensive electrodes turn out to be cheaper in the long run, since in that case you can replace and renew electrode sets independently from caps. In order to obtain a proper EEG signal, most EEG systems require using electrode gels and pastes. The choice of an appropriate gel or paste depends both on your EEG system and on your

How Do We Get the Brain to Tell Us About Language Computations. . .

57

Fig. 1 Schematic overview of a modular EEG/cap system

research needs. There is a plethora of available options and hence, it not straightforward to choose an optimal electrode gel. Typically, an EEG system manufacturer can recommend certain gels or pastes, but it is beneficial to learn about other possibilities and test different products to find the ones that are optimal for your own research purposes. When choosing an electrode gel or paste, you need to consider, for example, the properties of the gel, the ease of gel or paste removal, and the stability of the contact impedance during your recordings. A smoothly running gel with high viscosity is usually faster to apply than hard and sticky one, and the difference between them is more significant if you have many electrodes to prepare (i.e., longer preparation times). However, a smoothly running gel tends to leak and does not keep a good electrode contact, unless the electrode holder in the cap or the opening in a circular electrode is tight enough to keep the gel in. On the other hand, removal of thick pastes is time-consuming and requires force during washing of electrodes and caps. This may cause physical damages and shorten the life cycle of caps and electrodes. Thick pastes are also inconvenient for the participants, who need to wash their hair after the experiment. When the gel or paste is properly attached and the EEG measurement has started, the next task is to verify the stability of the contact impedance during the recording. The change in the contact impedance may be caused by, for instance, drying of a gel or a paste

58

Miika Leminen and Alina Leminen

and due to the movement of the electrodes, caused by movement of a participant. If the recording session is short ( 70 Hz), which are linked to the reduced engagement of semantic composition processes in idioms. 2.2 EEG Indexes for Mind-Reading Pragmatics

Irony Several EEG studies on irony compared the processing of the same sentence embedded in either a literal or an ironic context. P600 effects are the most prominent across studies and are found in all works [e.g., 72–75] except those focusing on prosodic differences outside of context [76, 77]. Interestingly, P600 effects could be detected across communicative styles and task demands [72], with or without ironic cues [75], in positive, negative, or unusual contexts [78, 79], and when using emojis [80]. Such positive effects are often long-lasting and are sometimes taken to involve the LPC [78, 80, 81]. Conversely, N400 effects are present only in a limited set of cases, for instance, when irony is used in a positive context (ironic praise) [78, 82, 83]—commenting What a sad prize!, after winning a huge sum of money—and in more “unfamiliar” forms of irony [79]. It is also worth noting that it is not unlikely to find that irony induces even earlier effects in the ERPs, often involving the P200 component [e.g., 73]. Two studies in the literature on irony [74, 81] investigated the time–frequency domain of the EEG. A common finding concerns a power decrease in the alpha range, pointing to the idea that ironic sentences require the recruitment of more cognitive resources than literal sentences. A power increase associated with the theta and the gamma ranges was observed in Spotorno et al. [81]. Humor Concerning humor, several studies tested the reading of canned verbal jokes and most of them involved a comparison between humorous and non-humorous sentences or dialogues— When I asked the bartender for something cold and full of rum, he recommended his wife versus ... he recommended his daiquiri

592

Paolo Canal and Valentina Bambini

[84]. Compared to straightforward conditions, the comprehension of humorous passages shows a complex pattern of ERP effects, very often involving two or more different components. The effect that is first encountered shows a negative polarity: it is taken to involve the N400 [e.g., 84–87] or the Left Anterior Negativity (LAN) [e.g., 84, 88, 89]. After the initial negativity, the majority of studies reported positive effects on the ERP response, from relatively early occurring effects on the P600 (500 to roughly 750 ms) [88, 90] to later occurring differences associated with the LPC [86, 88]. Studies on humor processing rarely explored the time–frequency domain of the EEG. One single study focused on verbal materials and observed a power decrease in the beta range of the EEG [88]. When processing visual stimuli, the involvement of several frequency ranges [91] or a specific role for alpha oscillations and creative cognition in humor [92] was found. 2.2.1 Functional Interpretation

It is undoubted that studies on irony and humor show a major involvement of the P600 and the LPC components. While the role of the P600 reported for metaphors and idioms is debated, the positivity for irony and humor is consistently interpreted as reflecting the inferential activity required to reverse sentence meaning in irony or to resolve the humorous clash. On the one hand, this is in line with theoretical description that irony and (to some extent) humor involve the ability to account for the speaker’s intentions and that in both cases the final interpretation is very distant from the literal message (in irony) or from the expectations fed by the joke set up (in humor). On the other hand, it also adds to the idea that the cognitive mechanisms underlying the P600 deal with meaning, especially when sentence interpretation must be revised and/or the reader/listener must get the implicature (as proposed for metaphor). The N400 findings confirm the view that humor and to a limited extent irony involve the experience of a difficulty in connecting the incoming words with the expectations set up by the previous context. In irony, this happens only when it is used in an unusual way (see unfamiliar irony [79] or ironic praise [78]), and in humor, the N400 may be involved in detecting the message incongruity that is fundamental for the humorous effect, and a similar role may be played by the Left Anterior Negativity, which typically reflects incongruity detection in the syntactic domain [e.g., 93]. Concerning humor, the observed multi-phasic ERP response (LAN/N400, P600, and LPC) captures the classic distinction between the different processing steps in humor comprehension [94]: incongruity detection (LAN/N400) and resolution (P600) and finally an elaboration/appreciation stage (LPC) [95].

Pragmatics Electrified

593

The investigation of the time–frequency domain of the EEG in mind-reading pragmatics further adds that irony is cognitively engaging (alpha suppression [74, 81]) and that, after the resolution of a humorous clash, the mental representation of the discourse built so far is abandoned (beta suppression in [88]), favoring the transition to the third step of elaboration and appreciation.

3

Context, Discourse, and Conversation

3.1 Decomposing Contextual Factors

The idea that contextual information, especially intended as information from the previous linguistic context, plays a major role in language comprehension is widely accepted since behavioral work on the processing of ambiguous words [e.g., 96, 97]. Already back then it was clear that contextual information sets up rather specific expectations about upcoming words, and these expectations ease the comprehension of congruent words. This line of research largely benefited from the ERP methodology, and especially from the discovery of the N400 component [98], which paved the way to the study of the effect on comprehension of several contextual factors, and made the notion of cloze-probability of a word [99] very popular. Cloze-probability is the probability of a particular word to be used to complete a truncated sentence, which is robustly (and negatively) correlated with the size of the N400: less predictable words in the sentential context elicit larger N400 [14]. However, the unitary notion of cloze-probability must not make us forget that many different factors contribute to making a word more or less predictable. From a pragmatic and psychological perspective, the notion of context is multifaceted, and both linguistic (e.g., sentence-level expectations, discourse information) and extralinguistic (individual knowledge or beliefs) factors, all together, contribute to creating expectations about what is next in the sentence. Considering linguistic factors, the information provided by the prior sentence or discourse context is a primary source for semantic expectations. Federmeier and Kutas [64] showed that expectations can be rather specific and that the N400 not only measures the distance between context and incoming input but can also be used to investigate the organization of semantic memory. In a typical experiment, the processing of strongly expected words (e.g., brain) in context (He was afraid that doing drugs would damage his...) is compared to the processing of words that are similarly less predictable but share some semantic features with the expected word (the related word, e.g., mind) or not (the unrelated word, e.g., reputation): the N400 is sensitive to this feature overlap and unrelated words show larger N400 when compared to related words (similar contextual effects have been extensively studied and reviewed by Van Petten and Luka [20]).

594

Paolo Canal and Valentina Bambini

One more aspect that was revealed by the behavior of the N400 is that many different kinds of information that go “beyond the sentence given,” as Hagoort and Van Berkum [100] would say, are part of the context in which a communicative exchange occurs, and they affect the N400, as local (sentence-level) sources of information do. Our knowledge about the speaker or the facts in the world affect the amplitude of the N400, which peaks when we hear a child voice uttering Before sleep I always drink a glass of wine [101, 102] or when we read the sentence The Dutch trains are white knowing that they are yellow [103]. Recently, Troyer and Kutas [104] further found that the N400 is also sensitive to encyclopedic knowledge, showing that individuals who better know Harry Potter’s story also show larger N400 to sentences presenting wrong information about the characters. Discourse context is then capable of overriding semantic features. Actually, Nieuwland and Van Berkum [105] showed that animacy violation, such as The peanut was in love, which is expected to elicit an N400 effect when presented outside a context, does not elicit incongruity effects when the preceding context is fictive and describes a singing and dancing peanut. The N400 is also involved in processing statements that are inconsistent with one’s moral attitude, like for a strict Christian being presented with I think euthanasia is an acceptable course of action [106]. N400 effects are often followed by later occurring positive effects which are also sensitive to the context, especially when processing unexpected but plausible words: Van Petten and Luka [20] proposed that they reflect the attempt of repairing or revising the sentence interpretation, after predictions turned out to be wrong. Moreover, the involvement of the Late Positive Potential— LPP which is typically associated with the processing of emotion— has been reported when investigating morally objectionable statements [106–109] and insults [110]. Concerning the time–frequency domain of the EEG, words that are unexpected in context are typically associated with increased power in the theta band, when comparing high and low cloze sentences [e.g., 111]. World knowledge violations are instead associated with increased power in the theta and in the gamma range of the EEG when compared to correct sentences [e.g., 112] but not in natural reading [113]. 3.2 Linking Discourse

Linguistic information provided by the discourse context becomes part of the mental model of the discourse [e.g., 114]: forthcoming information must be integrated with it, by linking facts and characters to the different events described. One of the tools that mostly allows speakers to maintain a coherent representation of the ongoing discourse is anaphora. Anaphoric expressions, such as pronouns or nouns connecting the subsequent occurrences of the same referent in a text, have been widely investigated with ERPs (reviewed in

Pragmatics Electrified

595

Callahan [115]): these studies have shown that when semantic or grammatical features control agreement and make it impossible to link anaphoric expressions to an antecedent (The king prepared herself* for the dinner), a P600 is observed [116–118] attesting the “failure to agree”; conversely, N400 effects have been associated with the unexpected use of anaphoric expressions, as, for instance, for repeated names [119]. The case of ambiguous reference, that is, when there are more than one plausible antecedent for an anaphoric expression, is of special interest for pragmatics because it may require a further search in the mental model to eventually choose the appropriate referent. Referential ambiguity was found to be associated with a different component, named Nref, which has a negative polarity, is long-lasting and has an anterior distribution [120–123]. Effects on this component are also found when processing unbound pronouns with no explicit antecedents [124] or when co-indexation features are computed on the basis of stereotypical gender [116]. Important evidence on anaphoric expressions comes from the study of givenness: very often information about the antecedents of a referring expression is not explicitly given but can be inferred from the context. Burkhardt [125] recorded the ERPs while participants read sentences such as He said that the conductor was very impressive, preceded by three kinds of contexts: in one, the given condition, the referent was mentioned explicitly (Tobias visited a conductor in Berlin); in the second, the new condition, the referent was not mentioned (Tobias talked to Nina); and lastly, in the bridged condition, the relationship between the two sentences could be derived via pragmatic inference (Tobias visited a concert in Berlin). The results showed that the differences in the ERPs concerned a reduction of the N400 amplitude for given as compared to new referents, while a larger P600 characterized the difference between bridged and given conditions. On the basis of this and other findings [e.g., 126], Schumacher [127] proposed that the N400 and the P600 would reflect Discourse Linking and Discourse Update mechanisms, respectively: the N400 would be related to the availability of the antecedent for a referring expression, allowing to establish the link between different entities of the discourse; the P600 would be associated with the structural consequences of an update of the discourse representation, obtained through the correction, modification, or enrichment of such representation. Support for this view comes from the study of presupposition, which can be considered as a special case of referential expression, when reference is implicitly made to something that should, or is assumed, be common knowledge between interlocutors. Masia et al. [128] contrasted the processing of new information when “packaged” as presupposed (for instance, through a definite description: The migration was confirmed by a very recent article) or explicitly asserted as new (There was a migration,

596

Paolo Canal and Valentina Bambini

confirmed by a very recent article) embedded in naturalistic discourse contexts (e.g., It is by now well established that the humankind is not pure. In fact, our DNA contains genetic information belonging to Neanderthals, who soon peopled Europe). Results showed that when new information is presupposed, a larger N400 is observed. In Domaneschi et al. [129], the same target sentence (e.g., Due to overstaffing problems, about a month ago the graphic designer was made redundant) was presented following a context in which no information was given about the referent and therefore the presupposition needed to be accommodated in the discourse representation (In Paolo’s office there are many employees) or following a context in which the antecedent was explicit and the presupposition “satisfied” (In Paolo’s office, there used to be a very bad-tempered graphic designer). Overall, the accommodation of presupposed information induced a larger N400 followed by a larger P600. These results fit with the account of Schumacher [127]: when the effort mainly consists in tracking new information given as presupposed [128], the N400 increases, whereas, when the effort requires the update of the discourse model to accommodate the presupposed new information and consequently correct or change the discourse representation, a P600 is observed [129]. 3.3

Conversation

Perhaps all research in psycho/neurolinguistics should aim at describing language processing in a naturalistic environment [130], and arguably conversation is the most natural environment for the use of language. Studying conversation in a laboratory setting where researchers try to get control over a range of confounding variables connected to the rich context in which conversation occurs is challenging. However, there are some interesting attempts, using, for instance, multiple modalities to realistically represent the large set of conversational features that is typical of communicative exchanges [e.g., 131]. One line of research in the domain of conversation concerns speech acts, that is, those actions that we perform using language, such as asking a question or giving an order [132, 133]. A number of studies have addressed the processing mechanisms underlying the two categories of speech acts: direct (when the structure of the utterance corresponds to the communicative intention) and indirect (when the speaker does not directly state the intended meaning of the utterance). The recognition of these acts was studied using written [134], spoken [135], or audiovisual [131, 136] modalities. For instance, Egorova et al. [131] investigated the processing of the same noun in a context where the spoken word was used for two different and direct speech acts, either naming or requesting. EEG differences between the two speech acts occurred as soon as 120 ms, and similar results were found also when naming and requesting were accompanied by gestures [136].

Pragmatics Electrified

597

By using auditory materials, Gisladottir et al. [135] compared the ERPs during listening of the same target sentence I have a credit card that could convey three, more or less direct, speech acts depending on the prior context: the answer condition as the most direct speech act, e.g., How are you going to pay for the ticket?; a pre-offer, e.g., I don’t have any money to pay for the ticket; and a declination, e.g., I can lend you money for the ticket. Differences between conditions concerned the beginning of the utterance and the onset of the critical word but did not clearly involve a specific ERP component. Perhaps even more interesting were the results in Gisladottir et al. [137] in the time–frequency domain: a power decrease for the lower bound of the beta range of the EEG was associated with declinations, as compared to answers and pre-offers, immediately preceding the target sentence. Switching to indirect speech acts, Coulson and Lovett [134] compared the processing of the same utterance interpreted either as an indirect request or as a literal statement depending on the previous context: in a scenario where a married couple order a soup, the wife utters My soup is too cold to eat either to the waiter (indirect request) or to her husband (literal context). The analysis of slow cortical potentials revealed differences between conditions starting from the second word of the sentence [see also proverbs in 67], suggesting that the conversational context is taken early into account, even in the absence of prosodic or gestural cues (in the written modality). Experimental work on other relevant aspects of conversation have been studied, although not extensively yet. One is the use of pregnant pause in conversation [138, 139], which may be associated with larger late positive effects in the ERPs due to the fact that “no” responses, usually preceding longer pauses, are disaffiliative and induce an evaluative process (in line with morally objectionable statements [106] or insults [110]). Overall, the literature on the ERP correlates of conversation is very promising, although the effects do not easily map onto the traditional components of language processing, possibly because conversational processes themselves are less clearly time-locked to individual words (as in the obvious case of pauses [138]) and affected by multiple factors (sentence content, prosody, gestures). One thing that seems to emerge consistently is that some conversational mechanisms such as the recognition of speech acts can occur very early. These early effects indicate that pragmatic factors are quickly accounted for and allow for a smooth and fast transition between conversational turns.

598

4

Paolo Canal and Valentina Bambini

ERP Responses Related to Pragmatics in Clinical Conditions In this section, we address an important yet poorly investigated topic, namely the EEG signature of impaired pragmatic processing. Pragmatic disorders can happen in a vast number of clinical populations, both neurological and psychiatric, affecting expressive as well as receptive skills [140]. In particular, there is EEG evidence of disrupted pragmatic processing in two of the domains described earlier, namely figurative language and discourse comprehension [141–144]. One of the conditions mostly affected by pragmatic impairment is schizophrenia, a mental illness characterized by a cohort of symptoms including cognitive [e.g., 145, 146] as well as linguistic deficits [e.g., 147, 148]. As one of the hallmarks of language disturbances in schizophrenia, the literature documented the so-called concretism, namely a difficulty in abstract thinking and in the interpretation of non-literal meanings [149]. For instance, a patient asked to explain the meaning of “The lawyer is a shark” might reply that “It means that the lawyer swims fast.” Behavioral reports of impaired figurative language processing are supported by a bunch of studies evidencing altered EEG processing of figurative language. In [150], participants were presented with idiomatic (dead beat), literal (vicious dog), and unrelated (square wind) word pairs during EEG recording. Whereas controls showed a graded N400 effect (Idiomatic < Literal < Unrelated), individuals with schizophrenia only showed a larger N400 for Unrelated vs Literal conditions, suggesting that they may not benefit of those contextual cues that allow for smoothly process figurative multiword expressions. Two other studies on metaphor processing reported a reduced N400 in schizophrenia with respect to the control group. Importantly, however, the N400 reduction did not specifically affect the metaphoric condition, being present also for literal sentences [151, 152]. A diminished sensitivity of the N400 to contextual expectations is a consolidated finding in the literature [for meta-analysis and a review, see 153, 154] and suggests that the most evident problem of language comprehension for individuals with schizophrenia is the difficulty of using contextual cues in processing discourse [e.g., 155, 156]. It is interesting to notice that also in the case of individuals with Asperger Syndrome there seems to be a reduced N400 sensitivity compared to neurotypical individuals. The study of Gold et al. [157] showed that individuals with Asperger Syndrome exhibited a marked N400 for novel metaphorical pairs (compared to literal metaphoric word pairs) that was on par with the N400 for unrelated word pairs (Conventional Metaphor = Literal < Unrelated = Novel Metaphor), a pattern that was different from the one observed for the control group (Conventional Metaphor = Literal = Novel Metaphor < Unrelated).

Pragmatics Electrified

599

Other evidence comes from studies on irony and the P600. Del Goleto et al. [158] found differences between two groups of participants with high or low schizotypy (i.e., schizophrenia-like personality traits), reporting that only the low schizotypal group showed the P600 effect to irony. Irony has been investigated also in relation to dysphoria, reporting mood-related alterations in the P600 [159]: specifically, in this study differences between the group with dysphoria and the control group concerned the spatial distribution of the P600. Such a paucity of studies investigating pragmatic phenomena using EEG in the clinical domain raises the question as to whether EEG-based indexes can be used as pathological biomarkers. In this respect, there is still a lot of work to do, extending the investigation to other pathological conditions and focusing on a larger set of indexes.

5

Conclusions and Future Directions In this chapter, we provided an extended overview of the electrophysiological correlates of language comprehension when it involves pragmatic processing. Across the domains of non-literal language, discourse, and conversation, we observed that two ERP components are engaged in the vast majority of the studies, namely the N400 and the P600. Pragmatic processes seem to be associated also with a set of long-lasting negativities, albeit these are less commonly observed. Figure 1 summarizes the main ERP effects for pragmatics. The value of the accumulated evidence is twofold. On the one hand, the knowledge about the functional meaning of these components has been used to investigate key issues in pragmatics. On the other hand, the work reviewed here allows refining and extending the functional interpretation of the ERP components in general, which have rarely been described from a pragmatic perspective [3, 127, 160]. Starting with the first point, that is, the contribution of the ERP literature to solve pragmatic issues, one of the questions that mostly dominated the field of experimental pragmatics concerns whether the access to non-literal meaning is direct [161] or indirect [37]. ERP research offers important evidence to solve the debate on this issue, although the answer may not be straightforward. The observed reduction of the N400 size supports a rather direct access to figurative meaning in the case of idioms [59, 61] and metaphors, when they are conventional or when the context is supportive enough [44, 45, 52]. However, the larger N400 for novel metaphors [44, 45] and the fact that literal meanings can prime the figurative meaning of metaphors [55] suggest that more demanding processes related to the selection of the relevant conceptual

600

Paolo Canal and Valentina Bambini

Fig. 1 ERP components in pragmatics—the figure displays the relation between the range of pragmatic phenomena described in the chapter and the ERP components. Negativity is plotted upwards, and reference to temporal and topographic information is provided on the time scale at the bottom and on the head models on the left, respectively. The position where the name of the different pragmatic phenomena is printed depends on the typical latency and direction (larger or smaller amplitude of each component) of the ERP effects associated with each specific phenomenon. The top row shows an ERP waveform recorded from anterior electrodes where anterior negativities are typically found for literary metaphor, humor, and referential ambiguity (here we refer collectively to Left Anterior Negativity, Sustained Anterior Negativity, and Nref).

Pragmatics Electrified

601

properties of the “literal” vehicle are needed. Importantly, the study of irony and humor shows the involvement of the P600 component in non-literal interpretation [73, 81, 90], which points to a second—later—route for indirectness, more clearly linked to inferential processes. Similar effects are observed also in lexical pragmatics phenomena, especially when metaphors are embedded in sentences [49, 52]. Globally, the results speak of two distinct indirect routes, one touching the meaning of the words to be intended figuratively (N400) and the other using inferences to derive an interpretation [52, 73], update [127], or integrate [13] the mental representation of the discourse (P600). With respect to the contribution of the ERP literature on pragmatics to the ERP literature on language in general, this mainly involves the functional interpretation of both N400 and P600. The widely accepted relationship between N400 and retrieval from semantic memory [14] finds new and important support in the studies mentioned in this chapter. First, the N400 is sensitive to the relative degree of lexicalization of figurative expressions [45, 61], which is consistent with lexical frequency effects on single words [e.g., 162, 163]. Second, the N400 is dependent on contextual expectations also when processing figurative expressions [52, 59], as it is for processing literal language [14, 100]. Moreover, studies in pragmatics allow to extend the classic view on the N400, offering a unique perspective: for instance, if we further scrutinize the role of the N400 in metaphor, a growing body of evidence suggests that meaning selection may involve further processes, such as analogical mapping, in which mental images play a role, as suggested by work on concrete versus abstract metaphors [46, 164–166]. Concerning the role of the P600, it is now widely accepted that this component is not only related to syntax [12, 13, 127, 167]. Studies of mind-reading pragmatics are indicative of inferential mechanisms underlying the P600 [81] and similarly for lexical pragmatics phenomena the short-lived P600 effects may reflect the effort spent to infer the sentence interpretation [52] or to perform “minor” updates of the sentence interpretation [127]. Collectively, the evidence reviewed here greatly contributes to finetune the interpretation of the beyond-syntax P600, considering also a pragmatic perspective on this component. Our hope is that this chapter will encourage future research in both directions: using ERP components for investigating pragmatic mechanisms and using pragmatic phenomena to elucidate ä Fig. 1 (continued) The second row shows an ERP waveform recorded from central electrodes where the N400 is typically found for metaphors (novel and conventional), idioms, and contextual expectations. The third row shows an ERP waveform recorded from posterior electrodes where P600 and LPC are typically found for idioms, irony, metaphor in context, and humor

602

Paolo Canal and Valentina Bambini

the nature of ERP effects. When planning future research, a number of further questions appear as outstanding and should receive primary attention: (1) The definition of the entire set of ERP components associated with pragmatic processing The cognitive mechanisms involved in pragmatic phenomena do not only impact the N400 and P600 (see Fig. 1). As illustrated in the sections above, longlasting ERP effects with a negative polarity have been observed for ambiguous anaphors (the Nref [123]), for the processing of literary metaphors (sustained negativity [51]), or for humor processing (sustained LAN [84, 88]). We may argue that all these linguistic phenomena are linked to the additional search for information in the mental representation of the discourse, needed to choose the right referent for an ambiguous anaphor, to indulge in deriving an array of weak implicatures in poetry, or to search for a solution cracking the joke. However, it is not trivial to determine whether the three negative effects are due to the same or different processing mechanisms because they show different scalp distributions (more or less lateralized, although mainly with a frontal focus). Moreover, the literature features several positive ERP effects that are often interpreted referring to the LPC component, but a shared definition of its functional role in pragmatics is still missing. For instance, the association of LPC and the processing of jokes suggest that the underlying mechanisms may involve an elaborative and evaluative processing stage, on par with evidence from the study of insults and moral transgressions in which ERP differences affect the LPP [106, 109, 110]. Yet it is unclear how to distinguish between the LPC and the P600 components or whether it is possible to determine a canonical timing or scalp distribution for the pragmatic P600 at all. Finally, the early (i.e., before the N400) response observed, for instance, in the case of speech acts [131, 136] points to the idea that the inventory of pragmatics-related ERP components could include also early effects, possibly linked to fast and automatic conversational mechanisms: this is certainly a domain where more testing is needed. (2) The role of individual differences on pragmatics-related ERP effects The interest in the study of individual differences to better describe language processing mechanisms has characterized research on language comprehension from its infancy, focusing on aspects such as memory [e.g., 168] and age [e.g., 169]. Notably, the evidence reviewed in this chapter shows that many individual factors may have an impact on the ERPs associated with pragmatic phenomena, including cognitive skills in the case of metaphor [47], specific knowledge about a story [104], but also moral values [106], gender beliefs

Pragmatics Electrified

603

[116], and social skills in the case of humor [88] and scalars [170]. The inclusion of individual factors in the ERP analysis is, however, sporadic. A proper pragmatic framework should pay considerably more attention to individual variability, as these extralinguistic factors are an important part of a broad notion of context, as intended in pragmatics. (3) The functional meaning of EEG oscillations Concerning the time–frequency domain of the EEG in pragmatics, studies are very few and the results are not always consistent. The roles of power drops in the beta and in the gamma ranges seem of particular interest, since the former may be a consequence of major revision processes [88] and the latter may be associated with the idling of semantic composition mechanisms [59, 63] in favor of more pragmatic processes. Given the importance of oscillations to fully understand language processing [22, 24, 171], it is key that future studies in pragmatics more systematically include the exploration of the time–frequency domain of the EEG for favoring an integrated account of language comprehension, extending beyond syntax and semantics. Probably, answers to these open issues will come also from the study of a larger set of pragmatic phenomena than the selection considered in this chapter. Here, we have focused on non-literal language and discourse and conversation, both because they are theoretically salient in pragmatics and because they have been extensively studied with the EEG-based measures. However, pragmatic mechanisms are involved also in several other language phenomena, on which electrophysiological investigation is still scarce. A non-exhaustive list includes the processing of scalars [170, 172, 173], negation [174, 175], lies [176], and those phenomena at the interface with discourse factors, such as the questions under discussion (QUD) [177], or the interaction between prosody and information structure [178].

Author Contribution VB conceived the structure of the chapter and the pragmatic framework presented in 1.1. Both authors contributed to retrieve and interpret the content of the different sections. PC wrote the first draft of the manuscript; VB revised the text. Conflict of Interest The authors declare that they have no conflict of interest.

604

Paolo Canal and Valentina Bambini

References 1. Kutas M, Van Petten C (1994) Psycholinguistics electrified: event-related potential investigations. In: Gernsbacher MA (ed) Handbook of psycholinguistics. Academic Press, Massachusetts, pp 83–143 2. Kutas M, Van Petten C, Kluender R (2006) Psycholinguistics electrified II: 1994–2005. In: Traxler MJ, Gernsbacher MA (eds) Handbook of psycholinguistics, 2nd edn. Elsevier, New York, pp 659–724 3. Van Berkum JJ (2009) The neuropragmatics of ‘simple’ utterance comprehension: an ERP review. In: Breheny U, Sauerland R, Yatsushiro K (eds) Semantics and pragmatics: from experiment to theory. Palgrave-Macmillan, Basingstoke, pp 276–316 4. Levinson SC (1983) Pragmatics. Cambridge University Press, Cambridge. https://doi. org/10.1017/CBO9780511813313 5. Bar-Hillel M (1971) Pragmatics of natural languages. Springer, Netherlands https:// doi.org/10.1007/9789401017138 6. Mey J (1998) Pragmatics. In: Mey J (ed) Concise encyclopedia of pragmatics. Elsevier, Amsterdam, pp 716–737 7. Verschueren J, Verschueren J (1999) Understanding pragmatics. Arnold, London. https://books.google.it/books?id=pOsUxSlhBwC 8. Sperber D, Wilson D (2005) Pragmatics. In: Jackson F, Smith M (eds) The oxford handbook of contemporary philosophy. Oxford University Press, Oxford, pp 468–503 9. Horn LR, Ward G (1999) Pragmatics. In: Wilson RA, Keil FC (eds) The MIT encyclopedia of the cognitive sciences. MIT Press, Cambridge, pp 661–664 10. Allott N (2010) Key terms in pragmatics. Bloomsbury Academic, London. https:// books.google.it/books?id=qLkMbIiLG1AC 11. Luck SJ, Kappenman ES (2011) The oxford handbook of event-related potential components. Oxford University Press, Oxford. h t t p s : // do i . or g /1 0 . 1 0 93 / ox f o r d h b / 9780195374148.001.0001 12. Kuperberg GR (2007) Neural mechanisms of language comprehension: challenges to syntax. Brain Res 1146:23–49. https://doi. org/10.1016/j.brainres.2006.12.063 13. Brouwer H, Hoeks JC (2013) A time and place for language comprehension: mapping the N400 and the P600 to a minimal cortical network. Front Hum Neurosci 7:758. https://doi.org/10.3389/fnhum.2013. 00758

14. Kutas M, Federmeier KD (2011) Thirty years and counting: finding meaning in the N400 component of the event-related brain potential (ERP). Annu Rev Psychol 62(1): 621–647. https://doi.org/10.1146/ annurev.psych.093008.131123 15. Lau EF, Phillips C, Poeppel D (2008) A cortical network for semantics: (de)constructing the N400. Nat Rev Neurosci 9(12):920–933. https://doi.org/10.1038/nrn2532 16. Osterhout L, Holcomb PJ (1992) Eventrelated brain potentials elicited by syntactic anomaly. J Mem Lang 31(6):785–806. https://doi.org/10.1016/0749-596X(92) 90039-Z 17. Kolk H, Chwilla D (2007) Late positivities in unusual situations. Brain Lang 100(3): 257–261. https://doi.org/10.1016/j. bandl.2006.07.006 18. Kuperberg GR, Sitnikova T, Caplan D, Holcomb PJ (2003) Electrophysiological distinctions in processing conceptual relationships within simple sentences. Brain Res Cogn Brain Res 17(1):117–129. https://doi. org/10.1016/S0926-6410(03)00086-7 19. Kim A, Osterhout L (2005) The independence of combinatory semantic processing: evidence from event-related potentials. J Mem Lang 52(2):205–225. https://doi. org/10.1016/j.jml.2004.10.002 20. Van Petten C, Luka BJ (2012) Prediction during language comprehension: benefits, costs, and ERP components. Int J Psychophysiol 83(2):176–190. https://doi.org/10.101 6/j.ijpsycho.2011.09.015 21. Pynte J, Besson M, Robichon F-H, Poli J (1996) The time-course of metaphor comprehension: an event-related potential study. Brain Lang 55(3):293–316. https://doi. org/10.1006/brln.1996.0107 22. Bastiaansen M, Mazaheri A, Jensen O (2012) Beyond ERPs: oscillatory neuronal dynamics. In: Luck S, Kappenman E (eds) Oxford handbook of event-related potential components. Oxford University Press, Oxford, pp 31–49 23. Lewis AG, Wang L, Bastiaansen M (2015) Fast oscillatory dynamics during language comprehension: unification versus maintenance and prediction? Brain Lang 148:51– 63. https://doi.org/10.1016/j.bandl.201 5.01.003 24. Meyer L (2018) The neural oscillations of speech processing and language comprehension: state of the art and emerging

Pragmatics Electrified mechanisms. Eur J Neurosci 48(7): 2609–2621. https://doi.org/10.1111/ejn. 13748 25. Prystauka Y, Lewis AG (2019) The power of neural oscillations to inform sentence comprehension: a linguistic perspective. Lang Linguist Compass 13(9):e12347. https://doi. org/10.1111/lnc3.12347 26. Weiss S, Mueller HM (2012) Too many betas do not spoil the broth: the role of beta brain oscillations in language processing. Front Psychol 3:201. https://doi.org/10.3389/ fpsyg.2012.00201 27. Giraud A-L, Poeppel D (2012) Cortical oscillations and speech processing: emerging computational principles and operations. Nat Neurosci 15(4):511–517. https://doi. org/10.1038/nn.3063 28. Nunberg G, Sag IA, Wasow T (1994) Idioms. Language 70(3):491–538. https://doi.org/ 10.1353/lan.1994.0007 29. Libben MR, Titone DA (2008) The multidetermined nature of idiom processing. Mem Cognit 36(6):1103–1121. https://doi. org/10.3758/MC.36.6.1103 30. Gibbs RW (2001) Proverbial themes we live by. Poetics 29(3):167–188. https://doi. org/10.1016/S0304-422X(01)00041-9 31. Glucksberg S (2003) The psycholinguistics of metaphor. Trends Cognit Sci 7(2):92–96. https://doi.org/10.1016/S1364-6613(02) 00040-2 32. Carston R (2012) Metaphor and the literal/ non-literal distinction. In: Allan K, Jaszczolt KM (eds) The Cambridge handbook of pragmatics. Cambridge University Press, Cambridge, pp 469–492. https://doi.org/10. 1017/CBO9781139022453.025 33. Bowdle BF, Gentner D (2005) The career of metaphor. Psychol Rev 112(1):193–216. h t t p s : // d o i . o r g / 1 0 . 1 0 3 7 / 0 0 3 3 - 2 95X.112.1.193 34. Carston R (2010) Lexical pragmatics, ad hoc concepts and metaphor: from a relevance theory perspective. Ital J Linguist 22(1): 1 5 3 – 1 8 0 . h t t p : // l i n g u i s t i c a . s n s . i t / RdL/22.1/carston.pdf 35. Wilson, D. (2003) Relevance and lexical pragmatics. Ital J Linguist 15:273–292. http:// w w w. i t a l i a n - j o u r n a l - l i n g u i s t i c s . c o m / wpcontent/uploads/03.Wilson.pdf 36. Wilson D, Carston R (2007) A unitary approach to lexical pragmatics: relevance, inference and ad hoc concepts. In: BurtonRoberts N (ed) Pragmatics. PalgraveMacmillan, Basingstoke, pp 230–259.

605

https://www.researchgate.net/publica tion/239542817 37. Grice HP (1975) Logic and conversation. In: Cole P, Morgan JS (eds) Syntax and Semantics, Volume III: Speech Acts. Academic Press, pp 41–58. https://doi.org/10.1163/ 9789004368811_003 38. Wilson D, Sperber D (1992) On verbal irony. L i n g u a 8 7 ( 1 ) : 5 3 – 7 6 . h t t p s : // d o i . org/10.1016/0024-3841(92)90025-E 39. Attardo S (1994) Linguistic theories of humor. De Gruyter Mouton, Berlin. https:// doi.org/10.1515/9783110219029 40. Arzouan Y, Goldstein A, Faust M (2007) Brainwaves are stethoscopes: ERP correlates of novel metaphor comprehension. Brain Res 1160:69–81. https://doi.org/10.1016/j. brainres.2007.05.034 41. Bonnaud V, Gil R, Ingrand P (2002) Metaphorical and nonmetaphorical links: a behavioral and ERP study in young and elderly adults. Neurophysiol Clin/Clin Neurophysiol 32(4):258–268. https://doi.org/10.1016/ S0987-7053(02)00307-6 42. Forga´cs B, Bardolph MD, Amsel BD, DeLong KA, Kutas M (2015) Metaphors are physical and abstract: ERPs to metaphorically modified nouns resemble ERPs to abstract language. Front Hum Neurosci 9:28. https://doi.org/10.3389/fnhum.2015. 00028 43. Forga´cs B (2020) An electrophysiological abstractness effect for metaphorical meaning making. Eneuro 7(5). https://doi.org/10. 1523/ENEURO.0052-20.2020 44. Goldstein A, Arzouan Y, Faust M (2012) Killing a novel metaphor and reviving a dead one: ERP correlates of metaphor conventionalization. Brain Lang 123(2):137–142. https:// doi.org/10.1016/j.bandl.2012.09.008 45. Lai VT, Curran T, Menn L (2009) Comprehending conventional and novel metaphors: an ERP study. Brain Res 1284:145–155. https://doi.org/10.1016/j.brainres.2009.0 5.088 46. Lai VT, Curran T (2013) ERP evidence for conceptual mappings and comparison processes during the comprehension of conventional and novel metaphors. Brain Lang 127(3):484–496. https://doi.org/10.1016/ j.bandl.2013.09.010 47. Kazmerski VA, Blasko DG, Dessalegn BG (2003) ERP and behavioral evidence of individual differences in metaphor comprehension. Mem Cognit 31(5):673–689. https:// doi.org/10.3758/BF03196107

606

Paolo Canal and Valentina Bambini

48. Tartter VC, Gomes H, Dubrovsky B, Molholm S, Stewart RV (2002) Novel metaphors appear anomalous at least momentarily: evidence from N400. Brain Lang 80(3): 488–509. https://doi.org/10.1006/brln. 2001.2610 49. De Grauwe S, Swain A, Holcomb PJ, Ditman T, Kuperberg GR (2010) Electrophysiological insights into the processing of nominal metaphors. Neuropsychologia 48(7):1965–1984. https://doi.org/10.101 6/j.neuropsychologia.2010.03.017 50. Fondevila S, Aristei S, Sommer W, Jime´nezOrtega L, Casado P, Martın-Loeches M (2016) Counterintuitive religious ideas and metaphoric thinking: an event-related brain potential study. Cognit Sci 40(4):972–991. https://doi.org/10.1111/cogs.12263 51. Bambini V, Canal P, Resta D, Grimaldi M (2019) Time course and neurophysiological underpinnings of metaphor in literary context. Discourse Process 56(1):77–97. https://doi.org/10.1080/0163853X.201 7.1401876 52. Bambini V, Bertini C, Schaeken W, Stella A, Di Russo F (2016) Disentangling metaphor from context: an ERP study. Front Psychol 7: 559. https://doi.org/10.3389/fpsyg.2016. 00559 53. Coulson S, Van Petten C (2002) Conceptual integration and metaphor: an event-related potential study. Mem Cognit 30(6): 958–968. https://doi.org/10.3758/BF031 95780 54. Schmidt-Snoek GL, Drew AR, Barile EC, Agauas SJ (2015) Auditory and motion metaphors have different scalp distributions: an ERP study. Front Hum Neurosci 9:126. https://doi.org/10.3389/fnhum.2015. 00126 55. Weiland H, Bambini V, Schumacher PB (2014) The role of literal meaning in figurative language comprehension: evidence from masked priming ERP. Front Hum Neurosci 8: 583. https://doi.org/10.3389/fnhum. 2014.00583 56. Rataj K, Przekoracka-Krawczyk A, Van der Lubbe RH (2018) On understanding creative language: the late positive complex and novel metaphor comprehension. Brain Res 1678: 231–244. https://doi.org/10.1016/j. brainres.2017.10.030 57. Rutter B, Kro¨ger S, Hill H, Windmann S, Hermann C, Abraham A (2012) Can clouds dance? Part 2: An ERP investigation of passive conceptual expansion. Brain Cognit 80(3): 301–310. https://doi.org/10.1016/j. bandc.2012.08.003

58. Tang X, Qi S, Wang B, Jia X, Ren W (2017) The temporal dynamics underlying the comprehension of scientific metaphors and poetic metaphors. Brain Res 1655:33–40. https:// doi.org/10.1016/j.brainres.2016.11.005 59. Canal P, Pesciarelli F, Vespignani F, Molinaro N, Cacciari C (2017) Basic composition and enriched integration in idiom processing: an EEG study. J Exp Psychol: Learn Mem Cognit 43(6):928–943. https://doi. org/10.1037/xlm0000351 60. Moreno EM, Federmeier KD, Kutas M (2002) Switching languages, switching palabras (words): an electrophysiological study of code switching. Brain Lang 80(2):188–207. https://doi.org/10.1006/brln.2001.2588 61. Vespignani F, Canal P, Molinaro N, Fonda S, Cacciari C (2010) Predictive mechanisms in idiom comprehension. J Cognit Neurosci 22(8):1682–1700. https://doi.org/10. 1162/jocn.2009.21293 62. Molinaro N, Carreiras M (2010) Electrophysiological evidence of interaction between contextual expectation and semantic integration during the processing of collocations. Biol Psychol 83(3):176–190. https://doi. org/10.1016/j.biopsycho.2009.12.006 63. Rommers J, Dijkstra T, Bastiaansen M (2013) Context-dependent semantic processing in the human brain: evidence from idiom comprehension. J Cognit Neurosci 25(5): 762–776. https://doi.org/10.1162/jocn_a_ 00337 64. Federmeier KD, Kutas M (1999) A rose by any other name: long-term memory structure and sentence processing. J Mem Lang 41(4): 469–495. https://doi.org/10.1006/jmla. 1999.2660 65. Bianchi B, Shalom DE, Kamienkowski JE (2019) Predicting Known Sentences: neural basis of proverb reading using non-parametric statistical testing and mixedeffects models. Front Hum Neurosci 13:82. https://doi.org/10.3389/fnhum.2019. 00082 66. Cermolacce M, Scannella S, Fauge`re M, VionDury J, Besson M (2014) “All that glitters is not. . .” alone. Congruity effects in highly and less predictable sentence contexts. Neurophysiol Clin/Clin Neurophysiol 44(2):189–201. h t t p s : // d o i . o r g / 1 0 . 1 0 1 6 / j . neucli.2014.04.001 67. Ferretti TR, Schwint CA, Katz AN (2007) Electrophysiological and behavioral measures of the influence of literal and figurative contextual constraints on proverb comprehension. Brain Lang 101(1):38–49. https://doi. org/10.1016/j.bandl.2006.07.002

Pragmatics Electrified 68. Ferretti TR, Katz AN, Schwint CA, Patterson C, Pradzynski D (2020a) How discourse constraints influence neurolinguistic mechanisms during the comprehension of proverbs. Cognit Affect Behav Neurosci 20: 604–623. https://doi.org/10.3758/s1341 5-020-00790-9 69. Ferretti TR, Hall DC, Mansour F (2020b) Interpreting pragmatic markers following proverbs. Can J Exp Psychol/Revue Can Psychol Exp. https://doi.org/10.1037/ cep0000231 70. Schumacher PB (2011) The hepatitis called . . . Electrophysiological evidence for enriched composition. In: Meibauer J, Steinbach M (eds) Experimental pragmatics/semantics. John Benjamins Publishing Company, Amsterdam/Philadelphia, pp 199–219. https://books.google.it/books?id=Puw0 aPVU1AoC 71. Schumacher PB (2014) Content and context in incremental processing: “the ham sandwich” revisited. Philos Stud 168(1): 151–165. https://doi.org/10.1007/s110 98-013-0179-6 72. Regel S, Coulson S, Gunter TC (2010) The communicative style of a speaker can affect language comprehension? ERP evidence from the comprehension of irony. Brain Res 1311:121–135. https://doi.org/10.1016/j. brainres.2009.10.077 73. Regel S, Gunter TC, Friederici AD (2011) Isn’t it ironic? An electrophysiological exploration of figurative language processing. J Cognit Neurosci 23(2):277–293. https:// doi.org/10.1162/jocn.2010.21411 74. Regel S, Meyer L, Gunter TC (2014) Distinguishing neurocognitive processes reflected by P600 effects: evidence from ERPs and neural oscillations. PLoS One, 9(5):1–11. https://doi.org/10.1371/journal.pone. 0096840 75. Regel S, Gunter TC (2017) Don’t get me wrong: ERP evidence from cueing communicative intentions. Front Psychol 8:1465. https://doi.org/10.3389/fpsyg.2017. 01465 76. Gibson L, Atchley RA, Voyer D, Diener US, Gregersen S (2016) Detection of sarcastic speech: the role of the right hemisphere in ambiguity resolution. Laterality 21(4–6): 549–567. https://doi.org/10.1080/13 57650X.2015.1105246 77. Wickens S, Perry C (2015) What do you mean by that? An electrophysiological study of emotional and attitudinal prosody. PLoS One 10(7):1–24. https://doi.org/10.1371/jour nal.pone.0132947

607

78. Caffarra S, Michell E, Martin CD (2018) The impact of foreign accent on irony interpretation. PLoS One 13(8):1–13. https://doi. org/10.1371/journal.pone.0200939 79. Filik R, Leuthold H, Wallington K, Page J (2014) Testing theories of irony processing using eye-tracking and ERPs. J Exp Psychol: Learn Mem Cognit 40(3):811–828. https:// doi.org/10.1037/a0035658 80. Weissman B, Tanner D (2018) A strong wink between verbal and emoji-based irony: How the brain processes ironic emojis during language comprehension. PLoS One 13(8): 1–26. https://doi.org/10.1371/journal. pone.0201727 81. Spotorno N, Cheylus A, Van Der Henst J-B, Noveck IA (2013) What’s behind a P600? Integration operations during irony processing. PLoS One 8(6):1–10. https://doi.org/ 10.1371/journal.pone.0066839 82. Baptista NI, Manfredi M, Boggio PS (2018) Medial prefrontal cortex stimulation modulates irony processing as indexed by the N400. Soc Neurosci 13(4):495–510. https://doi.org/10.1080/17470919.201 7.1356744 83. Caillies S, Gobin P, Obert A, Terrien S, Coutte´ A, Iakimova G, Besche-Richard C (2019) Asymmetry of affect in verbal irony understanding: What about the N400 and P600 components? J Neurolinguist 51:268– 2 7 7 . h t t p s : // d o i . o r g / 1 0 . 1 0 1 6 / j . jneuroling.2019.04.004 84. Coulson S, Kutas M (2001) Getting it: human event-related brain response to jokes in good and poor comprehenders. Neurosci Lett 316(2):71–74. https://doi.org/10.101 6/S0304-3940(01)02387-4 85. Coulson S, Williams RF (2005) Hemispheric asymmetries and joke comprehension. Neuropsychol 43(1):128–141. https://doi. o r g / 1 0 . 1 0 1 6 / j . neuropsychologia.2004.03.015 86. Feng Y-J, Chan Y-C, Chen H-C (2014) Specialization of neural mechanisms underlying the three-stage model in humor processing: an ERP study. J Neurolinguist 32:59–70. h t t p s : // d o i . o r g / 1 0 . 1 0 1 6 / j . jneuroling.2014.08.007 87. Mayerhofer B, Schacht A (2015) From incoherence to mirth: neurocognitive processing of garden-path jokes. Front Psychol 6:550. https://doi.org/10.3389/fpsyg.2015. 00550 88. Canal P, Bischetti L, Di Paola S, Bertini C, Ricci I, Bambini V (2019) ‘Honey, shall I change the baby? – Well done, choose another

608

Paolo Canal and Valentina Bambini

one’: ERP and time-frequency correlates of humor processing. Brain Cognit 132:41–55. https://doi.org/10.1016/j.bandc.201 9.02.001 89. Coulson S, Lovett C (2004) Handedness, hemispheric asymmetries, and joke comprehension. Cognit Brain Res 19(3):275–288. h t t p s : // d o i . o r g / 1 0 . 1 0 1 6 / j . cogbrainres.2003.11.015 90. Marinkovic K, Baldwin S, Courtney MG, Witzel T, Dale AM, Halgren E (2011) Right hemisphere has the last laugh: neural dynamics of joke appreciation. Cognit Affect Behav Neurosci 11(1):113–130. https://doi. org/10.3758/s13415-010-0017-7 91. Wang RW, Kuo H-C, Chuang S-W (2017) Humor drawings evoked temporal and spectral EEG processes. Soc Cognit Affect Neurosci 12(8):1359–1376. https://academic. oup.com/scan/articlepdf/12/8/1359/2 7104696/nsx054.pdf, https://doi.org/10. 1093/scan/nsx054 92. Perchtold-Stefan CM, Papousek I, Rominger C, Schertler M, Weiss EM, Fink A (2020) Humor comprehension and creative cognition: shared and distinct neurocognitive mechanisms as indicated by EEG alpha activity. NeuroImage 213:116695. https://doi. org/10.1016/j.neuroimage.2020.116695 93. Molinaro N, Barber HA, Carreiras M (2011) Grammatical agreement processing in reading: ERP findings and future directions. Cor tex 47(8):908–930. https://doi. org/10.1016/j.cortex.2011.02.019 94. Suls JM (1972) A two-stage model for the appreciation of jokes and cartoons: an information-processing analysis. In: Goldstein J, McGhee P (eds) The psychology of humor; theoretical perspectives and empirical issues. Academic Press, San Diego, pp 81–100. https://doi.org/10.1016/B978-012-288950-9.50010-9 95. Wyer RS, Collins JE (1992) A theory of humor elicitation. Psychol Rev 99(4): 663–688. https://doi.org/10.1037/0033-2 95X.99.4.663 96. Morton J (1969) Interaction of information in word recognition. Psychol Rev 76(2): 165–178. https://doi.org/10.1037/h002 7366 97. Swinney DA (1979) Lexical access during sentence comprehension: (re)consideration of context effects. J Verbal Learn Verbal Behav 18(6):645–659. https://doi.org/10.1016/ S0022-5371(79)90355-4 98. Kutas M, Hillyard SA (1980) Reading senseless sentences: brain potentials reflect semantic incongruity. Science 207(4427):203–205. https://doi.org/10.1126/science.7350657

99. Taylor WL (1953) “Cloze procedure”: a new tool for measuring readability. J Quart 30(4): 415–433. https://doi.org/10.1177/10 7769905303000401 100. Hagoort P, Van Berkum JJA (2007) Beyond the sentence given. Philos Trans R Soc B: Biol Sci 362(1481):801–811. https://doi.org/ 10.1098/rstb.2007.2089 101. Van Berkum JJA, van den Brink D, Tesink CMJY, Kos M, Hagoort P (2008) The neural integration of speaker and message. J Cognit Neurosci 20(4):580–591. https://doi.org/ 10.1162/jocn.2008.20054 102. White KR, Crites J, Stephen L, Taylor JH, Corral G (2009) Wait, what? Assessing stereotype incongruities using the N400 ERP component. Soc Cognit Affect Neurosci 4(2): 191–198. https://doi.org/10.1093/scan/ nsp004 103. Hagoort P, Hald L, Bastiaansen M, Petersson KM (2004) Integration of word meaning and world knowledge in language comprehension. Science 304(5669):438–441. https:// doi.org/10.1126/science.1095455 104. Troyer M, Kutas M (2020) Harry Potter and the chamber of what?: the impact of what individuals know on word processing during reading. Lang Cognit Neurosci 35(5): 641–657. https://doi.org/10.1080/23273 798.2018.1503309 105. Nieuwland MS, Van Berkum JJA (2006) When peanuts fall in love: N400 evidence for the power of discourse. J Cognit Neurosci 18(7):1098–1111. https://doi.org/10. 1162/jocn.2006.18.7.1098 106. Van Berkum JJA, Holleman B, Nieuwland M, Otten M, Murre J (2009) Right or wrong? The brain’s fast response to morally objectionable statements. Psycholog Sci 20(9): 1092–1099. https://doi.org/10.1111/j.14 67-9280.2009.02411.x 107. Foucart A, Moreno EM, Martin CD, Costa A (2015) Integration of moral values during L2 sentence processing. Acta Psycholog 162:1– 12. https://doi.org/10.1016/j.actpsy.201 5.09.009 108. Hundrieser M, Stahl J (2016) How attitude strength and information influence moral decision making: evidence from event-related potentials. Psychophysiology 53(5):678–688. https://doi.org/10.1111/psyp.12599 109. Lu J, Peng X, Liao C, Cui F (2019) The stereotype of professional roles influences neural responses to moral transgressions: ERP evidence. Biol Psychol 145:55–61. https://doi.org/10.1016/j.biopsycho.201 9.04.007

Pragmatics Electrified 110. Otten M, Mann L, Van Berkum JJA, Jonas KJ (2017) No laughing matter: how the presence of laughing witnesses changes the perception of insults. Soc Neurosci 12(2):182–193. https://doi.org/10.1080/17470919.201 6.1162194 111. Wang L, Zhu Z, Bastiaansen M (2012) Integration or predictability? A further specification of the functional role of gamma oscillations in language comprehension. Front Psychol 3:187. https://doi.org/10. 3389/fpsyg.2012.00187 112. Hald LA, Bastiaansen MC, Hagoort P (2006) EEG theta and gamma responses to semantic violations in online sentence processing. Brain Lang 96(1):90–105. https://doi.org/10.101 6/j.bandl.2005.06.007 113. Metzner P, von der Malsburg T, Vasishth S, Ro¨sler F (2015) Brain responses to world knowledge violations: a comparison of stimulus- and fixation-triggered event-related potentials and neural oscillations. J Cognit Neurosci 27(5):1017–1028. https://doi. org/10.1162/jocn_a_00731 114. van Dijk T, Kintsch W (1983) Strategies of discourse comprehension. Academic Press. https://books.google.it/books?id=xJsAAAAIAAJ 115. Callahan SM (2008) Processing anaphoric constructions: insights from electrophysiological studies. J Neurolinguist 21(3): 231–266. https://doi.org/10.1016/j. jneuroling.2007.10.002 116. Canal P, Garnham A, Oakhill J (2015) Beyond gender stereotypes in language comprehension: self sex-role descriptions affect the brain’s potentials associated with agreement processing. Front Psychol 6:1953. https://doi.org/10.3389/fpsyg.2015. 01953 117. Osterhout L, Bersick M, McLaughlin J (1997) Brain potentials reflect violations of gender stereotypes. Mem Cognit 25(3): 273–285. https://doi.org/10.3758/ BF03211283 118. Osterhout L, Mobley LA (1995) Eventrelated brain potentials elicited by failure to agree. J Mem Lang 34(6):739–773. https:// doi.org/10.1006/jmla.1995.1033 119. Swaab TY, Camblin CC, Gordon PC (2004) Electrophysiological evidence for reversed lexical repetition effects in language processing. J Cognit Neurosci 16(5):715–726. https://doi.org/10.1162/089892904970 744 120. Nieuwland MS, Van Berkum JJA (2006) Individual differences and contextual bias in

609

pronoun resolution: evidence from ERPs. Brain Res 1118(1):155–167. https://doi. org/10.1016/j.brainres.2006.08.022 121. Van Berkum JJA, Brown CM, Hagoort P (1999) Early referential context effects in sentence processing: evidence from event-related brain potentials. J Mem Lang 41(2): 147–182. https://doi.org/10.1006/jmla. 1999.2641 122. Van Berkum JJA, Brown CM, Hagoort P, Zwitserlood P (2003) Event-related brain potentials reflect discourse-referential ambiguity in spoken language comprehension. Psychophysiology 40(2):235–248. https:// doi.org/10.1111/1469-8986.00025 123. Van Berkum JJA, Koornneef AW, Otten M, Nieuwland MS (2007) Establishing reference in language comprehension: an electrophysiological perspective. Brain Res 1146:158– 1 7 1 . h t t p s : // d o i . o r g / 1 0 . 1 0 1 6 / j . brainres.2006.06.091 124. Nieuwland MS (2014) Who’s he? Eventrelated brain potentials and unbound pronouns. J Mem Lang 76:1–28. https://doi. org/10.1016/j.jml.2014.06.002 125. Burkhardt P (2006) Inferential bridging relations reveal distinct neural mechanisms: evidence from event-related brain potentials. Brain Lang 98(2):159–168. https://doi. org/10.1016/j.bandl.2006.04.005 126. Schumacher PB, Hung Y-C (2012) Positional influences on information packaging: insights from topological fields in German. J Mem L a n g 6 7 ( 2 ) : 2 9 5 – 3 1 0 . h t t p s : // d o i . org/10.1016/j.jml.2012.05.006 127. Schumacher PB (2012) Context in neurolinguistics. In: Finkbeiner R, Meibauer J, Schumacher PB (eds) What is a context?: linguistic approaches and challenges. John Benjamins Publishing, Amsterdam/Philadelphia, pp 33–53. https://books. google.it/books?id=gcl-N7FZPA4C&lpg= PA33&ots=nhHo0v3V1a&lr&hl=it&pg= PA33#v=onepage&q&f=false 128. Masia V, Canal P, Ricci I, Vallauri EL, Bambini V (2017) Presupposition of new information as a pragmatic garden path: evidence from event-related brain potentials. J Neurolinguist 42:31–48. https://doi.org/10.101 6/j.jneuroling.2016.11.005 129. Domaneschi F, Canal P, Masia V, Lombardi Vallauri E, Bambini V (2018) N400 and P600 modulation in presupposition accommodation: the effect of different trigger types. J Neurolinguist 45:13–35. https://doi. org/10.1016/j.jneuroling.2017.08.002

610

Paolo Canal and Valentina Bambini

130. Hasson U, Egidi G, Marelli M, Willems RM (2018) Grounding the neurobiology of language in first principles: the necessity of non-language centric explanations for language comprehension. Cognition 180:135– 157. https://doi.org/10.1016/j.cogni tion.2018.06.018 131. Egorova N, Shtyrov Y, Pulvermu¨ller F (2013) Early and parallel processing of pragmatic and semantic information in speech acts: neurophysiological evidence. Front Hum Neurosci 7:86. https://doi.org/10.3389/fnhum. 2013.00086 132. Austin JL (1962) How to do things with words. Oxford University Press, Oxford. h t t p s : // b o o k s . g o o g l e . i t / b o o k s ? i d = XnRkQSTUpmgC 133. Searle J (1979) Expression and meaning: studies in the theory of speech acts. Cambridge University Press, Cambridge. https:// books.google.it/books?id=1WqLLMG1 XiIC 134. Coulson S, Lovett C (2010) Comprehension of non-conventional indirect requests: an event-related brain potential study. Ital J Linguist 22(1):107–124. http://www.italianjournal-linguistics.com/wpcontent/ uploads/coulsonlovett.pdf 135. Gisladottir RS, Chwilla DJ, Levinson SC (2015) Conversation electrified: ERP correlates of speech act recognition in underspecified utterances. PLoS One 10(3):1–24. https://doi.org/10.1371/journal.pone. 0120068 136. Tomasello R, Kim C, Dreyer FR, Grisoni L, Pulvermu¨ller F (2019) Neurophysiological evidence for rapid processing of verbal and gestural information in understanding communicative actions. Sci Rep 9(1):1–17. https://doi.org/10.1038/s41598-019-521 58-w 137. Gisladottir RS, Bo¨gels S, Levinson SC (2018) Oscillatory brain responses reflect anticipation during comprehension of speech acts in spoken dialog. Front Hum Neurosci 12:34. https://doi.org/10.3389/fnhum.2018. 00034 138. Bo¨gels S, Kendrick KH, Levinson SC (2015) Never say no. . . How the brain interprets the pregnant pause in conversation. PLoS One 10(12):1–15. https://doi.org/10.1371/jour nal.pone.0145474 139. Bo¨gels S, Kendrick KH, Levinson SC (2020) Conversational expectations get revised as response latencies unfold. Lang Cognit Neurosci 35(6):766–779. https://doi.org/10.10 80/23273798.2019.1590609

140. Cummings L (2017) Research in clinical pragmatics. Springer International Publishing, Cham. https://books.google.it/books? id=r5TZDQAAQBAJ 141. Cummings L (2014) Pragmatic disorders. Springer International Publishing, Cham 142. Thoma P, Daum I (2006) Neurocognitive mechanisms of figurative language processing–Evidence from clinical dysfunctions. Neurosci Biobehav Rev 30(8): 1182–1205. https://doi.org/https://doi. org/10.1016/j.neubiorev.2006.09.001 143. Martin I, McDonald S (2003) Weak coherence, no theory of mind, or executive dysfunction? Solving the puzzle of pragmatic language disorders. Brain Lang 85(3): 451–466. https://doi.org/10.1016/S0093934X(03)00070-1 144. Stemmer B (1999) Discourse Studies in neurologically impaired populations: a quest for action. Brain Lang 68(3):402–418. https:// doi.org/10.1006/brln.1999.2120 145. Bosia M, Bechi M, Bosinelli F, Politi E, Buonocore M, Spangaro M, Bianchi L, Cocchi F, Guglielmino C, Cavallaro R (2019) From cognitive and clinical substrates to functional profiles: disentangling heterogeneity in schizophrenia. Psychiatry Res 271: 446–453. https://doi.org/10.1016/j. psychres.2018.12.026 146. Harvey PD, Bowie CR, Friedman JI (2001) Cognition in schizophrenia. Curr Psychiatry Rep (3):423–428. https://doi.org/10.100 7/s11920-996-0038-7 147. Covington MA, He C, Brown C, Nac¸i L, McClain JT, Fjordbak BS, Semple J, Brown J (2005) Schizophrenia and the structure of language: the linguist’s view. Schizophr Res 77(1):85–98. https://doi.org/10.1016/j. schres.2005.01.016 148. Bambini V, Arcara G, Bechi M, Buonocore M, Cavallaro R, Bosia M (2016) The communicative impairment as a core feature of schizophrenia: frequency of pragmatic deficit, cognitive substrates, and relation with quality of life. Compr Psychiatry 71:106–120. https://doi.org/10.1016/j.comppsych.201 6.08.012 149. Bambini V, Arcara G, Bosinelli F, Buonocore M, Bechi M, Cavallaro R, Bosia M (2020) A leopard cannot change its spots: a novel pragmatic account of concretism in schizophrenia. Neuropsychologia 139: 107332. https://doi.org/10.1016/j. neuropsychologia.2020.107332 150. Strandburg RJ, Marsh JT, Brown WS, Asarnow RF, Guthrie D, Harper R, Yee CM,

Pragmatics Electrified Nuechterlein KH (1997) Event-related potential correlates of linguistic information processing in schizophrenics. Biol Psychiatry 42(7):596–608. https://doi.org/10.1016/ S0006-3223(96)00410-6 151. Iakimova G, Passerieux C, Laurent J-P, Hardy-Bayle M-C (2005) ERPs of metaphoric, literal, and incongruous semantic processing in schizophrenia. Psychophysiology 42(4):380–390. https://doi.org/10.1111/ j.1469-8986.2005.00303.x 152. Schneider S, Wagels L, Haeussinger FB, Fallgatter AJ, Ehlis A-C, Rapp AM (2015) Haemodynamic and electrophysiological markers of pragmatic language comprehension in schizophrenia. World J Biol Psychiatry 16(6):398–410. https://doi.org/10.3109/1 5622975.2015.1019359 153. Wang K, Cheung EFC, Gong Q, Chan RCK (2011) Semantic processing disturbance in patients with schizophrenia: a meta-analysis of the N400 component. PLoS One 6(10): 1–8. https://doi.org/10.1371/journal. pone.0025435 154. Hirano S, Spencer KM, Onitsuka T, Hirano Y (2020) Language-related neurophysiological deficits in schizophrenia [PMID: 31741393]. Clin EEG Neurosci 51(4):222–233. https:// doi.org/10.1177/1550059419886686 155. Swaab TY, Boudewyn MA, Long DL, Luck SJ, Kring AM, Ragland JD, Ranganath C, Lesh T, Niendam T, Solomon M, Mangun GR, Carter CS (2013) Spared and impaired spoken discourse processing in schizophrenia: effects of local and global language context. J Neurosci 33(39):15578–15587. https://doi. org/10.1523/JNEUROSCI.0965-13.2013 156. Boudewyn MA, Carter CS, Long DL, Traxler MJ, Lesh TA, Mangun GR, Swaab TY (2017) Language context processing deficits in schizophrenia: the role of attentional engagement. Neuropsychol 96:262–273. https:// doi.org/10.1016/j.neuropsychologia.201 7.01.024 157. Gold R, Faust M, Goldstein A (2010) Semantic integration during metaphor comprehension in Asperger syndrome. Brain Lang 113(3):124–134. https://doi.org/10.1016/ j.bandl.2010.03.002 158. Del Goleto S, Kostova M, Blanchet A (2016) Impaired context processing during irony comprehension in schizotypy: an ERPs study. Int J Psychophysiol 105:17–25. https://doi.org/10.1016/j.ijpsycho.201 6.04.009 159. Li X, Pesonen J, Haimi E, Wang H, Astikainen P (2020) Electrical brain activity and facial electromyography responses to irony in

611

dysphoric and non-dysphoric participants. Brain Lang 211:104861. https://doi. org/10.1016/j.bandl.2020.104861 160. Hoeks JC, Brouwer H (2014) Electrophysiological research on conversation and discourse. In: Holtgraves T (ed) The Oxford handbook of language and social psychology. Oxford University Press, Oxford, pp 365–386. https://books.google.it/books? id=nWf0AwAAQBAJ 161. Gibbs RW (1980) Spilling the beans on understanding and memory for idioms in conversation. Mem Cognit 8(2):149–156. https://doi.org/10.3758/BF03213418 162. Bentin S, McCarthy G, Wood CC (1985) Event-related potentials, lexical decision and semantic priming. Electroencephalogr Clin Neurophysiol 60(4):343–355. https://doi. org/10.1016/0013-4694(85)90008-2 163. Van Petten C, Kutas M (1990) Interactions between sentence context and word frequency in event-related brain potentials. Mem Cognition 18(4):380–393. https:// doi.org/10.3758/BF03197127 164. Lai VT, Howerton O, Desai RH (2019) Concrete processing of action metaphors: evidence from ERP. Brain Res 1714:202–209. https://doi.org/10.1016/j.brainres.201 9.03.005 165. Canal P, Bischetti L, Bertini C, Ricci I, Lecce S, Bambini V (2019) N400 differences between mental and physical metaphors: the role of theories of mind. Brain Cognit 161: 105879. https://doi.org/10.1016/j.bandc. 2022.105879 166. Canal P, Ranieri G, Bischetti L, Tonini E, Bertini C, Ricci I, Schaeken W, Bambini V (2020) Bridging concepts in different modalities: the N400 of verbal and multimodal metaphor processing. In: Conference Presentation at the 13th Conference for the Association for Researching and Applying Metaphor (RaAM), Virtual Conference, 18–21 June 2020. https://media.inn.no/ Mediasite/Channel/raam2020/watch/ 9f5c9dafa72d41b0a2ab78c4ca86e3341d 167. Bornkessel-Schlesewsky I, Schlesewsky M (2008) An alternative perspective on “semantic P600” effects in language comprehension. Brain Res Rev 59(1):55–73. https://doi. org/10.1016/j.brainresrev.2008.05.003 168. Daneman M, Carpenter PA (1980) Individual differences in working memory and reading. J Mem Lang 19(4):450–466. https://doi. org/10.1016/S0022-5371(80)90312-6 169. Holcomb PJ, Coffey SA, Neville HJ (1992) Visual and auditory sentence processing: a

612

Paolo Canal and Valentina Bambini

developmental analysis using event-related brain potentials. Dev Neuropsychol 8(2–3): 203–241. https://doi.org/10.1080/ 87565649209540525 170. Nieuwland MS, Ditman T, Kuperberg GR (2010) On the incrementality of pragmatic processing: an ERP investigation of informativeness and pragmatic abilities. J Mem Lang 63(3):324–346. https://doi.org/10.1016/j. jml.2010.06.005 171. Martin AE (2020) A compositional neural architecture for language. J Cognit Neurosci 32(8):1407–1427. https://doi.org/10. 1162/jocn_a_01552 172. Noveck IA, Posada A (2003) Characterizing the time course of an implicature: an evoked potentials study. Brain Lang 85(2):203–210. https://doi.org/10.1016/S0093-934X(03) 00053-1 173. Spychalska M, Kontinen J, Werning M (2016) Investigating scalar implicatures in a truthvalue judgement task: evidence from eventrelated brain potentials. Lang Cognit Neurosci 31(6):817–840. https://doi.org/10.10 80/23273798.2016.1161806 174. Nieuwland MS, Kuperberg GR (2008) When the truth is not too hard to handle: an event-

related potential study on the pragmatics of negation. Psychol Sci 19(12):1213–1218. https://doi.org/10.1111/j.1467-9280.200 8.02226.x 175. Xiang M, Grove J, Giannakidou A (2016) Semantic and pragmatic processes in the comprehension of negation: an event related potential study of negative polarity sensitivity. J Neurolinguist 38:71–88. https://doi. org/10.1016/j.jneuroling.2015.11.001 176. Moreno EM, Casado P, Martın-Loeches M (2016) Tell me sweet little lies: an eventrelated potentials study on the processing of social lies. Cognit Affect Behav Neurosci 16(4):616–625. https://doi.org/10.3758/ s13415-016-0418-3 177. Delogu F, Jachmann T, Staudte M, Vespignani F, Molinaro N (2020) Discourse expectations are sensitive to the question under discussion: evidence from ERPs. Discourse Process 57(2):122–140. https://doi. org/10.1080/0163853X.2019.1575140 178. Dimitrova DV, Stowe LA, Redeker G, Hoeks JC (2012) Less is not more: neural responses to missing and superfluous accents in context. J Cognit Neurosci 24(12):2400–2418. https://doi.org/10.1162/jocn_a_00302

Chapter 19 Electrophysiology of Non-Literal Language Vicky Tzuyin Lai, Ryan Hubbard, Li-Chuan Ku, and Valeria Pfeifer Abstract This chapter reviews the electrophysiological research on four most commonly used figurative language types: metaphor, idioms, irony, and jokes. For metaphor, we focused on two issues: the incremental comprehension of metaphors and the role of metaphor in embodied cognition. In terms of comprehension, advances have been made regarding how meanings are selected, mapped, and suppressed when concepts collide. In terms of embodiment, current debates center on the involvement of sensory-motor systems through metaphors in abstract concepts. For idioms, we reviewed literature investigating how factors, such as the predictability or decomposability of an idiom, influence the degree to which the idiom is processed holistically or compositionally. Current view posits that idioms may be processed in both ways. For irony, we summarized research with regard to differences between spoken and written irony, as well as more recent efforts to investigate written irony in the context of computer-mediated communication. While many factors affect earlier stages of processing, irony has a robust neural correlate in the later stage. For verbal jokes, we reviewed stage-wise models, as well as joke types and individual differences. Stage-wise models explain how and when the incongruity in jokes is detected and resolved by readers to obtain a mirth experience, and how such process is modulated by different joke types, such as phonological jokes (puns) and semantic (mental) jokes. In terms of individual differences, joke processing is highly dependent on socio-pragmatic abilities and personality traits. We concluded this chapter with a summary of the commonalities and differences across these types of figurative language, their electrophysiological correlates, and future directions. Key words Figurative language, Metaphor, Idiom, Irony, Jokes, ERP, EEG

1

Introduction People do not always mean what they say. When someone says “I am dead tired” or “I’d like some rocky road ice cream,” they are not literally dead, and they would like to eat chocolate ice cream mixed with nuts and marshmallows. Daily language is filled with non-literal expressions at various levels, lexical and beyond. This chapter focuses on four types of non-literal language: metaphor, idiom, verbal irony, and verbal humor (jokes). Metaphor is a figure of speech that compares two entities/properties from two different semantic domains but share some similarities, and idiom is a group

Mirko Grimaldi et al. (eds.), Language Electrified: Principles, Methods, and Future Perspectives of Investigation, Neuromethods, vol. 202, https://doi.org/10.1007/978-1-0716-3263-5_19, © Springer Science+Business Media, LLC, part of Springer Nature 2023

613

614

Vicky Tzuyin Lai et al.

of words arranged in a relatively fixed order and with a phrasal meaning that is relatively frozen. Irony in a verbal form (verbal irony) is an utterance opposite of the speaker intention, and joke (verbal humor) is a narrative deliberately told for amusement. These four types are included here simply because there has been a good amount of electrophysiological data for discussion. Electrophysiological studies of non-literal language have experienced a baby boom in the past decade. A quick search on Google Scholar using the search term “electrophysiology of non-literal language” showed that while there were only 300 papers between 2000 and 2010, there were 700 papers between 2010 and 2020. While these numbers are relatively small compared to “electrophysiology of language” (77,100 papers between 2000 and 2020), there are progress and improvement. Investigation of the neural correlates of non-literal language has become more sophisticated due to not only advances in better-defined topics and research questions, but also advances in methodologies. Most of the studies used the event-related potential (ERP) method, where electrical activity on the readers’ or listeners’ scalp is recorded, amplified, and time-locked to the onset of specific words in the literal and non-literal phrases. Non-literal language typically leads to enhanced N400 and P600/LPC, and sometimes early P200 (e.g., irony) or/and late and sustained positivity (e.g., jokes). As will be seen throughout this chapter, the functional interpretations of these ERP components are non-trivial. On the one hand, the interpretations of these ERP components should cohere with language ERPs in general, establishing basic mechanisms underlying both literal and non-literal language. On the other hand, the interpretations should account for specific representations and mechanisms unique to non-literal language. Notably, some groundbreaking insights have been gained through the increased use of other electrophysiological methods, such as time-frequency analysis (TFA) of EEG data, transcranial magnetic stimulation (TMS), and transcranial direct current stimulation (tDCS). Each section starts with a classic or a typical study to familiarize the readers with the issues involved and the paradigms employed. Then, we take a deeper dive reviewing finer grained issues that have been developed as a result of older findings in the past decade. In metaphor research, there is a surge of studies on metaphor novelty and embodiment. In idiom research, researchers showed an increased focus on compositionality and predictability. In irony, emotion processes and cuing irony for example with emojis are taking center stage. In jokes, the variations in jokes and individual differences have gained traction. In each of these issues, we discuss how knowledge has been advanced through electrophysiology.

Non-Literal Language

2

615

Metaphors For centuries, metaphor was viewed as a figure of speech that allowed writers and poets to talk about one thing through another, for example, Juliet is the sun. While it maintains its poetic function, contemporary cognitive scientists [1, 2] have broadened the scope of metaphor to include conceptual metaphors. In this view, metaphoric expressions are surface realizations of an underlying conceptual metaphor. For instance, “That was a lot of info to digest” is the surface realization of a mapping “understanding is digesting” in the conceptual metaphor “Ideas are food.” Mappings are systematic and productive. In this example, related mappings under the same conceptual metaphor include “thinking is preparing food,” and “communication is feeding.” According to this broad definition, daily language is packed with metaphors: In spoken language, speakers use 5.9 non-literal expressions including metaphors per minute in free discourse [3]. In written language, metaphors occur once for every 24 words [4], and appear in 6.7% of the sentences even in primary school reading materials [5]. In a more recent count [6] that follows the conceptual metaphor theories, Steen and colleagues reported that metaphors make up of 20% of discourse. Due to their frequency of use, metaphors play a key role for understanding human cognition, including how word meanings are interrelated, how concepts are organized, and how language is processed in the brain and in the body.

2.1 Metaphors and Literal Language

Are metaphors special? The most studied question in metaphor research is whether metaphor is comprehended differently, compared to literal language. Numerous behavioral studies investigated this question. The majority reported similar reading times for literal language and metaphors in context, suggesting a similarity in the processing between the two [7–10]. These data support the direct access models, where metaphorical meaning is directly accessed, just like literal meaning [11, 12]. However, these data do not support the classic, literal-first models [13, 14], where literal meaning must be computed first and refuted before metaphoric meaning is derived. Electrophysiological measures such as event-related potential (ERP) introduced a paradigm shift and challenged the assumption that equivalent reading times between two conditions imply that metaphoric and literal language are processed in the same way. As Coulson and Van Petten [15] elegantly put, while lifting up a 10-pound weight may cost the same amount of time as lifting up a 5-pound weight, the former recruits more resources. Likewise, while metaphors in context are read as fast as literal language, computing metaphoric meaning could be more resource intensive than computing literal meaning.

616

Vicky Tzuyin Lai et al.

Indeed, it turned out that metaphoric language is more cognitively taxing than literal language. In a seminal study, Coulson and Van Petten [15] found that metaphors (e.g., He knows that power is a strong intoxicant) elicited a larger N400 and a larger late positivity component (LPC) than literals (e.g., He knows that whiskey is a strong intoxicant) at posterior sites. These two effects suggest at the very least that metaphors are more effortful to process than literal language. The authors further suggested that the N400 effect likely reflects complexity in metaphoric language, not metaphoricity. This was supported by a “literal-mapping” condition in their study. The literal-mapping sentences (e.g., He has used cough syrup as an intoxicant) were designed to invite the readers to go through the same meaning-making process as they do for metaphors—to retrieve extra semantic features from (various) conceptual domains and to blend those retrieved features into a coherent readout. Importantly, the literal-mapping sentences are complex literals, not metaphoric. It was found that the literal-mapping elicited an intermediate N400 between metaphors and the literals, and without an LPC effect. The authors suggested that it is the mappings of features between the conceptual domains (metaphoric or not) rather than the metaphoricity that drives the observed N400. But we speculate that because the literal-mapping N400 was smaller than the metaphoric N400, both complexity and metaphoricity contributed to the N400. The functional interpretation of the LPC was less clear, largely due to the scarcity of the LPC in language research in general at that time. The authors noted, however, that the LPC is likely specific to metaphoricity, indexing the success of the retrieval of relevant features which provide as bridges to the eventual metaphoric interpretation. We will discuss the functional interpretations of LPC more along with other study findings below. 2.2 Conventional and Novel Metaphors

In the ~15 years following Coulson and Van Petten [15], researchers took an interest in investigating how novelty modulates metaphor comprehension [16–21]. Most studies examined either conventional or novel metaphors, but two series of studies compared them directly within-study. Lai and colleagues examined metaphors embedded in sentential contexts, for example, conventional metaphoric expressions (e.g., Every point of my argument was attacked) and novel ones (e.g., Every second of my time was attacked) [17]. Here, conventionality was operationally defined in terms of subjective familiarity ratings and interpretability ratings. Conventional metaphors were familiar and interpretable, and novel metaphors were less familiar but still interpretable. Control conditions of literal (e.g., Every soldier in the frontline was attacked) and anomalous sentences were included. In addition, the cloze probability, that is, the likelihood of a sentence-final word being the expected continuation of the sentence fragment, was matched between

Non-Literal Language

617

conditions. They found that even though conventional metaphoric expressions were rated as familiar and as interpretable as the literals, they still elicited a larger N400 (320–440 ms) than the literals at central-posterior sites. Novel metaphoric expressions were rated as being less familiar and less interpretable than both the conventional and the literals, and showed a sustained N400 effect from 320 to 560 ms, also central-posterior sites. In other words, conventional metaphor N400 differed from the novel metaphor N400 in terms of latency. These results indicated that, first, there is something special about conventional metaphors—namely, compared to the equally conventionalized literals, they do require extra meaning retrieval or/and integration. Second, novelty matters—it takes longer to extract and/or integrate features between concepts that have not been paired before. These interpretations are further supported by a follow-up study [20] and are consistent with the Career of Metaphor theory postulated by Gentner and colleagues [22]: Understanding novel metaphors is a comparison process where features between the conceptual domains in a metaphor need to be retrieved, aligned, imported, and integrated. Through repeated use, novel metaphors are conventionalized, and part of these sub-processes become facilitated and less effortful. In another direct comparison between conventional and novel metaphors, conventionality was revealed in the N400 amplitude (rather than timing) difference (cf. [17]). Arzouan and colleagues [16] examined two-word expressions in Hebrew, including conventional metaphors (e.g., transparent intention), novel metaphors drawn from poetry texts (e.g., conscience storm), semantically related words (e.g., burning fire), and unrelated words. These authors measured conventionality by instructing participants to rate how conventional a given expression was, instead of breaking down conventionality into multiple factors such as familiarity and interpretability (cf. [17]). They found graded N400 (350–450 ms) amplitudes, smallest for the literally related, followed by the conventional metaphors, the novel metaphors, and the unrelated. According to the authors, the graded N400s were resulted from the additive contribution of three factors: semantic relatedness, conventionality, and meaningfulness. The semantically related word pairs benefited from all three factors, the conventional metaphors benefited from the two factors of conventionality and meaningfulness, and the novel metaphors benefited from only one factor of meaningfulness. Moreover, there is a difference in the scalp distributions of the N400 effects: The processing of novel metaphors and the unrelated words was slightly right lateralized, suggesting potential right hemispheric contribution. Additionally, instead of an LPC (cf. [15]), Arzouan and colleagues observed a late negativity from 550 to 800 ms, larger for novel metaphors than for literals, which the authors suggested is a manifestation of the readers’ further attempts to integrate meaning in a non-literal context (cf. [19]).

618

Vicky Tzuyin Lai et al.

One issue that stood out from the abovementioned studies is that all metaphors elicited N400s, but not all elicited late effects [21]. Among those that did, some reported a late positivity (LPC) effect [15, 18] and others reported a late negativity (LN) effect [16, 23]. One possible explanation is that the late effects may be driven by the grammatical form that metaphor takes, such as predicate vs. nominal metaphors. Predicate metaphors are words in metaphoric context, such as “his life took a sharp turn,” and nominal metaphors are metaphors in the form of A-is-B, such as “life is a journey” or “time is money.” Note that nominal metaphors are not necessarily conceptual metaphors (cf. [1]). Nominal metaphors such as “a woman’s waist is an hour glass” is a one-shot metaphor—it is not a conceptual metaphor because the mappings associated with this metaphor, if any, are neither systematic nor productive. This grammatical form explanation was promising, but did not quite hold, because within the studies that tested nominal metaphors, there was still inconsistency in the directionality of the late effects [18, 21, 23]. De Grauwe et al. [18] examined nominal metaphors but did not found LN. Familiar nominal metaphors (Unemployment is a plague) elicited a central-posterior N400 effect from 325 to 400 ms and an LPC effect starting at 550 ms relative to the literals (Cholera is a plague). They suggested that the N400 reflects a temporarily accessed literal meaning which is deemed anomalous momentarily in the metaphoric context, and that the LPC reflects a conflict between an implausible sentence meaning based on the temporarily accessed literal meaning and the plausible sentence meaning based on the intended metaphoric meaning. These interpretations are coherent with ERP correlates in language semantics research in general, where the N400 has been associated with semantic anomalies [24], and the LPC has been associated with semantic conflict/re-analysis [25]. In Schneider et al. [21], novel nominal metaphors (The 100-year-old man is an oak) showed an N400 effect from 300 to 500 ms relative to the literals (The 100-year-old man is a doter), which the authors associated with meaning activation and integration during novel mappings (cf. [15, 17]). However, Schneider and colleagues [21] reported no late effects, which suggest that the nominal form of metaphor, or novelty for that matter, is not necessary for eliciting late effects, LPC, or LN. Tang and colleagues [23] examined novel scientific metaphors (Chromosomes are sisters) and novel poetic metaphors (Life is a bubble), along with conventional metaphors (Language is a bridge) and literal categorization statements. All metaphors elicited larger posterior N400s from 350 to 450 ms and larger posterior LNs from 550 to 850 ms than the literal statements. In particular, scientific metaphors showed the LN effect. The authors suggested that this LN reflects a secondary integration effort, consistent with the interpretation by Arzouan and colleagues, who tested two-word metaphors. Specifically, the

Non-Literal Language

619

authors speculated that this secondary integration is pragmatic inferencing and very likely the kind of inferencing that readers must make for learning new (scientific or poetic) knowledge. In summary, conventional metaphors are processed differently compared to novel metaphors, as reflected by the N400 amplitudes [16] and the N400 timing [17]. Most researchers associate such N400 effects with retrieving meanings from the concepts involved in a metaphor and the integration of the retrieved meanings between the concepts. A second interpretation is that the N400 is associated with the insuppressible literal meaning of the critical word that feels anomalous to readers in a metaphoric context. Further research is needed to tease apart these two interpretations. After the N400 time frame, some metaphors require a secondary meaning processing, as reflected by an LPC or LN. The interpretations of the late effects are much debated. Based on the abovementioned findings, here we speculate that the LPC reflects the interaction and integration of the two, literal and metaphoric processing streams. The LN likely reflects a continuation of the meaning-making processes underlying the N400 time window, sustained by factors such as stimulus complexity or processing goals. Future studies are needed to verify these speculations. 2.3 Metaphor and Embodiment

In recent years, embodiment researchers started investigating whether metaphor is a way for abstract concepts to be embodied and grounded in concrete resources [26]. In general, embodied theories of language posit that the processing of conceptual knowledge cued by language relies on the sensory-motor regions of the brain [27, 28]. Supporting embodiment, there has been a large body of evidence using various modalities such as action, vision, audition, and emotion ([29], but see [30] for criticisms). However, these studies typically tested concrete literal language, such as the action word kick or the olfactory word cinnamon. It was not clear if and how abstract language such as life or time is embodied. Metaphors are brought into this debate because abstract concepts can be reasoned about/grounded through concrete concepts metaphorically (e.g., life is a journey, time is money). Imaging studies have provided some support for this. For instance, reading tactile metaphors (e.g., a rough day) recruits sensory regions’ responsive to touch [31], reading action metaphors (e.g., grasp an idea) recruits motor-related regions [32], and reading time metaphors also recruits motor-related regions [33]. One shortcoming is that the hemodynamic responses are relatively slow (4–6 s) and lack the temporal precision needed to tease apart whether sensory-motor activations are necessary for language comprehension or just something that is epiphenomenal, happening in parallel to or after language comprehension [34].

620

Vicky Tzuyin Lai et al.

Given the fine temporal resolution of ERP, several ERP studies interrogated the embodiment issue by examining how metaphoric meaning impacts sensory, visual perception. Zonolie and colleagues [35] considered the conceptual metaphor “power is up” and tested whether thinking of power makes people look up, inducing a shift of attention to the upper or lower visual field. In each trial, participants first did a power decision task on words denoting powerful or powerless people (e.g., king or servant). The emotional valence of the powerful and powerless words was matched through a pretest. Then, the participants did a letter identification task, identifying a letter that appeared on the top or the bottom half of the computer screen. The congruent trials (king followed by a letter appearing on the top half of the screen) elicited a larger N1 than incongruent trials (king followed by a letter at the bottom). Early ERP visual components like N1 index a shift in spatial attention, which according to the authors was caused by the concrete, spatial resources that were metaphorically involved in the semantics of power, providing support for grounding of semantics. This is consistent with the ample evidence that literal, sight words are embodied. For instance, Amsel and colleagues [36] showed that color word perception requires the involvement of color-related sub-regions in the visual cortex. These studies provided strong support for the necessity of sensory cortex in the semantics (or pre-semantics) of both literal and metaphoric language. Another smart way to test metaphor embodiment is to take away or occupy motor resources and see if metaphoric meaning can still be computed. Reilly and colleagues [37] used transcranial magnetic stimulation (TMS), a non-invasive focal magnetic pulse, on the scalp to disrupt activity in the underlying motor cortex. Specifically, they applied single-pulse TMS to the hand-related motor area at 150 ms, 300 ms, or 450 ms after the onset of the critical words in metaphoric and literal contexts (e.g., The girl . . . threw the ball vs. the team . . . threw the game). At the 300 ms condition, both metaphoric and literal sentences were read slower than the abstract control, suggesting a causal role of the motor cortex in both types of processing at this timing. A second, non-invasive way to manipulate the motor resources is to occupy the motor cortex by making participants exercise. Bardolph and Coulson [38] pre-activated participants’ motor systems by having them move marbles up and down before reading. The participants were then presented with words with high and low spatial attributes, both literally (e.g., ascend, descend) and metaphorically (e.g., inspire, defeat). For literal words, incongruency between the arm movement and the spatial attribute elicited a larger early negativity effect (200–300 ms) than the congruent ones, indexing literal embodiment. For metaphoric words, incongruency did not affect ERPs until after 500 ms. The finding that literal embodiment is earlier than metaphoric embodiment challenges the necessity of the

Non-Literal Language

621

immediate involvement of sensory-motor cortex during all types of language comprehension. One potential caveat is that the motor area impacted by moving marbles up and down is hand related, but the content of the metaphoric words (inspire) is not specific to hand. The abovementioned studies tested embodiment by examining interactions between language and vision, and between language and motion. Within the domain of language, Lai and colleagues used ERP to probe if and when motor semantics is activated during the comprehension of metaphoric expressions [39]. In their design, they contrasted action verbs in a metaphoric context (The church bent the rules) with two kinds of literal control sentences: Literalconcrete (The bodyguard bent the rod) and literal-abstract (The church altered the rules). This is different from past ERP studies where metaphors were almost always contrasted only with literalconcrete control, not with literal-abstract control. Their literalconcrete vs. literal-abstract contrast revealed a sentence-level concreteness N400 effect from 200 to 400 ms, frontally distributed, similar to word-level concreteness effect [40, 41]. They viewed this as a way of “localizing” the ERP correlates for motor feature activation in sentences, which mimics fMRI studies where areas involved in motor semantics are localized by contrasting concrete and abstract action sentences. Critically, the metaphoric vs. literalabstract contrast revealed a long and widespread metaphoric effect (Fig. 1), which appeared to consist of at least two underlying processes: One was comparable to the frontal concreteness effect, and the other was comparable to the metaphoric effect revealed in the metaphoric vs. literal-concrete contrast (cf. the same contrast in all past studies reviewed in Subheading 2.2). Thus, assuming the frontal concreteness effect is embodiment specific, the very same sensory-motor recruitment in literal-concrete language takes place for comprehending metaphoric expressions at the same, early timing. Ultimately, Lai et al. [39] argued that metaphoric meaning starts out as part of a non-specific sensorimotor simulation of the action content in these metaphoric expressions, and then gets further processed as the context unfolds. In summary, metaphor plays an important role in the embodiment of abstract concepts [26], and the ERP method provides the fine temporal resolution needed for testing the necessity of embodiment. Research so far has shown that metaphor use impacts the processing in the visual cortex early on at ~100 ms [35], and that suppressing or pre-activating motor cortex affects metaphor comprehension [37, 38]. Finally, one recent study showed that the metaphor N400 observed in past studies consists of an initial concreteness effect, providing further support for metaphor embodiment [39].

622

Vicky Tzuyin Lai et al.

Fig. 1 Grand-averaged ERPs in the metaphor (MET), literal-abstract (LA), and literal-concrete (LC) conditions at the frontal and posterior sites (left panel). Scalp distributions of the N400 effects from 200 to 500 ms based on the difference waves between LC and LA (sentence concreteness effect), MET and LC (metaphoric N400 effect consistent with past studies), and MET and LA (metaphoric N400 effect) (right panel). (Adapted from Lai et al. [39])

3

Idioms Idioms provide an interesting testing ground for understanding non-literal language processing in the brain. Unlike literal language, idioms, defined as phrases or multi-word expressions in which the conventionally agreed-upon figurative meaning of the phrase is mostly unrelated to the literal meanings of the individual words that make up the phrase, varying in how transparent the word form to phrase meaning mapping is, termed decomposability. For instance, the phrase “slept like a baby,” meaning to sleep very well, is fairly transparent in its mapping, and thus has high decomposability, as the word “slept” refers to sleeping in the literal meaning, but also relates to the figurative meaning. In contrast, the phrase “break the ice,” meaning to do something to start a conversation, is less transparent or more “opaque,” in that the literal meanings of the constituent words, or words that make up the idiomatic phrase, have little to do with starting a conversation, and thus has low decomposability. Idioms differ from metaphors in that metaphors specifically compare or relate concepts (“Juliet is the sun”), whereas idioms typically consist of a group of words that

Non-Literal Language

623

together conveys one particular meaning, which is not necessarily comparative in nature (as in “break the ice”). In general, compositional processing of constituent words of a metaphor will likely shed light on its figurative meaning, as clues can be derived from the comparison being made. In contrast, some degree of prior knowledge may be required to understand the figurative meaning of more opaque idioms, as the figurative meaning is difficult to reach through compositional processing of the constituent words of the phrase. Thus, the brain may require unique processing strategies to create and retrieve this non-literal meaning. For instance, the brain may not need to activate the literal meanings of the words that make up the idiom to process an idiom; however, these activation processes may happen obligatorily. Examining how neural processing differs when individuals comprehend literal vs. idiomatic language can provide insight into what mechanisms are engaged by the brain to understand non-literal language. Theories of idiom processing differ on whether idioms are processed holistically or compositionally during comprehension. Non-compositional theories posit that idiom meanings as a whole are stored and retrieved from memory [42], and potentially stored separately in their own lexicon [43]. According to the non-compositional position, little to no processing of the idiom’s constituent words would occur, as encountering the idiom would lead to direct retrieval of the idiom’s meaning from memory. Alternatively, the compositional view holds that idioms are not a unique class of linguistic stimuli, as posited by non-compositional theories, and the semantics of the individual words are necessary to understand the figurative meaning of the idiom [44]. According to this view, processing of idiomatic and literal phrases would look nearly identical in the brain, as compositional processing would occur in both cases. Finally, hybrid models incorporate aspects of compositional and non-compositional processing based on the features of the idiom [45]. For instance, if an idiom is highly familiar and predictable, its meaning may be directly activated from memory, while an unfamiliar idiom may require more compositional processing similar to literal language processing [46]. Recording signals of brain activity while individuals read idioms with varying qualities has allowed researchers to provide evidence for and adjudicate between these theories of processing. 3.1 EEG and MEG Studies of Idiomatic Language

In one of the earliest ERP studies of idioms by Moreno et al. [47], participants were presented with idioms and literal phrases that ended with expected words or synonyms (e.g., A dog is a man’s best friend/buddy). ERPs were measured at the final word of the idiomatic or literal phrase, and analyses were focused on the N400, an ERP component related to semantic processing of information. Synonyms elicited larger N400 responses compared to expected words in both literal and idiomatic contexts, suggesting that similar

624

Vicky Tzuyin Lai et al.

semantic processing of individual words still occurs when reading idioms as when reading literal phrases. This result was replicated in another study using a different set of stimuli [48]. However, different results were found by Rommers et al. [49] who presented participants with idiomatic and literal phrases that ended with expected, related, or unrelated words (e.g., A dog is a man’s best friend/buddy/fish). The typical N400 pattern (expected < related < unrelated) was observed in literal contexts, but no N400 differences were found in idiomatic contexts, suggesting compositional operations were “switched off” for idioms. This difference in results across studies may have been due to differences in the stimuli used in the experiments. Rommers et al. specifically included only familiar but opaque idioms, whereas Moreno et al. [47] did not specify the decomposability of their idioms. It is possible that compositional processing is not engaged, or engaged to a lesser degree, when idioms are more opaque. Comparing ERPs elicited by literal and idiomatic expressions has also led to mixed results. In one study, N400s elicited by final words in idiomatic phrases were smaller when the idiom was more familiar, but final words in highly familiar idioms did not differ from final words in literal expressions in N400 amplitude [50]. Another study reported larger N400 amplitudes to idiomatic phrases than to literal phrases [51], while Rommers et al. [49] found that expected endings elicited smaller N400s in idiomatic contexts than in literal contexts. One issue with measuring at the final word position in an idiom phrase is that readers might have recognized that the phrase is an idiom before reaching the final word. Analyzing EEG at the recognition point—the point in which a reader realizes that they are reading an idiom—seems more precise in pinning down the point in which idiom-specific processing is engaged. Vespignani et al. [52] found smaller N400s following the recognition point of an idiom compared to a literal phrase. Finally, Canal et al. [53] found no differences in N400 amplitudes between literal and idiomatic expressions, but found larger frontal positivities for idioms compared to literals. In summary, some studies have found that semantic processing of words differs between idiomatic and literal contexts, as indexed by differences in N400 amplitudes when processing those words, while others have found no differences between these linguistic contexts. The mixed nature of these results potentially supports a hybrid model of non-literal language processing, in that the mechanisms recruited by the brain to comprehend idiomatic language may be dependent on characteristics of the encountered stimuli. Recent work from our lab has examined how predictability and idiom decomposability separately influence the neural processing of idiomatic language [54]. We found that, when predictability (measured by cloze probability) was high, N400 differences between final words of idioms and literals emerged, with idioms eliciting smaller N400s, but this difference

Non-Literal Language

625

Fig. 2 (a) Grand-averaged ERPs for idiom (red lines) and literals (blue lines) in predictable and higher-cloze items (left) and in unpredictable and lower-cloze items (right). (b) Topographies of the effects obtained by subtracting the idiom from literal conditions (left two topographies) and scalp distributions of the effects obtained by subtracting the literal from idiom conditions (right two topographies). (Figure is adapted based on Hubbard et al. [54])

was abolished when predictability was low, though a later difference emerged (Fig. 2). Future work should consider how characteristics of the linguistic stimuli, such as predictability and familiarity, as well as the recognition point of the idiom, influence the observed dynamics. One method to identify if an idiom’s constituent words are activated or contribute to the overall sentence meaning is to examine brain activation in response to action words embedded in idioms. Numerous studies have demonstrated that cortical areas involved in sensorimotor processes are also engaged when individuals read action words involving body movements; for instance, reading the word “kick” leads to activation of neurons in the motor strip involved in control of the feet [55]. However, in the idiom “kick the bucket,” the figurative meaning of the phrase, to die, may override or obviate the activation of sensorimotor areas. The literal meaning of the word “kick” in the idiomatic phrase is largely unrelated to the figurative meaning, and thus activating this meaning may not be required to understand the phrase. In one experiment, participants read idiomatic and literal phrases with arm and leg movement–related words (e.g., “picked her brain,” “kicked the bucket”) while MEG was recorded [56]. Not only did MEG signals diverge between idiomatic and literal phrases, but also motorrelated activity for action words was found for both idioms and literal phrases. This suggests that some aspects of compositional processing are still engaged when processing figurative language,

626

Vicky Tzuyin Lai et al.

but that this compositional processing occurs in parallel with activation of the figurative meaning, in line with a hybrid theory of processing. Recording EEG during language processing also allows researchers to characterize how activity in different frequency bands relates to ongoing processing, and how this processing differs in literal and non-literal language. Currently, only two published studies have examined oscillatory mechanisms involved in processing of idioms. Rommers et al. [49] examined changes in power following expected, related, and unrelated completions to literal and idiomatic phrases. Only expected completions to literal phrases elicited a burst of power in the gamma band (approximately 60–80 Hz) that significantly differed from completions to idioms. Similarly, Canal et al. [53] found increased gamma band activity for literal sentence contexts compared to idiomatic sentence contexts. Activity in the gamma band has been associated with integration of incoming semantic information with the discourse-level context, and is disrupted when unexpected semantic events occur [57]. Thus, one interpretation of these results is that idioms do not require the same degree of semantic integration as literal language, since the constituent words do not require compositional processing. However, idiomatic language may also differ in how predictable it is, and recent studies have linked gamma band activity with predictive processing. Specifically, unexpected words may lead to differences in gamma power due to a mismatch of the prediction and the encountered stimulus [58]. Indeed, work from our lab [54] found that gamma band activity was correlated with both the predictability and the decomposability of an idiom, but the timing and topography of these effects differed, potentially suggesting gamma activity could index prediction and semantic integration processes, but these processes could rely on different neural generators. Further studies will be necessary to tease apart how gamma band activity relates to prediction and integration, and how these two mechanisms differ for idiomatic and literal language processing. In sum, both ERP and oscillatory analyses have revealed that compositional processing by means of semantic integration may differ between idiomatic and literal expressions, but that the degree to which this difference manifests may depend on the predictability of the constituent words, as well as the decomposability of the idiomatic phrase. 3.2 Brain Stimulation Studies of Idiomatic Language

The previous studies suggest that, in some cases, compositional processing may not be engaged by idioms to the same degree as literal language, and thus processing of idioms may to some extent be “easier.” However, neuropsychological studies suggest the opposite: Patients with particular cognitive complications, such as schizophrenia [59] and aphasia [60], display greater difficulty with processing idioms compared to processing literal language. In

Non-Literal Language

627

accordance with hybrid models of processing, idiom processing may lead to co-activation of literal and figurative meanings, and thus require greater prefrontally driven cognitive control to select the contextually appropriate meaning and resolve interference [61]. The prefrontal cortex, including the dorsolateral and ventrolateral prefrontal cortices, is critically involved in cognitive control processes, and neural function in the prefrontal cortex is impaired in many neuropsychological patients, leading to difficulty with cognitive control. The hypothesis that follows from this work is that when cognitive control is compromised, as is the case in schizophrenic patients, greater interference from the literal meaning is encountered when processing the figurative meaning of idioms, leading to impaired comprehension. Recently, brain stimulation technology has allowed researchers to test this cognitive control hypothesis in healthy individuals. By using repetitive transcranial magnetic stimulation (rTMS), a temporary lesion to a specific cortical area can be induced in participants while they are processing language stimuli. Researchers have used this technology to target the prefrontal cortex during language processing, and thus examined how temporarily impairing cognitive control can influence language comprehension. In studies using a picture-matching paradigm, in which participants selected a picture that matched the meaning of an idiomatic or literal sentence, rTMS stimulation of the dorsolateral prefrontal cortex [62, 63], as well as the ventrolateral prefrontal cortex [64], led to specific impairment of idiomatic comprehension. Similarly, transcranial direct current stimulation (tDCS) can be used to increase the excitability of neuronal activity in a particular cortical region, providing an alternative method to causally link neural activity in a brain area to a specific cognitive process. Studies applying anodal tDCS of dorsolateral prefrontal cortex during idiom and literal comprehension have shown that tDCS of the left hemisphere led to selective improvement in figurative comprehension, whereas stimulation of the right hemisphere produced benefits to literal comprehension [65, 66]. While these results suggest differential involvement of the two hemispheres, it is also possible that they reflect cross-hemispheric inhibition or facilitation of homologous areas. These results suggest that idiom comprehension requires recruitment of additional processes compared to literal language processing, including cognitive control, in order to accurately interpret the figurative meaning of the phrase. In summary, electrophysiological research on idiom comprehension has demonstrated that some aspects of compositional processing, or processing of the constituent words of the phrase, may still be engaged. Semantic processing differs between idiomatic and literal language, as indexed by N400 amplitudes and gamma band activity, but this processing difference may be dependent on characteristics of the idiom, such as how familiar or predictable it

628

Vicky Tzuyin Lai et al.

is. Additionally, cognitive control mechanisms may be engaged when processing idioms to select between the literal and figurative meanings of words. To further understand how these factors influence what neural mechanisms are engaged and when, future studies on idiom processing should vary these factors separately and examine how this modulates brain responses.

4

Irony Irony is a form of figurative language that is characterized by a strong contrast between what is said and what is meant. In most cases, irony even intends to express the opposite of what is true. For example, if an acrobat is performing a handstand and falls over, a comment like “How elegant!” could be interpreted as verbal irony, indicating unmet expectations about the acrobat’s performance. By contrast, if the same statement was uttered after a successful performance, it could be intended as a compliment. Verbal irony has prompted a variety of research, including attempts to identify the neural response associated with irony perception. Different from other types of figurative language, understanding irony involves mentalizing— interpreting the speaker’s intention, which requires understanding the mental state of the speaker. It has been shown that irony, but not other forms of figurative language, leads to activation in brain areas implicated in Theory of Mind and social cognition abilities, such as the medial prefrontal cortex (mPFC), the anterior cingulate cortex and medial frontal gyrus, and the right anterior superior frontal gyrus [67]. Similarly, irony understanding is impaired in individuals with impaired Theory of Mind abilities such as autism spectrum disorder (e.g., [68]) and schizophrenia (e.g., [69]) and in children who have not yet developed full Theory of Mind abilities (e.g., [70]).

4.1 Verbal Irony, Prosody, and Context

Intuitively, prosody should be a prominent feature in identifying and understanding spoken irony. But research has shown that it depends on the amount of context provided. One of the first ERP studies of irony tested the role of prosody in spoken irony comprehension [71]. They used short context stories that either biased toward an ironic or a literal interpretation of the target utterance, which was either pronounced using ironic prosody or non-ironic prosody. The ironic prosody had an overall longer duration, lower intensity, and a higher pitch at the sentence beginning that lowered toward the end, as compared to non-ironic prosody. They found no effects of ironic prosody on behavioral performance or ERPs, but an enhanced P600 for all ironic compared to literal statements. They associated their P600 effect with comprehension processes at a conceptual or pragmatic level as well as the retrieval and integration of general world knowledge. More generally, they

Non-Literal Language

629

suggested that P600 indexes the involvement of additional neurocognitive processes in figurative language processing. In the written version of the experiment, they additionally obtained a significant P200 effect. According to the authors, the presence of P200 in written irony and the absence of P200 in auditory irony indicate that written irony may either be processed more in depth than auditory irony or that the P200 reflects properties associated with the visual modality. In contrast to Regel and colleagues [71], a recent study demonstrated that prosody does impact the comprehension of ironic statements in isolation, when no context is present. Caillies et al. [72] contrasted sincere and ironic prosody of statements with positive or negative adjectives but provided no context for the statements. Participants therefore had to rely on prosodic cues alone to decide if the speaker “thought what he/she said” using yes/no answers. Behavioral results indicated that negative sentences with ironic prosody (interpreted by the authors as praise) were harder to understand than positive sentences with negative prosody (interpreted as criticism). In ERPs, negative sentences with an ironic prosody (¼ ironic praise) led to an increased N400 compared to negative sentences with a sincere prosody, suggesting that ironic praise was somewhat unexpected. In contrast, positive sentences with a sincere prosody elicited a larger N400 than positive sentences with an ironic prosody (¼ ironic criticism), suggesting that ironic criticism was not as unexpected as ironic praise. The P600 was sensitive toward ironic criticism, but not ironic praise, which the authors interpreted as an index of the disappointment of the speaker about a situation. Thus, prosody can affect irony perception if no context is given. The functional interpretations of the P600 in irony studies vary across studies. As discussed above, while Regel and colleagues associated it with pragmatic processes that integrate linguistic and general world knowledge, Caillies and colleagues [72] associated it with disappointment and negative affect. Consistent with Regel and colleagues’ [71] interpretation, Filik et al. [73] who examined familiar and unfamiliar irony also reported enhanced P600s for irony, regardless of familiarity. Filik and colleagues suggested that the P600 reflects an ongoing competition between the ironic and literal meaning of a word. In our opinion, competition of word meanings [73] and drawing on general world knowledge to integrate the word meaning with context [71] are very much like two sides of the same coin—real-world knowledge adjudicates between word meanings in competition. In contrast to such cognitive, conceptual semantic explanation, Caillies and colleagues took an emotional explanation, associating the P600 with negative feelings. Relatedly, recent work in our lab investigated the role of emotional context in written irony comprehension. Specifically, in Pfeifer and Lai [74], context stories described an unpleasant event which was

630

Vicky Tzuyin Lai et al.

highly negative or mildly negative, and the critical utterance that followed was either ironic or literal. When asked about the emotional state of the speaker, participants indicated that they perceived speakers using irony to be in a less negative emotional state compared to speakers using literal language, regardless of context emotionality. This suggests that irony can index a more downregulated mental state. In the ERPs, irony and literal language were dissociated early on, as a negativity effect at ~100 ms, regardless of context, indicating the processing of negative emotion. In addition, irony also elicited an enhanced P600 compared to literal statements, but only in the highly negative contexts, suggesting that highly negative contexts accentuate the competition between ironic and literal meaning. Thus, the authors demonstrated that contextual factors, such as emotionality, can affect irony comprehension. 4.2 The Roles of Extralinguistic Information and Speaker Attributes in Irony

Irony also appears frequently in computer-mediated communication (CMC), such as text messaging. In CMC, ironic intent can be highlighted through additional cues such as emojis [75] or quotation mark symbols [76]. Weissman and Tanner [75] investigated if the wink emoji provides reliable cues for irony detection. They combined text-message like statements, such as The cake she made was terrible with a smiling emoji, a frowning emoji, or a wink emoji. Statements were either positive or negative, so that the emoji would either be a match (positive statement with smiling emoji), a mismatch (negative sentence with a smiling emoji), or ironic. Participants rated the speaker’s opinion of the cake. Over the course of three experiments, both the P200 components and the P600 components observed at the emoji were reliably enhanced for the wink emoji compared to the matching emojis, suggesting an irony-like interpretation of the preceding statement. This effect was also present in participants’ ratings of the speaker opinion, which showed that following the wink emoji, participants were more likely to indicate that the cake was good, which is the opposite of the meaning conveyed in the text. Weissman and Tanner [75] therefore established the wink emoji as a reliable marker for verbal irony in CMC. Similarly, Regel et al. [76] investigated the effect of quotation marks as cues on irony processing. They contrasted statements (That’s fantastic) with the same statements using quotation marks (That’s “fantastic”) following contexts that biased toward an ironic or a literal interpretation. Irony compared to literal elicited a P600 effect, with or without quotation marks. Interestingly, irony cued by quotation marks additionally elicited an enhanced P300 compared to not-cued irony. Their results suggest that cued irony might initiate the processing of ironic meaning immediately after attention was allocated to it, possibly facilitating access to ironic intentions. However, the sustained positivity at P600 also indicates

Non-Literal Language

631

that advanced pragmatic processing in the P600 time window is still necessary, even when irony is cued. Perception of irony also differs depending on how it is said and who is saying it. For instance, non-native speakers with an accent may give people an impression that their proficiency is lower and that their ironic statements may be unintended. Caffarra et al. [77] investigated the influence of speaker accent on spoken irony perception. Ironic and literal comments were preceded by positive and negative contexts, and the comments were uttered by either a native or a non-native speaker of Spanish. Notably, prosody for ironic and literal utterances was the same. Overall, they found increased P600 amplitudes for ironic compared to literal sentences. However, ERPs dissociated as early as 150 ms for native accent in positive context, not in negative context, replicating [72] and suggesting early integration of contextual information only in positive context. The authors suggested that their early effect at ~150 ms is essentially an early N400, earlier due to the verbal presentation of the stimuli. No such early N400 difference within positive and negative context emerged in the non-native condition, but an irony-related P600 was present, suggesting that irony from non-native speakers was equally processed as ironic. The authors suggested that listeners’ initial interpretation of native language is more constrained and less open compared to non-native language interpretation. Comparable questions of speaker identity were investigated by Regel et al. [78], who manipulated the communicative styles of speakers, namely, how frequently they used irony. In their study, using written stimuli, a speaker used irony in about 70% of their utterances, while a contrasting character only used irony in about 30% of the cases. In a second session, the amount of irony and literal language was kept equal for both characters. In the ERP results, an increased N400 for irony was observed for the less-ironic speaker compared to the more-ironic speaker, suggesting that the communicative style of a speaker affected the processing of irony. These effects carried over to the second session, where both speakers used equal amounts of ironic and literal utterances. A second important finding was that across both sessions, the communicative style of a speaker affected ERP responses as early as 200 ms, indexed by larger P200 amplitudes for utterances congruent (more frequent) with the communicative style of a speaker. This P200 might reflect the implicit categorization of the utterance as congruent with a communicative style and illustrates early effects of pragmatic knowledge on language processing. 4.3 ERP Correlates of Irony

Across studies with spoken and written irony, as well as in CMC, irony reliably elicits a P600 effect with centro-parietal distribution, suggesting similar neuro-computations in all modalities (Fig. 3). It appears that irony is primarily modulated by contextual cues that

632

Vicky Tzuyin Lai et al.

Fig. 3 Example of averaged brainwaves (top) and scalp distributions (bottom) of the irony-related P600 effect. (Adapted from Fields and Kuperberg [83]. Shared under Creative Commons Attribution 4.0. https://doi.org/10. 1371/journal.pone.0096840.g003)

highlight the contrast between what is being said and what is true [74]. If no context is available, cueing using quotation marks [76], the wink emoji [75], or prosodic cues [72] can prompt the interpretation of a statement to express the opposite of what was conveyed with words. With context, prosody plays a minor role (cf. [75]). Speaker attributes, such as a non-native accent [77] or a communicative style [78], do not affect the P600 in response to irony, but instead affect early components, suggesting relatively immediate integration of extralinguistic information into the processing stream. Thus, the P600 is a reliable marker of irony processing in both auditory and visual domains. Results for other ERP components are less clear. Some studies found enhanced P200 effects for irony compared to literal language (e.g., [71, 75, 78]), while others did not (e.g., [72, 73, 77]). When found, such P200 is associated with early recognition of ironic meaning [71, 78]. However, the absence of a correlation between the P200 amplitude and behavioral responses in Weissman and Tanner conflicts with the interpretation of early recognition. Instead, they associated their P200 effect with enhanced visual processing of emojis [75]. These results point to a possibility that the P200 results may be explained by spoken vs. visual presentation modalities. One study [71] compared the same materials presented visually and auditorily and found that P200 was present only in the visual modality. However, the same authors [78] did observe a P200 effect in the auditory modality in a different study, using

Non-Literal Language

633

auditory presentation only. Thus, more research is needed to identify the effect of presentation modalities on the P200 component. Even more unclear is the role of the N400 in irony processing. Initially, it seems to be sensitive toward less expected forms of irony, such as unfamiliar irony [73], incongruency with a speaker’s communicative style [78], and ironic praise [72]. Further evidence for the implication of N400 in early stages of irony processing comes from Baptista and colleagues [79]. They investigated the role of the medial prefrontal cortex (mPFC) in irony processing using tDCS and subsequent EEG recordings during an irony comprehension task. The mPFC is implicated in Theory of Mind abilities and mentalizing operations [67], which are crucial for irony comprehension. After stimulation, participants viewed context sentences and pictures on the screen, followed by a critical utterance that was either literal or ironic, and either a compliment or criticism. Their task was to decide if the sentence was literal or ironic. ERPs showed that the N400 was enhanced for irony in the cathodal (activation) and the sham (control) group, but absent in the anodal (inhibition) group, suggesting that the inhibition of mPFC impacted irony processing at N400, rather than at P600. In behavioral results, all groups showed high accuracy, and there was no difference between groups, suggesting that irony was still fully processed. Therefore, the role of the N400 in irony processing remains debated. 4.4 Irony and Brain Oscillations

Brain oscillations have been used to investigate the cognitive processes underlying irony comprehension and could disambiguate some of the ERP effects. The few studies published so far have conducted time-frequency analysis (TFA) and compared its commonalities and differences with ERP components. Spotorno and colleagues [80] used minor context alterations to bias the interpretation of a target utterance (e.g., We gave a superb performance) to be literal or ironic. ERP results showed no difference at N400, but an enhanced P600 for irony compared to the literal interpretation. In the TFA, there were three effects: First, a significant difference in power within the gamma band (31–35 Hz) between 280 and 400 ms. Second, a difference within the theta band (4–7 Hz) between 500 and 700 ms. Third, a significant power increase in the alpha band (8–12 Hz) between 400 and 700 ms, distributed over the right frontal areas and accompanied by a simultaneous decrease of power over left parietal areas. The authors suggested that the early changes in gamma band activity reflect an early integration between the linguistic code and contextual cues. Given its timing proximity to the N400 time window, they speculated that this early gamma is related to the integration processes indexed by N400. However, because no N400 was present in ERPs, the authors suggested that TFA may be more sensitive

634

Vicky Tzuyin Lai et al.

than ERP analysis. The results in the theta band were associated with increased memory load, and the alpha band desynchronization was associated with the effortful integration of multiple streams of information during irony processing. Overall, TFA results in [80] support the notion that P600 indexes several neurocognitive processes. Regel et al. [81] compared the irony-related P600 effect (irony minus literal) with a syntactic P600 effect (syntactically incorrect minus correct). These two effects were found to be similar in terms of waveforms and time windows, though they exhibited topography differences: The syntactic P600 was more widespread, and the irony P600 was more constrained (Fig. 3). In TFA, more differences emerged. A significant decrease in alpha activity in the 300–500 ms time window was present in the irony vs. literal contrast, but not in the syntactically incorrect vs. correct contrast. Second, a difference in theta band power in the 500–900 ms was present for both contrasts, with theta being higher in power for syntactic contrast than the irony contrast. The authors suggested that these differences in underlying oscillations reflect stronger neural desynchronization in early processing stages for pragmatic ambiguity (i.e. irony) compared to syntactic violations and therefore, different neural processes for the generation of the later syntactic P600 and the irony-related P600. Finally, Akimoto and colleagues [82] used MEG to investigate alpha band–related changes in neural activity following irony. They were specifically interested in the role of the right anterior temporal lobe (rATL) in the generation of the irony-related P600. The rATL is hypothesized to play a significant role in the retrieval and representation of social information, which is required for understanding the mental states of others. They presented participants with context stories describing events that were either completed successfully or resulted in a failure, followed by a critical sentence that could be interpreted as literal or ironic. They used the second person singular pronoun “you” for the critical utterance. Results revealed event-related desynchronizations within the alpha band in the 600–900 ms time window (cf. [80]) in several regions, including bilateral ATL. Critically, while the alpha desynchronization was stronger in the right ATL for irony compared to literal, no significant P600 effect was found, suggesting that the ATL is not directly involved in P600 generation. However, because all statements were self-relevant, the P600 time window may have been confounded with self-relevance (e.g., [83]), which may have led to an equal increase in P600 for literal sentences, obscuring the presence of a P600 for irony. Thus, the relationship between irony and selfrelevance remains unclear. Summarizing work on oscillations observed in response to irony, it appears that both theta band power and alpha band power are involved in irony processing and provide room for future investigations of the neural underpinnings of irony.

Non-Literal Language

4.5

5

Irony Summary

635

Taken together, the following picture emerges: the P600 appears to be a reliable marker for irony processing. This P600 is associated with the integration of information from several sources, including memory and contextual sources. Even without context, the P600 is elicited if prosody or visual cues such as quotation marks or emojis indicate ironic meaning. The P600 is unaffected by presentation modality and emerges in both, written and spoken irony. Furthermore, its emergence is insensitive toward contextual factors such as speaker accent, speaker’s communicative style, and emotional context. All these affect ERP responses early on and are integrated rapidly, suggesting that a variety of information is processed simultaneously during irony comprehension. This is also reflected in the EEG oscillations, where irony affects gamma oscillations in earlier stages and alpha and theta oscillations at later stages. Finally, while ERP analyses do not typically show N400s, tDCS, and EEG analyses indicated that ironic meaning is to some extent activated prior to the P600 time frame. These analyses might offer means of studying subtle differences in irony processing beyond the interpretation of ERPs. One open question concerns the distinction of irony and sarcasm. According to Lee and Katz [84], sarcasm is a form of irony that must involve a hurtful intention, such as ridicule. However, attempts to define or dissociate the two forms experimentally so far have been unsuccessful, suggesting that the distinction between the two may vary across individuals and cultures. Another open question concerns individual differences, which remain largely unknown in irony comprehension, though some research [74, 82] suggests a relationship between autistic traits in the general population and irony processing.

Jokes Verbal jokes are a higher level of language play [85], which contains a set of cognitive processes that detect ambiguity at the punch line and resolve it by going back to reprocess the setup sentence. For instance, consider the following dialogue taken from Raskin [86]: “Is the doctor at home?” the patient asked in his bronchial whisper. “No,” the doctor’s young and pretty wife whispered in reply. “Come right in.” The words doctor, patient, and bronchial in the setup sentences lead readers to build up the assumption that the man was seeking medical advice. However, the response from the doctor’s wife in the punch line revealed the man’s true intention. It is the inconsistency between the expectation and the ending that leads readers to a sense of surprise. To resolve the incongruity, readers need to incorporate the new information with the old one established based on the context to output possible interpretations. This shifting process of mental sets (a.k.a. frames, scripts, schemas, or situations) finally results in a feeling of mirth or amusement.

636

Vicky Tzuyin Lai et al.

Different from other forms of figurative language discussed in this chapter, jokes are a type of verbal humor that include unexpectedness, surprise, and a need of appreciation that are intended to produce an emotional effect of a comic experience. Similar to figurative language, joke comprehension relies on the generation of inferences drawing upon contexts and knowledge. Research on joke comprehension and appreciation thus centers on theories of the incongruity detection, incongruity resolution [87], and further elaboration to get a feeling of mirth or amusement [88–90]. 5.1 Electrophysiology of Joke Processing

Behavioral, EEG, and MEG studies have suggested a stage-wise model of verbal joke comprehension within the first second of processing [91–97]. Event-related potential studies of jokes typically contrast sentence stimuli with joke endings and non-joke ones. One of the most commonly seen ERP effects is the N400, which peaks around 300–500 ms with centro-parietal, sometimes anterior, distributions [92–98]. Jokes usually elicit greater N400s than non-jokes (Fig. 4), reflecting an expectancy violation at the punch line. Following the detection of incongruity in jokes, a late positive component (LPC) is often seen at centro-parietal sites. This P600-like, posterior positivity often peaks between 500 and 900 ms, and is typically larger for jokes than non-jokes [91, 93–95, 97] (Fig. 4). Source analyses showed that the underlying neural sources of such LPC effect are in the anterior cingulate cortex, the right temporal-parietal regions, the anterior medial prefrontal cortex, and the right dorsolateral prefrontal cortex [93, 95, 97]. These regions are broadly linked to conflict monitoring and enhanced information processing demand, such as syntactic-semantic mismatches, and the access to divergent, alternative word meanings. Researchers thus have associated this late positivity effect with incongruity resolution in jokes, which contains a first sub-stage of breaking mental sets and forming new associations, and the second sub-stage of semantic re-analysis to build up a coherent discourse. These first two stages of joke comprehension were also supported by behavioral data: Higher subjective ratings of surprise on jokes were associated with larger N400 effects, while better comprehensibility of jokes was linked to greater P600-like effects, respectively [92, 99]. After the stage of incongruity resolution, a sustained positivity can sometimes be found in response to jokes compared with non-jokes [91, 93, 94, 96]. This sustained positivity often peaks between 700 and 1500 ms, and it is associated with a further elaboration of stimulus evaluation, or greater attention attraction modulated by emotionally salient stimuli [100]. Despite the overlap with P600 effects, source analyses revealed that the sustained activity came from different neural generators such as the middle frontal gyrus and the fusiform gyrus, which may imply an affective appreciation stage in reading jokes [93]. However, due to highly

Non-Literal Language

637

Fig. 4 (a) Grand-averaged ERPs to jokes and non-jokes at the Fz, Cz, and Pz electrodes. Jokes elicited a larger N400 and P600-like posterior positivity than non-jokes. (b) Topographies of the joke effects were obtained by subtracting the mean amplitudes of non-jokes from jokes in the 350–500 and 500–1000 ms time frames. (Adapted from Ku et al. [99])

individualized experiences in joke comprehension, it is difficult to pinpoint the exact moment of getting a joke. This further blurs the line between the inferences made to comprehend a joke and to appreciate a joke. Recently, Mayerhofer and Schacht [96] found that participants showed a larger anterior sustained positivity (700–1000 ms) along with larger pupil diameters registered to joke than to non-joke endings, which was argued to reflect emotion processes. In contrast, by analyzing the EEG time-frequency decompositions, Canal et al. [91] found a larger sustained positivity (700–1100 ms), along with a decrease in the beta frequency band (12–20 Hz) oscillation in response to jokes compared with non-jokes in the 600–900 ms time frame. As beta oscillations were implicated as the maintenance of the current sensorimotor or cognitive state, the authors argued that its decrease may reflect a stage of discarding the expectation built from the setups and reaching an alternative script through inferences. Future studies are thus needed to dissociate the cognitive and affective components in this elaboration stage, for example, by comparing funny and unfunny sentences that contain similar levels of resolvable incongruity.

638

Vicky Tzuyin Lai et al.

5.2 Different Types of Jokes

Electrophysiological studies on joke processing can be coarsely divided into two categories: semantic/mental jokes and phonological jokes such as puns which in addition to semantics have phonetic and phonological elements. Semantic jokes are characterized by incongruity generated from lexical-semantic knowledge or/and world knowledge. In contrast, puns are a playful use of words that manifest itself in one phonetic form (or two very similar ones) but convey two different meanings. Specifically, puns activate and maintain two interpretations of a (string of) word(s) via shared phonological features, whereas semantic jokes only build on the incongruity of semantic meanings and require the resolution by establishing a coherent semantic relationship or situational model [101]. The two joke types differ in their degrees of semantic incongruity and efforts to re-build global semantic coherence. For instance, Marinkovic et al. [95] recorded simultaneous EEG and MEG while participants read funny (e.g., What is a boxer’s favorite beverage? Punch), congruous but not funny (e.g., What might astronauts wear to keep themselves warm? Jackets), and incongruous/nonsensical question-answer type riddles (e.g., What’s the best way to pass a math test? Cloudy). Unlike previous studies, their joke materials contained a great portion of puns that shared semantic associations between the setups and punch lines, for example, spaceman and astronauts in What do you call a crazy spaceman? An astronaut, and phonological similarity, for example, executioner and axe in How do you become an executioner? Just axe. Funny punch lines elicited the smallest N400/N400m (350–550 ms) that were originated from the left anteroventral temporal lobe. The authors argued that due to the surface congruity of the punch lines with the preceding setups, these activations reflect facilitation of initial lexical-semantic analysis as a result of word association. These seemingly coherent yet factually incongruous punch-words within the context led to a smaller N400/N400m. Instead, the ambiguity detection was postponed to a later stage as the analysis of the alternative meanings took place (i.e., 700–1150 ms), reflected by an enhanced late positivity generated from the left frontomedial area. Not all studies reported N400. Mayerhofer and Schacht [96] examined garden-path jokes, where the readers first got an incorrect dominant interpretation, but subsequently found a hidden and correct joke interpretation: “Mummy, I just turned 14 years. May I please, finally, be allowed to wear a bra and make up.”—“No, you are not. And eat up your soup, my son!” Counterintuitively, the gardenpath jokes did not elicit a significant N400 effect compared to coherent non-jokes (e.g., my girl instead of son in the example above). It did, however, when an additional incoherent non-joke condition (e.g., my father in the example above) was included. The authors argued that the lack of N400 effects reflects an ease of semantic integration possibly due to the absence of the incoherent non-jokes. In contrast, Fillipova et al. recently used the same type of

Non-Literal Language

639

joke materials and found significant N400 effects for jokes than for coherent non-jokes [102]. By manipulating the setup sentences rather than the punchlines of the jokes, Shibata et al. [97] found only an enhanced P200 (200–300 ms) and an LPC (500–800 ms) to semantic jokes than non-jokes. A joke example is: A woman who’d finally landed her first boyfriend proudly boasted about her achievement to a close friend. “I’m getting begged practically every day to marry him!” “That’s amazing! Are you serious? Wait–Who’s doing the asking?” “It’s my parents.” A corresponding non-joke differed from the joke only in the third sentence, that is, “So what kind of a married couple would you want to be like?” The P200 effect was associated with an early influence of contexts that evoke a feeling of unexpectedness. However, it remains to be debated whether the P200 effect truly reflects the incongruity detection or just an enhanced attention to visual features in jokes. Similarly, Canal et al. [91] reported null N400 effects. In their study, their joke materials did not contain outright semantic violations: The shopkeeper speaks with a client: “The umbrella costs 30 euros.” And the client asks: “And what can I get for less than that?” And the shopkeeper: “You can get the rain if you wish.” The semantic jokes elicited an earlier LAN (left anterior negativity) effect from 300 to 500 ms, and a sustained LAN and posterior positivity from 500 to 700 ms, compared to non-jokes. They argued that the early LAN effect reflects incongruity detection in jokes rather than syntactic anomalies such as grammatical agreement violations implicated in previous language studies, while the later LAN effect indicates the search for an alternative interpretation to solve the ambiguity in the joke [103]. Notably, it is the size of the sustained LAN effect that was correlated with participants’ ratings of surprise on the jokes. Although the distinction between the early effects for jokes was obscure in the abovementioned studies, these studies suggested that the joke effect for incongruity detection largely depends on experimental contexts, joke structures or types (i.e., semantic or phonological), and individual differences in joke processing. 5.3 Individual Differences

Joke processing is also influenced by extra-sentential factors including individual comprehension ability, social skills, and personality traits [91, 92, 104]. One of the classic studies is Coulson and Kutas [92], where good and poor comprehenders were presented with one-liner jokes and non-jokes of high- and low-sentence constraints. Sentence constraints were defined as the probability of the most likely completion of a given sentence-final word. An example of a high-constraint one-liner joke is She read so much about the bad effects of smoking she decided she’d have to give up reading, and an example of a low-constraint one is Statistics indicate that Americans spend 80 million a year on games of chance, mostly weddings. In good comprehenders, high-constraint jokes

640

Vicky Tzuyin Lai et al.

elicited a larger N400 (300–500 ms) and a late posterior positivity (500–900 ms) than non-jokes, while low-constraint jokes did not show such N400 effects, but a late frontal positivity instead. The authors argued that these high-constraint joke endings led to an enhanced N400 due to the activation of a more diverse set of frames than non-joke endings. In contrast, both the low-constraint joke and non-joke endings activated a similar set of frames, thus attenuating the N400 effect. Importantly, the subsequent posterior positivity reflects frame-level expectation violation only in the good comprehenders, whereas the late frontal positivity indicates an orienting reaction that resembles a novelty P3. Additionally, good comprehenders often showed a sustained left anterior negativity (sustained LAN; around 500–900 ms) to jokes compared with non-jokes [91–93, 105]. This sustained LAN may imply a slightly different functional significance from the late posterior positivity, that is, to search for possible alternative scripts to solve the incongruity in the joke [91]. However, a less clear picture from these studies is how an individual’s comprehension ability is related to one’s appreciation of jokes, that is, does one need to fully comprehend a joke in order to get a feeling of amusement from it? To explore individual differences on joke comprehension, Canal et al. [91] examined participants’ socio-pragmatic and working memory abilities on joke processing, assessed by the Autismspectrum Quotient (AQ) scores and a sentence-span task, respectively. Participants with higher AQ scores, thus less developed social skills, were associated with larger LANs (300–500 ms) to jokes, possibly reflecting their more efforts in detecting the incongruity with the contextual cues. In contrast, the reduced LAN in more socially inclined participants suggested their greater accommodation of incongruity in the joke materials. Participants’ working memory ability did not correlate with any joke-related ERP effect, which is contradictory to the hypothesis that working memory ability is related to LPC/P600 effects. These results emphasize the link between joke comprehension and Theory of Mind for the first time, as the latter often showed an inverse relationship with the autistic traits. Recently, Ku et al. found that personality traits including extraversion, openness to experiences, and agreeableness could predict the size of the joke-related LPC effect, indicating different degrees of allocation of cognitive resources for resolving the incongruity in jokes [104]. Specifically, more extraverted, closeminded, and agreeable people were associated with a larger LPC to jokes. Knowing the relationship between these personality traits or social skills and joke comprehension is crucial, as it lays the foundation for exploring why the same joke may be appreciated by a person but not another (e.g., a bad/cheesy joke), and individual preferences of different joke types (i.e., incongruity-resolution jokes as discussed so far vs. non-sense jokes in which the incongruity cannot be fully resolved, e.g., Two elephants were taking a bath. One said, “Please pass the soap.” The other replied, “No soap, radio.”).

Non-Literal Language

641

To sum up, joke processing involves stage-like processes starting from incongruity detection, then incongruity resolution, and finally further elaboration to get the amusement, or laughter. However, the precise timing of each stage could greatly depend on joke types and contextual influences. These variations may lead to parallel processes in joke processing such as an overlap between the stages of ambiguity detection and global coherence integration, and thus marked by a simultaneous pattern of the brain activity. To further elucidate the causal relationship between the underlying mechanisms and brain activity of joke processing, Yankovitz and Mashal recently used the offline anodal transcranial direct current stimulation (tDCS) to stimulate the left inferior frontal gyrus (IFG) for 20 min, a region repeatedly implicated in ambiguity resolution, right before participants read semantic jokes (e.g., The beggar asked money from the ice cream seller, but the seller’s response was cold) [106]. Without observing an effect of stimulation on the performance enhancement in a later semantic judgment task (i.e., to decide if the meaning of the sentence-final words makes sense), the authors argued that joke processing involves complex processes that may not be underpinned in a single cerebral region. As joke processing is highly related to the pragmatic aspect, future studies should consider its multifaceted processes, and more importantly, individual differences, such as culture, personality, sex, age, and empathy/intelligence quotient, in joke processing [107].

6

Conclusion Electrophysiological research of non-literal language has provided new evidence for old theories and created exciting new questions. In the case of metaphor research, the traditional indirect vs. direct access debate has been largely reframed. Current research effort has branched out to different metaphor types (e.g., conventional vs. novel, nominal vs. predicate, poetic vs. science), the timing of metaphor embodiment, and most recently, literary metaphors [108]. In idiom research, electrophysiological evidence shed light on the traditional debate on idiom compositionality, demonstrating that individual words in an idiom are to some extent still compositionally processed. Evidence from the latest EEG timefrequency analysis points to the importance of gamma band oscillations and brings in new questions about the role of prediction in idiom processing. In the field of irony, while the majority considered the late effect as an index of integration between linguistic semantics and real-world knowledge, some began to question whether emotional factors play a role. More recent research addresses multimodal irony, such as emojis in text messages. In jokes, perhaps the most important contribution from electrophysiological research is the establishment of a stage-wise processing model: incongruity detection, resolution, and/or elaboration.

642

Vicky Tzuyin Lai et al.

Current research is focusing on joke variations and individual differences. As stated in the introduction, the functional interpretations of the ERP components observed in non-literal language studies are non-trivial. In very broad strokes, the processing of conceptual metaphor mainly occurs at the lexical level during the N400 time frame, though contextual factors such as unfamiliarity and literariness can additionally modulate the N400 amplitudes or recruit cognitive processes underlying the P600. Consistent with the interpretation that N400 indexes lexical retrieval in literal language studies in general, the metaphoricity N400 likely reflects the retrieval of more (concrete) meanings as well as the other side of the same coin—the contextual integration which includes the enhancement and suppression of irrelevant, non-metaphoric meanings. The ERP components of idiom processing are mostly reported in the N400 time frame, but the extent to which such idiomaticity N400 is driven by item predictability remains to be tested. Additional factors such as idiom opaqueness also shift the effect toward P600, which reflects compositional semantic mechanisms beyond the lexical-semantic level in idioms. In other words, the idiomaticity P600, when found, reflects re-analysis and re-combination of the meanings of the individual lexical blocks. These findings implicate that the non-literal languages of conceptual metaphors and idioms, though traditionally treated as language pragmatics, are central to cognitive semantics. In contrast, verbal irony and verbal humor can be viewed as being in the realm where cognition and emotion meet. Verbal irony conveys negative emotion pragmatically, and verbal humor conveys positive emotion indirectly. ERP effects of verbal irony primarily take place in the P600 time frame, which can be considered to be the surface manifestation of a combination of semantic re-analysis, emotional processing, or/and mental state processing. The humor part of the verbal humor, after the semantic processing at N400, is revealed at a late or sustained positivity, indexing semantic re-analysis and emotional appreciation of humor. These findings shed light on ways in which how language and emotion interface. One limitation is that researchers to date still have not quite pinned down the commonalities and differences between literal and non-literal language. While both literal and non-literal language utilize similar semantic and syntactic mechanisms, they differ in multiple dimensions. One dimension that has not received much attention is the speaker and discourse goal of figuration. According to a survey based on hundreds of native speakers [109], the discourse goals of various types of figurative language are as follows: The goals for metaphor are to clarify intended messages and to add interest; the goals for idioms are to be conventional, to be humorous, and also to clarify intended messages; and the goals for irony are to express negative emotion and to be humorous. The identification and integration of this type of speaker goal likely take place

Non-Literal Language

643

downstream of processing, and unfortunately multiple processes occur concurrently in the later time frame of ERPs. New analyses are needed to tease apart later processes, as has been shown in a study that employed the time frequency analysis and dissociated irony P600 and syntactic P600 [81]. In conclusion, non-literal language provides a perfect arena for interactions among language, cognition, and emotion. Metaphors and idioms have informed us about the interaction between language, concepts, and cognition, and irony and jokes have illuminated the interaction between language and emotion. Future research will benefit from taking into consideration more cognitive factors (e.g., cognitive control) and social-emotional factors (e.g., mentalizing) in understanding the uniqueness of non-literal language. References 1. Lakoff G, Johnson M (1999) Philosophy in the flesh: the embodied mind and its challenge to western thought. University of Chicago Press, Chicago 2. Gibbs RW Jr (1996) Why many concepts are metaphorical. Cognition 61(3):309–319 3. Pollio HR et al (1977) Psychology and the poetics of growth: figurative language in psychology, psychotherapy, and education. Erlbaum, Hillsdale 4. Graesser AC, Long DL, Mio JS (1989) What are the cognitive and conceptual components of humorous text? Poetics 18(1–2):143–163 5. Nippold MA (1991) Evaluating and enhancing idiom comprehension in languagedisordered students. Lang Speech Hear Serv Sch 22(3):100–106 6. Steen GJ et al (2010) A method for linguistic metaphor identification: from MIP to MIPVU. John Benjamins, Amsterdam 7. Gerring RJ, Healy AF (1983) Dual processes in metaphor understanding: comprehension and appreciation. J Exp Psychol Learn Mem Cogn 9(4):667–675 8. Glucksberg S, Gildea P, Bookin HB (1982) On understanding nonliteral speech: can people ignore metaphors? J Verbal Learn Verbal Behav 21(1):85–98 9. Keysar B (1989) On the functional equivalence of literal and metaphorical interpretations in discourse. J Mem Lang 28(4): 375–385 10. Blasko DG, Connine CM (1993) Effects of familiarity and aptness on metaphor processing. J Exp Psychol Learn Mem Cogn 19(2): 295–308 11. Gibbs RW Jr (1994) The poetics of mind: figurative thought, language, and

understanding. Cambridge University Press, New York 12. Glucksberg S (2003) The psycholinguistics of metaphor. Trends Cogn Sci 7(2):92–96 13. Grice HP (1975) Logic and conversation. In: Cole P, Morgan JL (eds) Syntax and semantics 3: speech acts, 1st edn. Academic Press, New York, pp 41–58 14. Searle J (1979) Expression and meaning. Cambridge University Press, Cambridge 15. Coulson S, Van Petten C (2002) Conceptual integration and metaphor: an event-related potential study. Mem Cogn 30(6):958–968 16. Arzouan Y, Goldstein A, Faust M (2007) Brainwaves are stethoscopes: ERP correlates of novel metaphor comprehension. Brain Res 1160:69–81 17. Lai VT, Curran T, Menn L (2009) Comprehending conventional and novel metaphors: an ERP study. Brain Res 1284:145–155 18. De Grauwe S et al (2010) Electrophysiological insights into the processing of nominal metaphors. Neuropsychologia 48(7): 1965–1984 19. Goldstein A, Arzouan Y, Faust M (2012) Killing a novel metaphor and reviving a dead one: ERP correlates of metaphor conventionalization. Brain Lang 123(2):137–142 20. Lai VT, Curran T (2013) ERP evidence for conceptual mappings and comparison processes during the comprehension of conventional and novel metaphors. Brain Lang 127(3):484–496 21. Schneider S et al (2014) Beyond the N400: complementary access to early neural correlates of novel metaphor comprehension using combined electrophysiological and haemodynamic measurements. Cortex 53:45–59

644

Vicky Tzuyin Lai et al.

22. Bowdle BF, Gentner D (2005) The career of metaphor. Psychol Rev 112(1):193–216 23. Tang X et al (2017) The temporal dynamics underlying the comprehension of scientific metaphors and poetic metaphors. Brain Res 1655:33–40 24. Kutas M, Federmeier KD (2011) Thirty years and counting: finding meaning in the N400 component of the event-related brain potential (ERP). Annu Rev Psychol 62:621–647 25. Kuperberg GR (2007) Neural mechanisms of language comprehension: challenges to syntax. Brain Res 1146:23–49 26. Jamrozik A et al (2016) Metaphor: bridging embodiment to abstraction. Psychon Bull Rev 23(4):1080–1089 27. Barsalou LW (2008) Grounded cognition. Annu Rev Psychol 59:617–645 28. Gallese V, Lakoff G (2005) The brain’s concepts: the role of the sensory-motor system in conceptual knowledge. Cogn Neuropsychol 22(3–4):455–479 29. Binder JR, Desai RH (2011) The neurobiology of semantic memory. Trends Cogn Sci 15(11):527–536 30. Leshinskaya A, Caramazza A (2016) For a cognitive neuroscience of concepts: moving beyond the grounding issue. Psychon Bull Rev 23(4):991–1001 31. Citron FM, Goldberg AE (2014) Metaphorical sentences are more emotionally engaging than their literal counterparts. J Cogn Neurosci 26(11):2585–2595 32. Desai RH et al (2013) A piece of the action: modulation of sensory-motor regions by action idioms and metaphors. NeuroImage 83:862–869 33. Lai VT, Desai RH (2016) The grounding of temporal metaphors. Cortex 76:43–50 34. Mahon BZ, Caramazza A (2008) A critical look at the embodied cognition hypothesis and a new proposal for grounding conceptual content. J Physiol Paris 102(1–3):59–70 35. Zanolie K et al (2012) Mighty metaphors: Behavioral and ERP evidence that power shifts attention on a vertical dimension. Brain Cogn 78(1):50–58 36. Amsel BD, Urbach TP, Kutas M (2014) Empirically grounding grounded cognition: the case of color. Neuroimage 99:149–157 37. Reilly M, Howerton O, Desai RH (2019) Time-course of motor involvement in literal and metaphoric action sentence processing: a TMS study. Front Psychol 10:371 38. Bardolph M, Coulson S (2014) How vertical hand movements impact brain activity elicited

by literally and metaphorically related words: an ERP study of embodied metaphor. Front Hum Neurosci 8:1031 39. Lai VT, Howerton O, Desai RH (2019) Concrete processing of action metaphors: evidence from ERP. Brain Res 1714:202–209 40. Barber HA et al (2013) Concreteness in word processing: ERP and behavioral effects in a lexical decision task. Brain Lang 125(1): 47–53 41. West WC, Holcomb PJ (2000) Imaginal, semantic, and surface-level processing of concrete and abstract words: an electrophysiological investigation. J Cogn Neurosci 12(6): 1024–1037 42. Swinney DA, Cutler A (1979) The access and processing of idiomatic expressions. J Verbal Learn Verbal Behav 18(5):523–534 43. Bobrow SA, Bell SM (1973) On catching on to idiomatic expressions. Mem Cognit 1(3): 343–346 44. Gibbs RW Jr, Nayak NP (1989) Psycholinguistic studies on the syntactic behavior of idioms. Cogn Psychol 21(1):100–138 45. Libben MR, Titone DA (2008) The multidetermined nature of idiom processing. Mem Cognit 36(6):1103–1121 46. Cacciari C, Tabossi P (1988) The comprehension of idioms. J Mem Lang 27(6):668–683 47. Moreno EM, Federmeier KD, Kutas M (2002) Switching languages, switching palabras (words): an electrophysiological study of code switching. Brain Lang 80(2):188–207 48. Liu Y, Li P, Shu H et al (2010) Structure and meaning in Chinese: an ERP study of idioms. J Neurolinguist 23(6):615–630 49. Rommers J, Dijkstra T, Bastiaansen M (2013) Context-dependent semantic processing in the human brain: evidence from idiom comprehension. J Cogn Neurosci 25(5):762–776 50. Laurent JP, Denhie`res G, Passerieux C et al (2006) On understanding idiomatic language: the salience hypothesis assessed by ERPs. Brain Res 1068(1):151–160 51. Proverbio AM, Crotti N, Zani A et al (2009) The role of left and right hemispheres in the comprehension of idiomatic language: an electrical neuroimaging study. BMC Neurosci 10(1):116 52. Vespignani F, Canal P, Molinaro N et al (2010) Predictive mechanisms in idiom comprehension. J Cogn Neurosci 22(8): 1682–1700 53. Canal P, Pesciarelli F, Vespignani F et al (2017) Basic composition and enriched

Non-Literal Language integration in idiom processing: an EEG study. J Exp Psychol Learn 43(6):928 54. Hubbard RJ, Bulkes N, Lai VT (2023) Separable neural components of literality, predictability, and decomposability contribute to compositional language processing. Psychophysiology:e14269 55. Hauk O, Johnsrude I, Pulvermu¨ller F (2004) Somatotopic representation of action words in human motor and premotor cortex. Neuron 41(2):301–307 56. Boulenger V, Shtyrov Y, Pulvermu¨ller F (2012) When do you grasp the idea? MEG evidence for instantaneous idiom understanding. NeuroImage 59(4):3502–3513 57. Hald LA, Bastiaansen MC, Hagoort P (2006) EEG theta and gamma responses to semantic violations in online sentence processing. Brain Lang 96(1):90–105 58. Wang L, Hagoort P, Jensen O (2018) Language prediction is reflected by coupling between frontal gamma and posterior alpha oscillations. J Cogn Neurosci 30(3):432–447 59. Titone D, Holzman PS, Levy DL (2002) Idiom processing in schizophrenia: literal implausibility saves the day for idiom priming. J Abnorm Psychol 111(2):313 60. Papagno C, Caporali A (2007) Testing idiom comprehension in aphasic patients: the effects of task and idiom type. Brain Lang 100(2): 208–220 61. Botvinick MM, Braver TS, Barch DM et al (2001) Conflict monitoring and cognitive control. Psychol Rev 108(3):624 62. Fogliata A, Rizzo S, Reati F et al (2007) The time course of idiom processing. Neuropsychologia 45(14):3215–3222 63. Rizzo S, Sandrini M, Papagno C (2007) The dorsolateral prefrontal cortex in idiom interpretation: an rTMS study. Brain Res Bull 71(5):523–528 64. Ha¨user KI, Titone DA, Baum SR (2016) The role of the ventro-lateral prefrontal cortex in idiom comprehension: an rTMS study. Neuropsychologia 91:360–370 65. Sela T, Ivry RB, Lavidor M (2012) Prefrontal control during a semantic decision task that involves idiom comprehension: a transcranial direct current stimulation study. Neuropsychologia 50(9):2271–2280 66. Mitchell RL, Vidaki K, Lavidor M (2016) The role of left and right dorsolateral prefrontal cortex in semantic processing: a transcranial direct current stimulation study. Neuropsychologia 91:480–489 67. Bohrn IC, Altmann U, Jacobs AM (2012) Looking at the brains behind figurative

645

language—a quantitative meta-analysis of neuroimaging studies on metaphor, idiom, and irony processing. Neuropsychologia 50(11):2669–2683 68. Martin I, McDonald S (2004) An exploration of causes of non-literal language problems in individuals with Asperger syndrome. J Autism Dev Disord 34(3):311–328 69. Mitchley NJ et al (1998) Comprehension of irony in schizophrenia. Cogn Neuropsychiatry 3(2):127–138 70. Pexman PM, Glenwright M (2007) How do typically developing children grasp the meaning of verbal irony? J Neurolinguistics 20(2): 178–196 71. Regel S, Coulson S, Gunter TC (2010) The communicative style of a speaker can affect language comprehension? ERP evidence from the comprehension of irony. Brain Res 1311:121–135 72. Caillies S et al (2019) Asymmetry of affect in verbal irony understanding: what about the N400 and P600 components? J Neurolinguistics 51:268–277 73. Filik R et al (2014) Testing theories of irony processing using eye-tracking and ERPs. J Exp Psychol Learn Mem Cogn 40(3):811 74. Pfeifer VA, Lai VT (2021) The comprehension of irony in high and low emotional contexts. Can J Exp Psychol 75(2):120–125 75. Weissman B, Tanner D (2018) A strong wink between verbal and emoji-based irony: how the brain processes ironic emojis during language comprehension. PLoS One 13(8): e0201727 76. Regel S, Gunter TC (2017) Don’t get me wrong: ERP evidence from cueing communicative intentions. Front Psychol 8:1465 77. Caffarra S et al (2019) When is irony influenced by communicative constraints? ERP evidence supporting interactive models. Eur J Neurosci 50(10):3566–3577 78. Regel S, Gunter TC, Friederici AD (2011) Isn’t it ironic? An electrophysiological exploration of figurative language processing. J Cogn Neurosci 23(2):277–293 79. Baptista NI, Manfredi M, Boggio PS (2018) Medial prefrontal cortex stimulation modulates irony processing as indexed by the N400. Soc Neurosci 13(4):495–510 80. Spotorno N et al (2013) What’s behind a P600? Integration operations during irony processing. PLoS One 8(6):e66839 81. Regel S, Meyer L, Gunter TC (2014) Distinguishing neurocognitive processes reflected by P600 effects: evidence from ERPs and neural oscillations. PLoS One 9(5):e96840

646

Vicky Tzuyin Lai et al.

82. Akimoto Y et al (2017) Alpha band eventrelated desynchronization underlying social situational context processing during irony comprehension: a magnetoencephalography source localization study. Brain Lang 175: 42–46 83. Fields EC, Kuperberg GR (2012) It’s all about you: an ERP study of emotion and self-relevance in discourse. NeuroImage 62(1):562–574 84. Lee CJ, Katz AN (1998) The differential role of ridicule in sarcasm and irony. Metaphor Symb 13(1):1–15 85. Attardo S (2017) The Routledge handbook of language and humor. Taylor & Francis, New York 86. Raskin V (1985) Semantic mechanisms of humor. D. Reidel, Dordrecht 87. Suls JM (1972) A two-stage model for the appreciation of jokes and cartoons: an information-processing analysis. In: Goldstein JH (ed) The psychology of humor: theoretical perspectives and empirical issues, 1st edn. Academic Press, Massachusetts, pp 81–100 88. Ventis L (2015) Thinking fast and slow in the experience of humor. Humor 28(3):351–373 89. Ruch W, Hehl FJ (1998) A two-mode model of humor appreciation: its relation to aesthetic appreciation and simplicity-complexity of personality. In: Ruch W (ed) The sense of humor: explorations of a personality characteristic, 1st edn. Mouton de Gruyter, Berlin, pp 109–142 90. Wyer RS, Collins JE (1992) A theory of humor elicitation. Psychol Rev 99(4): 663–688 91. Canal P et al (2019) ‘Honey, shall I change the baby?–well done, choose another one’: ERP and time-frequency correlates of humor processing. Brain Cogn 132:41–55 92. Coulson S, Kutas M (2001) Getting it: human event-related brain response to jokes in good and poor comprehenders. Neurosci Lett 316(2):71–74 93. Du X et al (2013) Differentiation of stages in joke comprehension: evidence from an ERP study. Int J Psychol 48(2):149–157 94. Feng YJ, Chan YC, Chen HC (2014) Specialization of neural mechanisms underlying the three-stage model in humor processing: an ERP study. J Neurolinguistics 32:59–70 95. Marinkovic K et al (2011) Right hemisphere has the last laugh: neural dynamics of joke

appreciation. Cogn Affect Behav Neurosci 11(1):113–130 96. Mayerhofer B, Schacht A (2015) From incoherence to mirth: neuro-cognitive processing of garden-path jokes. Front Psychol 6:550 97. Shibata M et al (2017) Time course and localization of brain activity in humor comprehension: an ERP/sLORETA study. Brain Res 1657:215–222 98. Coulson S, Williams RF (2005) Hemispheric asymmetries and joke comprehension. Neuropsychologia 43(1):128–141 99. Ku LC et al (2017) A re-visit of three-stage humor processing with readers’ surprise, comprehension, and funniness ratings: an ERP study. J Neurolinguistics 42(162):49–62 100. Cuthbert BN et al (2000) Brain potentials in affective picture processing: covariation with autonomic arousal and affective report. Biol Psychol 52(2):95–111 101. Hempelmann CF, Samson AC (2007) Visual puns and verbal puns: descriptive or false analogy? In: Attardo S, Popa D (eds) New approaches to the linguistics of humor, 1st edn. Dunarea de Jos, Galati, pp 180–196 102. Filippova MG, Shcherbakova OV, Shtyrov YY (2020) It is not what you think it is: Erp correlates of verbal and non-verbal ambiguity processing. Neurosci Behav Physiol:1–9 103. Molinaro N, Barber HA, Carreiras M (2011) Grammatical agreement processing in reading: ERP findings and future directions. Cortex 47(8):908–930 104. Ku LC, Chang YT, Chen HC (2020) How do extraverts process jokes? An event-related potential study on humor processing. Brain Cogn 141:105553 105. Coulson S, Lovett C (2004) Handedness, hemispheric asymmetries, and joke comprehension. Cogn Brain Res 19(3):275–288 106. Yankovitz B, Mashal N (2020) Can brain stimulation improve semantic joke comprehension? J Cogn Psychol:1–12 107. Vrticka P, Black JM, Reiss AL (2013) The neural basis of humour processing. Nat Rev Neurosci 14(12):860–868 108. Bambini V, Resta D, Grimaldi M (2019) Time course and neurophysiological underpinnings of metaphor in literary context. Discourse Process 56(1):77–97 109. Roberts RM, Kreuz RJ (1994) Why do people use figurative language? Psychol Sci 5(3): 159–163

Chapter 20 Neurological Evidence of the Phonological Nature of Tones Amedeo De Dominicis Abstract This chapter is a survey of the methodology used in the experiments concerning the tone-brain relationship. Its intention is therefore to highlight the criticalities of these experiments and to propose some suggestions. The theme of the neurological bases of tones is of specific linguistic interest because (level and contour) tones are distinctive in tonal languages and because intonation in both tonal and non-tonal languages has been accounted for by assuming the hypothesis of their phonological status both as targets and as sequences of their combination. Nevertheless, according to the neurological literature, there is controversial evidence of the phonological nature of prosody and particularly of intonation. Moreover, an even more intricate question concerns the abstract nature of the variables in any experiment: in order to correctly test the correlation between two variables, both should be measurable, that is, not abstract. Thus, this review raises some questions. Is an abstract linguistic tone a phonological constituent both in tonal and non-tonal languages? And what does “abstract” mean in linguistic terms? In this chapter, we assume that phonological constituents are interpreted as categorical processes, both in production and in perception. According to this special definition, they can take the form of neural states or processes. Key words Linguistic tone, Neurolinguistics, Abstractness, Intonation, Experimental methodology

Abbreviations ABR EEG EPs ERP H L LHD MEG MMF MMN NBD RHD

Auditory brainstem response (or Eps) Electroencephalogram Auditory evoked potentials Event-related potentials High tone Low tone Left-hemisphere damage Magnetoencephalography Mismatch field Mismatch negativity Non-brain-damaged Right-hemisphere damage

Mirko Grimaldi et al. (eds.), Language Electrified: Principles, Methods, and Future Perspectives of Investigation, Neuromethods, vol. 202, https://doi.org/10.1007/978-1-0716-3263-5_20, © Springer Science+Business Media, LLC, part of Springer Nature 2023

647

648

1

Amedeo De Dominicis

Introduction What exactly is phonology is a question of long-lasting debate. Generally speaking, language disciplines postulate that human linguistic behavior requires a scientific explanation. Therefore, they postulate the existence of a linguistic competence, innate or learned, ensuring the “link” between the performances of each speaker. Phonology is part of this competence. But this is only an epistemological assumption. Of course, no one has ever “seen” either phonology or syntax, or, more generally, linguistic competence. Lacking experimental evidence, phonological categories therefore remain merely an epistemological assumption, an “abstract” entity without a confident empirical basis. One possible hypothesis is that linguistic competence—and therefore also phonological categories—corresponds to neurobiological processes. Recently, neural “imaging” techniques have allowed one to “see” brain activity related to the execution of given human behaviors. This enabled a cause-and-effect relationship to be hypothesized, that is, behavior x is caused by neural process y. Actually, however, there are quite a few problems. First, neural “imaging” was born as a medical tool and measures quantitative facts, such as the differential of glucose or oxygen inflow destined for different areas of neurons: where more “food” arrives to the neurons, there the neural locus of the observed behavior is inferred. But here we are faced with a leap: qualitative deductions are drawn from quantitative facts. Such a “jump” assumes that the so-called categorical thresholds have been identified, as in all so-called critical phenomena. Conversely, the experimental protocol of these clinical tests does not provide the preliminary identification of such critical thresholds. Second, “observing a linguistic behavior” in the subject under experiment is not easy. Above all, it is difficult to induce a given linguistic behavior without involuntarily inducing other undesirable linguistic variants. In addition, these “unwanted” behaviors contaminate the results of the experiment. Generally speaking, several limits occur when imaging techniques are simply applied to neurolinguistics. Let us imagine a fictional example in order to understand the limits of the applications of these medical diagnostic techniques to neurolinguistic research. Suppose we are looking at a map of a city on our computer screen. Let us also suppose that on this screen we want to track the movements of vehicles using a Global Positioning System (GPS) by means of which we can derive the vehicles’ movements and destinations and nothing else. Now, within this methodological framework, suppose we insert a new objective: to find the location of, for example, the grocery stores on the map. Well, the imaging model will fail, because it will be unable to distinguish the “function” of each vehicle; on the contrary, it will identify only the vehicles’ number and direction.

Neurological Evidence of the Phonological Nature of Tones

649

Essentially, there is no proof that neurological evidence is also phonological evidence. In particular, this statement is true when we deal with “abstract” categories, that is, with an experimental variable that cannot be measured during the experiment.1 In addition, phonological categories—such as tonal targets—are by definition “abstract.” Despite these perplexities, we will attempt to determine if in the neurolinguistic literature one can find evidence of confident neural bases of the categories of phonological tone. We will carry out a methodological survey. At the end of this survey, we will argue that the main problem with experiments in neurolinguistics is that we should have less abstractness in linguistics, and particularly that we should measure the experimental variables before starting the neurolinguistic experiment.

2

Organization of this Chapter This chapter will develop the following steps. First, it surveys the main claims regarding the uses of the category of tone in linguistics, according to different models (Subheading 3). Next (Subheading 4) we will argue how, in the neurological tonotopic literature, the evidence for the neural bases of intonation is controversial, heterogeneous, and ambiguous. In particular, in the literature, in terms of tones and intonation, one finds neurological evidence supporting the activation of left hemisphere, but also neurological evidence supporting the activation of both hemispheres (Subheading 4.1); and, always in the literature, one finds data accounting for the absence of brain correlates of intonation (Subheading 4.2) or for the dependence of intonation on the linguistic performance, but also for its dependence on the linguistic system (Subheading 4.3), or one finds experimental results in favor of the innatism of intonation, but also in favor of the learned nature of intonation processing (Subheading 4.4). Next, this chapter reviews an experimental study aiming at identifying the neural correlates of categorical perception of intonation. The study analyzes these findings using two statistical models, but the results are contradictory (Subheading 4.5). This discrepancy shows that the neural correlates of human language processing are activated by intonation in terms of opposition or variation among levels of tone, but not in terms of absolute levels (H/L tones).

1

For instance, in neurological experiments on tones one often finds a reference to—as yet undefined—“affective,” or “emotional,” or paralinguistic tones (as opposed to lexical or linguistic tones), but no definition is provided for what is affective or emotional, and this is a clear demonstration of the abstractedness of this experimental variable.

650

Amedeo De Dominicis

In Subheading 5, this chapter reviews some neurological studies that use tonochronic methods. They measure the latency of the brain response to a tonal stimulus.2 In neurology, this latency is considered a measure of the attention of the subject. However, how to equate the measure of the listener’s attention with phonological evidence remains controversial. In Subheading 6, we discuss the notion of abstractness in linguistics before moving on to our conclusion (Subheading 7) and to some suggestions for a possible neuro-experiment on tones (Subheading 8).

3

Tones in Linguistics Linguistics deals with tones because tonal languages use tonal oppositions to implement the distinctive function. In intonational languages, the function of tones has been assigned to linguistic analysis in order to represent the intonation as a sequence of discrete categories. In tonal languages, tones convey lexical information and are described as events or targets pushing the pitch up or down (statically or dynamically). These events are named high (H) or low (L), or rising or falling tones. In non-tonal languages, intonation refers to pitch patterns in an utterance that convey non-lexical meanings. However, all linguistic models for intonation refer to what we may term special “events,” pushing up or down the F0 contours. These events are coded as high (H) or low (L) tones and their combinations thereof. Thus, the main difference between tone and intonation is the function of tones: lexical versus non-lexical (or post-lexical).3 As for intonation, some models directly map F0 contours or events to phonological categories or communicative functions;4

2 In previous studies, this technique succeeded in demonstrating the neural categorization of vowels: high vowels elicited a larger amplitude (i.e., larger latency) of EEG waves than did non-high vowels. 3 Of course, lexical tones and intonation belong to different domains of linguistics. Nevertheless, from a neurological standpoint, as Llanos [1: 11] observes, even in tonal languages “the transformation of continuous auditory inputs to linguistically relevant categories also operates at the suprasegmental or prosodic level.” In fact, one can also find Autosegmental-Metrical accounts for tone languages (e.g., see the many contributions in [2]). 4 According to the Autosegmental-Metrical (AM) or ToBI [3–6] model, intonation consists of F0 maxima and minima connected by linear or sagging interpolation, with only a single layer of events: pitch accents and boundary tones, each directly controlled. Thus, English intonation consists of linearly concatenated pitch accents known as tones. AM phonology ascribes the phonological nature of tones to three components: a grammar of phrasal tunes, a metrical grid, and rules of tunes-text association [3: 10–11, 236]. According to the IPO model [7, 8], intonation consists of linearly concatenated, stylized local pitch contours. According to the Tilt model [9], intonation consists of sporadic F0 events that are assumed to be linguistically meaningful, and F0 between these events results from interpolation.

Neurological Evidence of the Phonological Nature of Tones

651

other models (articulatory oriented models) simulate the articulatory processes underlying the generation of F0 contours.5 The former models try to capture surface forms directly; they try to map them onto communicative functions or phonological categories; and their basic assumption is that speakers are able to control surface F0 contours directly. The latter models try to simulate the articulatory processes underlying the generation of surface F0 contours; their parameters often have some articulatory connotations; and their basic assumption is that perception and linguistic specifications play only a partial role in determining F0 trajectories. Thus, it seems reasonable to raise the issue of the neurological encoding of these “events,” both in tonal and non-tonal languages. According to Pierrehumbert [3: 150], in intonational languages, the tonal organization is arranged into pitch accents, and this organization appears to be lacking in tonal languages; hence, it plays no role in tonal implementation. Moreover, intonational languages appear to use pitch range expressively within the phrase to an extent unparalleled in tonal languages. Nevertheless, both tonal and intonational languages can be described and accounted for using a complementary autosegmental string, where tonal properties are housed. Tones are different linguistic objects in the former and in the latter. Whether the tones in the former and in the latter can be assumed to refer to the same neurological process is a question to be investigated by means of the neurological literature. However, this literature has produced controversial results, and this incoherence is a problem for assuming tone as an abstract category both for intonational and tonal languages. This brings us back to the question of abstractness in linguistics categories. The problem is that it is difficult to define the “abstract” nature of a linguistic item, especially because—by definition—it is difficult to measure, to test what is not concrete. In the case of linguistic tones, one should understand what is concrete, that is, measurable, in a tone, before putting an “abstract” tone under experimental investigation, as we will see below.

4

Neurological Evidence Two hypotheses have been formulated in order to account for the mapping of acoustic structures onto neurons: tonotopy and tonochrony. The so-called tonotopic principle [12] claims that the acoustic frequencies map directly onto clusters of neurons within the auditory cortex, thanks to the specific sensitivity of nerve cells to

5

In the Command-Response (Fujisaki) model [10, 11], F0 is forced to deviate from the baseline by phrase commands and accent commands, and then asymptotically returns to the baseline; F0 deviations due to the two kinds of commands are added together logarithmically, forming surface F0.

652

Amedeo De Dominicis

the spectral properties of sounds, by a selective activation process that begins early in the cochlear neurons regularly positioned along the basilar membrane [13, 14]. As for the tonochrony principle, it has been suggested that latency of evoked responses may be a supplementary dimension for object encoding in the auditory system. Roberts and Poeppel [15] demonstrated that there is a frequency dependence of latencies separate from stimulus intensity. We will account for tonochrony in Subheading 5. 4.1 Left Versus Right Versus Both Hemispheres

In the neurological literature, different types of evidence support the left or the right lateralization of brain processes underlying tone production/perception, and some studies have also found evidence for a location in both hemispheres. The neurological model of language processing by Damasio and Damasio [16, 17] locates the neural site of the linguistic categories in a wide region of the left hemisphere. The lexical tones of tonal languages do activate a neural circuit located in this region; conversely, the intonation (in non-tonal languages) activates neural regions that are heterogeneous and much more extended than the sites of language processing. Similarly, Neville [18] has shown that the left hemisphere is activated not only by auditory stimuli but also by visual stimuli that have linguistic significance. In particular, they claim that paralinguistic tones (intonation) and lexical tones (in tonal languages) always activate the left frontal lobe, in particular the lateral frontal region (Broca’s area) and posterior superior temporal region (Wernicke’s area). Conversely, other researchers do not confirm the left location of neural bases of intonation. For instance, Gandour [19] compared pitch perception of linguistic and non-linguistic auditory stimuli in native speakers of two-tone languages (Chinese and Thai) and of a non-tone language (English). Only the Thai group showed significant activation in the left frontal operculum; the other groups showed significant activation in the anterior insular region for the English and Chinese groups, but not for Thai. These differential patterns of brain activation across language groups and tasks support the view that pitch patterns are processed at higher cortical levels in a top-down manner according to their linguistic function in a particular language.6

6 According to Albouy [20], the same brain asymmetry for speech and music emerges from domain-specific neural networks, and speech refers to the left auditory cortex. They found that degradation of temporal information impaired speech recognition but not melody recognition, whereas degradation of spectral information impaired melody recognition but not speech recognition. Functional magnetic resonance imaging data revealed a right-left asymmetry for speech and music. Classification of speech content occurred exclusively in the left auditory cortex, whereas classification of melodic content occurred only in the right auditory cortex.

Neurological Evidence of the Phonological Nature of Tones

653

Moreover, other studies support the role of both hemispheres in acoustic processing of language tones. The results obtained by Xi [21] provide strong neurophysiological evidence in support of the categorical perception of lexical tones in Mandarin Chinese. In particular, they show that the regions engaged in Chinese lexical tone reading are bilateral, that is, located in both hemispheres. Similarly, Kwok [22] demonstrates that bilateral frontal, parietal, motor, and cingulate regions are engaged in Mandarin Chinese lexical tone reading, and that temporal regions are not involved in lexical tone processing in reading comprehension. Once again, concerning the site (left or right hemisphere) of the intonation processing, van der Burght [23] carried out a functional magnetic resonance imaging (MRI) experiment. The results show that the lateralization of intonation processing depends on its role in syntactic processing: activity in the inferior frontal gyrus (IFG) was lateralized to the left hemisphere when intonation was the only source of information to understand the sentence. 4.2 Lexical Versus Non-Lexical Tones

7

Many studies demonstrate the neural irrelevance of the non-lexical tones. In reviewing the relevant scientific literature, we often found that non-lexical tones are equated to “affective” tones, although this equation is by no means appropriate. We also found that results on tonal and non-tonal languages are discussed as comparable: from a neurological standpoint, this extension is reasonable, at least to verify its possible common neural encoding, as noted in Subheading 3. The right hemisphere has often been claimed to be a potential neural locus for at least some aspects of intonation, such as what is inadequately called “affective” prosody, and people with righthemisphere damage (RHD) have often been reported to show impairments in this domain.7 This phenomenon has been investigated primarily in terms of perception, rarely in terms of production, and more rarely still using acoustic analysis.8 According to the (scarce) literature reporting the acoustic features of prosodic production in RHD, no strong evidence indicates that the prosodic productions of people with RHD is substantially different from that of NBD (non-brain-damaged) people, when measured in terms of acoustic features. At most, the acoustic features of productions by people with RHD do differ (slightly) from

In the later twentieth century, attention turned to the right hemisphere as a potential neural locus of at least some aspects of prosody. Ross, in particular [24–27], described several patients whose prosody became “monotone” following an RHD (damage to the right hemisphere), and proposed that “affective” prosody might be a dominant language function of the right hemisphere. More recently, the right hemisphere has come to be widely associated with prosody, and with “emotional” or “affective” prosody in particular [28–30]. Unfortunately, in the literature that we analyzed the whole hemisphere is considered as the unit of analysis, and more fine-grained distinctions are not always adopted. 8 For example, [31–33].

654

Amedeo De Dominicis

those of subjects with NBD and LHD (left-hemisphere damage) in F0 variation and pause duration. Prosody type (“emotional” vs. linguistic9) had very little effect. Currently available data show only a weak effect of RHD on prosody production.10 In the literature, these features are compared between individuals with RHD and non-brain-damaged (NBD) control groups and, when possible, the results are compared to participants with LHD. The results of these acoustic studies reveal some minor differences in prosody production in people with RHD when compared with the NBD control group. These results, which are listed below, concern phonetic rather than phonological facts. Pitch variation is reduced in people with RHD, although the effect is small; variation in pitch is essentially the same for people with RHD and LHD: in fact, variation is slightly reduced for people with RHD, suggesting that the effect may be due to issues surrounding brain damage in general, and not those specific to damage to the RH. Pause duration is indeed affected in RHD: not only do participants with RHD produce shorter pauses than NBD participants, but their pauses are also shorter than LHD participants. This compression of syllables in RHD is due to damage to the right hemisphere. These results demonstrate the neural (and phonological) irrelevance of the non-lexical tones, and they correspond to findings by Wang [46] and (already mentioned in Subheading 4.1) by Gandour et al. [19] and van der Burght et al. [23] on tonal languages. The study by Wang et al. [46] investigates the hemispheric lateralization of Mandarin tones and demonstrates that the left hemisphere is activated only for speakers of tone languages. Four groups of listeners were examined: native Mandarin listeners, English–Mandarin bilinguals, Norwegian listeners with experience with Norwegian tone, and American listeners with no tone experience. Tone pairs were dichotically presented and listeners identified which tone they heard in each ear. For the Mandarin listeners, 57% of the total errors occurred in the left ear, indicating a right-ear (left-hemisphere) advantage. The English–Mandarin bilinguals exhibited nativelike patterns, with 56% left-ear errors. However, no ear advantage was found for the Norwegian or American listeners (48% and 47% left-ear errors, respectively). Results indicate the left-hemisphere dominance of Mandarin tone by native and

9

In the analyzed literature, many authors distinguish between linguistic prosody (e.g., using intonation to distinguish between noun-noun compounds and noun phrases) and “emotional” or “affective” prosody. As we observed in footnote 1, no definition is given for what is considered “emotional” or “affective.” 10 In the studies analyzed, the elementary acoustic features argued to support prosody are only fundamental frequency (F0) variability, pause duration, syllable duration, and intensity duration. The literature measures these acoustic features hypothetically relating them to right-hemisphere damage [28, 34–45]. Unfortunately, in the majority of these studies, the size of the sample population is very small: the analyses include only ten or fewer patients.

Neurological Evidence of the Phonological Nature of Tones

655

proficient bilingual listeners, whereas non-native listeners show no evidence of lateralization, regardless of their familiarity with lexical tone. 4.3 Performance Versus System (Phonetics Versus Phonology)

11

One more unclear and debated issue in the experimental studies exploring the neurological bases of intonation is the question whether pitch contour is driven by the linguistic performance or by the linguistic system. Inouchi [47] supports the conclusion according to which the target tones depend on the neural bases of the linguistic system rather than of the performance. Another study based on the magnetic mismatch field (MMF) found that shortened-vowel duration changes and level-to-falling pitch changes in Japanese words elicited a prominent MMF in two hemispheres for both native and non-native speakers [48]. Their 2002 study [47] investigates whether shortened duration changes and level-to-falling pitch changes in non-speech (tones) would elicit a more prominent MMF component than lengthened duration changes and fallingto-level pitch changes, respectively. Stimuli included three computer-synthesized tones with varying duration or frequency modulation: (1) short duration and level pitch; (2) long duration and level pitch; and (3) long duration and falling pitch. Magnetoencephalography (MEG) responses were recorded using a dual 37-channel gradiometer system. The results show that the prominent MMF component was generated in long-to-short duration changes and level-to-falling pitch changes in each hemisphere for both Japanese and American subjects. The component peaked at around 100 ms after change onset for duration changes and 170 ms for pitch changes. The MMF component in tones, as in words, was particularly sensitive to duration shortening and pitch falling. In short, changes in duration shortening and pitch falling are particularly salient cues for pre-attentive auditory change detection in each hemisphere. On the other hand, findings by Chandrasekaran [49] assert that pitch contours are not driven by linguistic categories. To assess the domain specificity of experience-dependent pitch representation, they evaluated the mismatch negativity (MMN) and discrimination judgments of English musicians, English non-musicians, and native Chinese for pitch contours presented in a non-speech context using a passive oddball paradigm.11 Stimuli consisted of homologues of

The oddball paradigm is a commonly used task for cognitive and attention measurement in ERP (event-related potentials) studies. In this study, two visual stimuli, a box and a sphere, shapes 5 cm in size, were designed as the standard and target stimuli, respectively. The presentation duration of each trial, either the standard (box) or target (sphere) trial, was 500 ms, with an intertrial interval (ITI) between two consecutive trials of 500 ms. The participants were instructed to press “0” for a target stimulus and not to respond for a standard stimulus. Further, the reaction time and correct target detection of each participant were recorded. Two types of error were expected: false alarm (i.e., pressed key when standard stimulus was shown) and omission (forgot to press key when target stimulus appeared).

656

Amedeo De Dominicis

Mandarin high rising (T2) and high level (T1) tones, and a linear rising ramp (T2L). One condition involved a between-category contrast (T1/T2), and the other a within-category contrast (T2L/T2). Irrespective of condition, musicians and Chinese showed larger MMN responses than non-musicians, those by the Chinese larger than those by musicians. The Chinese, however, were less accurate than non-natives in overt discrimination of T2L and T2. Taken together, these findings suggest that experiencedependent effects to pitch contours are domain-general and not driven by linguistic categories. Unfortunately, another paper [50] arrives at opposite conclusions, that is, that the hemispheric specialization of auditory processing of tones is sensitive to language-specific factors. For the authors, precisely what kind of neural mechanisms underlie functional asymmetries in speech processing remains controversial. While some studies support speech-specific circuits, others suggest that lateralization is dictated by the relative computational demands of complex auditory signals in the spectral or time domains. To examine how the brain processes linguistically relevant spectral and temporal information, a functional magnetic resonance imaging study was conducted using Thai speech, in which spectral processing associated with lexical tones and temporal processing associated with vowel length can be differentiated. Ten Thai and ten Chinese subjects were asked to perform discrimination judgments of pitch and timing patterns presented in the same auditory stimuli under two different conditions: speech (Thai) and non-speech (hums). Tasks required, under the speech condition, included judging Thai tones (T) and vowel length (VL); under the non-speech condition, homologous pitch contours (P) and duration patterns (D). A remaining task required listening passively to non-speech hums (L). Only the Thai group showed activation in the left inferior prefrontal cortex in speech minus non-speech contrasts for spectral (T vs. P) and temporal (VL vs. D) cues. Thai and Chinese groups, however, exhibited similar fronto-parietal activation patterns in non-speech hums minus passive listening contrasts for spectral (P vs. L) and temporal (D vs. L) cues. According to these results, it appears that lower-level specialization for acoustic cues in the spectral and temporal domains cannot be generalized to abstract higher-order levels of phonological processing. Regardless of the neural mechanisms underlying low-level auditory processing, the authors’ findings clearly indicate that hemispheric specialization is sensitive to language-specific factors. Another specific question involves the nature of the perception of intonation. In particular, the question is whether the categorical perception of tones is learned or innate.

Neurological Evidence of the Phonological Nature of Tones

657

4.4 Learned Versus Innate

In terms of the categorical perception of tones, the results of an interesting paper by Burnham and Jones [51] indicate that the categorical perception of tone is to some extent learned. Categorical perception occurs when a physical continuum is perceived discontinuously. In speech, a vast body of research has shown that consonants are perceived categorically whereas vowels are perceived continuously. The other major phonetic unit of speech, lexical tone, has been relatively neglected in categorical perception studies. In the paper by Burnham and Jones [51], tonal language (Thai) and non-tonal language (Australian English) listeners are tested for their categorical perception of three artificial tone continua (mid-rise, mid-fall, rise-fall) in four different contexts, namely, speech, filtered speech, music, and sine waves. Thai listeners were found to perceive speech continua significantly more categorically than Australian English listeners, but for the non-speech continua perception was equivalent for the two language groups. However, this dependence of pitch discrimination on training contrasts with the results of other studies, such as Jakoby et al. [52], who demonstrate that training does not improve the pitch discrimination competence and, as a consequence, pitch discrimination belongs to the human linguistic competence. They explored the impact of perceptual training (auditory frequency discrimination), applying a carefully controlled intensive training experiment. The results were straightforward. No transfer was found to untrained tasks that rely on pitch discrimination, or to linguistic tasks that showed pretraining correlation. In terms of the possible categorical versus continued nature of intonation perception, in Subheading 4.5 we discuss the results of a study by a group of linguists and neurologists.

4.5 Categorical Versus Continued

A paper by Post et al. [53] raises the question of how to deal with the neural correlates of the categorical perception of intonation. By means of the resynthesis in Praat, single word utterances (neutral toponyms) with different tonal contours were created. Pairs of stimuli were presented to the participants in order to create tonal oppositions (gradual fall, sharp fall, gradual rise, sharp rise) and participants were asked to detect any difference in pitch contour between stimuli within each pair and to indicate whether stimuli within the pair were the same or different in pitch contour. Participants were told to distinguish between “linguistic” categories such as question/statement and between “paralinguistic” categories such as anger/surprise [53: 260]. The results of the experiments were analyzed according to both a non-parametric and a parametric statistical model. What is interesting is that the responses of the models are different. According to the non-parametric model, “linguistic” intonation activates both hemispheres; on the other hand, “paralinguistic” intonation still

658

Amedeo De Dominicis

activates both hemispheres, but only the superior temporal gyrus. According to the parametric model, “paralinguistic” intonation activates only the right hemisphere. In the study by Post et al. [53], participants were cued to make a forced choice identification response evaluating the interrogativity (linguistic responses) or the surprise (paralinguistic responses) signaled by some synthesized (vocal hums) stimuli: a non-parametric test is used because subjects are asked to answer the question “Is the F0 curve you hear an expression of a question or of a surprise?” and the answers are sorted on an ordinal numerical scale. However, the problem with this paper’s approach is that “interrogation” and “surprise” (or “anger”) are not linguistically comparable classifiers. The authors’ idea was that the first referred to the tone’s linguistic dimension, while the second to its paralinguistic dimension. In reality, they are profoundly heterogeneous and hence incompatible. In fact, “interrogation” is a word that classifies a state (specifically, a grammatically coded state), while “surprise” is the final state of a complex “narrative event” that contains not a simple word (“surprise”), but rather a narrative sequence: there is a fact/ state x with respect to which the speaker ( y) and his possible listener (z) are interested; there is an expectation (t) of x or y, or both; and then there is an event (w) that contradicts t.12 As one can see, the test fails because the “interrogation” and the “surprise” classifiers are not comparable. Moreover, there is another problem, concerning the notion of abstractness, in the case of these categories: if these categories were not abstract, then we should be able to increase the degree of “surprise” or “question” in this category in tandem with that of the rise of F0: but is it so? Does “more F0” correspond to “more surprise” or “more questioning?” And what does it mean? How to measure “more surprise?” That there is a discrepancy between the results of parametric and non-parametric statistical design is not surprising. The identification task on the so-called “paralinguistic” intonation asked participants what they thought of the words “surprise” or “anger,” and the results then provided information about the participants and their linguistic “ideologies,” not about the stimuli. In particular, the discrepancy between the parametric and non-parametric analysis of the results for this identification task shows that the neural correlates of human language processing are activated by intonation in terms of opposition or variation between levels of tone, but not in terms of absolute levels (H/L tones). In their experiment, linguistically interpreted stimuli activated a widespread network of sites including, as hypothesized, superior

12

The same goes for “anger,” as analyzed by Greimas [54].

Neurological Evidence of the Phonological Nature of Tones

659

and medial temporal areas bilaterally as well as a small cluster in the left inferior frontal gyrus overlapping with Broca’s area. The experiment confirmed that “linguistic” and “paralinguistic” intonation are differentially processed: responses in the linguistic condition were compatible with categorical perception, while those in the paralinguistic condition were typical for continua that are perceived gradiently. On the other hand, the results of the analysis based on a non-parametric design—compared with the different results of the parametric analysis—show that not the tonal categories but the discrimination between the tonal categories belongs to the universal characteristics of human language processing. In other words, the neural correlates of human language processing are activated by intonation in terms of oppositions or variation between levels of tone (and in terms of variation of the form of the F0 contour) but not in terms of absolute levels (e.g., H tone or L tone). In particular, analyses based on a non-parametric design consist of a baseline subtraction analysis, which was carried out to identify activation for intelligible speech in the experiment (“words”) as distinct from speech-like auditory input (“hums”); it revealed an activation of the speech processing system typically observed for auditory linguistic experiments involving higher-order phonological processing of speech, comprising large areas of activation in bilateral auditory temporal areas as well as clusters of activation in the left inferior frontal cortex overlapping with Broca’s area, in the left cerebellum, and in the right putamen. As for intelligible speech, analyses revealed widespread activations in the superior and medial temporal gyrus bilaterally for both conditions, but with more activation under the linguistic condition especially in the left hemisphere, extending further to the anterior and posterior regions of the superior and middle temporal gyri, and including the left inferior frontal gyrus, the perisylvian cortical areas, and the parietal regions as well as the putamen. Under the “paralinguistic” condition, activations were restricted to the superior temporal gyrus bilaterally. On the other hand, the analyses based on a parametric design explore the difference between the linguistic and paralinguistic functions when variation in form is factored out. By means of these analyses, as seen earlier, a wider network of activations was observed under the linguistic condition than under the paralinguistic condition, with bilateral middle temporal and right superior temporal activations, and parietal regions encompassing, on the left, an area at the interface between the temporal and parietal lobe and in the supramarginal gyrus, and on the right, an angular gyrus as well as a small cluster in the cerebellum. Under the “paralinguistic” condition, only right-hemispheric activations were found in the inferior frontal gyrus.

660

Amedeo De Dominicis

Thus, the non-parametric model shows that intonation activates a neural network in both hemispheres when intonation is “linguistic.” Conversely, when intonation is “paralinguistic,” the neural activation is always bilateral but restricted to the superior temporal gyrus. The results of the parametric model show that “paralinguistic” intonation activates only the regions in the right hemisphere.

5

Tonochrony In addition to the tonotopic methods, the latency of evoked responses may form a supplementary dimension for object encoding in the auditory system. There is a frequency dependence of latencies separate from stimulus intensity [15]. Furthermore, recent animal data have shown that the precision of temporally based neural representations declines from the periphery to the cortical regions, thus entailing different encoding strategies for slow and fast acoustic modulations [55]. This temporal mechanism of auditory encoding is known as the tonochrony principle: the latency of auditorily evoked components appears to be sensitive to some stimulus properties. The tonochronic methods measure the latency, that is, the “newness” of the acoustic stimulus, since the rapid responsiveness of the neurons prevents repetitive firing.13 Intrinsic brain rhythms are modified by attention to external events. This attention inhibits the so-called alpha rhythms, that is, more or less regular electric oscillations at a frequency of about 10 Hz. Electroencephalogram (EEG) and event-related potentials (ERP) are generally used to measure this latency of the brain’s electric waves when attention is focalized on a specific external stimulus. In situations where the individual must pay particular attention to a signal, these electrically negative components become larger. Conversely, if the individual is not paying attention, the component is smaller. The tonochronical methods are electroencephalogram (EEG), event-related potentials (ERP)14 (including mismatch negativity—

13 “The biophysical properties of neurons determine how synaptic currents are converted to voltage changes and over how long a time synaptic inputs are integrated. Octopus and bushy cells in the ventral cochlear nucleus are able to respond with exceptionally rapid and precisely timed synaptic potentials. These neurons have a prominent, low-voltage-activated K+ conductance that confers a low input resistance and rapid responsiveness and prevents repetitive firing” [56: 687]. 14 It is a non-invasive method of monitoring brain processes in real time and thus can be used to obtain electrophysiological evidence of brain functioning and psychological processes. Experimental psychologists and neuroscientists have discovered a variety of stimuli that elicit reliable and reproducible ERPs in participants in these studies. The latency measure is thought to measure the speed of communications in the brain or of the cortical processing time of information. ERP is currently interpreted as indicating a superior cognitive response to unexpected and/or to cognitively salient stimuli.

Neurological Evidence of the Phonological Nature of Tones

661

MMN—or mismatch field—MMF),15 and auditory evoked potentials (EPs) technique (or ABR = auditory brainstem response).16 MMN is the method used in [57]. The authors used the auditory MMN ERP to test the native perception of withincategory and across-category intonational contrasts between statement and question interpretations in Catalan. To that purpose, the authors mention a series of studies that have already shown that acoustic contrasts that cross a phonemic boundary lead to larger MMN responses than comparable acoustic contrasts that do not. They also observe that in this literature, tonal languages have successfully explored experience-dependent effects on the automatic processing of phonologically contrastive pitch [57: 844]. Thus, according to these studies, both in tonal and in intonational languages, acoustic tonal contrasts trigger the same MMN responses. This means, therefore, that both in tonal and in intonational languages, the tone generates the same tonochronical responses. We have already reviewed studies on the use of neurological methods to investigate the neural tonotopy of tones, and these studies on tonotopy provide different results: the neural sites for lexical tones in tone languages differ from the ones for intonation in intonational languages. This finding deserves an explication. Tonotopy and tonochrony give different results because MMN detects and measures the latency,17 that is, the attention of the speaker/listener toward the stimulus, not the intonational-based phonological distinctions indicating different meanings.18 In other words, [57] state that their results demonstrate the categorical perception of tone contrasts, but not the correlation between this categorization and a specific content counterpart: “The present experiment design does not allow us to draw any conclusions regarding the specific neural network supporting the across-category intonation contrasts observed here as enhanced MMNs, and therefore, we can only speculate” [57: 851]. ERP is the method used in [58]. This study used EEG-ERP experiment to verify the brain correlates of tonal focalization and of information contrast between New and Background. The starting

15

It is a component of the event-related potential (ERP) to an odd stimulus in a sequence of stimuli. This technique enables one to study the conditions and efficiency of the acoustic nerve. EPs are recorded by means of electrodes on the head to measure the brain’s electrical activity related to certain stimuli. The most important measurement parameter is the latency between stimulus and potential. This technique does not serve to identify the cerebral locus where the neural base of a certain acoustic stimulus is located. 17 The amplitude of the MMN components is investigated to predict the rate of deviance between standard and deviant stimuli. 18 “The abovementioned MMN results and its magneto-encephalographic counterpart on intonational discourse contrasts could be interpreted as detections of acoustic changes in the stimuli and remain far from signaling intonationally based phonological distinctions indicating different meanings” [58: 844]. 16

662

Amedeo De Dominicis

point is in the prior literature, according to which it is not acoustic highlighting alone that leads to the perception of an element’s prominence, but also expectations derived from the listener’s knowledge about the linguistic structure of a language. In short, measuring these expectations requires an appropriate technique, and this technique is precisely EEG-ERP. In particular, four types of focus are analyzed: First Occurrence Focus (FOF), Second Occurrence Focus (SOF), Quasi Second Occurrence Focus (Quasi-SOF), and Background (BG). In addition, two types of information positions are studied: New and Given. The authors state that The crucial role of newness (or, more generally, “information status”) for speech processing also becomes obvious when comparing Quasi-SOF with the other conditions. Quasi-SOF triggers an ERP effect over posterior regions (negativity between 400 and 650 ms after target word onset) which is more pronounced than for BG and SOF but less pronounced than for FOF. The reason for this intermediate status may lie in the fact that the Quasi-SOF target words are lexically new but at the same time referentially given. BG and SOF items are both referentially and lexically. [58: 22]

Thus, the authors conclude that the brain’s response depends on both lexical and tonal information. In other words, this analysis accounts not for the neural effects of tones on the brain, but rather for the sum of tonal and lexical effects of focalization and information status on the brain. Thus, this paper cannot be considered a neurological demonstration of a possible phonological status of tones or intonation: it is a good investigation of the neural correlates of the dynamics of the textual information (newness) and of focalization. EEG is the method used in [1]. English and Chinese subjects had to listen to a read text and answer questions that required categorizing the intonational “events” contained in the text itself. This study, in which the analysis of tones is correlated with electroencephalogram (EEG), found that the optimal neural discrimination between pitch accent categories emerged at latencies between 100 and 200 ms. Thus, it captures how selective attention influences the neural processing of pitch accent categories. In short, it refers to the tonochrony principle.19 We decided not to account for these tonochronic approaches because we were unable to directly

19

“Interestingly, while native English speakers exhibited more robust processing than native Chinese speakers at latencies shorter than 200 ms, Chinese native speakers exhibited more robust processing than native English speakers at latencies longer than 200 ms. Because the neural discrimination of pitch accents was still better at shorter latencies in both groups, group differences at longer latencies could be the result of additional top-down processing recruitment by native Chinese listeners, presumably to compensate for perceptual difficulties associated with the processing of non-native pitch accent categories at shorter latencies” [1: 10]. This means that non-native speakers (in particular, the Chinese listening to stimuli in English) must wait longer and pay more “attention” to non-native stimuli. In other words, by measuring latency, we primarily measure the subjects’ attention.

Neurological Evidence of the Phonological Nature of Tones

663

consider that every linguistic object on which the speaker/listener focalizes his/her (selective) attention is then a phonological unit. And we wish to explain this point. Speakers and listeners do not pay attention and thus do not recognize all phonetic units or all phonological units. For instance, there exist allophones (i.e., non-phonological units) and we can view them on a spectrogram, even if both speaker and listener do not pay attention to or have a mental representation of them. For instance, in Italian the word anche (“also”) is realized as [ˈaŋke], where the nasal consonant is velarized before the velar obstruent; but as the allophone [ŋ] is not a phoneme of Italian, speakers/ listeners have neither a phonological category for it, nor a phonetic awareness of it. Likewise, in some American English varieties [59: 277], the diphthongs [aI] and [ɑ·ɪ] are allophones, since they are in complementary distribution: the former occurs before unvoiced consonants, the latter before voiced consonants. Moreover, in an intervocalic context, the phonological opposition /t/~/d/ is neutralized and is realized as a flap [ɾ] (e.g., write [ˈraɪt] and ride [ˈrɑ·ɪd], but writer [ˈraɪɾər] and rider [ˈrɑ·ɪɾər]). The only phonetic difference between writer and rider concerns the vowels, but this difference is not phonologically pertinent, since these diphthongs are allophones; the phonological opposition /t/~/d/ should draw the attention of the speaker/listener, but it is phonetically canceled. Thus, the speaker’s (and listener’s) attention is focused on a phonetic event and not on a phonological opposition. In short, we observe that paying attention to a stimulus is not an unequivocal proof of the phonological status of this stimulus.

6

Abstractness Abstractness is of course the key property that allows linguistics to account for several different empirical facts. From this standpoint, it is not only highly recommended but also required. If a given category is abstract, then it is also empirical because it accounts for many linguistic facts. However, we do not criticize here the concept of (abstract) “competence,” which, as such, is a specific example of the indisputable role of every scientific model; in this particular case, the (abstract) models are obviously necessary in linguistics as in any other scientific discipline. On the contrary, we refer to how the categories of the model are organized if that model intends to be subjected to experimental tests of a neurological kind. In this case, the above categories must be homogeneous to those used in neurological experimental models. Thus, if a model of the mind works by measuring (increasing or decreasing) quantities of glucose, or hemoglobin (or any other quantity), then the categories of linguistic models must also be compatible, that is, structured as

664

Amedeo De Dominicis

measurable continua. Otherwise, if on varying a neurological category x (e.g., an increase in glucose to neuron x) one cannot match a corresponding and proportional variation of a language category y (e.g., the perception of a tone as higher, or even better, the categorical perception of a tone such as H or L), then it would be impossible to establish a causal nexus between x and y. A possible example of this kind of measurable categories is voice onset time (VOT), such as it was structured into three quantitative thresholds by Ladefoged [60], and these thresholds were correlated with three phonological categories: voiced, unaspirated unvoiced, and aspirated unvoiced consonants.

7

Discussion In this contribution, we reviewed the neurophysiological methods currently being applied to questions in the field of linguistic tone. After scrutinizing neurophysiological data obtained using different methodologies, we are able to critically assess the current state of neurolinguistic experiments from the standpoint of their methodological issues. We have methodologically evaluated the neurolinguistic literature on the relationship between intonation and brain, and have pointed out that the assumption of a neurological basis for the linguistic tone is not yet supported by reliable evidence. The results of the review are controversial. As we have shown, one controversial point is the question of the localization of neural processes activated by linguistic tone in both hemispheres or in one of them: one finds neurological evidence supporting the activation of the left hemisphere, but also neurological evidence supporting the activation of both hemispheres (Subheading 4.1). Another point of contention is the question of the brain correlates of intonation: some data account for the absence of brain correlates of intonation (Subheading 4.2). The question of whether intonation depends on performance or competence remains to be clarified: some data account for the dependence of intonation on linguistic performance, but also for its dependence on the linguistic system (Subheading 4.3). Moreover, some experimental results support the innatism of intonation, while others, on the contrary, support the learned nature of intonation processing (Subheading 4.4). Finally, no conclusive results are found even regarding the question of the categorical nature of intonation: some evidence supports the existence of neural correlates of categorical perception of intonation, but these correlates are activated in terms of opposition among levels of tone, not in terms of absolute levels (Subheading 4.5).

Neurological Evidence of the Phonological Nature of Tones

665

Accordingly, the neural understanding of linguistic tone raises fundamental questions. Actually, neural and linguistic entities appear not to be directly commensurable. Both tonotopic and tonochronic principles fail to account for the phonological tone representation theorized by linguists. It is clear that we cannot immediately reduce linguistic categories to neural processes. Linguistics and neuroscience adopt different primitives and epistemologies. Filling the gap demands taking a common path. In particular, a neurolinguistic experimental test requires a common epistemological and methodological set for both approaches. Indeed, some of these incoherent results depend on inhomogeneous methodological choices. For instance, ERP (and tonochronic) methods primarily measure the attention of the patient to a stimulus more than its neural categorization (Subheading 5), whereas, conversely, tonotopic methods measure the metabolic “hunger” of a given patient’s brain area during the encoding of a given stimulus, hoping that this localizes its categorization. However, we believe that it is not methodologically correct to refer to an axiomatic choice as evidence to be handed over to other disciplines. In other words, before beginning an experimental search for neural correlates, one should understand what we are looking for. Moreover, particularly in the case of a linguistic theory of tone, one should begin by defining what is “concrete” (not only abstract), hence measurable, in a linguistic tone. The notion of abstractness in linguistics (in the sense described in Subheading 6) is perhaps the primary obstacle to be overcome in order to test the possible neurological basis of its units. The experimental variables cannot be abstract, as this risks invalidating the experiment’s results. In Subheading 8, we suggest a possible solution.

8

Conclusions and Further Suggestions Some papers reported here state that the intonational marks of the focus are phonological. Nevertheless, to assess whether this statement is true, one must verify not only that the speakers use them (description of the object), but also that, if they do not use them or use different ones, then the perception of the focus does not exist or changes location (prediction model). This passage from description to prediction can be accomplished by having recourse to the speech synthesis and to a consequent perceptual experiment. Still, there would remain the problem of establishing a metrics (what does more or less “interrogativity,” more or less “surprise” mean?). In this regard, one could neurolinguistically analyze the “error” voluntarily entered in a synthetic voice that uses a tonal characteristic different from that elicited in the observation of speakers. In this case, the analysis should use the ERP method.

666

Amedeo De Dominicis

For example, let us suppose speakers of a language x produce the focus by means of a pitch accent—let us call it—H. The synthetic voice could instead produce a sentence with tone L on the constituent in focus. At this point, the sentence should be presented to n listeners belonging to language x and their ERP observed. The experimental expectancy is that the tonal “error” should trigger a brain response diverging from what was expected. The connection between explanations of errors and theories of normal performance is very strong within production studies. In terms of linguistic understanding, on the other hand, the concept of error has expressed its usefulness mainly in the field of neurolinguistics. Neuro-imaging studies, in fact, often make use of experimental paradigms that have recourse to the comparison between a control condition (which is assumed to require “normal” information processing) and one or more conditions that contain violations or anomalies at the phonological, semantic, or syntactic level. The assumption behind these experiments is that processing violations results in increased activations in the same brain areas involved in normal processing, since the anomaly requires more attention and resources to integrate. Furthermore, differences in responses to different types of anomalies are assumed to reflect differences in the processing of different linguistic components. This experimental approach has been used in particular to isolate the neural correlates of syntax and semantics (e.g., [61–63]), and these results have also been validated by electrophysiological techniques, such as event-related potentials (ERP), which enable one to detect the temporal course of cerebral electrical activity [64]. References 1. Llanos F et al (2021) The neural processing of pitch accents in continuous speech. Neuropsychologia 158:1–12, Article 107883 2. Jun S-A (ed) (2005) Prosodic typology: the phonology of intonation and phrasing. Oxford University Press, Oxford 3. Pierrehumbert JB (1980) The phonology and phonetics of English intonation. MIT dissertation (Published 1988, UILC, Bloomington) 4. Beckman ME, Pierrehumbert J (1986) Intonational structure in English and Japanese. Phonol Yearb 3:255–310 5. Silverman KEA et al (1992) ToBI: a standard for labeling English prosody. In: Proceedings of the second international conference on spoken language processing (ICSLP 92). Banff, Alberta, Canada, pp 867–870. ISCA Archive http://www.isca-speech.org/archive/icslp_1 992 6. Ladd DR (2008) Intonational phonology, 2nd edn. Cambridge University Press, Cambridge

7. ‘t Hart J (1979) Explorations in automatic stylization of F0 curves. IPO Annu Prog Rep 14:61–65 8. ‘t Hart J, Collier R, Cohen A (1990) A perceptual study of intonation. Cambridge studies in speech science and communication. Cambridge University Press, Cambridge 9. Taylor P (2000) Analysis and synthesis of intonation using the Tilt model. J Acoust Soc Am 107(3):1697–1714 10. Fujisaki H (1983) Dynamic characteristics of voice fundamental frequency in speech and singing. In: MacNeilage PF (ed) The production of speech. Springer-Verlag, New York, pp 39–55 11. Fujisaki H, Ohno S, Wang C (1998) A command-response model for F0 contour generation in multilingual speech synthesis. In: The third ESCA/COCOSDA workshop (ETRW) on speech synthesis. Jenolan Caves

Neurological Evidence of the Phonological Nature of Tones House, Blue Mountains, NSW, Australia, November 26–29, 1998, pp 299–304 12. Romani GL, Williamson SJ, Kaufman L (1982) Tonotopic organization of the human auditory cortex. Science 216:1339–1340 13. Moerel M, De Martino F, Formisano E (2014) An anatomical and functional topography of human auditory cortical areas. Front Neurosci 8:1–14, Article 225 14. Saenz M, Langers DRM (2014) Tonotopic mapping of human auditory cortex. Hear Res 307:42–52 15. Roberts TP, Poeppel D (1996) Latency of auditory evoked M100 as a function of tone frequency. Neuroreport 7:1138–1140 16. Damasio AR, Damasio H (1992) Brain and language. Sci Am 267(3):88–95 17. Damasio AR, Damasio H (2000) Language and the brain. In: Emmorey K, Lane HL (eds) The signs of language revisited: an anthology to honor Ursula Bellugi and Edward Klima. Lawrence Erlbaum, Mahwah, pp 405–416 18. Neville HJ et al (1997) Neural systems mediating American sign language: effects of sensory experience and age of acquisition. Brain Lang 57(3):285–308 19. Gandour J et al (2000) A crosslinguistic PET study of tone perception. J Cogn Neurosci 12(1):207–222 20. Albouy P et al (2020) Distinct sensitivity to spectrotemporal modulation supports brain asymmetry for speech and melody. Science 367(6481):1043–1047. https://doi.org/10. 1126/science.aaz3468 21. Xi J et al (2010) Categorical perception of lexical tones in Chinese revealed by mismatch negativity. Neuroscience 170(1):223–231 22. Kwok V et al (2019) Neural correlates and functional connectivity of lexical tone processing in reading. Brain Lang 196:104662. https://doi.org/10.1016/j.bandl.2019. 104662 23. van der Burght CL et al (2019) Intonation guides sentence processing in the left inferior frontal gyrus. Cortex 117:122–134, ISSN 0010-9452. https://doi.org/10.1016/j.cor tex.2019.02.011 24. Gorelick PB, Ross ED (1987) The aprosodias: further functional-anatomical evidence for the organisation of affective language in the right hemisphere. J Neurol Neurosurg Psychiatry 50(5):553–560 25. Ross ED et al (1981) How the brain integrates affective and propositional language into a unified behavioral function: hypothesis based on clinicoanatomic evidence. Arch Neurol 38(12): 745–748

667

26. Ross ED, Mesulam M-M (1979) Dominant language functions of the right hemisphere: prosody and emotional gesturing. Arch Neurol 36(3):144–148 27. Ross ED, Thompson RD, Yenkosky J (1997) Lateralization of affective prosody in brain and the callosal integration of hemispheric language functions. Brain Lang 56(1):27–54 28. Guranski K, Podemski R (2015) Emotional prosody expression in acoustic analysis in patients with right hemisphere ischemic stroke. Neurol Neurochir Pol 49(2):113–120 29. Patel S et al (2018) Right hemisphere regions critical for expression of emotion through prosody. Front Neurol 9:224. https://doi.org/10. 3389/fneur.2018.00224. eCollection 2018 30. Leung JH et al (2017) Affective speech prosody perception and production in stroke patients with left-hemispheric damage and healthy controls. Brain Lang 166:19–28 31. Jiam NT et al (2017) Voice emotion perception and production in cochlear implant users. Hear Res 352:30–39 32. Moriarty PM et al (2018) Comparing theory, consensus, and perception to the acoustics of emotional speech. J Acoust Soc Am 144(3): 1841–1841 33. Stoop TB et al (2018) Children’s ratings of vocal emotion intensity depend on the emotion spoken and speaker familiarity but not acoustic parameters. J Acoust Soc Am 144(3): 1965–1966 34. Emmorey KD (1987) The neurological substrates for prosodic aspects of speech. Brain Lang 30(2):305–320 35. Ryalls J, Joanette Y, Feldman L (1987) An acoustic comparison of normal and righthemisphere-damaged speech prosody. Cortex 23(4):685–694 36. Behrens SJ (1989) Characterizing sentence intonation in a right hemisphere-damaged population. Brain Lang 37(2):181–200 37. Ouellette GP, Baum SR (1994) Acoustic analysis of prosodic cues in left- and righthemisphere-damaged patients. Aphasiology 8(3):257–283 38. Gandour J et al (1995) Speech prosody in affective contexts in Thai patients with right hemisphere lesions. Brain Lang 51(3): 422–443 39. Balan A, Gandour J (1999) Effect of sentence length on the production of linguistic stress by left-and right-hemisphere-damaged patients. Brain Lang 67(2):73–94 40. Baum SR et al (2001) Using prosody to resolve temporary syntactic ambiguities in speech

668

Amedeo De Dominicis

production: acoustic data on brain-damaged speakers. Clin Linguist Phon 15(6):441–456 41. Gandour J, Baum SR (2001) Production of stress retraction by left- and right-hemispheredamaged patients. Brain Lang 79(3):482–494 42. Hird K, Kirsner K (2003) The effect of right cerebral hemisphere damage on collaborative planning in conversation: an analysis of intentional structure. Clin Linguist Phon 17(4–5): 309–315 43. Ross ED, Monnot M (2008) Neurology of affective prosody and its functional-anatomic organization in right hemisphere. Brain Lang 104(1):51–74 44. Yang S-Y, Van Lancker Sidtis D (2016) Production of Korean idiomatic utterances following left- and right-hemisphere damage: acoustic studies. J Speech Lang Hear Res 59(2):267–280 45. Wright A et al (2018) Selective impairments in components of affective prosody in neurologically impaired individuals. Brain Cogn 124:29– 36 46. Wang Y et al (2004) The role of linguistic experience in the hemispheric processing of lexical tone. Appl Psycholinguist 25(3): 449–466 47. Inouchi M et al (2002) Neuromagnetic auditory cortex responses to duration and pitch changes in tones: cross-linguistic comparisons of human subjects in directions of acoustic changes. Neurosci Lett 331(2):138–142 48. Inouchi M et al (2003) Magnetic mismatch fields elicited by vowel duration and pitch changes in Japanese words in humans: comparison between native- and non-speakers of Japanese. Neurosci Lett 353(3):165–168 49. Chandrasekaran B, Ananthanarayan K, Gandour JT (2009) Relative influence of musical and linguistic experience on early cortical processing of pitch contours. Brain Lang 108(1): 1–9 50. Gandour J et al (2002) A cross-linguistic fMRI study of spectral and temporal cues underlying phonological processing. J Cogn Neurosci 14(7):1076–1087 51. Burnham D, Jones C (2002) Categorical perception of lexical tone by tonal and non-tonal language speakers. In: Proceedings of the 9th Australian international conference on Speech Science & Technology. Australian Speech

Science & Technology Association Inc, pp 515–520 52. Jakoby H et al (2019) Auditory frequency discrimination is correlated with linguistic skills, but its training does not improve them or other pitch discrimination tasks. J Exp Psychol Gen 148(11):1953–1971 53. Post B et al (2015) Categories and gradience in intonation: a functional Magnetic Resonance Imaging study. In: Romero J, Riera M (eds) The phonetics/phonology interface: sounds, representations, methodologies. John Benjamins, Amsterdam, pp 259–284 54. Greimas AJ (1981) De la cole`re. E´tude de se´mantique lexicale. Actes se´miotiques. Documents du GRS-l 27:9–27 (now in Du sens II. Essais se´miotiques. Seuil, Paris, 1983) 55. Wang X (2007) Neural coding strategies in auditory cortex. Hear Res 229:81–93 56. Kandel ER et al (eds) (2013) Principles of neural science, 5th edn. McGraw-Hill, New York 57. Borra`s-Comes J et al (2012) Specific neural traces for intonational discourse categories as revealed by human-evoked potentials. J Cogn Neurosci 24(4):843–853 58. Baumann S, Schumacher PB (2020) The incremental processing of focus, givenness and prosodic prominence. Glossa 5(1):1–30 59. Chomsky N, Miller GA (2001) Introduction to the formal analysis of natural languages. In: Kreidler CW (ed) Phonology. Critical concepts, vol 1. Routledge, London and New York, pp 238–286 60. Cho T, Ladefoged P (1999) Variation and universals in VOT: evidence from 18 languages. J Phon 27:207–229 61. Embick D et al (2000) A syntactic specialization for Broca’s area. Proc Natl Acad Sci USA 97(11):6150–6154 62. Ni W et al (2000) An event-related neuroimaging study distinguishing form and content in sentence processing. J Cogn Neurosci 12(1): 120–133 63. Moro A et al (2001) Syntax and the brain: disentangling grammar by selective anomalies. NeuroImage 13(1):110–118 64. Friederici AD (2002) Towards a neural basis of auditory sentence processing. Trends Cogn Sci 6(2):78–84

Chapter 21 Neurophysiological Underpinnings of Prosody Silke Paulmann Abstract Prosody, that is meaningful patterns in intonation, rhythm, stress, and tone, impacts on a large body of other language operations, and it is likely one of the most undervalued and possibly understudied languagerelated functions. Drawing primarily upon evidence from event-related brain potentials (ERPs) but also referring to neural oscillation activity where possible, this chapter offers a concise review of the electrophysiological responses underlying prosody, how prosody impinges on other language functions, summarizes the effects of task demands and listeners characteristics, and ends by sketching future directions of travel for the field of studying how prosody is processed in the brain. Key words Emotional prosody, Tone of voice, Affective prosody, Social intonation, ERPs, Neural oscillations

1

Introduction The opinion that “texting is a brilliant way to miscommunicate with others” is an excellent example of how important prosody—that is the intonation pattern, stress, pace, rhythm, and tune of our speech—really is. When modulating features on the auditoryvocal dimension, for example, by raising or lowering the fundamental frequency we speak in, using a loud or quiet voice, speaking fast or slow, adding or removing pauses, and coloring our voice with a harsh or clear quality, we help others interpret the message we are trying to convey (Table 1 lists commonly measured acoustic features). Naturally, when texting these features (i.e., perceived pitch, loudness, speech rate, voice quality) are missing and, even when trying to mimic them (e.g., using all capital letters in text to shout at someone), they lose some of their power in the transition from speech to text. Given the power these cues have, it is not surprising that the study of prosody is not limited to linguists any longer: psychologists, computer scientists, musicians, teacher trainers, computing modelers and others have taken up the task to

Mirko Grimaldi et al. (eds.), Language Electrified: Principles, Methods, and Future Perspectives of Investigation, Neuromethods, vol. 202, https://doi.org/10.1007/978-1-0716-3263-5_21, © Springer Science+Business Media, LLC, part of Springer Nature 2023

669

670

Silke Paulmann

Table 1 Most commonly used acoustic features to describe prosodic realization of stimuli Domain

Measure

Description

Pitch

Fundamental frequency (F0)

Perceived as pitch, F0 measures reflect the vibration rate (open/ closing) of the vocal folds. Can be measured as mean, maximum, minimum, standard deviation, and range. Most often measured in Hertz (Hz) or semitones. Refers to small frequency-related perturbations that are caused by fluctuations of opening and closing vocal folds. The pitch perturbations are often perceived as roughness. Usually measured as percentage contained within a signal.

Jitter

Intensity Intensity

Refers to the energy present in the signal. It is perceived as loudness and linked to the effort used by the speaker to produce the signal. Usually measured in decibel (dB). Refers to amplitude (loudness) perturbations, most often perceived as roughness. Frequently measured as percentage contained within a speech signal.

Shimmer

Voice Formant bandwidth quality Formant bandwidth Harmonics-to-noise ratio (HNR) High-frequency energy Tempo

Speech rate Duration

Describes the frequency ranges amplified in formants. It affects the listener’s vowel quality perception. Describes the frequency ranges amplified in formants. It affects the listener’s vowel quality perception. Ratio between signal power in periodic and noise parts of the speech segment, can signal breathiness Indicates the proportion of energy present in specific frequency regions (e.g., above/below cut-off points. An increase in highfrequency energy leads to the perception of harsher, less soft voice). Indicates how fast or slow a speaker talks, can be measured as units (e.g., words, syllables) per duration. Overall length of the signal, usually including pauses (silence) in speech.

unravel the contribution prosody makes to social interaction. The fast-growing body of literature is, at times, difficult to keep track of. Prosody serves many functions in social interactions. It can help to differentiate between linguistic acts (e.g., statement vs. question), conveys lexical meaning (e.g., CONtent vs conTENT), and can help parse information (e.g., end of sentence intonation). However, it also serves what some researchers have coined an “affective” contribution as it allows listeners to infer how the speaker feels (e.g., lowered voice paired with low pitch might indicate sadness) or what his/her attitude is toward certain things (e.g., hesitantly saying “this plane is safe” will convey quite the opposite meaning). In short, prosody is a complex but powerful tool in social interactions that often fulfills multiple communicative functions at once. This chapter surveys research on the neurophysiological mechanisms underlying prosody’s linguistic and non-linguistic, affective functions. It will be of particular interest

Neurophysiology of Prosody

671

to illuminate the time-course linked to affective and non-affective prosody processing given that findings might not only help to shed light on the debate whether the two functions are in- or interdependent, but would also allow further specifying the multi-layered nature of prosody processing to advance brain-based models of prosody. 1.1 Electrophysiological Markers of Linguistic Prosody Processing

The advantage of using electrophysiological methodologies to track the brain’s response to prosodic events is clear: speech unfolds over time (and so do prosodic realizations) and methods that allow monitoring responses on a millisecond-by-millisecond basis provide the opportunity to fully sketch when listeners attend to and make use of available information. Indeed, prosody markers as indexed through event-related brain potentials (ERPs, as measures with electroencephalograph [EEG]), have identified multiple timepoints in processing that are linked to different prosodic processes. It seems that listeners engage in online prosody processing as soon as speech is presented to them; for instance, [1] reported that the so-called closure positive shift (CPS), a marker of prosodic boundaries, is elicited quickly after the off-set of a pre-boundary word. Listeners are argued to use information such as pre-boundary syllable lengthening to infer how to best syntactically parse sentences (e.g., [2]). The importance of this process is reflected in evidence that suggests that listeners engage in this process even when they have no idea what the speaker is saying, for example, when listening to de-lexicalized speech [3], but also when reading silently themselves (e.g., [4]). In other words, prosodic phrasing is crucial in a variety of circumstances and helps establish syntactic structure. Foti and Roberts [5] demonstrated that listeners also quickly (~100 ms) anticipate turn-taking mechanisms in conversations. Specifically, they showed that the so-called stimulus preceding negativity (SPN) is longer, or more sustained, in instances where delayed turn-taking occurred between conversation partners. That is, listeners overhearing a conversation anticipated when the next speaker was about to say something; when this prediction was not met, more continued processing of this moment resulted in a prolonged SPN. Li et al. [6] have explored how listeners react to prosodic boundary violations. They report a fronto-centrally distributed negative ERP in response to prosodic prominence manipulations. Shortly (~270 ms) after the encounter of an unexpected prominence cue, listeners seem to engage in enhanced processing of manipulated materials, suggesting that expectations around prosody are quickly built up. Indeed, Bo¨cker et al. [7] report elicitation of the only slightly later occurring N325 in response to extracting metrical stress, once more outlining that different features of prosody are used by listeners at slightly different time-points.

672

Silke Paulmann

In line with the assumption that expectations around prosodic realizations are quickly build up by listeners and that these can be used in a top-down fashion (e.g., [8]), [9] report that violating such expectancies (e.g., starting out a sentence as a statement but ending it as a question) elicit a frontally distributed prosodic expectancy positivity (PEP). Within roughly 600 ms after the onset of prosodic violations, listeners respond to unexpected, abrupt changes in prosodic contour. Similarly, [10] investigated responses to prosodic incongruities of sentences’ final words; they report a centro-posterior P600 effect, also suggesting that prosodic incongruities are processed around this time-frame. The effects of prosody on syntax processing cannot only be demonstrated on the sentence level. For instance, [11] showed that lexical prosody has an effect on how listeners decompose German compound words, again illustrating the impact prosody has on other language functions, in this case morphological processing. Similarly, [12] report that intonation use of speakers helps listeners predict upcoming grammatical structures. In their study, they manipulated how word accents on stems can cue upcoming suffixes. Results revealed an early left anterior negativity (ELAN) followed by a P600 in response to incongruent stem and suffix pairs. Similar to [11], these results emphasize how prosody can impact on morphological processes. Finally, another late positive ERP (P800) component was reported by [13] in response to linguistic prosodic expectancy violations. Specifically, they cross-spliced statements and questions and asked participants to either focus on prosodic or lexicalsemantic attributes of the stimuli. The P800 emerged in response to expectancy violated materials only when the spotlight was put on prosodic features of stimuli, but not when participants attended to lexical-semantic features. Collectively, these findings reveal that listeners detect deviances from expected contours very quickly, suggesting that they use this information during online speech comprehension. Indeed, converging evidence for the importance of change detection comes from studies investigating oscillatory neural activity. In broad terms, the frequency of oscillations is directly dependent on the amount of simultaneously activated neurons. The higher the number of neurons activated, the slower the frequency of oscillations. Changes in processing (e.g., sensory stimulation) can lead to event-related desynchronization (ERD) or event-related synchronization (ERS; see, e.g., [14]). Auditory change detection (e.g., oddball paradigms where a standard stimulus is presented only to be intermittently replaced by a deviant) has been reported to lead to increased ERS in the theta band, an effect amplified by stimulus salience (i.e., the more salient the acoustic difference, the stronger the ERS; [15]). One question that remains to be explored in future studies is how much attention listeners have to pay to prosodic aspects of speech in order to maximize “gain” from its contribution. While

Neurophysiology of Prosody

673

some studies (e.g., [9, 10]) indicate that prosodic aspects are processed even when listeners are not instructed to pay special attention to them, others (e.g., [13, 15]) seem to suggest that during the task the listeners’ focus needs to be on prosody in order to detect discrepancies. Either way, the combined results highlight the contribution of prosody when interfacing with other language functions including semantics and syntax. It should be noted that [16] have argued that the impact prosody has on syntactic processes may depend on which other cues are available at the time of processing. In their study on argument integration, they explored the interaction between prosody and verb information. Results revealed that for materials that contain ambiguous verbs, prosody use is crucial to help with argument integration processes. In contrast, when speakers use unambiguous verbs, prosody use is limited in listeners for the process of argument integration. Crucially, the time point at which prosody is used to help in this process differs depending on what other linguistic cues are available (i.e., early effects are found when ambiguous verbs are used as stimuli, and slightly later effects are reported for unambiguous verbs). Overall, combined results nicely showcase how multi-facetted linguistic prosody processing is. Given the differing temporal dynamics associated with different functions of linguistic prosody processing, it will be exciting to further explore differing neural mechanisms linked to each of these functions in future research. Moving away from linguistic prosody processing, the next section will focus on how affective attributes of prosody are processed online. 1.2 Electrophysiological Markers of Non-linguistic Affective Prosody Processing

Similar to linguistic prosody processing, the time-course associated with emotional and attitudinal prosody processing has been further illuminated over the past two decades. Different ERP components have been linked to different sub-tasks of emotional prosody processing (c.f. [17, 18]). Initially, acoustic features that may form part of signaling specific emotional intentions, such as pitch height or loudness, need to be extracted rapidly. A negative ERP component peaking at around 100 ms after stimulus onset, the N100, is linked to the extraction of these cues [19]. The following component, the fronto-centrally distributed P200 seems to be responsive to arousal [20] as well as emotional (e.g., [21–23]) features of speech. Even when speech is not attended to at all, the time-course associated with early emotional prosody processing seems to remain stable. Specifically, a mismatch negativity (MMN) component with a similar time stamp has been argued to reflect emotional category change detection under pre-attentive processing conditions (e.g., [24–26]). Both MMN and P200 seem to be linked to what is known as emotional salience detection (e.g., [17, 18, 21]). This idea receives support from a recent study [27] reporting larger MMN amplitudes in response to French vowels expressing fear as

674

Silke Paulmann

opposed to the same vowels expressing happiness or sadness. Similar results have been reported for Chinese pseudo-syllables spoken in emotional or neutral prosody. Thus, the most salient stimuli are automatically attended to within a very short burst of time (i.e., less than 200 ms after auditory onset). While the direction of effects for the MMN is always the same, that is, we find a more pronounced MMN for deviant stimuli reflecting enhanced attention allocation, the picture is more diverse for P200 findings. Part of this can be attributed to the fact that the component reflects processing of both, continued sensory stimulation, as well as attending to and processing emotional connotation (c.f. [28]). In other words, the component is linked to a combination of important sub-processes. Hence, it may not come as a surprise that some studies report neutral stimuli to elicit the strongest P200 amplitude (e.g., [21]), while others report more enhanced P200 components for emotional stimuli when compared to neutral (e.g., [29]). As prosodic realizations of materials differed across studies (and some used real sentences, while others used pseudo-sentences), it seems clear that the modulation of the component is closely linked to stimulus features such as pitch [30], loudness level [31], and possibly voice quality (see, e.g., [32] for evidence that timbral features of speech convey emotional quality). Yet, (emotional) meaning evaluation is likely to contribute to processing at this early time point, too. For instance, [33] report emotional priming effects for voice snippets of 200 ms long duration. That is, they found that listeners can extract emotional meaning from very short voice samples which then prime processing of subsequently presented facial expressions (also see below). Similar behavioral findings are reported by [34], suggesting a rapid timecourse underlying emotional vocal expression recognition. Thus, leaving direction of amplitude modulations aside, the crucial takehome message from these studies seems to be that different emotional intonation patterns can be distinguished from neutral tone of voice (e.g., [21–23, 35]) as well as from one another [20]. That is, research suggests that listeners can infer how others feel irrespective of how closely their speech may match a “prototypical” configuration of acoustic cues. In addition to ERP and behavioral findings suggesting that emotionally intoned stimuli capture the attention of listeners within 200 ms, there is also evidence from oscillation studies. For example, [36] report stronger delta synchronization within the time window of 200–400 ms when participants listen to angry sounding words compared to neutral sounding words. They interpret their data to link to immediate attention allocation to emotional stimuli [36]. Collectively, findings currently suggest that listeners can consult a large repertoire of similar, but not necessarily identical, acoustic cue combinations for different speakers. These combinations are then used to first establish if the speaker is conveying an emotion, motivation, attitude, or similar

Neurophysiology of Prosody

675

social intention (as opposed to neutral) and then undergoes a more thorough evaluation to establish how a speaker is really feeling (see paragraph below). How exactly this process is modulated in the brain awaits further testing and will be exciting for future studies to show. Following the early appraisal and potentially tagging of emotional speech, more thorough and elaborated processes linked to affective prosody processing have been revealed. Specifically, processes related to meaning evaluation and assessment of situational context cues seem to be processed within roughly 300–500 ms after speech onset. A number of different ERP components have been linked to such processes, including the N300 (e.g., [37]), P300 (e.g., [38]), or the N400 (e.g., [33, 39–41]). Interestingly, the interpretation of the associated functions of each component are quite similar and can overlap; however, differences in stimuli presentation and task foci seem to determine which component is triggered. For example, [37] explored how congruent and incongruent prosody impacts on emotional exclamation (e.g., “Wow!”, “Oooh”) processing and report enhanced N300 amplitudes in response to incongruous exclamations. In contrast, [41] used emotional words (e.g., “happy,” “angry”) in a similar paradigm and reported enhanced N400 components in response to incongruencies. Thus, while both studies showed that listeners not only extract emotional prosodic meaning quickly, but try to integrate it with lexical information, incongruencies in short exclamations seem to be processed more rapidly than incongruencies in words. Based on these results, one can hypothesize that the salience of incongruency plays a role in determining the speed of processing. Leaving specificities aside, the combined results highlight how quickly emotional prosody is integrated with other language functions, that is, within 300–400 ms after speech onset. This speed of process then raises the question of how much information listeners really need in order to infer the emotional connotation of speech. If saliency is detected within 200 ms after speech onset, and prosody and semantics are integrated within 300–400 ms after speech onset, how much information do listeners need to have received before knowing how the speaker feels? As mentioned previously, [33] explored this issue in a sentence fragment priming paradigm. They played 200 or 400 ms long snippets of speech followed by either matching or mismatching facial expressions. While both short and longer snippets led to priming effects of facial expressions, the 200 ms long primes led to reversed as opposed to traditional priming effects. This led authors to believe that listeners are using information as it unfolds and that even short fragments of speech can convey emotional meaningful information; however, the reversed effects also highlighted that potentially ambiguous or hard to “read” information requires additional resources to help with processing (c.f., [33]). In a recent gating

676

Silke Paulmann

study in which participants were presented with different emotional segments lasting between 100 ms (shortest segment) and 700 ms (longest segment), [34] highlighted that listeners may need different amounts of information for different emotions. Thus, they argued that recognition of different basic emotions follows a different time-course. This is in line with the idea that recognition will depend on the clarity (and distinctness) of the emotional signal. Thus, the findings nicely lend support to the assumption that the extraction of acoustic cues (as evidenced in the MMN or P200, see above) is more than just a salience detector—it paves the way to build up emotional context and the information is used in subsequent, more in-depth processing of emotional speech. This idea receives further support from studies applying frequency-band analyses. For example, [36] report delta synchronization between 200 and 800 ms for emotional words, suggesting sustained attention to emotional intonation. A slightly shorter lasting effect was found in the theta band which increased within 200 and 600 ms after stimulus onset. The authors interpret this latter effect to relate to enhanced working memory engagement. Crucially, the effects that they report for their different time windows of interest are dependent on whether participants attended or just passively heard stimuli, suggesting that oscillatory activity patterns are influenced by participants’ attention [36]. In the ERP literature, exhaustive, ongoing evaluation of emotional prosodic information has been linked to later occurring negativities [29] and the late positive complex (LPC; [20, 23, 42]). It is assumed that the early analysis as reflected in early ERP components directly guides attention to focus on stimulus components that facilitate the “proper” recognition of the expressed emotion. For instance, [23] explored if individual sub-processes linked to emotional prosody evaluation are independent of each other. Their results demonstrated that the early appraisal of emotional significance (P200) can directly affect amplitude modulations of later ERP components (LPC). In other words, through a continuous analysis of different emotionally relevant voice features, the late ERP components such as the LPC lead to more sustained analyses which in turn help ensure that listeners respond appropriately in terms of social behavior (e.g., fight/flight). It is likely that the continuous scanning for variations in the acoustic signal leads to facilitated processing of emotional stimuli. In fact, [42] report stronger LPC effects for more primitive emotional vocalizations such as growls (reflecting anger) when compared to angry pseudospeech. This suggests that anthropologically relevant vocalizations require more persistent analyses to engage in socially appropriate action. To the best of our knowledge, auditory studies looking directly at the effects of emotional re-appraisal and LPC modulations have yet to be conducted; however, findings from the visual literature (e.g., face and word perception) suggest that the LPC is

Neurophysiology of Prosody

677

indeed susceptible to re-appraisal processes. For instance, LPC modulations were reduced for negative stimuli that were accompanied with neutral descriptors as opposed to the same stimuli being accompanied with negative enforcing ideas [43]. Thus, it will be exciting to see if similar re-appraisal effects can be found for emotional prosody. Finally, researchers have also explored how quickly listeners detect changes in voice patterns. In this line of research, a number of studies have revealed that listeners can rapidly detect transitions from neutral prosody to emotional prosody. In sum, studies consistently showed modulations of the posteriorly distributed prosodic expectancy positivity (PEP) when listeners were unexpectedly presented with changes in intonation. For instance, [9] presented listeners with sentences that were initially spoken in a neutral prosody but that were spliced to sentence endings spoken in angry prosody. Results showed that violations to expected build-ups of emotional representation were detected within ~470 ms after splicing point. Interestingly, when prosody is paired with emotional lexical-semantic content, the PEP is subsided by an N400-like negativity reflecting the necessity to also re-evaluate semantic expectancies [44]. This pattern nicely highlights how emotional prosody and emotional semantics interact during sentence comprehension, thus, once more emphasizing the role that prosody plays across a number of different language functions. The latency and distribution of the PEP, which has been replicated in different languages (e.g., English, German, Chinese), seem to depend on stimulus properties and task focus (c.f. [9]). Similar task dependent findings are reported when looking at changes in EEG power, too. For example, [45] report theta band synchronization in a timewindow 100–600 ms after splicing point irrespective of task focus; however, beta desynchronization in a later time-window (400–750 ms) was only reported when participants explicitly attended to emotional attributes of the stimuli. Yet, despite these task-related differences, collective results suggest that the emotional prosody deviance detection mechanism is routinely triggered in listeners [44, 46, 47]. 1.3 Electrophysiological Markers of Social Prosody Processing

While there has been much interest in how emotional or linguistic prosody processing is linked to electrophysiological responses, fewer have explored other types of communicative intentions. One such exception is a study reported by [48] who studied ERP correlates of sarcasm processing. Similar to the affective prosody literature, these authors report an enhanced P200 component in response to sarcastic tone of voice when compared to neutral prosody, followed by an enhanced LPC. Interestingly, no differences between sarcasm and anger were reported for these components suggesting similar processing mechanisms for these two types of prosody. Other investigators have explored whether listeners

678

Silke Paulmann

evaluate the confidence level of speakers. For instance, [49] showed that confident and non-confident sounding voices could be differentiated as early as 200 ms after speech onset, again followed by later more in-depth evaluations of prosodic information. Similar results have been reported for motivational prosody processing. When comparing ERPs in response to autonomy-supportive, pressuring, and neutral sounding voices, [46] found that listeners quickly detect whether a speaker was providing support or tried to pressure them. In addition to the P200 differentiation, the authors also confirm that motivational and neutral prosody is further evaluated at later processing stages (LPC). Thus, combined, these studies seem to suggest that listeners do not only detect so-called basic emotions at a rapid speed (c.f. above) but that they also try to evaluate a speaker’s social intentions and state of mind as early as possible. This idea is in line with the hypothesis that the P200 predominantly reflects a tagging process that determines how vocal signals are going to be processed subsequently. Yet, other studies have failed to report such early effects for more subtle variations within prosodic markers. For example, [50] show that responses to speakers who expressed “white lies” as opposed to sincere claims, were not different at early stages of processing (i.e., no effect at the P200 or N400) and only started to diverge around 600 ms after stimulus onset. It can be hypothesized that the lack of early differentiation might be linked to the complexity of the signal. Specifically, the evaluation underlying the processing of “sugar-coated” responses might be too complex to allow for early differentiation, that is, the speech signal needs to fully unfold before the listener is able to detect the insincerity of the speaker. This is supported by evidence from irony processing which reports similar late P600 effects when listeners have to integrate content and voice information to derive at the speaker’s insincerity [51]. In contrast, emotional and motivational prosody might be easier to detect because of their clearly defined voice characteristics. 1.4 Influencing Factors and Characteristics 1.4.1

Task Focus

As indicated in the previous paragraphs (and it will become even more relevant in the following), (emotional) speech processing is prone to a number of influencing variables. For instance, it has become clear that processing mechanisms may differ depending on task focus. Collectively, results seem to suggest that prosodic aspects are processed even when not specifically attended to (e.g., [10, 21, 26]). However, asking listeners to specifically focus on prosodic attributes naturally leads to more attended, or vigilant processing (c.f. [46]). Task instruction differences also seem to encourage different processing mechanisms in female and male listeners as well as in different age populations (see below), indicating that careful attention should be paid to this influencing factor when designing experiments that aim to describe how prosody is processed in the brain.

Neurophysiology of Prosody

679

1.4.2

Sex

One topic that has received comparably much attention is the influence of sex on emotional prosody processing. For instance, [41] presented female and male listeners with words that were either spoken in a matching (e.g., “happy” spoken in a happy way) or mismatching prosody (e.g., “happy” spoken in an angry voice). Listeners were asked to either focus on the emotional quality of the tone of voice or the emotional quality of the content of materials. While behavioral effects were comparable between female and male participants, the N400 ERP component was elicited only for female participants in the task that specifically focused on the emotional quality of the word content. No such sex differences were found when judging the prosody of the words. In a study looking at pre-attentive processing of emotional prosody [26], both sexes were reported to demonstrate pre-attentive change detection, that is, both male and female listeners showed MMN responses to deviant stimuli; however, female listeners showed larger effects in response to emotional as opposed to neutral sounding deviants, an effect not found for male participants (i.e., their MMN response was comparable between neutral and emotional stimuli). Results were interpreted as indicating that women allocate more attention toward emotional attributes in comparison to men. Indeed, [49] report differences between female and male listeners when evaluating confidence cues as signaled through prosody thereby also suggesting different degrees of sensitivity in male and female speakers in response to suprasegmental features of speech. Future studies should explore the underlying reasons why women (at least at times) seem to show a more pronounced sensitivity toward emotional qualities of prosodic stimuli. As part of these investigations, it will be exciting to determine how sex hormones affect emotional prosody processing. Natural fluctuations of hormones during the menstrual cycle have already been shown to affect other language-related processes (e.g., lateralization of language). As part of these investigations, it has been argued that bottom-up processes might be more prone to hormonal influences than top-down processes (see [52]). Thus, it is reasonable to hypothesize that different sub-processes of emotional prosody perception will be differentially affected by hormonal changes.

1.4.3

Age

Some studies have investigated age differences in emotional prosody processing. The majority of findings looking at early developmental stages suggest that neurophysiological correlates of emotional prosody processing are often comparable between infants and adults [53] as well as between children and adults [54]. For instance, [53] presented 7-month-old infants with simple words spoken in a happy, angry, or neutral tone of voice. Results revealed that, similar to adults, infants quickly differentiate between different prosody patterns. However, comparisons between angry

680

Silke Paulmann

and neutral prosody were reflected in a negative ERP component, while differences between happy and neutral prosody were seen to occur slightly later as indicated in a positive slow wave in response to happy expressions. Similar early focus on “negative” sounding prosody for infants has also been reported in eye-tracking studies (e.g., [55]), suggesting that infants quickly develop the ability to allocate attention to potentially threatening situations as indexed through voice cues. Thus, emotional prosody processing abilities seem to be fine-tuned very early in life, and neurophysiological correlates are often comparable between very young and adult listeners. Whether the same holds true for later stages of development requires further investigation as few studies have compared ERP responses between young children and adults for emotional prosody processing. However, [56] report similar N400 amplitude modulations in response to negative compared to neutral prosody between 6- and 11-year-olds and adults. Yet, the same group also presents behavioral results which suggest that emotional prosody recognition as tested with forced choice paradigms is a process that develops into late childhood [54]. On the other side of the spectrum, behavioral studies have also indicated that aging listeners are less good at recognizing basic emotions from speech and voice cues (e.g., [57]). Similar to the “late” developmental literature, studies comparing neurophysiological responses in young and older listeners are still very rare. Zinchenko et al. [58] asked participants to view short video clips of speakers producing “A” or “O” in an angry or neutral tone of voice. The facial and vocal information were either congruent or incongruent. Two different tasks were used, one that focused on the content (i.e., participants had to say which vowel was produced, termed “cognitive conflict” by the authors) and one that focused on the emotional quality of the prosody (i.e., participants had to say which emotion had been expressed, coined emotional conflict). For all effects of interest (N100, P200, N200), younger and older adults showed similar enhanced ERPs suggesting that online processing mechanisms related to emotional conflict resolution remain relatively stable during aging. Similarly, [59] showed that early (P200) and late (following component) responses to emotional prosody were comparable between a healthy aging (65+ years) and a University student population. Yet, interestingly, in the same study, emotional prosody recognition accuracy rates in elderly were significantly reduced, suggesting that older participants might display difficulties in appropriately using the acoustic input to categorize emotions at later processing stages. In sum, these findings suggest that online processing mechanisms might be comparable across different age groups while behavioral judgments relying on different task strategies and processing mechanisms might differ between participants of different ages.

Neurophysiology of Prosody

2

681

Culture and Language Background Again, while there is a steadily growing literature on how emotional prosody recognition is modulated by language and cultural differences (e.g., [60–62]), few have investigated this issue as part of determining the neurophysiological system underlying emotional prosody. One exception is a study by [63]. They presented native English speakers from the United States and Canada with congruent (e.g., happy/happy) and incongruent (e.g., happy/angry) voice-face pairs and compared their neurophysiological responses to Mandarin Chinese speakers. Results revealed a larger N400 component for North American participants when compared to Chinese participants in response to incongruent pairs, but only when task instructions asked them to focus on the prosody. The authors take these results to suggest that Chinese speakers experienced less interference from facial cues than English speakers. It could be argued that speakers of tonal languages need to focus less on prosodic cues than those from non-tonal languages or that they process these cues more automatically leading them to be less susceptible to interference from other information channels. However, another study from the same group suggests that tonal language speakers can be influenced by task-irrelevant prosodic information as the authors report larger MMN amplitudes to deviant trials that contained information from both face and voice channels as opposed to only facial information. The same difference was not found in English speakers [64]. Thus, combined, it seems as if task focus is again playing a determining role in these studies. In the first study, participants were instructed to focus on prosody and Mandarin speakers showed smaller N400 interference effects from incongruent vocal cues. In the second study, listeners were asked to attend to the visual information and Mandarin listeners were reported to exhibit larger MMN responses when this information was paired with vocal information. Based on these combined results, the authors argued that tonal language users might be more sensitive to prosodic cues (c.f., [64]).

2.1 Future Perspectives and Challenges

By and large, studies reviewed in this chapter confirm that linguistic and affective prosody processing forms an important component of language processing. There is evidence that prosody contributes to a range of other language functions often by “paving the way” for facilitated processes. For instance, the supporting role of prosody in disambiguating syntactic structure building, word meaning, or turn-taking is nicely documented in the literature. The goal of future research should be to try and integrate these findings with current brain-based models that describe the relations between different functions. Thus, special focus should be put on how prosody can be included in the architecture of language models. Technical advances now allow for the combination of different

682

Silke Paulmann

recording technologies providing researchers with multi-modal data. Specifically, combining electrophysiological recordings with anatomical measures offers the potential to devise more fine-tuned brain-based emotional prosody models. This type of work has already been conducted in the auditory domain when combining EEG and DTI (e.g., [65]) but awaits to be transferred to studies on emotional prosody processing. In addition, it will also be exciting to see how future research will embrace the opportunity to further highlight the neuro-cognitive architecture underlying more social components of prosody. At this point, heavy emphasis seems to be put on investigations of linguistic and emotional prosody. However, so many additional social communicative functions are conveyed through prosody (e.g., motivation, attitude, and gratification) and future research should aim to explore how the brain processes such a variety of intentions in such a rapid manner. As part of this challenge, researchers will also have to address questions focusing on how much attention listeners will have to devote to prosodic processes in order to benefit from its potentially facilitating language function. Outside the laboratory, listeners rarely focus on prosodic attributes in a similar fashion to what they might be instructed in the lab. Thus, it will be of critical importance to define prosodic contributions to language processing in “real-life” situations. Indeed, in line with technical progress, research that tries to tackle “naturalistic paradigms” has started to emerge (and see, e.g., [66], for demonstration how using mixedeffects models rather than traditional ANOVA-based EEG analyses can help move the field to design experiments with stronger ecological validity). For example, [67] have nicely shown that it is feasible to test participants with controlled linguistic stimuli using a virtual reality setup that provides a rich three-dimensional context. In a similar vein, the majority of current studies are limited to investigating one listener at the time; however, this leaves out the dynamics interpersonal dimensions can play. This can, for example, be covered by the concurrent acquisition of signal (“hyperscanning,” see [68]). Applying more naturalistic, multi-modal approaches and using will also help delineate how different prosodic functions relate to one another and what contribution individual characteristics such as listeners’ sex, age, language background, personality traits, or similar play in the process.

References 1. Steinhauer K, Alter K, Friederici AD (1999) Brain potentials indicate immediate use of prosodic cues in natural speech processing. Nat Neurosci 2:191–196 2. Pauker E et al (2011) Effects of cooperating and conflicting prosody in spoken English garden path sentences: ERP evidence for the

boundary deletion hypothesis. J Cogn Neurosci 23:2731–2751 3. Steinhauer K, Friederici AD (2001) Prosodic boundaries, comma rules, and brain responses: the closure positive shift in ERPs as a universal marker for prosodic phrasing in listeners and readers. J Psycholinguist Res 30:267–295

Neurophysiology of Prosody 4. Hwang H, Steinhauer K (2011) Phrase length matters: the interplay between implicit prosody and syntax in Korean “garden path” sentences. J Cogn Neurosci 23:3555–3575 5. Foti D, Roberts F (2016) The neural dynamics of speech perception: dissociable networks for processing linguistic content and monitoring speaker turn-taking. Brain Lang 157:63–71 6. Li X, Chen Y, Yang Y (2011) Immediate integration of different types of prosodic information during on-line spoken language comprehension: an ERP study. Brain Res 1386:139–152 7. Bo¨cker KB et al (1999) An ERP correlate of metrical stress in spoken word recognition. Psychophysiology 36:706–720 8. Mietz A et al (2008) Inadequate and infrequent are not alike: ERPs to deviant prosodic patterns in spoken sentence comprehension. Brain Lang 104:159–169 9. Paulmann S, Jessen S, Kotz SA (2012) It’s special the way you say it: an ERP investigation on the temporal dynamics of two types of prosody. Neuropsychologia 50:1609–1620 10. Eckstein K, Friederici AD (2006) It’s early: event-related potential evidence for initial interaction of syntax and prosody in speech comprehension. J Cogn Neurosci 18:1696– 1711 11. Ko¨ster D (2014) Prosody in parsing morphologically complex words: neurophysiological evidence. Cogn Neuropsychol 31:147–163 12. So¨derstro¨m P, Horne M, Roll M (2017) Stem tones pre-activate suffixes in the brain. J Psycholinguist Res 46:271–280 13. Aste´sano C, Besson M, Alter K (2004) Brain potentials during semantic and prosodic processing in French. Cogn Brain Res 18:172–184 14. Pfurtscheller G (2001) Functional brain imaging based on ERD/ERS. Vis Res 41:1257– 1260 15. Cacace AT, McFarland DJ (2003) Spectral dynamics of electroencephalographic activity during auditory information processing. Hear Res 176:25–41 16. Augurzky P, Kotchoubey B (2016) Prosodic phrasing in the presence of unambiguous verb information–ERP evidence from German. Neuropsychologia 81:31–49 17. Schirmer A, Kotz SA (2006) Beyond the right hemisphere: brain mechanisms mediating vocal emotional processing. Trends Cogn Sci 10:24– 30 18. Kotz SA, Paulmann S (2011) Emotion, language, and the brain. Lang Ling Compass 5: 108–125

683

19. N€a€at€anen R, Picton T (1987) The N1 wave of the human electric and magnetic response to sound: a review and an analysis of the component structure. Psychophysiology 24:375–425 20. Paulmann S, Bleichner M, Kotz SA (2013) Valence, arousal, and task effects in emotional prosody processing. Front Psychol 4:345 21. Paulmann S, Kotz SA (2008) Early emotional prosody perception based on different speaker voices. Neuroreport 19:209–213 22. Pinheiro AP et al (2014) Abnormalities in the processing of emotional prosody from single words in schizophrenia. Schizophr Res 152: 235–241 23. Schirmer A et al (2013) Vocal emotions influence verbal memory: neural correlates and interindividual differences. Cogn Affect Behav Neurosci 13:80–93 24. Goydke KN et al (2004) Changes in emotional tone and instrumental timbre are reflected by the mismatch negativity. Cogn Brain Res 21: 351–359 25. Jiang A, Yang J, Yang Y (2014) MMN responses during implicit processing of changes in emotional prosody: an ERP study using Chinese pseudo-syllables. Cogn Neurodyn 8:499– 508 26. Schirmer A, Striano T, Friederici AD (2005) Sex differences in the preattentive processing of vocal emotional expressions. Neuroreport 16: 635–639 27. Carminati M, Fiori-Duharcourt N, Isel F (2018) Neurophysiological differentiation between preattentive and attentive processing of emotional expressions on French vowels. Biol Psychol 132:55–63 28. Paulmann S (2015) The neurocognition of prosody. In: Hickok G, Small SL (eds) Neurobiology of language. Elsevier (Academic Press), pp 1109–1120 29. Paulmann S, Uskul AK (2017) Early and late brain signatures of emotional prosody among individuals with high versus low power. Psychophysiology 54:555–565 30. Pantev C et al (1996) Binaural fusion and the representation of virtual pitch in the human auditory cortex. Hear Res 100:164–170 31. Picton TW et al (1977) Evoked potential audiometry. J Otolaryngol 6:90–119 32. Liu X et al (2018) Emotional connotations of musical instrument timbre in comparison with emotional speech prosody: evidence from acoustics and event-related potentials. Front Psychol 9:737 33. Paulmann S, Pell MD (2010) Contextual influences of emotional speech prosody on face

684

Silke Paulmann

processing: how much is enough? Cogn Affect Behav Neurosci 10:230–242 34. Castiajo P, Pinheiro AP (2019) Decoding emotions from nonverbal vocalizations: how much voice signal is enough? Motiv Emot 43:803– 813 35. Steber S et al (2020) Uncovering electrophysiological and vascular signatures of implicit emotional prosody. Sci Rep 10:1–14 36. Del Giudice R et al (2016) The voice of anger: oscillatory EEG responses to emotional prosody. PLoS One 11:e0159429 37. Bostanov V, Kotchoubey B (2004) Recognition of affective prosody: continuous wavelet measures of event-related brain potentials to emotional exclamations. Psychophysiology 41:259–268 38. Wambacq IJ, Jerger JF (2004) Processing of affective prosody and lexical-semantics in spoken utterances as differentiated by eventrelated potentials. Cogn Brain Res 20:427–437 39. Schirmer A, Kotz SA, Friederici AD (2002) Sex differentiates the role of emotional prosody during word processing. Cogn Brain Res 14: 228–233 40. Schirmer A, Kotz SA, Friederici AD (2005) On the role of attention for the processing of emotions in speech: sex differences revisited. Cogn Brain Res 24:442–452 41. Schirmer A, Kotz SA (2003) ERP evidence for a sex-specific Stroop effect in emotional speech. J Cogn Neurosci 15:1135–1148 42. Pell MD et al (2015) Preferential decoding of emotion from human non-linguistic vocalizations versus speech prosody. Biol Psychol 111: 14–25 43. Foti D, Hajcak G (2008) Deconstructing reappraisal: descriptions preceding arousing pictures modulate the subsequent neural response. J Cogn Neurosci 20:977–988 44. Kotz SA, Paulmann S (2007) When emotional prosody and semantics dance cheek to cheek: ERP evidence. Brain Res 1151:107–118 45. Chen X et al (2015) EEG oscillations reflect task effects for the change detection in vocal emotion. Cogn Neurodyn 9:351–358 46. Paulmann S, Weinstein N, Zougkou K (2019) Now listen to this! Evidence from a crossspliced experimental design contrasting pressuring and supportive communications. Neuropsychologia 124:192–201 47. Chen X et al (2011) Event-related potential correlates of the expectancy violation effect during emotional prosody processing. Biol Psychol 86:158–167

48. Wickens S, Perry C (2015) What do you mean by that?! An electrophysiological study of emotional and attitudinal prosody. PLoS One 10: e0132947 49. Jiang X, Pell MD (2015) On how the brain decodes vocal cues about speaker confidence. Cortex 66:9–34 50. Rigoulot S, Fish K, Pell MD (2014) Neural correlates of inferring speaker sincerity from white lies: an event-related potential source localization study. Brain Res 1565:48–62 51. Regel S, Coulson S, Gunter TC (2010) The communicative style of a speaker can affect language comprehension? ERP evidence from the comprehension of irony. Brain Res 1311: 121–135 52. Hodgetts S, Weis S, Hausmann M (2015) Sex hormones affect language lateralisation but not cognitive control in normally cycling women. Horm Behav 74:194–200 53. Grossmann T, Striano T, Friederici AD (2005) Infants’ electric brain responses to emotional prosody. Neuroreport 16:1825–1828 54. Chronaki G et al (2015) The development of emotion recognition from facial expressions and non-linguistic vocalizations during childhood. Br J Dev Psychol 33:218–236 55. Gerson S et al (2019) Infants attend longer to controlling versus supportive directive speech. J Exp Child Psychol 187:104654 56. Chronaki G et al (2012) Isolating N400 as neural marker of vocal anger processing in 6–11-year old children. Dev Cogn Neurosci 2:268–276 57. Paulmann S, Pell MD, Kotz SA (2008) How aging affects the recognition of emotional speech. Brain Lang 104:262–269 58. Zinchenko A et al (2017) Positive emotion impedes emotional but not cognitive conflict processing. Cogn Affect Behav Neurosci 17: 665–677 59. Paulmann S, Harmsworth C, Russo R (2015) Emotional prosody perception in healthy ageing – evidence from ERPs and recognition rates. SAN conference, Boston, 23–25 April 60. Scherer KR, Banse R, Wallbott HG (2001) Emotion inferences from vocal expression correlate across languages and cultures. J CrossCult Psychol 32:76–92 61. Pell MD et al (2009) Recognizing emotions in a foreign language. J Nonverbal Behav 33: 107–120 62. Paulmann S, Uskul AK (2014) Cross-cultural emotional prosody recognition: evidence from Chinese and British listeners. Cogn Emot 28: 230–244

Neurophysiology of Prosody 63. Liu P, Rigoulot S, Pell MD (2015) Culture modulates the brain response to human expressions of emotion: electrophysiological evidence. Neuropsychologia 67:1–13 64. Liu P, Rigoulot S, Pell MD (2015) Cultural differences in on-line sensitivity to emotional voices: comparing East and West. Front Hum Neurosci 9:311 65. Steinmann S et al (2018) The role of functional and structural interhemispheric auditory connectivity for language lateralization – a combined EEG and DTI study. Sci Rep 8:1–12 66. Alday PM, Schlesewsky M, BornkesselSchlesewsky I (2019) M/EEG analysis of

685

naturalistic stories: a review from speech to language processing. Lang Cogn Neurosci 34: 457–473 67. Tromp J, Peeters D, Meyer AS, Hagoort P (2018) The combined use of virtual reality and EEG to study language processing in naturalistic environments. Behav Res Methods 50: 862–869 68. Babiloni F, Astolfi L (2014) Social neuroscience and hyperscanning techniques: past, present and future. Neurosci Biobehav Rev 44:76– 93

Chapter 22 Using Facial EMG to Track Emotion During Language Comprehension: Past, Present, and Future Jos J. A. van Berkum, Marijn Struiksma, and Bjo¨rn ‘t Hart Abstract Beyond recognizing words, parsing sentences, building situation models, and other cognitive accomplishments, language comprehension always involves some degree of emotion too, with or without awareness. Language excites, bores, or otherwise moves us, and studying how it does so is crucial. This chapter examines the potential of facial electromyography (EMG) to study language-elicited emotion. After discussing the limitations of self-report measures, we examine various other tools to tap into emotion, and then zoom in on the electrophysiological recording of facial muscle activity. Surveying psycholinguistics, communication science, and other fields, we provide an exhaustive qualitative review of the relevant facial EMG research to date, exploring 55 affective comprehension experiments with single words, phrases, sentences, or larger pieces of discourse. We discuss the outcomes of this research, and evaluate the various practices, biases, and omissions in the field. We also present the fALC model, a new conceptual model that lays out the various potential sources of facial EMG activity during language comprehension. Our review suggests that facial EMG recording is a powerful tool for exploring the conscious as well as unconscious aspects of affective language comprehension. However, we also think it is time to take on a bit more complexity in this research field, by for example considering the possibility that multiple active generators can simultaneously contribute to an emotional facial expression, by studying how the communicator’s stance and social intention can give rise to emotion, and by studying facial expressions not just as indexes of inner states, but also as social tools that enrich everyday verbal interactions. Key words Review, EMG, Facial electromyography, Psycholinguistics, Communication science, Emotion, Psychophysiology, Simulation, Mimicry, Evaluation

1

Introduction One of the most interesting features of us humans is that we can communicate through language. We do so seemingly effortlessly, for many, many hours a day, as we, for example, engage in conversation, check our social media, read a book, or listen to the news. The speed of language processing is astounding, with an average reading speed of 3–5 words per second for adults, depending on the language at hand [1], and an estimated average speaking—and hence also listening—speed that is only slightly lower and that can

Mirko Grimaldi et al. (eds.), Language Electrified: Principles, Methods, and Future Perspectives of Investigation, Neuromethods, vol. 202, https://doi.org/10.1007/978-1-0716-3263-5_22, © Springer Science+Business Media, LLC, part of Springer Nature 2023

687

688

Jos J. A. van Berkum et al.

go up dramatically, depending on circumstances [2]. Psycholinguists interested in language comprehension have long realized that to understand this rapid and intricately orchestrated skill, they need measures that can keep track of reading or listening as language input unfolds. It is for this reason that the research field makes heavy use of temporally sensitive measures like self-paced reading, eye tracking, electroencephalography (EEG), and magnetoencephalography (MEG). The use of such measures has contributed a lot to our understanding of the many cognitive aspects of language comprehension that psycholinguistics has traditionally been interested in, like word recognition, syntactic parsing, or situation model construction (see [3–5], and this volume, for reviews). However, language researchers have recently also begun to ask questions about the relationship between language and emotion (e.g., [6–14]). This new focus in language research raises an important question: what measurement tools should we use to keep track of the emotional aspects of language comprehension as the words are coming in? How can we measure, during reading or listening, the emotional impact of, say, a piece of gossip, a good joke, a political slogan, a medical information leaflet, or an engaging bit of narrative fiction? In this chapter, we address exactly this question. After examining the various options for measuring emotion during language comprehension, we focus on one particularly promising method: facial electromyography, the electrophysiological recording of facial muscle activation. We explain the method, provide an exhaustive review of the language-relevant facial EMG research to date, present a conceptual model for future EMG research in this domain, and discuss the challenges and opportunities ahead. Our focus in all of this will be largely conceptual, for a methods-oriented review of facial electromyography in the language sciences and beyond, see [42] in this volume.

2

Measuring Emotion

2.1 Why Self-Report Is Not Enough

At first sight, emotion seems to be about how people feel. Emotion is therefore often measured by asking people about this conscious experience, either via an open-ended “how do you feel” question, or, more commonly, via a constrained self-report format. For example, researchers can ask participants in their study to rate the presence and intensity of specific emotions, such as joy, shame, or compassion (e.g., via the Geneva Emotion Wheel [15]). They can also ask participants to rate their conscious experience on one or more emotion-relevant dimensions. An often used dimension is subjective valence, the degree to which an emotional experience, or the elicitor of that experience, is seen as positive or negative

Facial EMG and Affective Language Comprehension

689

(cf. the smiley-based valence scale used to assess customer satisfaction). Ratings can also be collected on two orthogonal dimensions, often crossing subjective valence and subjective arousal, the experienced degree of bodily activation (e.g., SAM [16]), but sometimes also crossing experienced positivity and experienced negativity (the Affect Matrix [17]). Furthermore, researchers can systematically ask people to report on various other things, such as what triggered the emotion [18], or the amount of activity they feel in different parts of the body [19]. These various self-report methods can all help explore how people feel. However, they are not particularly useful for tracking emotion as linguistic input unfolds, for two very different reasons. On the practical side, researchers simply cannot ask for a self-report at each word or sentence, at least not without seriously disrupting the processes under study. The second and much more fundamental reason is that, just like with other domains of the human mind, many aspects of emotion are simply inaccessible to consciousness, at least initially [13, 20–24]. In everyday life, people use the terms “emotion” and “feeling” in interchangeable ways. However, research has shown that emotion can sometimes do all of its work “behind the scenes,” nudging us without us being aware of the stimulus or of what it brings about (e.g., [25, 26]). Furthermore, even in the case where conscious experience or “feeling” does arise, there is a lot more to the emotion at hand than what people have conscious access to—feeling is usually just the tip of the iceberg, and important motivational, cognitive, physiological, and expressive aspects of an emotion will inevitably escape self-report. How to tap into those? Motivational and cognitive aspects Because emotions always come with particular motivations (“action tendencies” [27]), and can also, for example, affect the speed and focus of cognitive processing (e.g., via emotion-induced attention), familiar behavioral measures developed within cognitive psychology and psycholinguistics, such as two-choice response time, self-paced reading, and eye tracking can in principle be used to pick up on these aspects of language-induced emotion, independent of whether those are accessible to consciousness or not. For example, it is possible to detect very subtle approach or avoidance tendencies when reading words with strong positive or negative valence [28, 29]. Also, because motivational and cognitive aspects of emotion require neural computation, brain measures such as EEG, MEG, and fMRI, can also help track these aspects of emotional responding during language processing, independent of whether people are aware of them or not (e.g., [30]; see [7, 31] for reviews).

690

Jos J. A. van Berkum et al.

Physiological and expressive aspects One highly convenient feature of emotions, however, is that they are “fundamentally embodied” [32], or, more poetically, “play out in the theatre of the body” [33], in two interesting ways. First, because emotions usually prepare the body to act (e.g., to approach or avoid, to explore, embrace, or hide), they come with various rapid physiological changes that prepare for and support the intended actions, such as changes in heart rate, sweating, or hormone levels. Although brain measures might be able to pick up on the neural control or neural feedback associated with these physiological changes, the changes themselves can be measured as well (see [34], for a review of the utility of recording cardiovascular, respiratory, skin conductance, and other physiological parameters). Moreover, they can be measured not just when actually facing, say, a hairy spider, or a loved one, but also when reading or hearing about them (e.g., [35]; see [36, 37] for reviews). Second, emotions are often expressed in the face, in the voice, and in the movements or posture of the rest of the body (see [38] for review), sometimes deliberately, and sometimes involuntarily. Such expressions probabilistically inform others about how we evaluate things and what state we are in, as well as about what we might do, right now, or in the future [39, 40]. Responding to an insult with an angry face and voice, for example, informs the offender and/or others that we probably dislike seeing our rights or properties infringed upon, that we may well strike back in some way or the other, and that it is therefore probably wise to back off. Responding to a sad story of a friend with a compassionate face and voice tells him or her that we are probably concerned with his or her fate, and that we might well be willing to help. And responding to a mistake with a facial expression and posture of embarrassment or shame informs others that we probably agree we messed up, feel bad about it, and are somehow committed to not let it happen again. Although the specific ingredients of emotional expressions need not all have evolved for communication originally ([41]; see also [42]), it is clear that like other primates, we express our emotions to others, and that doing so plays a crucial role in solving the many problems of social life [43]. Again, brain measures might be able to pick up on the required neural control. But just like the physiological embodiment of emotion, the expressive embodiment itself can also be measured. One way to do so is by recording facial EMG. 2.2 Measuring Emotion via Facial Electromyography

Facial expressions are generated by contracting and/or relaxing specific subsets of the ~30 muscles in the human face. Facial muscles are not only involved in expressing emotion, but also needed for other things, such as eating and drinking, improving visual acuity, and speaking. During non-invasive facial electromyography,

Facial EMG and Affective Language Comprehension

691

common reference electrode (at border of hair line) imaginary vertical line through pupil frontalis (VII) corrugator supercilii (VII) orbicularis oculi (VII) lev. labii sup. al. nasi (VII) zygomaticus major (VII) orbicularis oris inferior (VII) depressor anguli oris (VII)

mentalis (VII)

Fig. 1 EMG electrode locations for measuring activity in several muscles that play a role in facial expressions. (Figure courtesy of Anton van Boxtel (see also [44]))

all such emotional and/or non-emotional facial activity can be recorded with high temporal resolution, by means of small electrodes attached to the skin over particular target muscles. Figure 1 displays standard electrode positions for recording EMG from specific facial muscles, including, for example, the corrugator supercilii (involved in frowning), the levator labii superioris (involved in wrinkling the nose to express disgust), and the zygomaticus major and orbicularis oculi (involved in smiling; see [42, 44, 45]). Facial EMG has been used to study many things, including the facial correlates of physical and cognitive effort (e.g., [46–49]), and the patterns of lip movement during articulation (e.g., [50, 51]). However, building on earlier work that used facial EMG to track covert articulation during reading, Schwartz and colleagues (e.g., [52]) were the first to use EMG to keep track of emotion: when participants were asked to imagine happy, sad, and angry situations, the researchers observed increased corrugator activity during sad and angry emotional imagery as well as decreased corrugator

692

Jos J. A. van Berkum et al.

activity during happy emotional imagery. Importantly, Schwartz and colleagues also videotaped the participants while they were engaging in emotional imagery, and reported that the patterns of activity recorded via EMG “were not readily detected on the overt face.” In line with physiological knowledge about the imperfect coupling of facial muscle activation and visible skin movement (see [53]), this showed that facial EMG can track very subtle, non-overt changes in emotional state. Following up on this clinical work, Cacioppo and Petty [54] pioneered the use of facial EMG to track emotional state in a verbal communication setting, as part of a larger social-psychological research program that explored attitude-relevant processing. In the critical study (experiment 2), undergraduate participants listened to pro- and counter-attitudinal messages (e.g., on alcohol use) while EMG was recorded over the corrugator to assess the degree of frowning, and over the zygomaticus and depressor anguli oris (upper and lower cheek muscles) to assess the degree of smiling (to track covert pronunciation, the mentalis also was recorded). In contrast to Schwartz et al. [52], who related their EMG findings to specific emotions, Cacioppo and Petty looked for EMG traces of emotional valence, the degree to which different stimuli elicited positive or negative emotion (which participants may or may not be aware of). Their results suggested that both the corrugator and the zygomaticus were sensitive to message valence, in the expected direction: increased corrugator activity to negative stimuli, and increased zygomaticus activity to positive stimuli. Several followup studies confirmed that facial EMG over these and other facial muscles could help assess the valence of language-induced emotional states, and do so even when no overt changes were visible in the face (see [55] for review). As such, this early work paved the way for other attempts to track language-elicited emotion by means of facial EMG. In the next section, we review that work.

3

A Review of Facial EMG Research on Affective Language Comprehension To explore the potential of facial EMG for our purpose, comprehension studies that explicitly asked about the language-emotion interface within the field of psycholinguistics are obviously relevant. However, as illustrated by the pioneering work by Cacioppo and Petty [54], psychologists interested in the neighboring field of communication have also been using facial EMG to track emotion to language, albeit without foregrounding a language processing question. We therefore queried the literature to find emotionoriented facial EMG research with a psycholinguistic or (verbal)

Facial EMG and Affective Language Comprehension

693

communication focus.1 Our interest was in examining the types of questions addressed, the procedures by which they were addressed (e.g., which electrodes were measured, and why), and the characteristic findings obtained. As can be seen in Table 1, our literature query turned up 55 experiments reported in 50 publications, mostly in journal articles, but also, occasionally, in published dissertations. We grouped these experiments on the basis of whether they involved single isolated words, phrases and sentences, or multi-sentence discourse—a standard psycholinguistic criterion that is convenient for now, but to which we will briefly return later. Facial EMG can, in principle, also be useful when assessing the role of emotion during language production [14]. However, echoing a general bias toward comprehension in psycholinguistics, the interest in how emotion affects production (e.g., [56]) lags far behind the interest in how it affects comprehension. Also, speaking involves facial muscle activation as part of articulation, which greatly complicates the use of EMG to keep track of speaker emotion. Therefore, while psycholinguists interested in language production have used EMG to study articulation (e.g., [50, 51]), they have to our knowledge not used it to tap into emotional factors— even in stuttering research, where emotion has clear relevance, EMG recording focuses on articulation only (e.g., [57, 58]). Our review of studies that used EMG to track emotion during language comprehension does not cover facial EMG studies that use verbal materials as a vehicle but whose description does not satisfy the language- and communication-oriented keyword searches in this review (e.g., research with “vignettes” in social or moral psychology). We also do not cover media-psychological EMG research involving cinematic narrative (e.g., [59]) or advertisements with emotionally toned non-verbal materials (e.g., music [60]), work that includes language but does not manipulate verbal materials in ways that are useful to those interested in language

1

Psycholinguistically framed research was obtained by querying Scopus with: TITLE-ABS-KEY (language OR discourse OR text OR story OR stories OR narrative OR conversation OR sentence OR phrase OR word OR lexical OR speech OR prosody) AND TITLE-ABS-KEY (comprehension OR recognition OR perception OR interpretation OR reception OR processing) AND TITLE-ABS-KEY (“facial EMG” OR electromyography OR zygomaticus OR corrugator OR frontalis OR levator OR orbicularis OR depressor OR mentalis) AND TITLE-ABS-KEY (emotion OR valence OR affect OR sentiment) Research on communication was obtained by querying Scopus with: TITLE-ABS-KEY (communication) AND TITLE-ABS-KEY (“facial EMG” OR electromyography OR zygomaticus OR corrugator OR frontalis OR levator OR orbicularis OR depressor OR mentalis) AND TITLE-ABS-KEY (emotion OR valence OR affect OR sentiment) The resulting entries were screened for whether they explored facial EMG responses to emotionally relevant linguistic stimuli presented to non-impaired adult participants, and complemented with ad-hoc leads derived from the obtained papers that also passed the above screening. All queries were conducted on May 4, 2019, without setting any date ranges.

694

Jos J. A. van Berkum et al.

Table 1 The 55 EMG studies discussed in our literature review, with, for each study, whether the corrugatorsupercilii (CS) and zygomaticus major (ZM) were sensitive to a language-driven emotion manipulation (yes, no, undecidable, or not measured/evaluated (–)), whether one of the two muscles was sensitive while the other was not (CS > ZM: CS sensitive, ZM not sensitive; CS < ZM: CS not sensitive, ZM sensitive; CS = ZM: CS and ZM both sensitive, or both not sensitive), and whether any other facial muscles were recorded from. The undecidable case is a study that used a CS-ZM composite measure (See table 2 for summary statistics) ZM emotionsensitive?

Recorded Comparison other muscles

Arias, Belin, & Aucouturier (2018) [90] Yes

Yes

CS = ZM

Arndt, Allen, & Greenberg (2001) [70] Yes



Bartholow, Fabiani, Gratton, & Bettencourt (2001) [96]



Study

CS emotionsensitive?

Yes

Baumeister, Foroni, Conrad, Rumiati, & Yes Winkielman (2017) [72]

Yes

Bayer, Sommer, & Schacht (2010) [81] Yes



Borgeat, Elie, Chaloult, & Chabot (1984) [68]



Cacioppo & Petty (1979, study 2) [54] Yes

CS = ZM

– Yes

Frontalis CS = ZM

Depressor anguli oris, mentalis

Depressor anguli oris

Cikara & Fiske (2012, study 1) [86]



Yes

Durso, Geldbach, & Corballis (2012) [116]

Yes

No

CS > ZM

Fiacconi & Owen (2015) [101]

Yes

Yes

CS = ZM

Fino, Menegatti, Avenanti, & Rubini (2016) [84]

Yes

Yes

CS = ZM

Fino, Menegatti, Avenanti, & Rubini (2019) [87]

Yes

Yes

CS = ZM

Fino (2014, study 3) [88]

Yes

No

CS > ZM

Foroni (2015) [83]



Yes

Foroni & Semin (2009) [67]

Yes

Yes

Foroni & Semin (2013) [82]



Yes

Gavaruzzi, Sario, Giandomenico, Rumiati, Polato, De Lazarri, & Lotto (2018, study 2) [102]

Yes



Hietanen, Surakka, & Linnankoski (1998) [76]

Yes



K€atsyri, Ravaja, & Salminen (2012) [105]

No

Yes

Orbicularis oculi

CS = ZM

Orbicularis oculi CS < ZM

Orbicularis oculi (continued)

Facial EMG and Affective Language Comprehension

695

Table 1 (continued) CS emotionsensitive?

ZM emotionsensitive?

Recorded Comparison other muscles

Krumhuber, Tsankova, & Kappas (2016) [97]

No

No

CS = ZM

Ku¨necke, Somme, Schacht, & Palazova (2015) [73]

Yes

No

CS > ZM

Kunkel (2018, study 2, control experiment) [66]

Yes

No

CS > ZM

Levator labii superioris

Kunkel (2018, study 2, discourse experiment) [66]

Yes

Yes

CS = ZM

Levator labii superioris

Kunkel (2018, study 3, exp 2, control experiment) [66]

Yes

No

CS > ZM

Kunkel (2018, study 3, exp 2, discourse No experiment) [66]

No

CS = ZM

Larsen, Norris, & Cacioppo (2003) [17] Yes

Yes

CS = ZM

Larsen, Norris, McGraw, Hawkley, & Cacioppo (2009, study 4) [65]

Yes

Yes

CS = ZM

Larson, Hazlett, Chaparro, & Picard (2007) [115]

Yes

No

CS > ZM

Lee & Potter (2018) [110]

Yes

Yes

CS = ZM

Leshner, Bolls, Gardner, Moore, & Kreuter (2018) [103]

Yes



Levy, Harmon-Jones, & Harmon-Jones Yes (2018, study 2) [93]



Study

Levator labii superioris

Orbicularis oculi

Livingstone, Forde Thompson, & Russo Undecidable Undecidable (2009) [91] Lucas, Sa´nchez‐Adam, Vila, & Guerra (2019) [35]

Yes

Yes

CS = ZM

Magne´e, Stekelenburg, Kemner, & de Gelder (2007, study 1) [92]

Yes

Yes

CS = ZM

Morriseau, Mermillod, Eymond, van der No Henst, & Noveck (2017) [112]

Yes

CS < ZM

Neumann, Hess, Schulz, & Alpers (2005, study 1) [74]

Yes

Yes

CS = ZM

Neumann, Hess, Schulz, & Alpers (2005, study 2) [74]

Yes

No

CS > ZM

Niedenthal, Winkielman, Mondillon, & Yes Vermeulen (2009, study 1) [71]

Yes

CS = ZM

Orbicularis oculi

Frontalis, orbicularis oculi

Orbicularis oculi, levator labii superioris (continued)

696

Jos J. A. van Berkum et al.

Table 1 (continued) ZM emotionsensitive?

Recorded Comparison other muscles

Niedenthal, Winkielman, Mondillon, & Yes Vermeulen (2009, study 2) [71]

Yes

CS = ZM

Ravaja, Aula, Falco, Laaksonen, Salminen, & Ainamo (2015) [106]

No

Yes

CS < ZM

Ravaja, Kallinen, Saari, & KeltikangasJ€arvinen (2004) [107]

Yes

No

CS > ZM

Orbicularis oculi

Ravaja, Saari, Kallinen, & Laarni (2006) Yes [108]

No

CS > ZM

Orbicularis oculi

’t Hart, Struiksma, van Boxtel, & van Berkum (2018) [98]

Yes



’t Hart, Struiksma, van Boxtel, & van Berkum (2019) [99]

Yes



’t Hart, Struiksma, van Boxtel, & van Berkum (2021) [89]

Yes



Study

CS emotionsensitive?

Orbicularis oculi, levator labii superioris

Thomson, Mackenzie, Leuthold, & Filik Yes (2016) [100]

Yes

CS = ZM

Topolinsky, Likowski, Weyers, & Strack Yes (2009) [95]

Yes

CS = ZM

Frontalis

Topolinsky & Strack (2015) [94]

Yes

No

CS > ZM

Frontalis

Tuisku, Ilves, Lylykangas, Surakka, Ainasoja, Ryto¨vuori, & Ruohonen (2018) [104]

Yes



Van Leeuwen (2017, chapter 5) [113]

Yes



Wassiliwizky, Koelsch, Wagner, Jacobsen, & Menninghaus (2017, study 1) [114]

Yes

No

CS > ZM

Weis & Herbert (2017) [85]

Yes

Yes

CS = ZM

Wexler, Warrenburg, Schwartz, & Janer Yes (1992) [69]

No

CS > ZM

Wise, Kim, & Kim (2009) [109]

Yes



Zhu & Suzuki (2018) [75]

Yes

Yes

CS = ZM

Facial EMG and Affective Language Comprehension

697

processing. Finally, we ignore EMG studies that, in the trail of the pioneering emotional imagery EMG work by Schwartz et al. [52], explored the emotional consequences of thinking about linguistically presented materials after reading or hearing it, of imagining reading or listening to a text, or of imagining agreeing or disagreeing while reading or listening to a text (see, e.g., [61–64]). 3.1 Single-Word Studies 3.1.1 Manipulations of Lexical Emotional Contents

Following up on the early Cacioppo and Petty research [54], as well as on work that had observed increased lip EMG to words when evaluated on valence [46, 47], Larsen et al. [17] were the first to directly examine word-elicited positive and negative emotion by means of EMG. Participants read a series of words that spanned a preestablished broad unpleasant-pleasant valence rating continuum, while EMG was recorded over the corrugator and the zygomaticus muscles. The same EMG participants also saw a series of pictures and heard a series of sounds (e.g., animal noises, alarms, laughter) and after each stimulus, rated how positive and negative they felt about it. When average EMG responses were plotted against the (ranked) average valence ratings provided by participants, a very important result emerged: for pictures, sounds, and words, corrugator EMG tracked subjective stimulus valence in a virtually linear way, with stimuli that had been rated more negatively leading to stronger corrugator activity—see Fig. 2a for the word and picture results. The zygomaticus EMG response was more complex: for pictures and sounds, it tracked subjective stimulus valence in a more U-shaped way (elevated zygomaticus activity for positive stimuli, but also for very negative stimuli), whereas for words, the zygomaticus was not reliably sensitive to subjective valence. These results, which were corroborated by a later reanalysis on absolute (rather than ranked) valence ratings ([65]: study 4), suggested three important things: (a) corrugator EMG is a useful index of the valence of word-induced emotion, (b) corrugator valence effects to words, although smaller, are qualitatively similar to those obtained with pictures and sounds of concrete objects or events, and (c) relative to the corrugator, the zygomaticus is somewhat less suitable for tracking the valence of word-elicited emotion. Other research also points in this direction. For example, in two recent EMG studies ([66] control experiments of studies 2 and 3), corrugator and zygomaticus activity was recorded to written single words and pictures, focusing on negative versus neutral valence, and using two different secondary tasks. Whereas zygomaticus activity did not change in response to valence, corrugator activity was significantly higher to negative compared to neutral words, both when participants were asked neutral questions about the words after each trial (study 3, e.g., did the word have three syllables? Was the animal depicted a sheep?), and when participants were asked to rate the degree to which the same words emotionally moved them (study 2; in the latter case, the additionally measured

698

Jos J. A. van Berkum et al.

Fig. 2 Example corrugator and zygomaticus results from three facial EMG studies with single words. Panel (a): corrugator and zygomaticus responses to words ordered along a valence rank dimension (left = most negative, right = most positive; each dot represents a word with a particular valence rank; inset shows the same data for pictures). Panel (b): corrugator and zygomaticus responses as a function of time to emotion

Facial EMG and Affective Language Comprehension

699

levator labii superioris was also sensitive to valence). Interestingly, across the two studies, picture-induced effects displayed the same pattern of results. These findings converge with the Larsen et al. [17] results and also show that word-elicited emotional responding does not critically depend on the use of an emotion-focused secondary task. The latter idea also received support from a study in which participants were merely asked to read single words, without any additional task, while corrugator and zygomaticus EMG was recorded ([67], see Fig. 2b). Compared to words associated with positive emotions (e.g., “smiling,” “funny”), words associated with negative emotions (e.g., “frowning,” “annoying”) elicited a rapid relative increase in corrugator activity, as well as a rapid relative decrease in zygomaticus activity, both clearly within 750 ms after word presentation. In another word reading study without a secondary task ([35], see Fig. 2c), seeing the names of loved ones led to increased zygomaticus activity and reduced corrugator activity, relative to seeing control names of unknown or (neutrally rated) familiar persons. Of course, we cannot exclude that facial behavior in these studies was in part affected by perceived implicit task demands, an issue to which we will return later. However, in several EMG studies with masked stimuli, words that were acoustically hidden in other stimuli to such an extent that participants did not consciously detect their presence, also generated reliable EMG responses, visible in frontalis muscle EMG with sex-related words [68], as well as in corrugator (but not zygomaticus) EMG for other types of valenced words ([69]; see [70], for a related observation when using written words). Such findings cannot easily be ascribed to task demands. As might be expected, task and stimulus parameters do sometimes influence language-induced corrugator and zygomaticus EMG. For example, in contrast to studies where task effects did not emerge ([66] control experiments of studies 2 and 3), another study [71] did generate a task effect: valence-dependent responses to concrete and abstract written words were obtained in corrugator and zygomaticus EMG (as well as in orbicularis oculi and levator labii superioris EMG) when participants were asked to judge whether the referent of each word was associated with an emotion, but not when they were asked to judge whether the word was ä Fig. 2 (continued) adjectives and emotion expression verbs of positive or negative valence. Panel (c): corrugator and zygomaticus responses as a function of time to names of loved ones, compared to neutral familiar and unfamiliar names. Note the rapid differential EMG responses in Panels (b) and (c), and, in Panel (a), the different way in how corrugator and zygomaticus activity tracks stimulus valence. See text for further explanation. (Panels (a–c) reprinted with permission from Refs. [17, 67, 35], respectively)

700

Jos J. A. van Berkum et al.

printed in upper- or lower-case. Using the same emotion-focused judgment task, another study [72] revealed a native language advantage: a valence-dependent corrugator effect only emerged to written words in one’s native language (L1) and not to words in a familiar foreign language (L2). Although a valence-dependent zygomaticus effect emerged in both L1 and L2, the effect emerged later and ended sooner in L2. Finally, one study [73] revealed a concreteness effect: the corrugator was sensitive to the valence of written words in the case of concrete words, but not in the case of abstract words—the zygomaticus did not show any effects in this study. The source of these various task and stimulus type effects, as well as their coming and going across studies, is as yet unknown. An interestingly different approach is illustrated in a study where participants were asked to indicate whether written words were positive or negative by deliberately activating the corrugator or zygomaticus, and do so as rapidly as possible [74]. Relative to participants instructed to use the corrugator for negative valence and the zygomaticus for positive valence, participants that were instructed to (incongruently) use the corrugator for positive valence and the zygomaticus for negative valence were some 100–150 ms slower in contracting their corrugator or zygomaticus, with the interference emerging at around 500 ms after word presentation. This rapid Stroop-like interference effect indicates that participants simply cannot stop words from automatically and rapidly activating valence-congruent facial muscles. Also, when the participants of a follow-up study were simply asked to contract one particular muscle as soon as a word appeared on the screen, regardless of valence (e.g., always the zygomaticus in the first half of the experiment, and always the corrugator in the second), the corrugator muscle was again contracted more slowly to positive than to negative words (no such congruency effect was obtained for the zygomaticus). In all, and consistent with the masked word results, this suggests that words can involuntarily trigger valencecongruent activation of the corrugator and, in one study, the zygomaticus. In related written word research [75], a valencedependent EMG response did not only emerge much earlier in the corrugator than in the zygomaticus, but this early response was also very difficult to suppress, again testifying to a degree of rapid automaticity. 3.1.2 Manipulations of Prosodic Emotional Contents

All studies discussed so far focused on the impact of word meaning. However, words can also be spoken with a particular emotional prosody. In an EMG study that explored the impact of this [76], participants listened to the single word “Sarah” spoken with an angry, neutral, or mildly positive prosody, while EMG over the corrugator and the orbicularis oculi was measured to obtain an index of negative and positive emotion, respectively. Compared to angry as well as neutral prosody, positive prosody led to reduced corrugator EMG. Positive prosody also reliably increased

Facial EMG and Affective Language Comprehension

701

orbicularis oculi EMG relative to neutral prosody, but unexpectedly, angry prosody did so too. Therefore, although clear effects of emotional prosody could be observed in corrugator EMG, the orbicularis oculi results are complex. We return to prosody when examining EMG research with phrases and sentences. In all, facial EMG research with single spoken or written words2 suggests that word-elicited emotional effects can emerge involuntarily and very rapidly, in both input modalities, and in ways that are consistent with the idea that we communicate our affective response to the world around us by means of such things as a frown, or a smile. Of course, although single-word paradigms provide conceptually interesting methodological options (e.g., masked presentation, a comparison to pictures of objects, or other forms of precise stimulus control), everyday language use goes well beyond presenting single words to other people. So what about the use of EMG to track emotional responses to bigger chunks of language? In the remainder of this review, we first examine comprehension research with phrases and sentences that are not part of a wider discourse, and then turn to research with larger discourse stimuli. 3.2 Phrase and Sentence Studies 3.2.1 Manipulations of Lexical Emotional Contents

2

In one of the first EMG studies focusing on the emotional aspects of sentence comprehension [81], participants were asked to read negative and neutral sentences word by word as their corrugator activity was being measured, and to make a semantic correctness judgment after each sentence. The corrugator EMG elicited by a range of negative and neutral sentence-final verbs was larger for very negatively rated verbs than for neutrally rated verbs, and very rapidly so (significant in the 300–600 ms after verb onset). The negatively valenced verbs involved in this comparison also had higher arousal ratings than the neutral verbs, but a separate comparison, low- and high-arousal words that were matched on valence did not differ in corrugator activity. This suggests that, at least in this study, the corrugator effects of subjective valence are not due to a subjective arousal confound. Pursuing a wider question about embodied language processing, several EMG studies followed up on earlier single-word research [67] by placing the same written emotion expression verbs (e.g., “frowning,” “smiling”) in a minimal sentential context, again without an additional secondary task. Focusing on the zygomaticus only, one of these studies [82] revealed higher zygomaticus

EMG-studies can also examine the emotional impact of single words by measuring EMG to subsequently presented other, non-language stimuli, for which the words serve as context. For example, there is evidence that the very rapid mimicry of emotional facial expressions measured through facial EMG, and emerging in the signal at around 300 ms after face presentation already, can be modulated by subliminally presented congruent or incongruent word primes [77]. Emotionally relevant words can also modulate the affective EMG responses to odors [78], as well as the EMG-measured startle blink response to loud sound bursts (using the orbicularis oculi muscle, see [35, 79, 80]).

702

Jos J. A. van Berkum et al.

activity to sentences like “I am smiling” than to sentences like “I am frowning,” already significant around 400–600 ms after (whole) phrase presentation onset. This finding resembles the original single-word result. A similar but delayed and somewhat weaker pattern emerged with proficient L2 speakers [83], comparable to the attenuated EMG effects for L2 speakers when reading single words [72]. In related research [84], corrugator and zygomaticus EMG was recorded to examine the impact of minimal emotional expression or emotional state sentences (e.g., “Mario smiles,” “Mario enjoys”), in a paradigm in which participants also had to provide a sentence likeability rating. Corrugator activity was significantly higher to descriptions of negative states and actions than to descriptions of positive states and actions already from 300 to 600 ms onward, and zygomaticus activity was significantly higher to descriptions of positive states and actions than to descriptions of negative states and actions from 900 to 1200 ms onward. Finally, in a study that compared first- and second-person perspectives [85], verb valence affected the corrugator and zygomaticus in the expected direction in second-person phrases (“your fear,” “your joy”), but not in first-person phrases (“my fear,” “my joy”), a puzzling result that the authors related to the social embedding of language, and that requires further examination. Sentences also allow for interesting other manipulations, such as negation. Two studies [82, 83] explored the impact of negation in L1 and L2, respectively. In L1, zygomaticus activity was higher to “I am smiling” than to “I am not smiling,” from about 200–400 ms after sentence presentation onset, revealing surprisingly rapid effects of the negation. Furthermore, this negation effect disappeared in L2, which echoes the native language advantage in facial EMG sensitivity reported before [72] for single words. Without going into the details here, the full pattern of results obtained in these negation studies [82, 83] does not indicate that negation simply inhibits word-elicited muscle activity—as also testified by the complex pattern of EMG results obtained in another study [85], with phrases like “no fear” or “no joy,” the impact of negation seems much more complex. We return to this later in this chapter. Another option afforded by sentences is to manipulate the identity of the agent involved. In a study on enjoyment over other people’s misfortune (Schadenfreude) that paired sentences like “got soaked by a taxi driving through a puddle” with pictures triggering various types of person stereotypes [86], negative event descriptions increased zygomaticus activation only for pictures suggesting an enviable, “cold and competent” person. Furthermore, in a study involving names of well-known ingroup or outgroup politicians (e.g., Italian versions of “Biden smiles” or “Trump smiles” [87]) the corrugator and zygomaticus strongly responded to positive and negative expression or state descriptions involving ingroup

Facial EMG and Affective Language Comprehension

703

politicians, but less so, and sometimes not at all, to similar expressions involving outgroup politicians. In a related study ([88]: study 3), comparable results were obtained for the corrugator, with somewhat more ambiguous findings for the zygomaticus and the orbicularis oculi. Finally, in a study where participants read sentences like “Mark is furious when ” or “Mark is happy when ” without an additional task [89], corrugator activity increased at “furious” compared to “happy” when a previous picture (and accompanying verbal label) had characterized Mark as a good person, but not when it had characterized him as a bad person. In line with the negation and perspective studies discussed before, all this suggests that facial EMG is sensitive to much more than just local word meaning, and that combinatorial meaning and affiliation-based evaluation matter as well. 3.2.2 Manipulations of Prosodic Emotional Contents

Three spoken-sentence studies have used corrugator and zygomaticus EMG to explore the impact of affective prosody. In one study [90], participants were asked to rate the “smiliness” of sentences whose prosody had been digitally modified to convey the speaker was smiling, neutral, or “unsmiling,” while their EMG was being measured. From about 800 ms after speech onset, corrugator activity was significantly higher to “unsmiling” prosody than to smiling prosody, and from about 1100 ms, zygomaticus activity was higher to smiling prosody than to “unsmiling” prosody. In a study on emotional singing [91], corrugator and zygomaticus EMG was recorded as musically trained participants listened to particular sentences (e.g., “Grass is green in summertime”) sung with a happy, sad, or neutral intention. A composite EMG measure revealed emotion-congruent responses in facial EMG as sung sentences were heard. In the third study ([92]: study 1), participants saw a fearful or happy face while hearing a neutral prepositional phrase spoken with a fearful or happy prosody. Relative to adding a happy voice, adding a fearful voice to a face increased corrugator activity when the face was a fearful one, but not when it was a happy one. Also, relative to adding a fearful voice, adding a happy voice to a face increased zygomaticus activity when the face was a happy one, but not when it was a fearful one. This suggests that when the additional emotional prosody channel is consistent with the facial expression shown, the corrugator and zygomaticus can faithfully track the added affective load.

3.2.3 Manipulations of Processing Complexity

In addition to exploring the impact of emotional contents, facial EMG can also be used to study the potentially affective impact of processing complexity, emerging when sentences stop making sense or express very implausible meanings. In one of the studies already reviewed [81], semantic anomalies of the type typically explored in N400-research did not increase corrugator activity. However, in a study on the corrugator-indexed affective

704

Jos J. A. van Berkum et al.

consequences of inconsistency [93], such anomalies did increase corrugator activity, in the 1000–2000 ms after critical word onset. Furthermore, in research on the corrugator, zygomaticus and frontalis effects of surprise [94], surprising trivia statements increased corrugator activity in a 0–6000 ms latency range, relative to unsurprising controls. Also, a fluency-oriented study of corrugator, zygomaticus, and frontalis activity to semantically coherent or incoherent word triplets [95] showed that, compared to coherent triplets like salt—deep—foam (all sea-related), viewing incoherent triplets like dream—ball—book led to significantly higher corrugator and frontalis activity, as well as lower zygomaticus activity, in the 1500–3000 ms after critical word onset. These various corrugator effects perhaps all reflect non-emotional effort ([49]; see also [42]). However, because processing dysfluency signals a problem, these effects might also reflect negative emotional valence. In all, and in line with single-word research, EMG studies with spoken or written sentences reveal consistent and very rapid valence-dependent facial EMG responses, with the earliest reported significant effects emerging already around half a second after the relevant contents has been presented. Also, facial EMG effects induced by words such as “smile” or “frown” can be modulated by the specific sentential context, as in the case of negation, perspective, and character properties. Although such modulation has sometimes led to complex data patterns, particularly in the case of negation, it does reveal that there is more to language-driven facial EMG than just embodied local word meaning. We will see similar things as we examine, in the next section, the extant facial EMG research with multi-sentence written and spoken discourse. 3.3 Discourse Studies

In addition to the pioneering Cacioppo and Petty study [54] on pro- and counter-attitudinal texts (see Subheading 2.2), our literature search generated three clusters of EMG studies involving multi-sentence discourse: studies involving moral and other social transgressions, media-psychological questions, or various other topics.

3.3.1 Moral and Other Social Norm Manipulations

Morality is a rich source of human emotion, and several research teams have examined facial EMG responses to morally loaded narratives. In a pioneering study [96] on the corrugator effects of social unexpectedness, participants read stories that first induced a strong positive or negative trait inference for some character, and then continued with a word-by-word description of trait-consistent or -inconsistent positive or negative character behavior. Relative to descriptions of good behavior, descriptions of bad behavior already led to significantly higher corrugator activity in the 100–300 ms after the critical word, with an additional corrugator activity increase if the bad behavior was trait-inconsistent. The zygomaticus data were deemed suspect, and therefore not analyzed.

Facial EMG and Affective Language Comprehension

705

In a study on the impact of social norm transgressions in stories [97], EMG activity was recorded in the corrugator and the levator labii (with zygomaticus recording to control for smiling artifacts only). Participants read texts that did or did not involve a social norm violation (e.g., refusing to shake an extended hand, using somebody else’s toothbrush). Relative to neutral ones, stories with social norm transgressions elicited stronger activity in the levator labii, but not in the corrugator. The absence of a corrugator effect might be due to the fact that, in contrast to the preceding study [96] as well as the studies discussed below, this study imposed multiple secondary tasks (subsequent imagery, and three ratings), a possibly effortful situation that could have driven the corrugator to its upper activity range throughout the study. Also different from other transgression studies, EMG was averaged over 60 s of text reading, much of which did not involve critically different descriptions of people and events—this can lead to phasic effects being “washed out.” In the above research, participants knew they would be required to recall or rate the stories after reading. However, in two reading studies that did not impose any other task ([98, 99], see Fig. 3a), narrative descriptions of bad character behavior (e.g., “Mark accelerates through the puddle on purpose to create a big splash and soak the pedestrian”) consistently increased corrugator activity within a second after presentation, relative to descriptions of good character behavior (e.g., “Mark slows down to avoid the puddle, making sure he doesn’t soak the pedestrian”). Furthermore, when bad or good events subsequently befell the same character, corrugator activity rapidly increased at negative state descriptions (e.g., Mark is frustrated because...“ followed by a reason) compared to positive state descriptions (e.g., Mark is happy because...”) in cases where the character had displayed good behavior before, but not when he or she had displayed morally bad behavior. This apparent insensitivity to the state of bad characters was observed in both studies, and also replicated in a third, sentential study ([89], see Subheading 3.2). In line with the results of [96] as well as the other sentence work discussed before [86, 87] ([88]: study 3), this character-dependent corrugator effect reveals that verbal expressions with “frustrated” or “happy” do not merely drive corrugator activity as a function of word-induced embodied language processing—an issue to which we return in Subheading 4. Two final transgression studies (reported in [66]) compared facial EMG activity as participants read descriptions of immoral versus moral actions embedded in wider stories. Without an additional task that focused attention on emotion ([66]: study 3), the corrugator and zygomaticus did not respond to moral transgressions. However, when readers also had to rate the degree to which the story “had moved them” after every trial ([66]: study 2), descriptions of morally objectionable actions did lead to higher

706

Jos J. A. van Berkum et al.

activity in the corrugator and levator labii, as well as, unexpectedly, in the zygomaticus. The latter effect may be related to other observations that strong negative content can sometimes increase activity of this muscle (e.g., [17]). The two studies considered together illustrate the potential impact of a secondary task, with the same materials. Note, though, that as shown by two other studies already discussed ([98, 99], see Fig. 3a), facial EMG can be sensitive to descriptions of moral transgressions without a secondary task, in a replicable way. Future research will need to determine why these various passive reading studies led to different results. 3.3.2 Media Research Manipulations

Most of the moral transgression studies were conducted with psycholinguistic questions in mind, as such often focusing on the details of word-by-word processing. However, in the field of media psychology, various researchers have used facial EMG to assess the emotional impact of different types of (verbal) messages, or of the same messages conveyed through different media. For example, several health communication studies have reported complex valence-dependent corrugator effects that vary as a function of particular types of patient narratives [102, 103], or as a function of different styles of veterinarian communication [104]. Other media-psychological EMG studies have explored the processing of valenced news texts presented in different formats [105–109], sometimes reporting increased corrugator activity to negative as opposed to positive news [107, 108], as well as increased zygomaticus and/or orbicularis oculi activity to positive as opposed to negative news [105, 107]. Interestingly, in one study with corrugator and zygomaticus recording [106], higher zygomaticus activity was elicited by positive news in comparison to negative news when the news involved companies with a good reputation, but when news concerned companies with a bad reputation, negative news elicited higher zygomaticus activity. This pattern of results is reminiscent of that of various other studies discussed before [86, 87, 98, 99] ([88, 89]: study 3), and it reveals that it is not (just) the text as such that elicits particular facial EMG effects. In the same study [106], negative comments posted on-line to the abovementioned news messages elicited a corrugator activity increase, but no zygomaticus activity change, as compared to positive on-line comments. Finally, in a media study on spoken radio advertisements [110], negative, neutral, and positive words embedded in these advertisements elicited corrugator differences that faithfully tracked word valence and emerged within 500 ms after spoken word onset. Effects also emerged in the zygomaticus, but they did not faithfully track the valence dimension, with positive words eliciting the highest, neutral words eliciting the lowest, and negative words leading to an intermediate level of zygomaticus activity (cf. [17]; see also [66]: study 2, discourse experiment).

Facial EMG and Affective Language Comprehension

707

Fig. 3 Example EMG results elicited with discourse materials. Panel (a): corrugator EMG as people read morally loaded stories in which the main character first did something moral or immoral and subsequently experienced something good or bad; the signals reveal large corrugator increases to immoral behavior, as well as an apparent insensitivity to the subsequent emotional state of immoral characters. Panel (b): corrugator EMG to sentences that did or did not end with a :P emoticon. Panel (c): corrugator and zygomaticus EMG to the punchline of spoken jokes, compared to non-joke controls. (Panel (a) is a different rendering of the data reported in Refs. [98] (upper part) and [99] (lower part). Panels (b, c) reprinted with permission from Refs. [100, 101], respectively)

708

Jos J. A. van Berkum et al.

3.3.3 Other Manipulations in Discourse

3

A recent study [100] explored the combined processing of irony and emoticons by asking participants to read texts with a final sentence that praised or criticized somebody in a literal or ironic way (the latter derivable from context), and that was or was not accompanied by a tongue-face emoticon (:P). The irony-induced EMG effects were complex and require conceptual replication as well as somewhat more detailed interpretation than provided in the paper. However, the simple presence of an emoticon generally reduced corrugator activity and increased zygomaticus activity (see Fig. 3b for the corrugator), possibly as a sign of higher enjoyment in emoticon-containing communication, or, alternatively, some form of affective resonance with the communicator stance being signaled. Two studies used facial EMG to explore the emotional impact of jokes. In the first study [101], participants heard jokes which they subsequently rated on funniness. Relative to the final words of non-joke control stories mixed with the jokes in the same session, joke-final “punch line” words very rapidly increased zygomaticus activity, and reduced corrugator activity, with clear effects emerging well within a second of critical spoken word onset (see Fig. 3c).3 In a second study [112], corrugator, zygomaticus, orbicularis oculi, and frontalis EMG was measured as people heard and rated jokes told by speakers of the same or different political affiliation. Joke-elicited EMG was not sensitive to speaker status during the joke, but within 4–5 s after the joke had ended, the zygomaticus and orbicularis oculi were more activated if the joke had been told by an ingroup speaker than by an outgroup speaker, and activity in these smilingrelated muscles correlated strongly with the funniness ratings. One study used corrugator EMG to explore the affective consequences of speech overlap in conversation ([113]: Chapter 5). Participants overheard synthetic spoken dialogues between a man and a woman that sounded like muffled conversation coming through a room wall. Specific words could not be identified, but people could hear who was speaking as well as his or her specific intonation and rhythm. After each mini-dialogue, participants were asked to rate the degree of affiliation heard in the conversation. Speakers whose response overlapped with the previous speaker were not only rated as less affiliative than speakers who “waited their turn,” but overlapping speech also reliably increased corrugator activity. Whether this corrugator effect reflects increased listening effort for more complex stimuli, negative emotion to less fluent processing, or negative emotion to overheard interruptions remains to be explored. Moreover, whereas speakers whose speech rhythm

Strikingly, follow-up research [111] in which the same materials were played to two vegetative state (coma-like) patients revealed qualitatively comparable, if substantially weaker, zygomaticus and corrugator responses in one of the patients.

Facial EMG and Affective Language Comprehension

709

was “out of sync” also elicited lower affiliation ratings, this did not increase corrugator EMG. Three remaining studies explore other aspects of language comprehension. In a study on the emotional impact of spoken poetry [114], those fragments of poetry that evoked chills or goosebumps also strongly and consistently increased corrugator (but not zygomaticus) activity, possibly reflecting the “bittersweet” mixed-emotion state of being moved. In a study on the aesthetic effects of text layout [115], poorer typographic designs led to elevated corrugator activity, but not to effects in zygomaticus or orbicularis oculi activity. Finally, in a study where the corrugator, zygomaticus, and depressor anguli oris were recorded to explore the facial correlates of confusion during story reading [116], corrugator and depressor anguli oris EMG contained information that better predicted confusion than self-reports did, testifying to the fact that EMG can pick up on responses that are not consciously accessible. 3.4 Some General Observations

Although the number of language- or communication-relevant EMG experiments found, 55 in all, is somewhat lower than we had expected, it is clear that since the pioneering Cacioppo & Petty study published in 1979 [54], language-elicited emotion has been tracked by means of facial EMG to address a wide variety of research questions, in psycholinguistics, media psychology, and other fields. Stepping back from the individual studies, what general observations can we make? We discuss a few trends in the findings as well as in the conceptual and methodological approach. Few replications and many different types of analysis We must begin with an important methodological observation. Most of these studies can be seen as conceptual replications of each other in the sense that they all demonstrate the utility of facial EMG to assess language-driven emotion. However, there have as yet been virtually no attempts to systematically replicate specific EMG findings in the reviewed domain of inquiry. Also, the studies reviewed differ greatly in how the data were analyzed, in terms of baselining (width of interval, subtraction, or % change), in statistical testing (in a single latency range or a series, boundaries chosen, t-tests, simple condition by time ANOVA’s, or growth curve analysis), in whether tests compared post-stimulus levels to the pre-stimulus baseline or to a post-stimulus control condition, and in whether raw, difference, or composite scores were entered into statistical testing. None of this is unrepresentative of current psychophysiological research, where there is often no single best way to analyze the data. However, published results obtained under conditions of many experimenter degrees of freedom probably contain a

710

Jos J. A. van Berkum et al.

non-trivial proportion of false positives (see, e.g., [117] for the case of EEG). Therefore, as in other domains, future EMG work on language-induced emotion should greatly benefit from preregistration and systematic replication. A valence approach to emotion In virtually all of the EMG studies reviewed, emotion is framed in terms of valence, with researchers characterizing their stimuli and/or the emotional responses to those stimuli as positive or negative. In line with this framing, most studies recorded activity in the corrugator supercilii and/or the zygomaticus major, the two facial muscles broadly perceived to be the most useful ones for tracking emotional valence (see Table 1). Some valence-framed studies focused on just one of the two muscles (e.g., [82, 83, 98, 99]), and other studies sometimes also recorded from additional muscles deemed sensitive to positive or negative valence, such as the orbicularis oculi (involved in smiling), or the levator labii superioris (involved in disgust). Very few studies classified their stimuli as potentially evoking specific emotions and recorded from specific facial muscles to assess those emotions (e.g., the zygomaticus for Schadenfreude, or the levator labii superioris for disgust, see [86]). To some extent, this may result from a dimension-oriented theoretical perspective on emotions. However, we suspect that the predominance of corrugator and zygomaticus recording mainly reflects the impact of influential early EMG research (such as [54] or [17]), and the fact that there is simply less knowledge, and more debate, on how to tap into specific emotions by means of EMG (cf. [38, 118]). Rapid valence effects Of the studies that reported onset-relevant statistics, several clearly indicate that the effects of valence can emerge in the EMG record within only a few hundred milliseconds after critical language input. This holds for single words (e.g., [67]), for words embedded in a sentence (e.g., [81, 84]), and for words embedded in a discourse (e.g., [96]). Several studies that provided time-course information without reporting onset statistics numerically confirm that valence-dependent EMG effects can emerge rapidly in the signal (e.g., see Figs. 2b, c and 3a–c). These observations are consistent with what we know about the speed of language processing and emotional responding, and, perhaps more interestingly, show that such rapid effects do not just emerge for depictions and sounds of concrete objects, situations, and events out there in the world, but also for arbitrary signs used to communicate about those things. It is too early to say if language- or other symbol-elicited EMG effects lag behind those of non-symbolic stimulus types (see [66] control experiments of studies 2 and 3, for first observations).

Facial EMG and Affective Language Comprehension

711

Table 2 Summary table indicating how often, in all 55 studies examined, the corrugator supercilii (CS) and/or zygomaticus major (ZM) was or was not sensitive to a language-driven emotion manipulation (upper part), and how often, in the 37 studies that simultaneously examined both muscles, both were sensitive to a language-driven emotion manipulation, or just one of the muscles was (lower part) CS

ZM

Muscle is sensitive to language-driven emotion manipulation

45 (82%)

26 (47%)

Muscle is not sensitive to language-driven emotion manipulation

5 (9%)

14 (25%)

Undecidable

1 (2%)

1 (2%)

Muscle is not measured or not evaluated

4 (7%)

14 (25%)

Total for this muscle

55 (100%)

55 (100%)

Both muscles are sensitive to language-driven emotion manipulation

22 (59%)

22 (59%)

Only this muscle is sensitive to language-driven emotion manipulation

12 (32%)

3 (8%)

Total for this muscle

34 (92%)

25 (68%)

All studies (55)

Only studies measuring both CS and ZM (37)

Absolute counts are studies. Percentages are rounded, so may not exactly add up

Corrugator-dominated effects Our review also indicates that although the corrugator and the zygomaticus are the muscles most often used to assess the impact of language-driven emotion manipulations (see Table 1), the corrugator is, on average, more sensitive, at least with the particular verbal materials and tasks used here (see Table 2): of the 37 studies recording from both muscles, 92% reported emotion-related effects in the corrugator, while only 68% reported such effects in the zygomaticus (with also some unexpected effect direction reversals in the latter). The reasons for this differential sensitivity have been discussed in detail elsewhere (e.g., [17, 42]). For one, different muscle types are involved: whereas the zygomaticus is a phasic muscle that is often inactive and that can in those cases only increase its activity, the corrugator is a tonic muscle that is always “on,” and that can as such easily increase as well as decrease its activity (see, e.g., Fig. 3c and [35, 66], for clear evidence). Also, whereas the evidence suggests that the corrugator tracks valence in a relatively monotonic way, with increasing activity as one moves from strongly positive via neutral stimuli to strongly negative stimuli, zygomaticus activity can be increased by both positive and very negative stimuli, relative to neutral stimuli ([66]: study 2 discourse experiment [17, 110]), presumably because very negative stimuli can lead to, for example, wry smiles, grins of embarrassment, or disgust-induced raising of

712

Jos J. A. van Berkum et al.

the cheeks.4 This non-monotonic (and non-linear [65]) behavior of the zygomaticus need not be problematic when contrasting positive and neutral stimuli only, but it may well get in the way when contrasting either of those to negative stimuli. Task effects as well as automaticity There is some evidence that secondary tasks, imposed on participants above and beyond the primary task of reading or listening to language, can have an impact on language-driven emotional EMG responses. This is not surprising, as such tasks can easily change the focus of participants. The more important question is whether the observation of language-driven emotional EMG responses critically depends on the presence of a secondary task. As we have seen in this review, the answer is no. For example, consistent and replicable EMG effects have been observed in studies where participants had no other task than to read ([89, 98, 99]), and consistent EMG effects have also been obtained in situations where participants were supposed to deliberately use another muscle [74] or were not aware of the stimulus at all [68, 69]. Such observations echo EMG results with other types of stimuli (e.g., [25, 26]), and testify to a very important property of facial EMG: the electromyographic recording of facial muscle activity does not only pick up on consciously controlled facial expressions, but also on involuntarily generated ones. Reading out involuntary emotion Although this is usually not made explicit, the studies reviewed share an important assumption, which is that, at least in the research context at hand, facial EMG primarily indexes inner emotional states that are automatically generated. For example, we found no studies that used facial EMG to study the deliberate generation of a facial display as a social tool used in, for example, conversation. As such, the current languagerelevant facial EMG research is closer to a basic-emotions perspective on emotion (e.g., [120]) than, say, to a social constructionist or “behavioral ecology” perspective on emotion (e.g., [121]). We offer this observation as a fact rather than a verdict. In the minimal social context of a laboratory study where stimuli are presented to solitary individuals for reading or listening only, with a experimenter that is trying to be as absent as possible, it is in fact not unlikely that the facial responses picked up on via EMG are indexing relatively involuntary inner emotional states. And since much of human behavior is generated involuntarily, rather than under conscious control, such results can be highly relevant to understanding affective language use. At the same time, and

4

Related, smiles may serve a number of different functions, such as displaying enjoyment, affiliative intentions, or dominance (see [119], and references therein).

Facial EMG and Affective Language Comprehension

713

echoing a more general “passive comprehension” bias in psycholinguistics, language- and communication-oriented facial EMG research has yet to begin to explore emotion in socially richer arenas of language use, where both involuntary facial expressions and deliberate, strategic facial displays play an important role. Tracking immediate emotion only As might be expected from studies with a time-sensitive biosignal measure, all of the research reviewed focused on relatively immediate emotional responses, usually unfolding within half a second to a few seconds after the relevant language materials. Understanding real-time responding is important, for theoretical as well as practical reasons, and such responses may also be correlated with later, more stable emotional evaluations (e.g., as part of attitudes). However, there is no principled reason why facial EMG recording cannot also be used to actually probe those more stable emotional results (e.g., via the delayed presentation of single “probe” words, or phrases, that relate to an attitude object). As with real-time data, the advantage would be that our knowledge of emotional impact is not limited to what participants can report in an attitude or valence rating survey, but is assessed in a broader fashion. Complexity in what drives emotion The EMG evidence reviewed suggests that language can very rapidly elicit emotion during comprehension, in at least partly automatic ways. However, the triggers for such emotion are complex. For convenience, we have organized our review in terms of familiar chunks of language structure: a single isolated word, a single isolated phrase or sentence, and a multi-sentence discourse. But merely concluding that all three can elicit emotion is rather superficial. After all, we know from psycholinguistics and pragmatics that language comprehension is a very complex process involving the (near-)simultaneous integration of many different features, at various levels of analysis. Also, emotional state or emotional expression descriptions (e.g., “happiness,” “frowning”) probably elicit emotion in a different way from how other words (e.g., “weekend,” “dying,” or the name of a loved one) elicit emotion. Although the various facial EMG studies discussed all have their own local theoretical context, what is currently missing is a coherent theoretical framework that makes explicit the many fundamentally different ways in which language can give rise to emotional facial expressions during comprehension. In the next section, we provide such a theory.

714

4

Jos J. A. van Berkum et al.

The fALC Model The model we present is a version of the Affective Language Comprehension or ALC model [12, 13], slightly modified and extended to better conceptualize the potential sources of facial EMG effects (hence the “f”). The fALC model, schematically depicted in Fig. 4, combines broadly accepted psycholinguistic and pragmatic ideas on what is needed to comprehend language (e.g., [122, 123]) with knowledge of how emotion and its facial expression can be triggered, automatically or deliberately, without or with conscious awareness. In the model, we draw various distinctions that we consider to be critical to understanding language-driven facial EMG effects. We first discuss these distinctions (Subheadings 4.1, 4.2, and 4.3), and then briefly conceptualize some of the empirical EMG findings in terms of the model (Subheading 4.4).

Fig. 4 The fALC model, a theoretical framework for understanding and predicting facial EMG effects during affective language comprehension. The various cognitive processes drawn in the left part of the figure can influence each other, and this may also hold for various evaluations drawn on the right of them (arrows omitted for simplicity). Potential facial feedback loops are not explicitly marked. Further thoughts can be triggered rapidly by representations generated by each of the other cognitive components of language comprehension. S impact of emotion simulation, M impact of emotional mimicry, E impact of emotion-based evaluation, O impact of other factors. See text for further explanation

Facial EMG and Affective Language Comprehension

4.1 Language Comprehension at Multiple Levels

715

In everyday language use, understanding an utterance like “Mark is angry” requires processing at several fundamentally different levels, described briefly below (see [12, 13], for details), and illustrated in the left half of Fig. 4. 1. Recognizing and parsing the signs: To understand “Mark is angry,” you will need to retrieve relevant information stored in long-term memory for the words “Mark,” “is,” and “angry,” and combine that information with other relevant cues (e.g., linguistic prosody, a declarative, interrogative, or imperative sentence form) in accordance with the grammar of English. The result, sometimes referred to as “timeless sentence meaning,” specifies that, in some to be determined context, time, and place, an agent exists whose name is Mark and whose current state includes anger. What this means in the current context is to be determined at another level. 2. Modeling the situation referred to by the communicator: For successful comprehension, you also need to work out which specific (real or fictional) “Mark” the communicator is talking about, and somehow model this particular person as being in a state of anger now, as part of a larger model of the situation under discussion (situation model; [124]). In most instances of everyday language use, a lot will already be in place in that model. For example, if “Mark is angry” is not the first utterance in a text or conversation, you may already have modeled where Mark is, who else is around and what you know about them, what has happened before, and what social norms are in place. Situation models constructed during communication are extremely rich, and individual utterances usually only add or modify bits of the model. 3. Modeling the communicator, which involves at least two important aspects: 3a. Inferring the communicator’s stance: To understand “Mark is angry” in everyday language use, it is important to model details about the communicator too. For example, when somebody utters “Mark is angry” in a conversation, it is not just important to work out what he or she is referring to, but also what his or her stance (feeling, attitude, orientation) on the matter is. Here, facial, vocal, and other bodily cues matter a lot. For example, the affective prosody with which “Mark is angry” is produced can easily betray the speaker’s stance (although not necessarily what the stance is about). Inferring stance can also be crucial when comprehending written language (e.g., a WhatsApp message, an internet blog, an advertisement, or a political statement), where, in the absence of facial, vocal, and other bodily cues, other cues can carry the relevant

716

Jos J. A. van Berkum et al.

information (e.g., reply speed, capital letters, exclamation marks, emoji). 3b. Inferring the communicator’s social intention: Knowing which Mark is angry and how the speaker feels about this is usually not enough, though, as there is often a deeper question to be answered [123]: what is it that the communicator wants you to do, know or feel, by producing this utterance in this particular way? Is the intention to just let you know Mark is angry, share his or her feelings over this, or persuade you to do something? For example, does the speaker want you to share in amazement over this fact and as such pull you closer, or does he or she want to tell you that you messed up? Answering these questions is not just important when conducting a spoken or texted conversation, but also in such cases like overhearing a conversation or listening to a lecture, as well as, for example, when reading an advertisement, political statement, or blog. 4. Further thinking: Finally, there are countless things that the communicator did not intend to convey but that you will nevertheless rapidly infer from the signs, the situation model, and/or the communicator’s stance and social intention, consciously or unconsciously. For example, the use of overly complex words (“Mark is antagonized”) can signal professional deformation or a disregard for the addressee, the situation referred to can reveal a misanalysis on the part of the communicator, and, say, a contemptuous tone of voice and social intention to gossip can betray other things, such as that you are talking to an arrogant person, or to one that is compulsively trying to speak badly about people. Although perhaps not strictly part of language comprehension, these further thoughts can emerge rapidly, as you are reading or listening to the utterance at hand. 4.2 Emotion-Based Evaluation

The second critical part of the fALC model is the assumption that all of these various representations retrieved or constructed during language comprehension can trigger emotion, as internal stimuli appraised as relevant to the concerns of the comprehender. For example, at the sign level, you may have an unconscious positive or negative response to the name “Mark,” not because what it refers to now, but simply because the name itself has a negative connotation for you, for whatever other conscious or unconscious reason. At the situation model level, learning that this particular Mark is angry can trigger all sorts of emotional responses, depending, in part, on your relationship to him (partner, child, friend, competitor, etc.). At the stance level, hearing compassion, contempt, anger, or glee in a speaker’s voice may elicit all sorts of emotion, either with or without you being aware of it. At the social

Facial EMG and Affective Language Comprehension

717

intention level, inferring that the speaker is saying things about Mark to get closer to you may flatter or worry you. And as for further thinking, realizing that the speaker is kind, or not such a good friend after all, can also elicit rapid emotion. The various different strong or subtle emotional responses elicited as you are evaluating the utterance at all these levels can all express themselves in such things as corrugator or zygomaticus activity, and presumably do so simultaneously. In addition, and in line with research on fluency (e.g., [125]), the model captures the assumption that the ease or difficulty of processing can elicit evaluative emotion during language comprehension, conceptually—but not necessarily empirically—independent of what that processing is about. Think about negative emotion to words that are hard to pronounce, to sentences that are hard to parse, or to texts that are poorly composed or hard to read or hear (e.g., because of poor typography or too much background noise). 4.3 Emotion as Simulation

In the above routes from language to emotion, the comprehender’s emotion systems are used for their original purpose, evaluation, in ways that are comparable to how directly perceived objects, situations and events, as well as the fluency of one’s own actions, can elicit emotion. However, language is special because it can also refer to emotional states and expressions and can recruit the comprehender’s emotion systems as part of doing so. Embodied language processing studies have indicated that reading or hearing a word can lead to a simulation of concrete experiences with what the word denotes or refers to, that is, the neural (and sometimes also bodily) re-instantiation of relevant perceptual, motor, and other experience-induced processes and states associated with what the concept or phrasal combination of concepts is about [124, 126– 130]. For example, reading action words like “kick” or “pick” leads to activation of the motor cortex involved in actually realizing the described movements [131, 132], and reading phrases such as “he saw an eagle in the sky” leads to a perceptual simulation of the described situation [133, 134]. In line with these observations in other domains, the corrugator and zygomaticus EMG effects induced by words such as “angry” or “I am smiling” are usually interpreted in terms of emotion simulation, rather than as reflecting emotion-based evaluation (see, e.g., [67, 82, 87]). According to the fALC model, emotion simulation—the use of one’s own emotion systems to construct mental representations of emotion or components thereof—can in principle occur in at least three different components of language comprehension. First, when reading “Mark is angry,” people may simulate anger as part of retrieving stored information about the word “angry” from the mental lexicon—to the extent that such retrieval-based simulation recruits facial muscle motor control systems (along similar lines as

718

Jos J. A. van Berkum et al.

in, e.g., the abovementioned Pu¨lvermu¨ller et al. [131], the resulting neural activity could lead to a bit of frowning. Second, comprehenders may simulate anger as part of imagining, that is, constructing a vivid representation of an angry Mark in their model of the situation being talked about—to the extent that such situation model simulation recruits facial muscle motor control systems, this could again lead to a bit of frowning. Third, in cases where a communicative act contains partial or indirect cues to strong communicator stance, emotion simulation may also occur as part of modeling the latter. For example, when a text message from a familiar person suggests strong emotion via its “tone” (i.e., via choice of words or typography), readers may imagine the plausibly associated facial or prosodic expressions as part of their model of the communicator’s current stance. Also, when hearing a familiar person smile over the phone, listeners may use their emotion systems to model other, for example, visual, aspects of the communicator’s stance. Whether any of this actually occurs in any particular situation is an empirical issue, of course (see [67], and other embodied language processing research for relevant evidence). Critically, however, and illustrated by the fact that people can enjoy somebody else’s negative emotion (as in Schadenfreude), simulating emotion as part of the retrieval of a lexical concept from long-term memory and/or as part of modeling the affective state of protagonists or communicators is conceptually very different from using one’s emotion system to evaluate things. The fALC model captures this important distinction by postulating two different routes (S and E in Fig. 4) to facial motor systems activation, and, ultimately, EMG-recorded facial muscle activity. 4.4 Emotional Mimicry and Other Factors

Apart from making explicit multiple levels of language comprehension and the difference between evaluation and simulation, the fALC model makes two additional assumptions. One is that when comprehenders vividly imagine what, for example, an angry Mark looks like, or what the communicator’s emotional expression might be when he or she is not directly seen, it is theoretically possible that they are also involuntarily “contaminated” by the resulting mental image, in the same way in which actually seeing a person with a strong facial expression can cause observers to automatically mimic that expression. The idea of mimicking facial expressions that we have ourselves imagined may sound far-fetched, but may become somewhat more intuitive when we consider the impact of, say, “Trump smiled broadly,” “Leonardo DiCaprio looked really concerned,” or similar sentences involving friends or loved ones. Importantly, the idea follows naturally from two well-established facts. First, there is considerable evidence that people partly involuntarily mimic visible emotional expressions of other people (see [135, 136], for review; see also [42]), and such visual mimicry must involve representations in the visual perception system (e.g., areas V1 and V2). The second fact is that the same visual perception

Facial EMG and Affective Language Comprehension

719

system is thought to be recruited when people imagine, rather than perceive, a visual scene (e.g., [137, 138]). These two facts together open the door to an interesting hypothesis, which is that people may involuntarily mimic facial expressions that they themselves have constructed as part of making sense of language (see [87] for a mimicry-based interpretation of language-driven facial EMG effects). This self-cued facial mimicry hypothesis is not easy to test, and perhaps impossible to experimentally separate from the language-driven emotion simulation that is needed to bootstrap the process. Nevertheless, in the absence of evidence against the idea, we think it is important to include mimicry as a possible source of language-mediated facial EMG effects in our model (M in Fig. 4).5 The second additional assumption is that many non-emotional other factors can potentially act as systematic confounds or sources of noise, such as the increased muscle tension induced by physical effort [48], facial muscle activity in the lower half of the face during covert articulation [46, 47], the selective relaxation or activation of specific facial muscles to improve auditory or visual perception, or the effort-induced steady increase in corrugator activity during an experimental session (see [42] for review). The boundary between emotional and non-emotional EMG effects will not always be clear, as in the case of increased corrugator activity associated with an error [139] —is that part of an emotional response, or is it a conceptually separate index of orienting and effort? However, many clearly non-emotional types of facial movement exist, and marking this in the model (O in Fig. 4) is a reminder of potential confounds and sources of noise. 4.5

5

Using the Model

The core idea of the fALC model is that language-induced facial EMG responses can arise for a wide variety of interestingly different reasons. As such, the model can provide a new perspective on the studies reviewed. For example, it points to the possibility that facial EMG effects induced by negation or person (cf. “I am not smiling,” “you are smiling”) need not just involve the contextual inhibition or tuning of lexical-conceptual emotion simulation (as supposed in [82, 85]), but can also involve simulation as part of situation model construction, and, in sufficiently contextualized cases, perhaps even a subtle evaluative response to the situation modeled. Situation model evaluation is clearly also a relevant possibility to consider

Although regular perception-driven facial mimicry is itself also often described as a form of simulation (e.g., [119]), such perceptual simulation is conceptually different from simulation involved in language-driven imagery—the notion explored here is that the latter may give rise to the former. Also, although one can argue that emotional facial mimicry is a low-level form of emotional evaluation (the “emotional contagion” or “affective resonance” part of empathy; [140]), there is some debate as to whether such mimicry is truly mediated by “deep” emotional states or on a par with other, non-emotional behavioral mimicry (such as unconsciously adopting somebody else’s sitting posture or walking rhythm). In the fALC model, therefore, we currently prefer to separate mimicry from evaluation.

720

Jos J. A. van Berkum et al.

for facial EMG effects involving whether an in- or outgroup member “got soaked by a taxi” [86], whether an in- or outgroup member “... is smiling” ([88]: study 3 [87]), whether a good or bad Mark “... is angry” ([89, 98, 99]), or whether negative news concerns a bad-reputation company or a good-reputation one [106]. Furthermore, to the extent that the verbally described situation is rich enough to allow people to vividly imagine the emotional expression of a protagonist (e.g., because it involves the facial expression of famous politicians [87]), we should consider the possibility that the EMG effects obtained might reflect facial mimicry as well. Over and above situation-level representations, the model reminds us of the fact that some of the facial EMG effects reported in the literature, such as the emoji findings reported in [100], as well as the affective prosody findings reported in [76, 90, 92] can to an unknown extent reflect emotion that is triggered as a function of the communicator’s inferred stance or social intention—which also raises further questions about their specific proximal cause (emerging via evaluation, simulation, and/or, perhaps, mimicry). The model makes explicit that the facial EMG effect reported to unexpected or incoherent materials [93, 94, 115, 116] might reflect an evaluative response to disfluent processing. And the model can, for example, help us to be more specific about what particular secondary tasks can bring about (e.g., a “sentence likeability” rating task probably leads people to rate particular situations, not sentences; cf. [84]). The fALC model can help think about past results but also about future experiments. Consider the results of the two experiments with morally loaded stories [98, 99] depicted in Fig. 3a, and replicated in another paradigm ([89]). The most interesting finding is that whereas “Mark is frustrated” increased corrugator activity relative to “Mark is happy” when Mark had displayed morally good behavior before, it did not increase corrugator activity when Mark had displayed morally bad behavior. By itself, this already shows that simple emotion simulation at the conceptual or situationmodel level cannot account for the findings. The fALC model, however, suggests a number of alternative possible accounts. One is that the apparent non-responsiveness of readers to the emotions of a bad character reflects the simultaneous use of their emotion systems to simulate and evaluate the events described, with the two forces counteracting for bad characters. For example, reading about a “frustrated” bad character might increase frowning because of simulation, while simultaneously decreasing frowning because of some fairness-based evaluation (cf. Schadenfreude). A second possibility is that readers are somehow less inclined to simulate the state of a bad character, an idea that fits with the emerging realization that language-driven simulation is not an all-or-none concept but depends on all kinds of contextual factors [128–130, 132, 141,

Facial EMG and Affective Language Comprehension

721

142]. A third possibility is that readers are less likely to automatically mimic the simulated emotions of bad characters, an idea that is consistent with reports that facial mimicry can be diminished by disliking or disidentifying with the other person, by the desire to disaffiliate from somebody, and, related, by classifying him or her as an outgroup member [135]. And fourth, it could be that readers do not evaluate—that is, care about—what happens to bad characters at all. Some of these hypotheses might be more plausible than others, and some are very hard to test. For now, however, the point is that the fALC model can help in systematically defining the options, and as such facilitate the search for what is actually happening in this particular situation.

5

Challenges and Opportunities We have reviewed facial EMG research on affective language comprehension and proposed a model of how language can lead to changes in facial muscle activity. What are the challenges and opportunities ahead of us? One opportunity is obvious: facial EMG can help track conscious and unconscious emotional responses to language as it unfolds. That is good news, because facial EMG extends the methods repertoire for tracking emotion in unique ways. For example, relative to EEG, MEG, or fMRI, it can tap into somebody’s emotional state without requiring lengthy electrode attachments sessions (EEG), expensive and relatively inaccessible equipment (MEG, fMRI), and the complex analyses that comes with high-dimensional data (all three). In addition, whereas skin conductance provides information about undirected arousal only, with fairly low temporal resolution, facial EMG provides directional (valence) as well as intensity information, with a higher temporal resolution. And because EMG can detect facial muscle activity even when this is too weak to lead to observable skin movement [53], the measure is more sensitive than automated facial image analysis (such as Face Reader), an important property when working with language stimuli in a laboratory setting [42]. Whether facial EMG should be the method of choice naturally depends on the research question: if the latter is about feeling only, for example, self-report is the way to go. However, in affective language comprehension research, facial EMG clearly occupies a unique and attractive niche. We see several important challenges for facial EMG research in this domain. First, although the fALC model need not be correct in the exact distinctions made, Fig. 4 does immediately foreground a big challenge to facial EMG research in psycholinguistics and other fields: we are dealing with a very complex situation here. When exploring single written words presented without any additional context in the lab, this complexity is still relatively restricted.

722

Jos J. A. van Berkum et al.

However, as soon as we enrich the materials or the context, many other options open up. For example, as soon as language is spoken, stance-elicited emotion can enter the picture, even with a single word. Also, as soon as sentences really describe something, situation model simulation and evaluation enter the picture, and as we move to bigger chunks of discourse, such as private or professional conversations, advertisements, government messages, or blogs (to name just a few), emotions elicited by inferred social intentions enter the picture too. Although the complexity made explicit in our model is perhaps a little disheartening, it also points to exciting new questions that as yet have not been tackled. Research on those questions might confirm some of the predictions of the fALC model, but—just as attractive—might also eliminate some of the proposed routes from language to facially expressed emotion. The second challenge involves using facial EMG research to assess specific emotions. Although emotion and emotional expression is part and parcel of our biological heritage, there has been much debate over how reliable the information provided by facial, vocal, and other bodily cues to emotion actually is (see, e.g., [38, 39, 118]). What that debate has made very clear is that, in contrast to what early basic-emotions research suggested (e.g., [143]), particular emotions do not map onto particular facial, vocal, or other bodily expression in a simple one-to-one fashion, or one that is completely invariant across occasions. The various aspects of a particular emotion are probabilistically linked (see [39, 144]), which means that particular appraisals and motivational, physiological, cognitive, and behavioral responses typically but not invariably cohere with each other. For example, although anger tends to increase the probability of frowning, to such an extent that the cue becomes an informative cue, not every instance of anger does so. Because our body is used for many things besides expressing emotions, facial, vocal, and other bodily cues are often ambiguous, an ambiguity that we can only resolve by taking the context into account too (e.g., what is the situation, who is involved, what cultural constraints are in place?). Now, emotional expressions are reliable and specific enough to do considerable work in our daily lives. We routinely pay attention to how people look, sound, and move to infer specific emotional states and the associated intentions—all that would not happen if facial, vocal, and other embodied emotional cues are fundamentally unreliable, or only signal valence. However, the problem is that much of the rich context that allows us to effectively disambiguate facial signals in real life is not present in the average laboratory experiment. One way to address this problem is to resort to highdimensional signal measurement [38], so that the simultaneous recording of many different facial muscles, autonomous nervous system parameters, and other potentially relevant bodily cues can provide additional constraints. The other, and often more feasible,

Facial EMG and Affective Language Comprehension

723

option is to design experiments in which the materials, tasks, and other details of the setting constrain interpretation to such an extent that the activity of even just one or two facial muscles becomes sufficiently unambiguous and informative. The final challenge also relates to the multi-interpretability of facial expressions. It has been argued that emotional expressions conveyed through the face, the voice, and the rest of the body provide a more “honest” indication of emotion than what people say, because such embodied multi-channel expressions often arise involuntarily, are not easy to suppress, and, as rich and complex patterns, are relatively hard to generate deliberately [39, 40, 145]. At the same time, however, real-life experience shows that people can deliberately generate emotional expressions that we take at face value. Actors, for instance, do so routinely, and all parents of young children will recognize the utility of being able to make an emotional facial expression that is at odds with one’s inner state. Moving beyond such anecdotal observations, careful research has shown, for instance, that some facial expressions are only produced when a particular audience is around (e.g., [146–148]), and that, more generally, we all use facial expressions as social tools [121]. Related, there is discussion as to what “honest” really means (see, e.g., [119]). All this testifies to the complexity of using facial EMG when studying language comprehension. As long as nobody else is around, and the experiment is designed such that comprehenders do not feel that making certain facial expressions is a desirable thing to do, facial EMG may predominantly tap into “inner” emotional states, generated for one or more of the reasons laid out in the fALC model. However, all of this may change when an interlocutor or other audience enters the arena, because in such more social situations, facial expressions can and often will be used strategically, under the expressers’s full control and deliberate designed to have a particular impact on one’s current interlocutor or audience [121, 146]. This means that as soon as we abandon the classic solitary-comprehender paradigm to explore comprehension in a more social context, facial EMG will most certainly not just “tap into inner emotional states,” but will also pick up on strategic facial displays that, although probably often motivated by some emotion, are not themselves direct reflections of that emotion (nor part of language comprehension proper). These are all non-trivial challenges. However, rather than just avoiding all this complexity, it is much more interesting to address it. For example, by increasing the richness of our verbal stimuli, we can use facial EMG to explore the impact of stance signals and social intentions. By designing EMG studies that independently control emotion simulation, emotional evaluation, and perhaps mimicry, we can try to tease those factors apart. And by manipulating the social context, we can use facial EMG to explore the exact conditions under which language-induced emotional expression arises, as an

724

Jos J. A. van Berkum et al.

involuntary response to the stimulus and its context, as a deliberate social tool, or both. Language-driven emotion is all around us, in our personal lives, and in the big world out there. Facial EMG provides a unique tool to help us understand how such emotion comes about, as well as how we express it to influence others.

Acknowledgments This work is partly supported by NWO Vici grant #277-89-001 to JvB. We thank Anita Eerland and Mirko Grimaldi for comments on an earlier version of this chapter, the members of the ILS Language & Communication research group for feedback on some of the ideas presented here, and Ton van Boxtel for Fig. 1, and for generously sharing his expertise when we embarked upon our first EMG project. References 1. Brysbaert M (2019) How many words do we read per minute? A review and meta-analysis of reading rate. J Mem Lang 109:104047 2. Levelt WJM (1989) Speaking: from intention to articulation. MIT Press, Cambridge, MA 3. Hagoort P (ed) (2019) Human language: from genes and brains to behavior. MIT Press, Cambridge, MA 4. Rueschemeyer S, Gaskell MG (eds) (2018) The Oxford handbook of psycholinguistics. Oxford University Press, Oxford 5. de Zubicaray GI, Schiller NO (eds) (2019) The Oxford handbook of neurolinguistics. Oxford University Press, Oxford 6. Bohn-Gettler CM (2019) Getting a grip: the PET framework for studying how reader emotions influence comprehension. Discourse Process:1–16 7. Hinojosa JA, Moreno EM, Ferre´ P (2019) Affective neurolinguistics: towards a framework for reconciling language and emotion. Lang Cogn Neurosci:1–27 8. Jensen TW (2014) Emotion in languaging: languaging as affective, adaptive and flexible behavior in social interaction. Front Psychol 5:720 9. Koelsch S, Jacobs AM, Menninghaus W, Liebal K, Klann-Delius G, von Scheve C, Gebauer G (2015) The quartet theory of human emotions: an integrative and neurofunctional model. Phys Life Rev 13:1–27 10. Majid A (2012) Current emotion research in the language sciences. Emot Rev 4(4): 432–443

11. Per€akyl€a A, Sorjonen ML (2012) Emotion in interaction. Oxford University Press, Oxford 12. van Berkum JJA (2018) Language comprehension, emotion, and sociality: Aren’t we missing something? In: Rueschemeyer SA, Gaskell G (eds) Oxford handbook of psycholinguistics. Oxford University Press, Oxford, pp 644–669 13. van Berkum JJA (2019) Language comprehension and emotion: where are the interfaces, and who cares? In: de Zubicaray G, Schiller NO (eds) Oxford handbook of neurolinguistics. Oxford University Press, Oxford, pp 736–766 14. van Berkum JJA (2020) Inclusive affective neurolinguistics. Lang Cogn Neurosci 35(7): 871–876 15. Scherer KR (2005) What are emotions? And how can they be measured? Soc Sci Inf 44(4): 695–729 16. Bradley MM, Lang PJ (1994) Measuring emotion: the self-assessment manikin and the semantic differential. J Behav Ther Exp Psychiatry 25(1):49–59 17. Larsen JT, Norris CJ, Cacioppo JT (2003) Effects of positive and negative affect on electromyographic activity over zygomaticus major and corrugator supercilii. Psychophysiology 40(5):776–785 18. Fontaine JR, Scherer KR, Soriano C (eds) (2013) Components of emotional meaning: A sourcebook. Oxford University Press, Oxford

Facial EMG and Affective Language Comprehension 19. Nummenmaa L, Glerean E, Hari R, Hietanen JK (2014) Bodily maps of emotions. Proc Natl Acad Sci 111(2):646–651 20. Adolphs R (2017) How should neuroscience study emotions? By distinguishing emotion states, concepts, and experiences. Soc Cogn Affect Neurosci 12(1):24–31 21. Carlson JM (2018) Reactive emotional processes in the absence of conscious awareness. In: Fox AS, Lapate RC, Shackman AJ, Davidson RJ (eds) The nature of emotion: fundamental questions. Oxford University Press, Oxford, pp 312–316 22. Damasio AR, Damasio H (2018) Emotions and feelings: William James then and now. In: Fox AS, Lapate RC, Shackman AJ, Davidson RJ (eds) The nature of emotion: fundamental questions. Oxford University Press, Oxford, pp 1–6 23. De Gelder B, Tamietto M (2018) What is the role of unconscious emotions and of conscious awareness in emotions? In: Fox AS, Lapate RC, Shackman AJ, Davidson RJ (eds) The nature of emotion: fundamental questions. Oxford University Press, Oxford, pp 316–322 24. Scherer KR, Moors A (2019) The emotion process: event appraisal and component differentiation. Annu Rev Psychol 70:719–745 25. Dimberg U, Thunberg M, Elmehed K (2000) Unconscious facial reactions to emotional facial expressions. Psychol Sci 11(1):86–89 26. Tamietto M, De Gelder B (2010) Neural bases of the non-conscious perception of emotional signals. Nat Rev Neurosci 11(10): 697 27. Frijda NH (2007) The laws of emotion. Erlbaum, Mahwah 28. Krieglmeyer R, Deutsch R, De Houwer J, De Raedt R (2010) Being moved: valence activates approach-avoidance behavior independently of evaluation and approach-avoidance intentions. Psychol Sci 21(4):607–613 29. Citron FM, Abugaber D, Herbert C (2016) Approach and withdrawal tendencies during written word processing: effects of task, emotional valence, and emotional arousal. Front Psychol 6:1935 30. van Berkum JJA, Holleman BC, Nieuwland M, Otten M, Murre JJM (2009) Right or wrong? The brain’s fast response to morally objectionable statements. Psychol Sci 20:1092–1099 31. Citron FM (2012) Neural correlates of written emotion word processing: a review of recent electrophysiological and hemodynamic

725

neuroimaging studies. Brain Lang 122(3): 211–226 32. Wood A, Martin J, Niedenthal P (2018) How and why emotions are embodied. In: Fox AS, Lapate RC, Shackman AJ, Davidson RJ (eds) The nature of emotion: fundamental questions. Oxford University Press, Oxford, pp 277–280 33. Damasio AR (2003) Looking for Spinoza: joy, sorrow, and the feeling brain. Houghton Mifflin Harcourt 34. Kreibig SD (2010) Autonomic nervous system activity in emotion: A review. Biol Psychol 84(3):394–421 35. Lucas I, Sa´nchez-Adam A, Vila J, Guerra P (2019) Positive emotional reactions to loved names. Psychophysiology:e13363 36. Boren JP, Veksler AE (2011) A decade of research exploring biology and communication: the nervous, endocrine, cardiovascular, and immune systems. Commun Res Trends 30(4):1–31 37. Francis AL, Oliver J (2018) Psychophysiological measurement of affective responses during speech perception. Hear Res 369:103– 119 38. Cowen A, Sauter D, Tracy JL, Keltner D (2019) Mapping the passions: toward a high-dimensional taxonomy of emotional experience and expression. Psychol Sci Public Interest 20(1):69–90 39. Scarantino A (2017) How to do things with emotional expressions: the theory of affective pragmatics. Psychol Inq 28(2–3):165–185 40. Scarantino A (2018) Emotional expressions as speech act Analogs. Philos Sci 85(5): 1038–1053 41. Altenmu¨ller E, Schmidt S, Zimmermann E (eds) (2013) The evolution of emotional communication: from sounds in nonhuman mammals to speech and music in man. Oxford University Press, Oxford 42. van Boxtel A (2023, this volume) Electromyographic (EMG) responses of facial muscles during language processing. In: Grimaldi M, Brattico E, Shtyrov Y (eds) Language electrified: principles, methods, and future perspectives of investigation. Springer, New York 43. Hess U (2018) The (more or less) communication of emotions serves social problem solving. In: Fox AS, Lapate RC, Shackman AJ, Davidson RJ (eds) The nature of emotion: fundamental questions. Oxford University Press, Oxford, pp 250–253 44. van Boxtel A (2010) Facial EMG as a tool for inferring affective states. In: Spink AJ,

726

Jos J. A. van Berkum et al.

Grieco F, Krips OE, Loijens LWS, Noldus LPJJ, Zimmerman PH (eds) Proceedings of measuring behavior 2010 45. Fridlund AJ, Cacioppo JT (1986) Guidelines for human electromyographic research. Psychophysiology 23(5):567–589 46. Cacioppo JT, Petty RE (1981) Electromyograms as measures of extent and affectivity of information processing. Am Psychol 36(5): 441–456 47. Cacioppo JT, Petty RE (1981) Electromyographic specificity during covert information processing. Psychophysiology 18(5):518–523 48. de Morree HM, Marcora SM (2012) Facial electromyography as a measure of effort during physical and mental tasks. In: Takada H (ed) Electromyography: new developments, procedures and applications. Nova Science Publishers, pp 103–122 49. Waterink W, van Boxtel A (1994) Facial and jaw-elevator EMG activity in relation to changes in performance level during a sustained information processing task. Biol Psychol 37:183–198 50. Harris KS (1970) Physiological measures of speech movements: EMG and fiber-optic studies. ASHA Rep 5:271–282 51. Wohlert AB, Hammen VL (2000) Lip muscle activity related to speech rate and loudness. J Speech Lang Hear Res 43(5):1229–1239 52. Schwartz GE, Fair PL, Salt P, Mandel MR, Klerman GL (1976) Facial muscle patterning to affective imagery in depressed and nondepressed subjects. Science 192(4238): 489–491 53. Rodriguez-Falces J, Navallas J, Malanda A (2012) EMG modeling. In: Naik GR (ed) Computational intelligence in electromyography analysis – a perspective on current applications and future challenges. IntechOpen 54. Cacioppo JT, Petty RE (1979) Attitudes and cognitive response: an electrophysiological approach. J Pers Soc Psychol 37(12): 2181–2199 55. Tassinary LG, Cacioppo JT (1992) Unobservable facial actions and emotion. Psychol Sci 3(1):28–33 56. Braun N, Goudbeek M, Krahmer E (2019) Language and emotion–a foosball study: the influence of affective state on language production in a competitive setting. PLoS One 14(5):e0217419 57. Andrade CRFD, Sassi FC, Juste F, Mendonc¸a LIZD (2008) Persistent developmental stuttering as a cortical-subcortical dysfunction:

evidence from muscle activation. Arq Neuropsiquiatr 66(3B):659–664 58. Walsh B, Smith A (2013) Oral electromyography activation patterns for speech are similar in preschoolers who do and do not stutter. J Speech Lang Hear Res 56(5):1441–1454 59. Sukalla F, Bilandzic H, Bolls PD, Busselle RW (2015) Embodiment of narrative engagement. J Media Psychol 28(4):175–186 60. Bolls PD, Lang A, Potter RF (2001) The effects of message valence and listener arousal on attention, memory, and facial muscular responses to radio advertisements. Commun Res 28(5):627–651 61. Cacioppo JT, Petty RE, Marshall-Goodell B (1984) Electromyographic specificity during simple physical and attitudinal tasks: location and topographical features of integrated EMG responses. Biol Psychol 18:85–121 62. Cannon PR, Schnall S, White M (2011) Transgressions and expressions: affective facial muscle activity predicts moral judgments. Soc Psychol Personal Sci 2(3):325–331 63. Elash CA, Tiffany ST, Vrana SR (1995) Manipulation of smoking urges and affect through a brief-imagery procedure: selfreport, psychophysiological, and startle probe responses. Exp Clin Psychopharmacol 3(2):156–162 64. Peasley-Miklus CE, Vrana SR (2000) Effect of worrisome and relaxing thinking on fearful emotional processing. Behav Res Ther 38(2): 129–144 65. Larsen JT, Norris CJ, McGraw AP, Hawkley LC, Cacioppo JT (2009) The evaluative space grid: A single-item measure of positivity and negativity. Cognit Emot 23(3):453–480 66. Kunkel AE (2018) The processing of moral transgressions: investigating the role of affective evaluations (Dissertation) 67. Foroni F, Semin GR (2009) Language that puts you in touch with your bodily feelings: the multimodal responsiveness of affective expressions. Psychol Sci 20(8):974–980 68. Borgeat F, Elie R, Chaloult L, Chabot R (1984) Psychophysiological responses to masked auditory stimuli. Can J Psychiatry 30(1):22–27 69. Wexler BE, Warrenburg S, Schwartz GE, Janer LD (1992) EEG and EMG responses to emotion-evoking stimuli processed without conscious awareness. Neuropsychologia 30(12):1065–1079 70. Arndt J, Allen JJB, Greenberg J (2001) Traces of terror: subliminal death primes and facial electromyographic indices of affect. Motiv Emot 23(3):253–277

Facial EMG and Affective Language Comprehension 71. Niedenthal PM, Winkielman P, Mondillon L, Vermeulen N (2009) Embodiment of emotion concepts. J Pers Soc Psychol 96(6): 1120–1136 72. Baumeister JC, Foroni F, Conrad M, Rumiati RI, Winkielman P (2017) Embodiment and emotional memory in first vs. second language. Front Psychol 8:394 73. Ku¨necke J, Sommer W, Schacht A, Palazova M (2015) Embodied simulation of emotional valence: facial muscle responses to abstract and concrete words. Psychophysiology 52(12):1590–1598 74. Neumann R, Hess M, Schulz SM, Alpers GW (2005) Automatic behavioural responses to valence: evidence that facial action is facilitated by evaluative processing. Cognit Emot 19(4):499–513 75. Zhu Y, Suzuki N (2018) The face and brain in the emotional loop: event-related potential correlates of the first facial response to emotional words. N Am J Psychol 20(1):91–110 76. Hietanen JK, Surakka V, Linnankoski I (1998) Facial electromyographic responses to vocal affect expressions. Psychophysiology 35(5):530–536 77. Philip L, Martin JC, Clavel C (2018) Suppression of facial mimicry of negative facial expressions in an incongruent context. J Psychophysiol 32(4):160–171 78. Ferdenzi C, Joussain P, Digard B, Luneau L, Djordjevic J, Bensafi M (2017) Individual differences in verbal and non-verbal affective responses to smells: influence of odor label across cultures. Chem Senses 42(1):37–46 79. Herbert C, Kissler J (2010) Motivational priming and processing interrupt: startle reflex modulation during shallow and deep processing of emotional words. Int J Psychophysiol 76(2):64–71 80. Herbert C, Kissler J, Jungho¨fer M, Peyk P, Rockstroh B (2006) Processing of emotional adjectives: evidence from startle EMG and ERPs. Psychophysiology 43(2):197–206 81. Bayer M, Sommer W, Schacht A (2010) Reading emotional words within sentences: the impact of arousal and valence on eventrelated potentials. Int J Psychophysiol 78(3): 299–307 82. Foroni F, Semin GR (2013) Comprehension of action negation involves inhibitory simulation. Front Hum Neurosci 7:1–7 83. Foroni F (2015) Do we embody second language? Evidence for ‘partial’ simulation during processing of a second language. Brain Cogn 99:8–16

727

84. Fino E, Menegatti M, Avenanti A, Rubini M (2016) Enjoying vs. smiling: facial muscular activation in response to emotional language. Biol Psychol 118:126–135 85. Weis PP, Herbert C (2017) Bodily reactions to emotional words referring to own versus other people’s emotions. Front Psychol 8: 1277 86. Cikara M, Fiske ST (2012) Stereotypes and schadenfreude: affective and physiological markers of pleasure at outgroup misfortunes. Soc Psychol Personal Sci 3(1):63–71 87. Fino E, Menegatti M, Avenanti A, Rubini M (2019) Unfolding political attitudes through the face: facial expressions when reading emotion language of left-and right-wing political leaders. Sci Rep 9(1):1–10 88. Fino E (2014) The language of others mirrored in the face: the role of political affiliation in automatic facial effects of language (Doctoral disser tation). Bologna. http:// amsdottorato.unibo.it/6535/1/fino_edita_ tesi.pdf 89. ’t Hart B, Struiksma M, Van Boxtel A, Van Berkum JJA (2021) Reading about us and them: Moral and minimal group effects on language-induced emotion. Front Commun 6:1 90. Arias P, Belin P, Aucouturier JJ (2018) Auditory smiles trigger unconscious facial imitation. Curr Biol 28(16):R782–R783 91. Livingstone SR, Thompson WF, Russo FA (2009) Facial expressions and emotional singing: A study of perception and production with motion capture and electromyography. Music Percept Interdiscip J 26(5):475–488 92. Magne´e MJCM, Stekelenburg JJ, Kemner C, de Gelder B (2007) Similar facial electromyographic responses to faces, voices, and body expressions. Neuroreport 18(4):369–372 93. Levy N, Harmon-Jones C, Harmon-Jones E (2018) Dissonance and discomfort: does a simple cognitive inconsistency evoke a negative affective state? Motiv Sci 4(2):95–108 94. Topolinski S, Strack F (2015) Corrugator activity confirms immediate negative affect in surprise. Front Psychol 6:134 95. Topolinski S, Likowski KU, Weyers P, Strack F (2009) The face of fluency: semantic coherence automatically elicits a specific pattern of facial muscle reactions. Cognit Emot 23(2): 260–271 96. Bartholow BD, Fabiani M, Gratton G, Bettencourt BA (2001) A psychophysiological examination of cognitive processing of and affective responses to social expectancy violations. Psychol Sci 12(3):197–204

728

Jos J. A. van Berkum et al.

97. Krumhuber EG, Tsankova E, Kappas A (2016) Examining subjective and physiological responses to norm violation using textbased vignettes. Int J Psychol 53(1):23–30 98. ’t Hart B, Struiksma ME, Van Boxtel A, Van Berkum JJA (2018) Emotion in stories: facial EMG evidence for both mental simulation and moral evaluation. Front Psychol 9:613 99. ’t Hart B, Struiksma ME, Van Boxtel A, Van Berkum JJA (2019) Tracking affective language comprehension: simulating and evaluating character affect in morally loaded narratives. Front Psychol 10:318 100. Thompson D, Mackenzie IG, Leuthold H, Filik R (2016) Emotional responses to irony and emoticons in written language: evidence from EDA and facial EMG. Psychophysiology 53(7):1054–1062 101. Fiacconi CM, Owen AM (2015) Using psychophysiological measures to examine the temporal profile of verbal humor elicitation. PLoS One 10(9):1–16 102. Gavaruzzi T, Sarlo M, Giandomenico F, Rumiati R, Polato F, De Lazzari F, Lotto L (2018) Assessing emotions conveyed and elicited by patient narratives and their impact on intention to participate in colorectal cancer screening: A psychophysiological investigation. PLoS One 13(6):e0199882 103. Leshner G, Bolls P, Gardner E, Moore J, Kreuter M (2018) Breast cancer survivor testimonies: effects of narrative and emotional valence on affect and cognition. Cogent Soc Sci 4(1):1426281 104. Tuisku OA, Ilves MK, Lylykangas JK, Surakka VV, Ainasoja M, Ryto¨vuori SE, Ruohonen MJ (2018) Emotional responses of clients to veterinarian communication style during a vaccination visit in companion animal practice. J Am Vet Med Assoc 252(9):1120–1132 105. K€atsyri J, Ravaja N, Salminen M (2012) Aesthetic images modulate emotional responses to reading news messages on a small screen: A psychophysiological investigation. Int J Hum Comput Stud 70(1):72–87 106. Ravaja N, Aula P, Falco A, Laaksonen S, Salminen M, Ainamo A (2015) Online news and corporate reputation. J Media Psychol 27(3):118–133 107. Ravaja N, Kallinen K, Saari T, KeltikangasJarvinen L (2004) Suboptimal exposure to facial expressions when viewing video messages from a small screen: effects on emotion, attention, and memory. J Exp Psychol Appl 10(2):120–131 108. Ravaja N, Saari T, Kallinen K, Laarni J (2006) The role of mood in the processing of media

messages from a small screen: effects on subjective and physiological responses. Media Psychol 8(3):239–265 109. Wise K, Kim HJ, Kim J (2009) The effect of searching versus surfing on cognitive and emotional responses to online news. J Media Psychol 21(2):49–59 110. Lee S, Potter RF (2018) The impact of emotional words on listeners’ emotional and cognitive responses in the context of advertisements. Commun Res:1–16 111. Fiacconi CM, Owen AM (2016) Using facial electromyography to detect preserved emotional processing in disorders of consciousness: A proof-of-principle study. Clin Neurophysiol 127(9):3000–3006 112. Morisseau T, Mermillod M, Eymond C, Van Der Henst JB, Noveck IA (2017) You can laugh at everything, but not with everyone. Interact Stud 18(1):116–141 113. van Leeuwen AR (2017) Right on time: synchronization, overlap, and affiliation in conversation. LOT Dissertation Series, Utrecht 114. Wassiliwizky E, Koelsch S, Wagner V, Jacobsen T, Menninghaus W (2017) The emotional power of poetry: neural circuitry, psychophysiology and compositional principles. Soc Cogn Affect Neurosci 12(8): 1229–1240 115. Larson K, Hazlett RL, Chaparro BS, Picard RW (2007) Measuring the aesthetics of reading. In: People and computers XX— engage. Springer, London, pp 41–56 116. Durso FT, Geldbach KM, Corballis P (2012) Detecting confusion using facial electromyography. Hum Factors 54(1):60–69 117. Luck SJ, Gaspelin N (2017) How to get statistically significant effects in any ERP experiment (and why you shouldn’t). Psychophysiology 54(1):146–157 118. Feldman Barrett L, Adolphs R, Marsella S, Martinez AM, Pollak SD (2019) Emotional expressions reconsidered: challenges to inferring emotion from human facial movements. Psychol Sci Public Interest 20(1):1–68 119. Niedenthal PM, Mermillod M, Maringer M, Hess U (2010) The Simulation of Smiles (SIMS) model: embodied simulation and the meaning of facial expression. Behav Brain Sci 33(6):417–480 120. Ekman P, Cordaro D (2011) What is meant by calling emotions basic. Emot Rev 3(4): 364–370 121. Fridlund AJ (1994) Human facial expression: an evolutionary view. Academic Press, San Diego

Facial EMG and Affective Language Comprehension 122. Jackendoff R (2007) A parallel architecture perspective on language processing. Brain Res 1146:2–22 123. Tomasello M (2008) Origins of human communication. MIT Press, Cambridge, MA 124. Zwaan RA (2016) Situation models, mental simulations, and abstract concepts in discourse comprehension. Psychon Bull Rev 23(4):1028–1034 125. Winkielman P, Olszanowski M, Gola M (2015) Faces in-between: evaluations reflect the interplay of facial features and taskdependent fluency. Emotion 15(2):232 126. Barsalou LW (2008) Grounded cognition. Annu Rev Psychol 59:617–645 127. Glenberg AM (2017) How reading comprehension is embodied and why that matters. Int Electron Elem Educ 4(1):5–18 128. Havas DA, Matheson J (2013) The functional role of the periphery in emotional language comprehension. Front Psychol 4:294 129. Winkielman P, Coulson S, Niedenthal P (2018) Dynamic grounding of emotion concepts. Philos Trans R Soc B Biol Sci 373(1752):20170127 130. Zwaan RA (2014) Embodiment and language comprehension: reframing the discussion. Trends Cogn Sci 18:229–234 131. Pulvermu¨ller F, Shtyrov Y, Ilmoniemi R (2005) Brain signatures of meaning access in action word recognition. J Cogn Neurosci 17(6):884–892 132. Willems RM, Casasanto D (2011) Flexibility in embodied language understanding. Front Psychol 2:116 133. Zwaan RA, Pecher D (2012) Revisiting mental simulation in language comprehension: six replication attempts. PLoS One 7:51382 134. Zwaan RA, Stanfield RA, Yaxley RH (2002) Language comprehenders mentally represent the shapes of objects. Psychol Sci 13:168–171 135. Hess U, Fischer A (2014) Emotional mimicry: why & when we mimic emotions. Soc Personal Psychol Compass 8:45–57 136. Hatfield E, Bensman L, Thornton PD, Rapson RL (2014) New perspectives on emotional contagion: a review of classic and

729

recent research on facial mimicry and contagion. Interpersona 8(2):159–179 137. Albers AM, Kok P, Toni I, Dijkerman HC, De Lange FP (2013) Shared representations for working memory and mental imagery in early visual cortex. Curr Biol 23(15):1427–1431 138. Dijkstra N, Bosch SE, van Gerven MA (2019) Shared neural mechanisms of visual perception and imagery. Trends Cogn Sci 23(5): 423–434 139. Elkins-Brown N, Saunders B, Inzlicht M (2016) Error-related electromyographic activity over the corrugator supercilii is associated with neural performance monitoring. Psychophysiology 53(2):159–170 140. Decety J, Cowell JM (2014) Friends or foes: is empathy necessary for moral behavior? Perspect Psychol Sci 9:525–537 141. Pecher D (2018) Curb your embodiment. Top Cogn Sci 10(3):501–517 142. Pecher D, Zeelenberg R (2018) Boundaries to grounding abstract concepts. Philos Trans R Soc B Biol Sci 373(1752):20170132 143. Ekman P, Friesen WV (1971) Constants across cultures in the face and emotion. J Pers Soc Psychol 17(2):124 144. Scarantino A (2014) The motivational theory of emotions. In: Moral psychology and human agency. Oxford University Press, Oxford, pp 156–185 145. Mortillaro M, Mehu M, Scherer KR (2013) The evolutionary origin of multimodal synchronization and emotional expression. In: Altenmu¨ller E, Schmidt S, Zimmermann E (eds) The evolution of emotional communication: from sounds in nonhuman mammals to speech and music in man. Oxford University Press, Oxford, pp 3–25 146. Crivelli C, Fridlund AJ (2018) Facial displays are tools for social influence. Trends Cogn Sci 22(5):388–399 147. Kraut RE, Johnston RE (1979) Social and emotional messages of smiling: an ethological approach. J Pers Soc Psychol 37(9):1539 148. Ruiz-Belda MA, Ferna´ndez-Dols JM, Carrera P, Barchard K (2003) Spontaneous facial expressions of happy bowlers and soccer fans. Cognit Emot 17(2):315–326

Chapter 23 Eye-Tracking Methods in Psycholinguistics Mikhail Pokhoday, Beatriz Bermu´dez-Margaretto, Anastasia Malyshevskaya, Petr Kotrelev, Yury Shtyrov, and Andriy Myachykov Abstract Rapid technological advancements have led to a significant progress in research methods for cognitive sciences. Today scientists can use various neurophysiological methods to study human behavior and its underlying neuroanatomical and cognitive mechanisms. Among such methods is eye-tracking—a technique allowing recording and analysis of online oculomotor behavior. In this chapter, we first review the use of eye-tracking methodology in cognitive research—both as a stand-alone method and in combination with electroencephalography. We then discuss eye-tracking in terms of its application in language research, from studying sentence comprehension and sentence production, to second language learning and bilingualism. Finally, we discuss co-registration of brain-ocular activity. Key words Eye-tracking, EEG, Eye movements, Co-registration, Brain, Language, Production, Comprehension

1

Getting Started with Eye-Tracking Methodology: An Overview There are several basic aspects that need to be taken into consideration before implementing any kind of eye-tracking in psychological and neuroscientific research. First, one needs to consider which type of eye movement recording methodology to use: electrooculography (EOG), scleral contact lens/search coil/suction cap (see, e.g., [1], for a methodological description), PhotoOculoGraphy (POG), Video-OculoGraphy (VOG), or videobased combined pupil and corneal reflection eye-trackers [2]. EOG records the position of an eyeball using electrodes placed above and below or to the left and to the right of the eye. Eyeball is a dipole with positive charge at cornea and negative charge at retina creating a cornea-retinal standing potential. Thus, when the eye moves toward one electrode or the other it registers the change in

Mirko Grimaldi et al. (eds.), Language Electrified: Principles, Methods, and Future Perspectives of Investigation, Neuromethods, vol. 202, https://doi.org/10.1007/978-1-0716-3263-5_23, © Springer Science+Business Media, LLC, part of Springer Nature 2023

731

732

Mikhail Pokhoday et al.

Fig. 1 SMI High-Speed tower-mounted eye tracker

the charge. EOG is usually used in medical diagnosis and less frequently for eye-tracking research as it does not provide information about the eye position on the visual stimuli. Scleral contact lens and similar methods are examples of an invasive eye-tracking recording with eye movements recorded via a direct connection between the recording mechanism and the eyeball. Such methods are very accurate [1], but they are rarely used nowadays for obvious ethical and health reasons. In this chapter, we mostly focus on research using optical non-invasive binocular and/or monocular eye-tracking methods as they are the most widely methods nowadays. Modern non-invasive eye-trackers are further subdivided according to their form-factor: head-mounted (e.g., EyeLink II, Tobii Pro glasses) or static, generally mounted on desktop, tower (SMI High speed (Fig. 1)), or monitor, EyeLink portable duo, Tobii Eye-tracker 4C). Most advanced systems allow setups for remote recordings in fMRI scanners or MEG systems (e.g., an arm mount EyeLink 1000+). Other systems like HTC Vive with Pupil Labs eye-tracking system can record eye movements in virtual reality (VR). A basic eye-tracking setup consists of an eye tracker and a laptop with software. Some setups of high-end eye-tracking systems (e.g., EyeLink 1000+) require two computers to run experiments (a “Host” computer controlling and processing and filtering eye-movement data and a “Display” computer for stimuli presentation and data storage). In general, modern eye trackers share the same basic principle of obtaining oculomotor measurements—the video-based pupilto-corneal reflection measurement technology (see [2, 3] for detailed description). An eye position is recorded via a high frame rate camera. Some trackers can record at up to 2000 Hz (2000 frames or samples per second) speeds allowing to detect microsaccades, ocular tremor, and drift, as well as smooth pursuit. Higher

Eye-Tracking Methods in Psycholinguistics

733

Fig. 2 Pupil-to-corneal reflection eye position recording principle. Red cross represents the center of the pupil; white cross represents Purkinje reflex 1 (P1) IR light reflection from the cornea

sampling rates provide better temporal resolution and data clarity. The camera is paired with an infrared (IR) light emitter. The emitted light is reflected from the cornea creating a Purkinje reflex 1 (P1, see Fig. 2 for a depiction). Information about the location of the P1 is paired with the position of the pupil to estimate the point of gaze [2]. The outcome data comprise samples and events, where a sample is a single recording of a time stamp, and an eye position in x and y coordinates. An event consists of several grouped samples into a specific eye movement: for example, a saccade or a fixation. Saccades are the “movements” of the eye, which usually last between 30 and 80 ms, whereas fixations are periods of relative eye stability, ranging from several dozen milliseconds up to several seconds [3]. Eye-movement data are usually aggregated across areas of interest (AOIs, or regions of interest, ROIs). In a typical sentence reading study, an AOI could be for instance a single word. In addition to eye movements, pupil size is also recorded, and it can be used as an additional measure [4], although pupillometry is not without its own limitations (for a recent review see [5]). Some systems allow binocular recording only, that is, parallel recording of both eyes’ movements (binocular mode in SMI Red-M) while others (EyeLink 1000+) allow for both binocular and monocular recording modes. It is important to decide in advance whether binocular or monocular recording is optimal/necessary. Monocular mode is used more often [6], typically recording the dominant eye (to determine the eye dominance use, e.g., Miles test [7]). Among other important features are the system’s accuracy, precision, and latency. Accuracy stands for the average difference between the actual fixation position and the fixation position recorded by the system in degrees of visual angle. As the systems used nowadays are not invasive, this disparity should be expected and reported. During calibration and validation procedures, the system provides the researcher with an average error in degrees of visual angle. Depending on the conditions and the system itself, these can range from as low as 0.05–1°. If during tracker calibration an average error exceeds 1°, it is recommended to recalibrate the system. Precision, according to Holmqvist et al. [3], is an estimate of how consistently a tracker records the eye position. Finally, the

734

Mikhail Pokhoday et al.

latency refers to how fast the tracker delivers information about the eye position to the data file. Trackers differ in their latency and the smaller it is, the better. It can be as low as 1 ms. In comparison to other technologically complex methods used in language research (such as electrophysiological or hemodynamic magnetic-resonance measurements), eye-tracking is relatively easy to learn and implement. Most modern eye trackers are equipped with user-friendly software (e.g., SR Research Experiment Builder and Data Viewer) and their vendors as well as research communities offer plenty of helpful support. A few different open-access experiment building applications (Opensesame, PsychoPy) are compatible with some eye trackers. Furthermore, most eye trackers allow concurrent eye-tracking and reaction time data recording, as well as integration with other types of neurocognitive data acquisition, such as electroencephalography (EEG, will be discussed in detail below), magnetoencephalography (MEG), functional magnetic resonance imaging (fMRI), and various other measures via parallel port link (see [3] for a list of supported methods). Eye-tracking equipment is largely non-invasive and easy to use—calibration and setup take 5–10 min [8]. Most of the eye trackers can record eye movements in unconstrained head mode, which is especially useful when running studies with children or clinical populations. Overall, a quick summary above explains why eye-tracking is currently one of the most used methods in research in general and in psycholinguistics research, in particular. In the next section, we will look at several examples of how eye-tracking methodology has been used for psycholinguistics research.

2

Eye-Tracking Implementation in Psycholinguistics

2.1 Eye-Tracking in Reading Research

Arguably, research in reading is the area where eye-tracking has been used the most. Thus, a vast body of research has been accumulated in order to characterize fluent reading based on the readers’ eye-movement patterns, revealing different stages of word form and meaning access ([9, 10] also see [11] for an extensive review). According to [12], the use of eye-tracking is based on two main principles similarly to any other reaction time measure. First, the number and the duration of individual fixations on a given target reflect the degree of cognitive effort required for processing [13]. Thus, a stimulus that receives fewer and shorter fixations can be assumed to be easier to process. The second principle is that the currently fixated stimulus is the one being processed at a given moment—the eye-mind hypothesis. A summary of how eye-movement measures, such as fixation locations and durations, has been used in reading research is provided by [11]. First and foremost, eye-tracking methodology is appealing to the research in reading due to its high temporal and spatial accuracy.

Eye-Tracking Methods in Psycholinguistics

735

As such, eye-tracking allows access to cognitive processes unfolding “online,” in the sense that the process of interest can be traced to the millisecond-by-millisecond record of eye behavior (e.g., within one fixation on the target word) (for reviews, see [14, 15]). Our eyes move across the page approximately 3–4 times per second when we read, with fixation durations averaging around 200–250 ms and mean saccade amplitudes of eight characters, at least for an adult proficient reader. The latter property roughly corresponds to 2° of visual angle for normal text at a typical reading distance. Importantly, readers do not fixate every word they read as functional words and shorter words are skipped about 70% of the time, while content words and longer words are almost always fixated [9]. The two main techniques for the study of eye movements in reading research are the moving window paradigm and the boundary paradigm. These techniques allow manipulating the text’s properties contingent on where the reader is currently looking, providing important evidence concerning the type of information extracted at the point of fixation—both from the fixated word itself and from the words within the parafoveal region [16, 17]. This effect is usually addressed as a parafoveal-on-foveal effect [9]. A typical moving window paradigm experiment manipulates the amount of information available for the reader in a reading task. Among possible manipulations are the ability to mask sentences/ words above or below the current fixation, the number of unmasked symbols to the left or to the right from the fixation point, masking peripheral or foveal vision. The boundary paradigm allows to create the point in the sentence/word crossing which affects the following text, words, or letters. Generally, existing research suggests that, while readers primarily uptake information from the currently fixated word with lexical processing of the fixated word controlling the parameters of the gaze, the information provided by the parafoveally previewed words affects the current gaze parameters as well, indicating predictive processing in reading. Various visual word recognition and sentence comprehension eye-movement studies have documented several factors affecting reading process. For example, duration of the first fixation on a word reflects the word’s length and corpus frequency [18, 19] as well as its semantic (e.g., word predictability due to contextual constraints, see [20]), morphosyntactic (e.g., syntactic complexity and ambiguity, see [21]), and discourse-related features (e.g., anaphor resolution, see [22]). High-frequency words, for example, are fixated for shorter durations or even skipped indicating parafoveal access and/or predictive coding. Notably, this difference disappears if a less frequent word is repeated three or more times in a single trial (see next section about research in bilingualism using eye-tracking). Existing research also indicates that fixations but

736

Mikhail Pokhoday et al.

not saccades are sensitive to these and other linguistic factors, such as word frequency and familiarity, age-of-acquisition, polysemy, morphological complexity, contextual constraints, and plausibility [13]. As a result, fixation analyses feature more frequently in reading studies. Similarly, saccades have been shown to be sensitive to the word length, with readers showing higher probability for generating a forward saccade and for skipping shorter rather than longer words [20, 23]. Most saccades progress in the direction of reading (e.g., rightward in English, leftward in Hebrew). At the same time, backward or regressive saccades are not uncommon comprising about 20–25% of all saccades in children and 10–15%—in adults. Regressive eye movements often reflect reanalysis of previously encountered input, indicating general reading difficulties, syntactic and semantic ambiguities, or problems with text integration [24–27]. In addition, eye movements also reflect the reader’s general reading proficiency, changes, which can be tracked longitudinally across the course of learning how to read [28]. Changes in the eye-movement patterns, in this case, reflect the developmental trajectory of reading fluency, from reading patterns characterized by a considerable amount of longer fixations and short saccades, as well as a high proportion of regressive eye movements during the early stages of learning how to read, toward shorter and fewer fixations and longer saccades reflecting the increase in reading fluency. Similarly, poor readers and dyslexic readers at all ages exhibit longer fixations, shorter saccades, and more regressions, relative to control samples [29, 30]. Therefore, eye-tracking proved to be a useful tool for the evaluation of reading fluency—both during development and in the readers with language-related deficits. 2.2 Eye-Tracking in Spoken Language Comprehension

Eye-tracking has also been useful for the study of language comprehension and production, mainly in the form of the Visual World Paradigm (see [31] for a systematic review). Starting from the seminal works of [32–34], the VW paradigm has been extensively used in psycholinguistics to address various questions related to how linguistic and visual processing are interfaced during sentence comprehension and production. In a typical sentence comprehension Visual World experiment, participants’ eye movements are recorded as they are listening to the auditorily presented sentences while examining picture displays presented on a computer screen [32, 35]. A variant of the Visual World (VW) comprehension protocol may include a particular picture-related task, for example, to locate and click on a particular object presented on a computer screen [33, 36, 37]. Results of the pioneer studies using Visual World Paradigm showed that eye movements are closely time-locked to the sentence comprehension process as listeners tend to fixate named referents during word

Eye-Tracking Methods in Psycholinguistics

737

access, even when there is no specific picture-related task. These and similar findings reflect the influence of visual, non-linguistic information on sentence comprehension in an interactive processing system whose goal is to facilitate comprehension (e.g., [38]). Many of the existing VW studies used ambiguity resolution tasks to illuminate stages and features of the sentence comprehension system. For instance [39], used clipart depictions of events presented to participants for 1000 ms prior to and during the spoken sentence comprehension. Results suggested that these cliparts facilitated local structural ambiguity resolution in German subject-verb-object (SVO) sentences compared to the objectverb-subject (OVS) sentences. Another study [40] measured the incremental integration of clipart events into the process of sentence comprehension reporting similar incremental congruence effects: Participants’ demonstrated longer fixation durations to the referents of the sentence constituents when the latter were incongruent with the event depicted in the preceding clipart. Other VW studies looked at phonological and orthographic processing during spoken sentence comprehension, by presenting pictures of phonologically/orthographically similar objects or by using printed words instead of or in addition to pictures [41, 42]. Weber and colleagues [41] examined spoken-word recognition in Spanish: Spanish native speakers were instructed to click on the pictures as they listened to their names while their eye movements were monitored. When the participant was instructed to click on the picture of a door (“puerta”), they experienced temporarily interference from the picture of a pig (“puerco”) due to phonological similarities between the names of the target and the distractor. Similar interference was observed when the printed names or a combination of pictures with their names printed underneath have been presented to the participants. This and similar findings demonstrate that the comprehenders’ gaze patterns are modulated in response to the visual stimuli in the VW paradigm, which can be a good measurement of not just visual language processing, as in reading, but also of spoken language comprehension. 2.3 Eye-Tracking in Language Production

Although sentence comprehension studies dominate the Visual World Paradigm, the latter has also been used for the study of sentence production. In a sentence production version of the Visual World protocol, participants are instructed to describe pictures presented on a computer screen while their eye movements are monitored and recorded. As with sentence comprehension studies, a typical finding is that speakers’ eye movements are tightly linked to the sentence generation process with fixations to the referents about to be named slightly preceding the production of the corresponding words [43–47]. Such patterns reflect the progression from the conceptualization to lemma retrieval to overt articulation in sentence production.

738

Mikhail Pokhoday et al.

In a series of experiments [43], have determined a link between produced speech and eye movements. Specifically, they analyzed the order in which the presented object pairs are named and measured if object properties (such as difficulty in recognition) are reflected in the temporal properties of eye movements. The results suggested that participants fixated first on the objects they named first. This was due to the location of objects (objects were separated by the gap of 10–12°) which forced participants to fixate on the objects to recognize them. Another important piece of evidence demonstrates that eye movements precede overt speech articulation: Participants looked at the left object for about 500 ms, then directed their gaze to the right object, and only then named the first object. Image complexity and word frequency were also manipulated. By removing 50% of contours, researchers made the participants responses significantly slower. Eye-tracking data demonstrated that the speakers had to put more effort in visualconceptual processing of the objects which was evident from an average increase in fixation durations of 15 ms. Regarding frequency effects, shorter naming latencies and, more importantly, significantly shorter (~35 ms) fixations were recorded for objects with high-frequency names. Another important derivative measurement used in eye-tracking language production studies is the eye-voice span (EVS). This measure dates to early works of [48, 49] and is still a relevant measure of language production. EVS provides a sensitive tool for examining the effects of different types of linguistic constraints. For example, in oral reading, eye movements precede speech production. In a study by Gleitman and colleagues [50], participants described still depictions of transitive events (i.e., “The dog chases the man”) between animate protagonists after their attention had been manipulated by implicit attentional cues. The cue (a black square presented for 60–80 ms) appeared on the screen before the stimuli in the place of one of the subsequently presented referents. General finding is that implicit attention manipulation has a direct impact on the word order which is chosen by the speaker. What is interesting for this review is the eye-movement data analysis. Gleitman et al. [50] firstly collected the data and later analyzed them using eye-contingent utterance analysis, fixation analysis, and EVS analysis. Early fixation analysis indicated that attention manipulation by means of implicit cues influences early eye movements. That is, speakers were looking at the cued character first more often than at the non-cued referent. The aim of the eye-contingent utterance analysis was to understand whether it was the attention manipulation, which indeed affected the word order. Researchers compared the word order choice in sentences where the cue has shifted the gaze to those where it has not. The pattern suggested that initial capture of attention toward one referent over

Eye-Tracking Methods in Psycholinguistics

739

the other influences the order of mentioning. The EVS analysis revealed that in concordance with previous research (e.g., [46]), people tend to fixate the object or in this case the referent they are about to name. Specifically, analyses of the time course of eye position showed that fixating a character within the first 200 ms after stimulus onset reliably predicted the speaker’s tendency to mention this character first. Myachykov et al. [51] used EVS as a measurement of the structural pre-planning in two languages with differing degrees of word-order flexibility—Russian and English. Russian- and Englishspeaking Russian participants described transitive event pictures while their eye movements were monitored. Analysis of the EVS for each sentence constituent showed that it took Russian participants longer to plan their sentences, as reflected in longer sentence onset latencies and eye–voice spans for the sentence-initial constituent reflecting the greater amount of syntactic competition from available alternatives than in English and suggesting that syntactic flexibility is costly regardless of the language in use. Together, these and similar studies show how the derivative eye-tracking measurements, such as EVS, can be a fruitful source of information about the language production. 2.4 Eye-Tracking in Language Learning and Bilingualism Research

Eye-tracking methodology has also been used in studies of novel vocabulary, both in the native (L1) and in a second (L2) language acquisition ([52] for a recent review). Novel word learning is a process taking place throughout a person’s life, not only during development but also in adulthood. A typical example of the latter process is learning new words and structures of a foreign language. However, learning words of one’s native language never stops either with a considerable segment of novel vocabulary acquired in adulthood through reading and mainly in an incidental manner [53, 54]. Existing research analyzing reader’s eye movements as a consequence of the repeated encounter with novel written word forms through sentence contexts shows how new words are integrated into the L1 orthographic lexicon [55–61]. For example [58], monitored eye movements of adults while they were reading sentences containing novel words. First, growing exposure to novel words made the corresponding reading times faster with longer words taking longer to encode than short words. Second, readers fixated novel words for longer and they also made more regressive saccades toward them indicating that readers used contextual information from the text to guess possible meanings of the novel words [55]. Existing eye-tracking data also show that L1 orthographic learning occurs at an impressive speed with only few exposures required to consolidate the new word’s meaning into the existing lexicon. For instance, Chinese participants in a study by Li et al. [57] showed a significant decrease in their word fixation durations and in the frequency of regressive saccades to the pseudo-characters

740

Mikhail Pokhoday et al.

they were trained on after only five exposures with the eye-movement changes reflecting this learning process starting already at the second encounter with the novel stimuli. Furthermore, there is a growing interest in psycholinguistics research toward the neural mechanisms underlying the acquisition of orthographic processing and reading fluency skills in a second language. The proportion of bilingual population continues to grow with more and more people learning how to speak and read in two or even more languages. This often implies learning a new orthographic script. Classical eye-tracking measures, like fixations and saccades, provide a window onto this process and help understanding the cognitive mechanisms underlying L2 acquisition, particularly, the interplay between L1 and L2 language codes. Several studies analyzed eye-movement patterns as indicators of L2 reading skill acquisition showing progressive decrease of number and length of fixations as well as saccadic movements and regressions as a function of reading proficiency in L2 [62–69]. Similarly, to the L1 research described above, these changes are very rapid suggesting that the speed, with which novel words are incorporated into existing lexicons, is similar for L1 and L2. For instance, a recent study by [64] reported changes in the pattern of eye movements in a group of native speakers of Dutch while they were repeatedly exposed to novel words in their L2 (English) embedded in sentences; after a very short exposure (eight repetitions only), low frequency English words matched the processing of high-frequency L2 words (used as control stimuli) in fixation durations, gaze durations, and regressive saccades. Finally, eye-tracking has been applied to studying parallel language activation in bilingual speakers, both in visual (e.g., [70, 71]) and spoken modalities (e.g., [67, 72, 73]). These studies have systematically found that bilingual performance is affected by the presentation of interlingual competitors (stimuli in the inactive language that share orthographic or phonological properties with that active at the moment), as reflected in different eye-movement measures (regression or fixation to irrelevant competitors, saccades, total reading time) thus demonstrating parallel activation of both languages in a bilingual mind.

3

Beyond Eye Movements: Co-Registration of Brain-Ocular Activity Despite the many strengths of eye-tracking as a stand-alone research method, eye movements alone do not provide a direct window onto the underlying neural mechanisms. As perhaps the most widely used technique for measuring neuronal activity in the human brain, many studies use electroencephalography (EEG) to investigate neurocognitive mechanisms of language. EEG has proven to be especially suitable for the study of linguistic processing

Eye-Tracking Methods in Psycholinguistics

741

by providing an extremely accurate registration of the order and the chronometry of the brain processes involved in language use (see Chapters 5 and 6 in this volume). In addition, the method is relatively user-friendly and inexpensive compared to other popular neuroimaging methods in language research, such as functional magnetic resonance imaging (fMRi) or magnetoencephalography (MEG). Existing EEG literature offers ample evidence of the time course and interactivity of distinct language processes during both comprehension and production of speech (e.g., [74–78]), reflected in the modulation of event-related potentials (ERPs), registered as rapid as 50 ms following word onset [79, 80], indicating the fast and automatic fashion, in which language processing is carried out. The use of EEG has been particularly useful in reading research, where several ERPs have been identified and extensively studied in relation to different stages of visual word recognition. Thus, brain responses within first 200 ms post stimulus onset, such as the N1/P1 complex and the P200 component, have been attributed to the extraction of visual features and word-form access during orthographic processing [74, 81–83]. Later responses, such as the well-known N400 or P600 effects, are considered to index lexicosemantic access and structural integration as well as reanalysis processes (see [84, 85] for reviews on respective components). Combining eye-tracking with EEG may thus provide a unique tool that allows co-registration of eye movements and electrophysiological activity and offers a single and chronometrically accurate method to capture the neural mechanisms of linguistic processes. In what follows, we discuss the potential benefits of combining eye-tracking with EEG as well as associated methodological challenges. 3.1 Brain Potentials Locked to Oculomotor Behavior

The co-registration of eye movements and electroencephalographic activity allows synchronizing analysis of the oculomotor behavior, reflecting uptake of visual information, and the brain responses associated with the analysis of this information, all with the similarly high temporal resolution. As a result of such synchronization, fixation-related potentials (FRPs, also called eye-fixation-related potentials or EFRPs) and saccade-related potentials (SRPs, also called lambda waves), can be identified. These are averaged brain signals that are time-locked to an oculomotor event (a fixation or a saccade onset, respectively) instead of to the onset of a specific event (a word, image), as in the case of traditional ERPs. Thus, differently from ERPs, FRPs and SRPs offer more ecological validity since they allow to relate ERP signatures to a specific cognitive process indexed in the oculomotor data [86] and provide useful information about the interplay between the information uptake (eye movements) and the information use (ERPs). Nonetheless, given that very little information is acquired during saccades, and that fixations provide a perfect marker for the onset of word processing,

742

Mikhail Pokhoday et al.

FRPs are likely to be a more suitable measure of neural dynamics during linguistic processing. As a result, they have been used in psycholinguistic research more often [86, 87]. However, different studies have been focused on SRPs (i.e., [87, 88]). Besides the estimation of FRPs and SRPs, a very recent approach to co-registration of the electrophysiological and oculomotor brain activity is to time lock neuronal oscillations (see Chapter 7 in this volume) rather than ERPs to eye movements, thus obtaining a fixation- or saccade-related oscillations (FROs and SROs). Oscillatory brain dynamics related to changes in the rhythmic frequency of cortical excitability at different temporal and spatial scales is considered to be informative about the patterns of communication between different brain regions during a broad range of cognitive processes [89–92]. Oscillations at beta and gamma frequency bands are particularly crucial for our understanding of language processes [93, 94]. 3.2 Word-by-Word Versus Free-Viewing Paradigms

In general, there are two main paradigms that have being used for the obtention of FRPs and SRPs during the co-registration of eye-tracking and brain electrophysiological responses, namely, single-word paradigms and naturalistic, free-viewing paradigms. Single-word paradigms have been traditionally preferred for the extraction of FRPs and SRPs. This has been the standard procedure carried out in ERP studies in psycholinguistics ([78, 84] for reviews), in which words are serially presented to participants in the center of the screen and ERPs are time-locked to the word onset, thus avoiding ocular artifacts (i.e., saccades and blinks) that interfere with the EEG signal, which can lead to large numbers of excluded trials or even participants [95, 96]. The use of single-word approach during co-registration of eye movements and EEG activity has provided evidence regarding the relation between oculomotor and brain signals during language processing, particularly with regard to the parafoveal-on-foveal effects, that is, showing the influence of lexical or semantic information from parafoveally presented words on the ERPs obtained during reading of the fixated word [86–88, 97]. Despite its popularity, the single-word paradigm is not the most optimal approach for the study of the linguistic processing, and particularly of its dynamics; indeed, a word-by-word presentation does not resemble the parallel processing that is carried out during natural reading, where reader makes regressive movements to previously presented words, extracts information from not-yetfixated words presented in the parafoveal visual field, or makes shorter or longer saccades through words along the sentence. All this information regarding the sequential motor activity carried out in natural reading contexts is disregarded in single-visual paradigms. For this reason, free-viewing paradigms, in which participants are presented with the whole sentence or text paragraphs

Eye-Tracking Methods in Psycholinguistics

743

during the co-registration of both ocular and brain dynamics, are the most recommendable approach to study reading fluency and its underlying neural mechanisms; indeed, this approach does not break the natural flow of information that is presented during reading neither interferes with the speed and general pattern of eye movements. Moreover, natural, free-viewing paradigms do not introduce non-natural ocular behavior patterns derived from taskdemands, such as the maintenance of the fixation in target words or the avoidance of blinks. 3.3 Methodological Challenges of CoRegistration

However, although free-viewing paradigms are undoubtedly a better option to address continuous neurophysiological modulations underlying natural reading (among other cognitive processes), there are important methodological challenges that have refrained researchers to use this approach until the last few years. Such technical limitations are mainly related to the pre-processing stage of the FRPs and SRPs; indeed, once potential technical problems related to the synchronization of both ocular and EEG signals are solved, the recording and estimation of FRPs and SRPs are similar to that carried out for ERPs, this latter involving the average of a sufficient number of epochs or trials in order to obtain a good quality in the signal (due to the low signal-to-noise ratio). However, the most severe limitations of non-constrained, free-viewing paradigms are the ocular movements preceding and following the fixation period (i.e., saccadic spike potentials, corneo-retinal dipole changes, and their interrelations) which can interfere the FRPs (as well any other brain potential), and the overlap between subsequent FRPs [98, 99]. Since the duration between fixations can be actually shorter than the latency of an event-related potential, more than one fixation can occur within the epoch, resulting in the overlap of the underlying neural responses. Both problems are avoided in single-word paradigms, with the corresponding restrictions imposed by this approach. Nonetheless, different algorithms and procedures have been recently developed in order to solve these issues by correcting and decomposing the signal [100–102], allowing for an effective co-registration and extraction of interpretable FRPs and SRPs effects in naturalistic contexts. Indeed, some recent methodological studies offer a detailed description of the use of these tools for the identification and correction of ocular artifacts as well as to deal with the overlapping issue during the preprocessing of EEG signals obtained under a non-constrained approach (see, e.g., [14, 99, 103–105]). It is important to mention the work by [104] in which the authors compare the performance of two of the most used methods for artifact correction: linear regression and independent component analysis (ICA), likely the most popular procedure for the identification and correction of ocular artifacts nowadays, in single-word paradigms and also in free-viewing paradigms. According to their

744

Mikhail Pokhoday et al.

study of guided eye movements with parallel 64-channel EEG recording, there are different independent sources of eye-movement artifacts, which in turn leads to over or under correction of individual artifacts by regression-based correction. To avoid this problem, Plochl et al. [104] propose to use the developed ICA-procedure as a tool for optimized detection and correction of eye-movement-related artifact components. Although these methodological issues can seem threatening and its solutions challenging, it is clearly worthwhile to opt for the combination of both techniques, taking into account the benefit that each method entails to the stand-alone implementation of the other: possibility to capture the underlying neural mechanisms of ocular behavior on the one hand, and to fix the specific onset time for these dynamics as well as possibility to account for the cognitive processing in the most naturalistic context possible. Different studies have already proven the feasibility for the co-registration of EEG-eye-tracking, effectively solving the artifact correction and overlapping problems by implementing these tools (see next section). Importantly, the particular research question and analysis aimed in the study must be carefully taken into account before applying an artifact correction procedure, since depending on the particular process under study, the implementation of these procedures might be even unnecessary. For instance, different studies in novel word learning literature have found extremely rapid enhancements of the EEG signal, within the first 200 ms (and, in some cases even earlier), resembling fast lexicosemantic access of newly trained word [106–108] similar early effects have been also found in visual word recognition [80, 109, 110]. Thus, FRP effects could be estimated within the fixation duration (e.g., 200 ms) with no need to previously correct for ocular artifacts (although in such case baseline correction should be carried out using the first ms of the fixation period, free of ocular artifacts). As a final note, experimenters must consider that free-viewing paradigms led a lower signal-to-noise ratio than single-word paradigms [111]. This is not strange, considering the higher variability in the obtained data, depending on individual oculomotor patterns of participants, who can show differences in reading speed, words skipped, regressions made, etc. Therefore, a sufficient amount of trials and participants must be must aimed in free-viewing studies in order to estimate the neural correlates underlying the effect studied as well as to investigate correlations between brain responses and oculomotor behavior with sufficient statistical power. Therefore, the implementation of free vision paradigms during co-registration of eye movements and EEG data is not only feasible but highly recommendable. Recent studies in psycholinguistic research have already shown the benefits of this combined approach, providing valuable information regarding the dynamics of language processing during reading [14, 98, 103, 112–115]. In

Eye-Tracking Methods in Psycholinguistics

745

a complete methodological study [14], FRPs were estimated in conditions of text-reading (meaningful) and pseudo-reading (using sentences made of pseudowords). The comparison of both signals revealed differences in early components (N1, P1) typically found in traditional ERP reading research, thus demonstrating the applicability of the co-registration and the correction methods described by the authors to effectively study the neurophysiological correlates of online, natural reading. Co-registration of EEG and eye movements has been particularly informative in regard to predictability effects during natural reading. For instance, in Kretzschmar et al. [112], word predictability effects were found to affect the semantically related N400 component during natural sentence reading, with higher N400 amplitudes for the semantically unrelated words presented in the parafovea [112]. Interestingly, in this and other studies (i.e., [103]) the onset of the N400 predictability effect was found earlier than traditionally found in single-word paradigms (in particular, during the last fixation of the critical word), indicating that the flow of visual word recognition might be different during natural reading. Furthermore, parafoveal-on-foveal effects have been recently registered in earlier ERP components, such as N100, likely reflecting facilitation in the processing of the lower-level visual word features [98, 114]. Importantly, single-word paradigms are not able to capture such benefits obtained from parafoveal preview that are inherent to natural reading processing, since no information can be extracted for the parafovea. Other studies using co-registration have proven the importance of reading freely through the sentence or text for the effective comprehension [111], in comparison to serial visual presentation of words. This was observed in the association between regression saccades to previous words containing syntactic violations, and the modulation of the P600, a brain signal known to index sentence integration processes, resulting in good comprehension rates as well (see also [116], for similar relation between regression ocular activity and P600 activity). Some recent studies used co-registration of neuronal oscillations and eye movements to provide important insights regarding online semantic and syntactic processing [111, 117, 118]. For instance, Vignali et al. [117] identified the oscillatory brain dynamics involved during the online, sentence comprehension, with lower beta band desynchronization (13–18 Hz) during semantic error detection and increased gamma (31–55 Hz) and theta (4–7 Hz) power during parsing of syntactically correct sentences, but not when the order of words was randomized and then causing syntactic violation. Moreover, co-registration of eye movements and EEG signals during free vision paradigms has been successfully applied in other fields, such as attention [119], memory [120, 121], or emotion (see

746

Mikhail Pokhoday et al.

[99, 122–124]—for recent reviews of co-registration of eye-tracking and neurophysiological measures with free-viewing protocols), proving the potential benefit of such methodological combination for the study of human cognition. In general, co-registration of eye movements and EEG (note that the same logic largely refers to magnetoencephalography, MEG), particular during free-viewing paradigms, provides new fundamental information about the neurophysiological underpinnings of natural reading processing, going beyond the traditional word-by-word ERP research. Although initially challenging, the combination of both techniques has opened a promising avenue of research in psycholinguistics, providing enormously valuable information about the behavioral and neural correlates of different cognitive processes, with a high ecological validity in the procedure and very low costs in its implementation.

4

Conclusions and Future Directions This chapter provided a brief overview of the psycholinguistic research using eye-tracking methodology, both as a stand-alone technique and in combination with EEG. Eye-tracking is a relatively cheap tool that allows attribution of ecologically valid data reflecting millisecond-by-millisecond linguistic processing. Recent methodological advances allowing co-registration of oculomotor and electrophysiological brain activity using naturalistic, self-paced reading or viewing approaches, will allow to capitalize on the two methodologies’ strengths. The implementation of eye-tracking methodology is particularly suitable for the study of reading mechanisms, as well as any other task involving visual processing. Regarding reading, this is a complex cognitive activity, characterized for an active and varied pattern of ocular movements (i.e., direct saccadic movements, regressions) which allows the continuous top-down and bottomup flow of information in order to ensure comprehension. EEG methodology has been massively used in reading research; however, it is questionable whether the paradigms used are the most suitable to account for the neural mechanism of reading or, most likely, for those underlying visual word recognition, since reading processing is constrained single words (i.e., presented in isolation, word-byword presentation in sentences, two-word presentation). The use of the eye-tracking technique during the recording of the EEG signals presents as the most suitable option to account for neural mechanism of fluent reading. Indeed, the co-registration enables the use of oculomotor information to extract brain responses the exactly underlie the cognitive process under study. This combination of methods has provided highly valuable findings in psycholinguistics which indeed have led to considerably important implications for ERP research.

Eye-Tracking Methods in Psycholinguistics

747

Overall, the current research trend in psycholinguistics and in cognitive sciences in general is the development of combinatorial approaches with fine-grain methods, from which our field can benefit the most. Although challenging at methodological level (due to the interference of ocular artifacts or the overlap of cognitive processes), the combination of eye-tracking and EEG methods have already demonstrated as a very suitable tool to investigate the neural mechanisms of free vision and reading processing. A promising new avenue of research is the application of co-registration of eye and brain signals with brain computer interfaces (BCI, see, e.g., [125–127]). In this sense, oculomotor activity can be used for the detection of those EEG signals that can be utilized to control external devices, thus providing important clinical applications for disabled population helping them to restore communication as well as other practical applications (e.g., remote system controls).

Acknowledgment The reported study was funded by RFBR, project number 19-313-51023. References 1. Yarbus AL (1965) Role of eye movements in the visual process. Nauka, Oxford, UK 2. Duchowski AT (2017) Diversity and types of eye tracking applications. In: Eye tracking methodology. Springer, Cham, pp 247–248 3. Holmqvist K, Nystro¨m M, Andersson R et al (2011) Eye tracking: a comprehensive guide to methods and measures. OUP, Oxford 4. Laeng B, Sirois S, Gredeb€ack G (2012) Pupillometry: a window to the preconscious? Perspect Psychol Sci 7(1):18–27 5. Mathoˆt S (2018) Pupillometry: psychology, physiology, and function. J Cogn 1(1) 6. Raney GE, Campbell SJ, Bovee JC (2014) Using eye movements to evaluate the cognitive processes involved in text comprehension. JoVE (J Vis Exp) 83:e50780 7. Miles WR (1930) Ocular dominance in human adults. J Gen Psychol 3(3):412–430 8. Nystro¨m M, Andersson R, Holmqvist K et al (2013) The influence of calibration method and eye physiology on eyetracking data quality. Behav Res Methods 45(1):272–288 9. Rayner K (1998) Eye movements in reading and information processing: 20 years of research. Psychol Bull 124(3):372

10. Rayner K, Sereno SC, Raney GE (1996) Eye movement control in reading: a comparison of two types of models. J Exp Psychol Hum Percept Perform 22(5):1188 11. Clifton C Jr, Ferreira F, Henderson JM et al (2016) Eye movements in reading and information processing: Keith Rayner’s 40 year legacy. J Mem Lang 86:1–19 12. Pickering MJ, Frisson S, McElree B et al (2004) Eye movements and semantic composition. In: The on-line study of sentence comprehension: eyetracking, ERP, and beyond. Psychology Press, pp 33–50 13. Staub A, Rayner K (2007) Eye movements and on-line comprehension processes. In: The Oxford handbook of psycholinguistics, vol 327. Oxford University Press, Oxford, p 342 14. Henderson JM, Luke SG, Schmidt J et al (2013) Co-registration of eye movements and event-related potentials in connectedtext paragraph reading. Front Syst Neurosci 7:28 15. Rayner K, Pollatsek A, Reisberg D (2013) Basic processes in reading. In: The Oxford handbook of cognitive psychology. Oxford University Press, New York, pp 442–461

748

Mikhail Pokhoday et al.

16. Kennedy A (2000) Parafoveal processing in word recognition. Q J Exp Psychol A 53(2): 429–455 17. Starr M, Inhoff A (2004) Attention allocation to the right and left of a fixated word: use of orthographic information from multiple words during reading. Eur J Cogn Psychol 16(1–2):203–225 18. Rayner K, Duffy SA (1986) Lexical complexity and fixation times in reading: effects of word frequency, verb complexity, and lexical ambiguity. Mem Cogn 14(3):191–201 19. Juhasz BJ, White SJ, Liversedge SP et al (2008) Eye movements and the use of parafoveal word length information in reading. J Exp Psychol Hum Percept Perform 34(6): 1560 20. Rayner K, Slattery TJ, Drieghe D et al (2011) Eye movements and word skipping during reading: effects of word length and predictability. J Exp Psychol Hum Percept Perform 37(2):514 21. Clifton C Jr, Staub A, Clifton C (2011) Syntactic influences on eye movements during reading. Eye 3(2) 22. Ehrlich K, Rayner K (1983) Pronoun assignment and semantic integration during reading: eye movements and immediacy of processing. J Verbal Learn Verbal Behav 22(1):75–87 23. Brysbaert M, Drieghe D, Vitu F (2005) Cognitive processes in eye guidance. Oxford University Press, Oxford 24. Rayner K (1986) Eye movements and the perceptual span in beginning and skilled readers. J Exp Child Psychol 41(2):211–236 25. McConkie GW, Zola D, Grimes J et al (1991) Children’s eye movements during reading. Vis Vis Dyslexia 13:251–262 26. Blythe HI, Liversedge SP, Joseph HS et al (2006) The binocular coordination of eye movements during reading in children and adults. Vis Res 46(22):3898–3908 27. Feng G, Miller K, Shu H et al (2009) Orthography and the development of reading processes: an eye-movement study of Chinese and English. Child Dev 80(3):720–735 28. Huestegge L, Radach R, Corbic D et al (2009) Oculomotor and linguistic determinants of reading development: a longitudinal study. Vis Res 49(24):2948–2959 29. Ashby J, Rayner K, Clifton C (2005) Eye movements of highly skilled and average readers: differential effects of frequency and predictability. Q J Exp Psychol A 58(6): 1065–1086

30. Chace KH, Rayner K, Well AD (2005) Eye movements and phonological parafoveal preview: effects of reading skill. Can J Exp Psychol/Revue canadienne de psychologie expe´rimentale 59(3):209 31. Huettig F, Rommers J, Meyer AS (2011) Using the visual world paradigm to study language processing: a review and critical evaluation. Acta Psychol 137(2):151–171 32. Cooper RM (1974) The control of eye fixation by the meaning of spoken language: a new methodology for the real-time investigation of speech perception, memory, and language processing. Cogn Psychol 6:813–839 33. Eberhard KM, Spivey-Knowlton MJ, Sedivy JC et al (1995) Eye movements as a window into real-time spoken language comprehension in natural contexts. J Psycholinguist Res 24(6):409–436 34. Tanenhaus MK, Spivey-Knowlton MJ, Eberhard KM et al (1996) Using eye movements to study spoken language comprehension: evidence for visually mediated incremental interpretation 35. Altmann GT, Kamide Y (1999) Incremental interpretation at verbs: restricting the domain of subsequent reference. Cognition 73(3): 247–264 36. Tanenhaus MK, Spivey-Knowlton MJ, Eberhard KM et al (1995) Integration of visual and linguistic information in spoken language comprehension. Science 268(5217): 1632–1634 37. Allopenna PD, Magnuson JS, Tanenhaus MK (1998) Tracking the time course of spoken word recognition using eye movements: evidence for continuous mapping models. J Mem Lang 38(4):419–439 38. Tanenhaus MK, Trueswell JC (2006) Eye movements and spoken language comprehension. In: Handbook of psycholinguistics. Academic Press, pp 863–900 39. Knoeferle P, Crocker MW, Scheepers C, Pickering MJ (2005) The influence of the immediate visual context on incremental thematic role-assignment: evidence from eye-movements in depicted events. Cognition 95(1):95–127 40. Knoeferle P, Crocker MW (2005) Incremental effects of mismatch during picturesentence integration: evidence from eye-tracking. In: Proceedings of the 26th annual conference of the Cognitive Science Society, pp 1166–1171 41. Huettig F, McQueen JM (2007) The tug of war between phonological, semantic and

Eye-Tracking Methods in Psycholinguistics shape information in language-mediated visual search. J Mem Lang 57(4):460–482 42. Weber A, Melinger A, Lara Tapia L (2007) The mapping of phonetic information to lexical presentations in Spanish: evidence from eye movements. In: 16th international congress of phonetic sciences (ICPhS 2007). Pirrot, pp 1941–1944 43. Meyer AS, Sleiderink AM, Levelt WJ (1998) Viewing and naming objects: eye movements during noun phrase production. Cognition 66(2):B25–B33 44. Bock K, Irwin DE, Davidson DJ et al (2003) Minding the clock. J Mem Lang 48(4): 653–685 45. Griffin ZM (2001) Gaze durations during speech reflect word selection and phonological encoding. Cognition 82(1):B1–B14 46. Griffin ZM, Bock K (2000) What the eyes say about speaking. Psychol Sci 11(4):274–279 47. Griffin ZM, Weinstein-Tull J (2003) Conceptual structure modulates structural priming in the production of complex sentences. J Mem Lang 49(4):537–555 48. Buswell GT (1920) An experimental study of the eye-voice span in reading (No. 17). University of Chicago 49. Fairbanks G (1937) The relation between eye-movements and voice in the oral reading of good and poor silent readers. Psychol Monogr 48(3):78 50. Gleitman LR, January D, Nappa R et al (2007) On the give and take between event apprehension and utterance formulation. J Mem Lang 57(4):544–569 51. Myachykov A, Scheepers C, Garrod S et al (2013) Syntactic flexibility and competition in sentence production: the case of English and Russian. Q J Exp Psychol 66(8): 1601–1619 52. Conklin K, Pellicer-Sa´nchez A (2016) Using eye-tracking in applied linguistics and second language research. Second Lang Res 32(3): 453–467 53. Bolger DJ, Balass M, Landen E et al (2008) Context variation and definitions in learning the meanings of words: an instance-based learning approach. Discourse Process 45(2): 122–159 54. Reichle ED, Perfetti CA (2003) Morphology in word identification: a word-experience model that accounts for morpheme frequency effects. Sci Stud Read 7(3):219–237 55. Chaffin R, Morris RK, Seely RE (2001) Learning new word meanings from context: a study of eye movements. J Exp Psychol Learn Mem Cogn 27(1):225

749

56. Joseph HS, Wonnacott E, Forbes P et al (2014) Becoming a written word: eye movements reveal order of acquisition effects following incidental exposure to new words during silent reading. Cognition 133(1): 238–248 57. Li L, Marinus E, Castles A, Yu L et al (2019) Eye-tracking the effect of semantic decoding on orthographic learning in Chinese 58. Lowell R, Morris RK (2014) Word length effects on novel words: evidence from eye movements. Atten Percept Psychophys 76(1):179–189 59. Godfroid A, Boers F, Housen A (2013) An eye for words: gauging the role of attention in incidental L2 vocabulary acquisition by means of eye-tracking. Stud Second Lang Acquis 35(3):483–517 60. Godfroid A, Ahn J, Choi I et al (2018) Incidental vocabulary learning in a natural reading context: an eye-tracking study. Biling Lang Congn 21(3):563–584 61. Wochna KL, Juhasz BJ (2013) Context length and reading novel words: an eye-movement investigation. Br J Psychol 104(3):347–363 62. Balling LW (2013) Does good writing mean good reading?: an eye-tracking investigation of the effect of writing advice on reading. Fachsprache Int J Spec Commun 35(1–2): 2–23 63. Cop U, Keuleers E, Drieghe D, Duyck W (2015) Frequency effects in monolingual and bilingual natural reading. Psychon Bull Rev 22(5):1216–1234 64. Elgort I, Brysbaert M, Stevens M, Van Assche E (2018) Contextual word learning during reading in a second language: an eye-movement study. Stud Second Lang Acquis 40(2):341–366 65. Koval NG (2019) Testing the deficient processing account of the spacing effect in second language vocabulary learning: evidence from eye tracking. Appl Psycholinguist 40(5): 1103–1139 66. Marian V, Spivey M (2003) Competing activation in bilingual language processing: within-and between-language competition. Biling Lang Congn 6(2):97–115 67. Marian V, Spivey M, Hirsch J (2003) Shared and separate systems in bilingual language processing: converging evidence from eyetracking and brain imaging. Brain Lang 86(1):70–82 68. Mohamed AA (2018) Exposure frequency in L2 reading: an eye-movement perspective of incidental vocabulary learning. Stud Second Lang Acquis 40(2):269–293

750

Mikhail Pokhoday et al.

69. Pellicer-Sa´nchez A (2016) Incidental L2 vocabulary acquisition from and while reading: an eye-tracking study. Stud Second Lang Acquis 38(1):97–130 70. Altarriba J, Kroll JF, Sholl A et al (1996) The influence of lexical and conceptual constraints on reading mixed-language sentences: evidence from eye fixations and naming times. Mem Cogn 24(4):477–492 71. Libben MR, Titone DA (2009) Bilingual lexical access in context: evidence from eye movements during reading. J Exp Psychol Learn Mem Cogn 35(2):381 72. Chambers CG, Cooke H (2009) Lexical competition during second-language listening: sentence context, but not proficiency, constrains interference from the native lexicon. J Exp Psychol Learn Mem Cogn 35(4):1029 73. Ju M, Luce PA (2004) Falling on sensitive ears: constraints on bilingual lexical activation. Psychol Sci 15(5):314–318 74. Bentin S, Mouchetant-Rostaing Y, Giard MH et al (1999) ERP manifestations of processing printed words at different psycholinguistic levels: time course and scalp distribution. J Cogn Neurosci 11(3):235–260 75. Coulson S (2007) Electrifying results: ERP data and cognitive linguistics. Methods Cogn Linguist 18:400 76. Ganushchak L, Christoffels I, Schiller NO (2011) The use of electroencephalography in language production research: a review. Front Psychol 2:208 77. Indefrey P, Levelt WJ (2000) The neural correlates of language production. In: The new cognitive neurosciences, 2nd edn. MIT press, pp 845–865 78. Kutas M, Van Petten CK, Kluender R (2006) Psycholinguistics electrified II (1994–2005). In: Handbook of psycholinguistics. Academic Press, pp 659–724 79. MacGregor LJ, Pulvermu¨ller F, Van Casteren M et al (2012) Ultra-rapid access to words in the brain. Nat Commun 3:711 80. Shtyrov Y, Lenzen M (2017) First-pass neocortical processing of spoken language takes only 30 msec: electrophysiological evidence. Cogn Neurosci 8(1):24–38 81. Assadollahi R, Pulvermu¨ller F (2001) Neuromagnetic evidence for early access to cognitive representations. Neuroreport 12:207–213 82. Carreiras M, Vergara M, Barber H (2005) Early event-related potential effects of syllabic processing during visual word recognition. J Cogn Neurosci 17(11):1803–1817 83. Proverbio AM, Vecchi L, Zani A (2004) From orthography to phonetics: ERP measures of

grapheme-to-phoneme conversion mechanisms in reading. J Cogn Neurosci 16(2): 301–317 84. Kutas M, Federmeier KD (2011) Thirty years and counting: finding meaning in the N400 component of the event-related brain potential (ERP). Annu Rev Psychol 62:621–647 85. Friederici AD, Weissenborn J (2007) Mapping sentence form onto meaning: the syntax–semantic interface. Brain Res 1146: 50–58 ˜ MLH et al (2007) 86. Hutzler F, Braun M, Vo Welcome to the real world: validating fixation-related brain potentials for ecologically valid settings. Brain Res 1172:124–129. https://doi.org/10.1016/j.brainres.2007. 07.025 87. Baccino T, Manunta Y (2005) Eye-fixationrelated potentials: insight into parafoveal processing. J Psychophysiol 19(3):204–215 88. Simola J, Holmqvist K, Lindgren M (2009) Right visual field advantage in parafoveal processing: evidence from eye-fixation-related potentials. Brain Lang 111(2):101–113 89. Siegel M, Donner TH, Engel AK (2012) Spectral fingerprints of large-scale neuronal interactions. Nat Rev Neurosci 13(2):121 90. Singer W (2011) Dynamic formation of functional networks by synchronization. Neuron 69(2):191–193 91. von Stein A, Chiang C, Ko¨nig P (2000) Top-down processing mediated by interareal synchronization. Proc Natl Acad Sci 97(26): 14748–14753 92. Bressler SL, Richter CG (2015) Interareal oscillatory synchronization in top-down neocortical processing. Curr Opin Neurobiol 31: 62–66 93. Bastiaansen M, Hagoort P (2006) Oscillatory neuronal dynamics during language comprehension. Prog Brain Res 159:179–196 94. Lewis AG, Wang L, Bastiaansen M (2015) Fast oscillatory dynamics during language comprehension: unification versus maintenance and prediction? Brain Lang 148:51–63 95. Picton TW, van Roon P, Armilio ML, Berg P, Ille N, Scherg M (2000) The correction of ocular artifacts: a topographic perspective. Clin Neurophysiol 111(1):53–65 96. Berg P, Scherg M (1991) Dipole modelling of eye activity and its application to the removal of eye artefacts from the EEG and MEG. Clin Phys Physiol Meas 12(A):49 97. Lo´pez-Pere´z PJ, Dampure´ J, Herna´ndezCabrera JA, Barber HA (2016) Semantic parafoveal-on-foveal effects and preview

Eye-Tracking Methods in Psycholinguistics benefits in reading: evidence from fixation related potentials. Brain Lang 162:29–34 98. Dimigen O, Kliegl R, Sommer W (2012) Trans-saccadic parafoveal preview benefits in fluent reading: a study with fixation-related brain potentials. NeuroImage 62:381–393. https://doi.org/10.1016/j.neuroimage. 2012.04.006 99. Nikolaev AR, Meghanathan RN, van Leeuwen C (2016) Combining EEG and eye movement recording in free viewing: pitfalls and possibilities. Brain Cogn 107:55–83 100. Croft RJ, Barry RJ (2000) Removal of ocular artifact from the EEG: a review. Neurophysiologie Clinique/Clin Neurophysiol 30(1): 5–19 101. Delorme A, Sejnowski T, Makeig S (2007) Enhanced detection of artifacts in EEG data using higher-order statistics and independent component analysis. NeuroImage 34(4): 1443–1449 102. Ille N, Berg P, Scherg M (2002) Artifact correction of the ongoing EEG using spatial filters based on artifact and brain signal topographies. J Clin Neurophysiol 19(2): 113–124 103. Dimigen O, Sommer W, Hohlfeld A et al (2011) Coregistration of eye movements and EEG in natural reading: analyses and review. J Exp Psychol–Gen 140:552–572. https://doi.org/10.1037/a0023885 104. Plo¨chl M, Ossando´n JP, Ko¨nig P (2012) Combining EEG and eye tracking: identification, characterization, and correction of eye movement artifacts in electroencephalographic data. Front Hum Neurosci 6:278 105. Ries AJ, Slayback D, Touryan J (2018) The fixation-related lambda response: effects of saccade magnitude, spatial frequency, and ocular artifact removal. Int J Psychophysiol 134:1–8 106. Shtyrov Y, Nikulin VV, Pulvermu¨ller F (2010) Rapid cortical plasticity underlying novel word learning. J Neurosci 30(50): 16864–16867 107. Shtyrov Y (2011) Fast mapping of novel word forms traced neurophysiologically. Front Psychol 2:340 108. Partanen EJ, Leminen A, Cook C, Shtyrov Y (2018) Formation of neocortical memory circuits for unattended written word forms: neuromagnetic evidence. Sci Rep 8(1):1–10 109. Hauk O, Pulvermu¨ller F (2004) Neurophysiological distinction of action words in the fronto-central cortex. Hum Brain Mapp 21(3):191–201

751

110. Pulvermu¨ller F, Shtyrov Y, Hauk O (2009) Understanding in an instant: neurophysiological evidence for mechanistic language circuits in the brain. Brain Lang 110(2):81–94 111. Metzner P, von der Malsburg T, Vasishth S et al (2015) Brain responses to world knowledge violations: a comparison of stimulus-and fixation-triggered event-related potentials and neural oscillations. J Cogn Neurosci 27(5):1017–1028 112. Kretzschmar F, Bornkessel-Schlesewsky I, Schlesewsky M (2009) Parafoveal versus foveal N400s dissociate spreading activation from contextual fit. Neuroreport 20:1613– 1618. https://doi.org/10.1097/WNR. 0b013e328332c4f4 113. Kretzschmar F, Schlesewsky M, Staub A (2015) Dissociating word frequency and predictability effects in reading: evidence from coregistration of eye movements and EEG. J Exp Psychol Learn Mem Cogn 41(6):1648 114. Li N, Niefind F, Wang S, Sommer W, Dimigen O (2015) Parafoveal processing in reading Chinese sentences: evidence from event-related brain potentials. Psychophysiology 52(10):1361–1374 115. Takeda Y, Sugai M, Yagi A (2001) Eye fixation related potentials in a proof-reading task. Int J Psychophysiol 40(3):181–186 116. Dimigen O, Sommer W, Kliegl R (2007) Long reading regressions are accompanied by a P600-like brain potential. J Eye Mov Res 1:129 117. Vignali L, Himmelstoss NA, Hawelka S et al (2016) Oscillatory brain dynamics during sentence reading: a fixation-related spectral perturbation analysis. Front Hum Neurosci 10:191 118. Kornrumpf B, Dimigen O, Sommer W (2017) Lateralization of posterior alpha EEG reflects the distribution of spatial attention during saccadic reading. Psychophysiology 54(6):809–823 119. Fischer T, Graupner ST, Velichkovsky BM, Pannasch S (2013) Attentional dynamics during free picture viewing: evidence from oculomotor behavior and electrocortical activity. Front Syst Neurosci 7:17 120. Nikolaev AR, Nakatani C, Plomp G et al (2011) Eye fixation-related potentials in free viewing identify encoding failures in change detection. NeuroImage 56(3):1598–1607 121. Nikolaev AR, Jurica P, Nakatani C et al (2013) Visual encoding and fixation target selection in free viewing: presaccadic brain potentials. Front Syst Neurosci 7:26

752

Mikhail Pokhoday et al.

122. Simola J, Le Fevre K, Torniainen J et al (2015) Affective processing in natural scene viewing: valence and arousal interactions in eye-fixation-related potentials. NeuroImage 106:21–33 123. Simola J, Torniainen J, Moisala M et al (2013) Eye movement related brain responses to emotional scenes during free viewing. Front Syst Neurosci 7:41 124. Himmelstoss NA, Schuster S, Hutzler F, Moran R, Hawelka S (2019) Co-registration of eye movements and neuroimaging for studying contextual predictions in natural reading. Lang Cogn Neurosci:1–18

125. Stawicki P, Gembler F, Rezeika A, Volosyak I (2017) A novel hybrid mental spelling application based on eye tracking and SSVEPbased BCI. Brain Sci 7(4):35 126. Gembler F, Stawicki P, Saboor A, Volosyak I (2019) Dynamic time window mechanism for time synchronous VEP-based BCIs—performance evaluation with a dictionary-supported BCI speller employing SSVEP and c-VEP. PLoS One 14(6):e0218177 127. Zhao M, Gao H, Wang W, Qu J (2020) Research on human-computer interaction intention recognition based on EEG and eye movement. IEEE Access 8:145824–145832

Chapter 24 Neurophysiology of Language Pathologies Laura Verga, Michael Schwartze, and Sonja A. Kotz Abstract Language- and speech-related disorders are among the most frequent consequences of developmental and acquired pathologies. While classical approaches to the study of these disorders typically employed the lesion method to unveil one-to-one correspondence between locations, the extent of the brain damage, and corresponding symptoms, recent advances advocate the use of online methods of investigation. For example, the use of electrophysiology or magnetoencephalography—especially when combined with anatomical measures—allows for in vivo tracking of real-time language and speech events, and thus represents a particularly promising venue for future research targeting rehabilitative interventions. In this chapter, we provide a comprehensive overview of language and speech pathologies arising from cortical and/or subcortical damage, and their corresponding neurophysiological and pathological symptoms. Building upon the reviewed evidence and literature, we aim at providing a description of how the neurophysiology of the language network changes as a result of brain damage. We will conclude by summarizing the evidence presented in this chapter, while suggesting directions for future research. Key words EEG, ERPs, TMS, Aphasia, Parkinson’s disease, Specific Language Impairment

1

Introduction Any brain injury or developmental disorder resulting in an impairment of language and/or of its vocal expression, speech, comes at a high price for a person’s quality of life. Given the individual burden and societal cost of these pathologies, research has extensively focused on identifying pathology-specific neurophysiological markers, on exploring neural mechanisms supporting recovery of function, and on developing interventions to restore functionality to premorbid levels. Classic research on language- and speech-related pathologies primarily targeted the metabolic, hemodynamic, and structural brain levels in an attempt to link symptomatology to the location of its organic cause. While this approach contributed to significant advances in neuropsychology and neurolinguistics, it underestimates the fact that speech—as many other

Mirko Grimaldi et al. (eds.), Language Electrified: Principles, Methods, and Future Perspectives of Investigation, Neuromethods, vol. 202, https://doi.org/10.1007/978-1-0716-3263-5_24, © Springer Science+Business Media, LLC, part of Springer Nature 2023

753

754

Laura Verga et al.

human activities—unfolds in time; hence, significant insight may come from the analysis of speech and language by employing electrophysiological measures, such as electroencephalography (EEG; e.g., [1–3]) and magnetoencephalography (MEG; e.g., [4, 5]), capable of tracking spontaneous activity (e.g., during resting state) and stimulus-driven activity (evoked responses) with high temporal resolution. Yet further insight might come from techniques capable of interfering with ongoing brain activity, such as Transcranial Magnetic Stimulation (TMS) and Direct-Current Stimulation (tDCS; e.g., [6–8]). More and more frequently, however, multiple techniques allowing both high temporal resolution and spatial resolution—for example, electrophysiological and structural recordings—are combined to reach an even more comprehensive and integrated account of the symptomatology and neurobiology of language and speech disorders arising as a consequence of both developmental and acquired brain damage [9, 10]. The development of such an integrative approach in the study of speech and language pathologies, however, does not end with adopting integrated methods, but should also consider the composite nature of the study object: Language is a complex and multilayered system, whose components are spread across several hubs, ancillary nodes, and connections in a distributed corticosubcortical network [11, 12]. Lesions occurring in either hubs or connections of this system may result not only in local alterations of the underlying neurophysiology, but in a global updating of the entire system [13]. For example, both an increase in delta amplitude—a slow EEG band usually associated with deep sleep and rest—and a decrease in the beta band, a marker of active brain processing—have been described as markers of brain damage across a variety of pathologies and symptomatology, including aphasia [14]. While this evidence is suggestive of plastic neurophysiological alterations that are independent of the pathological mechanism, specific markers (e.g., a decrease in beta range activity in response to linguistic stimuli, typical of post-stroke aphasia—see Subheading 2.1) are bound to depend upon several factors. This chapter aims at targeting these factors to provide a comprehensive account of speech- and language-related pathologies. We will discuss the consequences of damage with diverse etiologies (e.g., focal or diffuse, developmental, affecting perception or production); further, we will explore the specific and often neglected contribution of subcortical structures in the language network. Building upon the reviewed evidence and literature, we will sketch a description of how the neurophysiology of the language network changes as a result of brain damage. We will conclude by summarizing the evidence presented in this chapter, while suggesting directions for future research.

Neurophysiology of Language Pathologies

2

755

Forms of Language Disorders Human language and its vocal expression, speech, rely on a complex system of several cortical hubs and their underlying whitematter connections [15–18]. Yet another important part of the circuit is represented by the basal ganglia, a group of subcortical nuclei which form extensive networks with near and distant cerebral regions (e.g., pre-supplementary motor area—preSMA, cerebellum [19–21]). Lesions at both cortical and subcortical levels may result in speech and language pathologies; however, several additional factors may have an impact on the resulting symptomatology and recovery trajectory. One such factor is whether the lesion is focal (i.e., localized) or diffuse (i.e., widespread): Diffuse damage typically entails more severe deficits, although important exceptions need to be discussed. The age of the patient is an equally important aspect to consider: Not only may lesions or deficits occurring during childhood have significantly different outcomes from their adult equivalents, but this very plastic period of life may cause specific impairments whose onset is not typical in adulthood. Finally, language pathologies may be classified depending on whether they affect primarily the system input (i.e., perception) or the output (i.e., production).

2.1 Pathologies Due to Focal Damage

Focal brain damage is caused by spatially confined injury in a specific cortical or subcortical region, either in the gray or white matter of the brain. While neuronal loss and cerebrovascular damage represent primary consequences, ischemia or cascades of cytotoxic effects may further complicate the clinical picture [22]. The cause of focal brain damage may be either traumatic (e.g., blow to the head) or organic (e.g., tumor and ischemic or hemorrhagic stroke). In either case, the symptomatology mostly depends on where the damage occurred and is typically highly circumscribed to specific functions (but see below for important exceptions when the damage occurs in the white matter). The two most famous examples of localized damage affecting linguistic functions trace back almost two centuries to the studies of Paul Broca (1865) [23] and Carl Wernicke (1874) [24]: By correlating behavioral symptoms with post-mortem pathological findings, Broca identified a brain region involved in speech production in the third convolution of the inferior frontal gyrus (IFG), while Wernicke discovered the location of the brain’s hub for speech comprehension in the posterior superior temporal gyrus. Nowadays, researchers can map behavioral symptoms to brain lesions in vivo, due to high-quality structural magnetic resonance imaging. This approach, called lesion-symptom mapping (LSM, [25–27]), can be further combined with electrophysiological methods to provide a comprehensive account of plastic reorganization following focal damage.

756

Laura Verga et al.

Localization-based investigations (both post-mortem and in vivo) revealed that speech- and language-related deficits are most commonly encountered as a consequence of left-hemispheric ischemic or hemorrhagic stroke in the territory of the middle cerebral artery, with about half of the patients suffering from aphasia following a stroke in this region [28]. Symptomatology normally depends on the lesion size, location, or both. For example, speech perception deficits typically arise from highly localized damage in Heschl’s gyrus and their severity correlates with the lesion extent, while broader language-related deficits (such as naming impairments) may originate from damage in diverse regions and are mostly dependent upon lesion size [29]. An important exception relates to the existence of a white-matter bottleneck lying deep within the frontal lobe: Even very small lesions in this area, representing the convergence point between several major connecting fiber tracts within the language network (i.e., uncinate fasciculus, inferior fronto-occipital fasciculus, anterior thalamic radiations), result in semantic deficits across modalities [26]. Focal lesions in left-hemispheric language regions cause significant changes in normal brain physiology: Starting from early poststroke stages, increased slow activity in the delta and theta range has been reliably observed with EEG and MEG [7, 14, 30–33]; similarly, activity in the beta range—indexing cortical arousal and input processing—is typically decreased in chronic aphasic patients performing linguistic tasks [4, 34]. However, how does the system return to its normal functions? In general, functional restoration correlates with an increase in alpha-band phase synchronization [35]. More specifically, two main plasticity mechanisms have been described to account for functional recovery after brain damage. They rely on a takeover of a lost function either by i) homologous regions in the right hemisphere or ii) perilesional areas in the left hemisphere. Concerning the former, white matter integrity seems to be responsible, at least in some cases, for the compensatory role assumed by right-hemispheric homologs. In a recent study, Piai and colleagues [10] used a multi-modal approach to investigate the correlation between physiological compensation after lefthemispheric stroke and the integrity of white matter tracts connecting the temporal poles bilaterally. Over the course of a picture naming task, the authors observed a decrease in alpha-beta power which was left lateralized in healthy participants but right lateralized in patients with left-hemispheric stroke and intact posterior callosal fibers. Similarly, Rosso and colleagues [36] investigated picture naming in a sample of aphasic patients with damage in left IFG. Combining tDCS and functional connectivity, the authors observed improved picture naming after cathodal stimulation of the right homolog of Broca’s area only in patients with an intact arcuate fasciculus. Taken together, this evidence suggests that the structural integrity of language hubs and connections may be

Neurophysiology of Language Pathologies

757

critical for the recruitment of spared functional regions [37]. However, whether the takeover by right homolog areas may be adaptive or maladaptive is still a matter of dispute. For example, in stroke patients, synchronization between spontaneous alpha oscillations originating from the IFG and other brain regions correlates positively with verbal fluency measures when measured in the left hemisphere, but negatively when measured in the right IFG [7]. It is possible that the right-hemispheric takeover may be necessary or at least helpful in initial stages, while the left-hemispheric perilesional regions need to recover; ultimately, however, better outcomes are associated with a restoration of the normal functional activity to the left side [38]. A recent MEG study [39] demonstrated an increase in left-hemispheric magnetic mismatch negativity (mMMN) responses to auditorily presented words following speech therapy, indicating functional recovery. Similarly, Campana and colleagues [6] used a combined LSM and tDCS approach to investigate picture naming improvement in aphasic patients. The authors used tDCS to stimulate areas surrounding lesions in the left hemisphere and they observed that—while this treatment benefitted all patients—those with greater integrity of cortical language hubs (IFG, insula, operculum, inferior parietal cortex, see Subheading 4) had the best outcomes. Preservation of subcortical regions, including white matter tracts connecting anterior and posterior areas (superior and inferior longitudinal fasciculus) and the basal ganglia, is equally important in predicting recovery; in particular, it has been suggested that the left basal ganglia might inhibit dysfunctional cortical activity, thus facilitating functional reorganization in the spared cortex [6, 40]. Besides stroke, language and speech impairments may arise as a consequence of localized traumatic brain injury (TBI) often resulting in subdural and epidural hematomas, intraparenchymal hematomas, and contusions [41]. The injury itself derives from the impact of an external force either penetrating the skull (openhead injuries) or causing a blow to the head without breaking the skull (closed-head injuries).1 These lesions typically result in mild and transient language deficits whose severity depends upon the location and severity of impact on the skull: While cognitive functioning usually returns to baseline 1–3 months after mild injuries, long-term sequelae may persist and become chronic after 2 years post-injury in more severe cases [42, 43]. Language impairments in TBI are often classified as secondary to a primary impairment of executive functions or as affecting aspects of language such as

1

It is important to note that in most cases, traumatic brain injuries entail a combination of a focal lesion (the contact point with the skull or with an external object) and a diffuse lesion (often due to the shearing and tearing of white matter tracts, for example in the case of car accidents). A distinction between the two components can be made only on an individual basis. See also [22, 41].

758

Laura Verga et al.

pragmatics [44]. Interestingly, however, other studies challenge this claim by suggesting that subtle deficits (lexical-semantic processing [45, 46]; syntactic processing [47]) may simply go unnoticed in behavioral tasks, but can emerge with more sophisticated electrophysiological measurements due to their enhanced sensitivity. For example, even with comparable behavioral performance, TBI patients show signs of abnormal language processing, as evidenced by a lack of a P600 component—a positive event-related response (ERP) typically peaking around 600 ms after stimulus presentation and associated with the processing of syntactic and grammatical incongruences [48]—in response to the detection of syntactic abnormalities [49] (see also [47]). 2.2 Pathologies Due to Diffuse Damage

2

Diffuse brain damage is caused by widely distributed damage to axons, diffuse vascular injury, hypoxic-ischemic injury, brain swelling (or edema [41]), and neurodegenerative diseases (e.g., primary progressive aphasia (PPA) [50, 51]). Diffuse axonal injury (DAI [52]) is especially frequent after TBI related to vehicle accidents, falls, or sports activity (e.g., [53]). These types of collision entail a rapid acceleration-deceleration movement of the head resulting in a typical pattern of damage, characterized by (i) a compression of the brain at the site of the impact (called coup), (ii) a second bruise on the opposite side (contrecoup), and (iii) DAI due to the shearing and tearing of axons either sagitally or laterally, with the latter direction being associated with more severe deficits [22, 41, 52]. Diffusion tensor imaging (DTI [54–57]) may uncover these latter white matter damages, which are otherwise typically elusive in routine instrumental analyses partially because of extreme individual variability [54, 58]. A major consequence of DAI is that the balance between cortical and subcortical regions may be disrupted, thus causing disconnection syndromes characterized by relatively frequent2 and highly persistent language deficits [45]. However, the subtlety of the deficits (among others: verbal associations, sentence construction, synonym generation, comprehension of ambiguous sentences, temporal structure comprehension, naming) often escapes standard neuropsychological assessments [44, 59–62]. As a consequence, even in this case, language deficits (when recognized) are frequently considered secondary to a more general impairment in executive functions. Nonetheless, recent studies targeting language abilities in TBI patients identified semantic abilities as a main area of concern (e.g., [2]). For example, several studies noted a predominant semantic deficit following diffuse damage by targeting the N400 component, a negative ERP response typically peaking

Subtle deficits are especially frequent following mild TBI, as—in severe injuries—they may be masked by more global deficits (e.g., coma, motor deficits [22]). A full description of the different forms and severity grades of TBI is, however, outside the scope of this chapter (but see, e.g., [59]).

Neurophysiology of Language Pathologies

759

around 400 milliseconds after stimulus presentation and indexing word retrieval or access to semantic memory [63]. More specifically, these studies found that the N400 response to semantic priming was either delayed, reduced, or even absent in TBI patients as compared to healthy controls even when the experiment was conducted several years post-injury [1, 3, 22]. Neurodegenerative diseases represent another source of diffuse brain pathology causing language and speech deficit symptoms [64]. Neurodegeneration may either primarily affect cortical regions, as evident in progressive semantic dementia (SD [26, 64]) and PPA [50, 51], or disrupt the normal functioning of subcortical structures such as the basal ganglia in Parkinson’s disease (PD; e.g., [65]). Electrophysiological findings across neurodegenerative pathologies typically reveal signs of widespread neurophysiological deficits, including loss of alpha rhythm and generalized EEG slowing with an excess of theta rhythm (e.g., [66]). Similarly to DAI, neurodegeneration is also frequently accompanied by semantics-related deficits [67]; for example, the N400 effect already mentioned for diffuse TBI is also a marker of semantic deficits in Alzheimer’s disease (AD) and is associated with an increase conversion risk from mild cognitive impairment to AD [68]. On the other hand, semantic deficits observed in diffuse pathologies differ from those reported in focal damage: In dementias and diffuse brain pathologies in general, semantic deficits are precipitated by the loss of stored semantic representations caused by neuronal death, while in focal damage information may still be available (i.e., neurons are not lost) but difficult to access [64]. As partially common underlying mechanisms in the altered neurophysiology and symptomatology of diffuse pathologies may exist, specific markers are difficult to identify: A paradigmatic example is semantic dementia, a variant of frontotemporal dementia (FTD), which is characterized by damage to the anterior temporal lobe and extensive semantic deficits often at the single word level [67, 69– 73]. While older studies and clinical guidelines reported typically normal resting state cortical activity in SD (e.g., [66, 74]; see also [75, 76]), recent evidence, employing more fine-grained methods of analysis, challenges this conclusion. Particularly promising in this regard is the analysis of microstate topography of resting state EEG: In this approach, a multichannel electrode array is scanned to identify topographies of electric potentials that remain stable for c.a. 100 ms before transitioning into a different state [77]. These quasi-stable topographies (or “microstates”) are assumed to reflect the activity of different neural populations coding for distinct cognitive processes. By employing this method, Grieder and colleagues [74] revealed significant differences between SD and both AD patients and healthy controls in at least two microstate classes. More recent studies further confirmed this first counterevidence to the long-standing claim of a non-pathological resting state EEG

760

Laura Verga et al.

in SD. For example, different electrophysiological signatures across neurodegenerative forms were observed in a connectivity-based MEG study, in which distinct connectivity profiles—defined by distribution and frequency of oscillatory activity—were found to distinguish between AD and several variants of FTD [76]. While the neurodegenerative forms described so far predominantly relate to cortical gray matter, other pathologies are characterized by subcortical damage: Parkinson’s disease is a paradigmatic example of a neurodegenerative process primarily affecting the basal ganglia and resulting in both semantic and syntactic deficits ([65, 78] see following paragraph). 2.3 Disorders of Perception Versus Production

3

Language-related pathologies may be distinguished as receptive or productive, depending on whether they primarily affect language input (i.e., comprehension) or output (i.e., production). The most paradigmatic example of the former class is represented by Wernicke’s aphasia, a prototype of the so-called fluent aphasic syndromes, characterized by fluent speech in the presence of impairments in speech comprehension and word repetition. Productive aphasias—such as Broca’s aphasia—often are non-fluent, in the sense that the speech output is severely reduced and characterized by syntactic deficits, agrammatism, anomia, and dysprosody. To this day, aphasia classification rests on the Wernicke-Lichtheim model, an early lesion model linking brain damage location with function. According to this model, damage to the language “motor center”—Broca’s area—would cause typical symptoms of a non-fluent aphasic syndrome, while damage to the “sensory center”—Wernicke’s area—would result in fluent aphasia, and a lesion interrupting the connection between these hubs would cause a conduction aphasia [24, 79–81].3 This “classical” model has the advantage of being simple, intuitive, and applicable in clinical practice; however, many researchers consider it obsolete and inadequate in light of current advances in neuro-anatomical knowledge [85–88]. Particularly problematic is the exclusive focus on cortical hubs and the exclusion of subcortical nuclei that have been consistently implicated in both language perception and production [19, 89] and whose lesion often results in aphasic symptoms [90, 91]. For example, several functional and metabolic (positron emission tomography—PET) studies in healthy adults report activity in the left striatal complex during syntactic [92, 93], as well as semantic processing (e.g., [94–96]). This evidence has been further refined by studies on clinical populations characterized by striatal degeneration (e.g., PD or Huntington’s disease [97, 98]) showing that parts of the basal ganglia

Here, the model is simplified to focus on perceptive and productive deficits; originally, it includes a third “concept” center and its connections to the motor and sensory hubs [82–84].

Neurophysiology of Language Pathologies

761

system may be actively engaged during language reception at different processing levels [20, 89, 99]. For example, PD patients typically encounter difficulties in sentence comprehension, although it is unclear whether this relates to a pure syntactic deficit, a slowing in processing speed (e.g., [100]), or a more generalized timing deficit [21, 65, 101]. In addition to receptive deficits, Parkinson’s disease patients often suffer from severe motor impairments, extending to productive speech disorders—including phonation, articulation, and prosody [102]. At the neurophysiological level, these deficits are reflected by altered brain activity on several accounts. For example, a recent study by So¨ro¨s and colleagues in medicated PD patients (i.e., under dopaminergic treatment) uncovered an increase in oscillatory brain activity in the β-band during the preparation for visually cued overt speech, in stark contrast with the typical decrease observed for speech preparation in healthy individuals and for limb movement in PD patients [103]. While the exact causes of this phenomenon are still unclear, it has been hypothesized that excessive β-band oscillations may arise as a consequence of chronic dopamine depletion to the subthalamic nucleus (STN [104]; see Subheading 3). 2.4 Developmental Language Disorders

The acquisition of language is one of the most significant and consequently most anticipated landmarks in a child’s development. Despite the complex nature of the underlying processes, the acquisition of language skills is typically achieved astonishingly quickly and effortlessly. Nevertheless, every level of the interplay of genetic, physiological, and psychological factors in these processes is susceptible to aberrations that manifest in a wide range of developmental language disorders. Median prevalence estimates for speech and language delay provide figures close to 6% for the normal population [105]. However, such estimates are complicated by heterogeneous yet overlapping phenotypes and comorbidity with global developmental conditions such as autism spectrum disorder, learning disability, or hearing impairments [106, 107]. For most developmental language disorders, no clear cause can be identified, although mutations in the FOXP2 gene that cause heritable developmental verbal dyspraxia establish a strong case for genetic factors [108, 109]. The most common language disorder in children affects receptive and/or expressive abilities despite normal non-verbal intellectual abilities. Different terms and sub-classifications have been used to describe this phenomenon, most prominently as Specific Language Impairment (SLI) or, less commonly but more recently, as Developmental Language Disorder (DLD [110, 111]). SLI is characterized by a delayed onset and protracted development of language skills between 3 and 5 years of age, with an overall prevalence of about 7.4% [112, 113]. Electrophysiological research

762

Laura Verga et al.

in this context takes advantage of the excellent temporal resolution of EEG to delineate functional characteristics of SLI along the time course of sensory and language processing. Previous studies have focused on early (peak amplitude latencies shorter than about 300 ms) and later ERP components of the EEG. Differences between typically developing and children with SLI in the timing and peak amplitudes of early components such as the N100 and mismatch negativity suggest atypical sensory information processing, whereas atypical ERP morphology and hemispheric lateralization in later components suggest also atypical attention-dependent information processing in SLI, despite a considerable amount of heterogeneity in the respective results [114]. Recent research provides further evidence along these lines, while it also highlights some of the inconsistencies resulting from the ERP approach. For example, preschoolers with SLI showed a delayed time course and more diffuse scalp topography of the N400 effect at the sentence level but not of earlier sensory responses such as the N1/P2 complex [115]. Although such findings may be taken as supportive of the notion of a language specific impairment, the results of studies employing both verbal and non-verbal visual “oddball” paradigms focusing on P3/P3b ERP components suggest higher processing costs in SLI children across cognitive domains [116]. Stuttering is another developmental language disorder with a high prevalence. Like SLI, stuttering is a phenomenon with still mostly unclear causes, despite evidence for strong genetic factors [117]. Stuttering is characterized by repetitions, prolongations, and blocking of speech sound articulation. Life-span prevalence rates vary but are probably close to 0.75%, with considerably higher rates in children under 6 years of age [117]. EEG studies of children who stutter confirm verbal and non-verbal processing dysfunctions [118]. Results from non-verbal auditory oddball paradigms comparing pre-school children who stutter with non-stuttering controls did not yield group differences in early P1/N1 ERP components but confirmed a significant P3 only in controls, suggesting less robust allocation of attention and working memory updating [119]. Children who stutter also showed a reduced P3 as opposed to an excessive N2 in a visual Go/No-go task [120]. Considering the overall high rate of natural recovery [117], it is important to note that subtle differences in ERP markers of semantic processing (N400, late positive component) may be predictors of stuttering persistence [121]. Such findings confirm a unique role and sensitivity of verbal and non-verbal EEG/ERP paradigms, which should complement a holistic neuropsychological approach to developmental language disorders [111].

Neurophysiology of Language Pathologies

3

763

Role of Subcortical Structures in Language Processing Language is a complex, yet highly automated system entailing several subcomponents. While the role of cortical structures in language processing has been well investigated using lesion and/or neuroimaging approaches (M/EEG, functional Magnetic Resonance Imaging-fMRI, and PET), the role of subcortical structures is less explored, potentially due to contradictory results. Often subcortical aphasia does not lead to persistent symptoms and language deficits [90, 91, 122] or primarily coincides with speech production deficits that are more motoric in nature. For example, the phenomenon of palilalia, realized by arbitrary repetitions of syllables, words, and word combinations, may be produced with increasing speed during speech production in PD patients. However, this phenomenon results from a motor planning deficit rather than a speech production deficit per se. Similarly, Wallesch and Blanken [123] suggest that speech automatisms in PD are linked to a pre-articulatory deficit based on the reduced capacity to inhibit irrelevant target expressions. While production deficits (primarily prosodic) have been reported in PD patients, it has been argued that linguistic processes such as phonology, lexical semantics, and syntax in language perception are not affected. Rather, these processes may appear deficient, but may be secondary to attentional and/or working memory deficits. Often these deficits mimic frontal cortical phenomena such as verbal working memory or verbal fluency deficits [124]. In an early information-processing model, Wallesch and colleagues [125, 126] proposed that a cortico-striatopallido-thalamo-cortical loop regulates response preparation and response selection. According to this model, multiple lexical alternatives (i.e., response alternatives) are produced and released in the posterior perisylvian cortex, then carried to the anterior perisylvian cortex and the striatum in parallel modules. Thus, the striatum may monitor various types of lexical alternatives (situational, emotional, motivational, and semantic) and play an immanent role in the selection of a contextually appropriate lexical candidate. Structurally, the model can be criticized as basal ganglia lesions often include white matter lesions and thus are not exclusive. Next to the striatum, the thalamus has been discussed as the subcortical structure that may be engaged in language processing. Lesions of specific thalamic nuclei can impair speech production, word finding, and cause paraphasia. For example, speech production impairments in PD have been attributed to a degenerative process affecting the STN as a consequence of chronic dopaminergic depletion [104, 127]. This hypothesis aligns with and updates a classical model of subcortical speech production firstly proposed by Crosson [128] and supported by thalamic lesion data (e.g., [129– 131]). In this model, the thalamus, along with the basal ganglia,

764

Laura Verga et al.

engages in the selection of produced speech segments in a striatothalamo-cortical network modulated by the frontal cortex [132]. Wallesch [124] described three anatomical models that assign a potential role to the thalamus during language processing (see also [132]). First, the ventral thalamic nuclei VA and VL are part of the cortico-striato-pallido-thalamo-cortical loop that regulates speech production. Second, the pulvinar as the largest thalamic nucleus projects mainly to the posterior temporal language cortex. Third, lesions of non-specific thalamic nuclei can disrupt the connection between the ascending reticular activation system (cerebellum) and the cortex, resulting in attentional, motivational, and consciousness deficits that may supersede language deficits. According to these proposed models, the striatum and the thalamus are plausible structures regulating language processing. Language deficits may therefore be an epiphenomenon of attentional and/or working memory deficits. Finally, it remains to be unambiguously established whether language production and comprehension deficits attributed to basal ganglia lesions may be caused by lesions in other regions that are either adjacent to or connected with these nuclei. For example, aphasia resulting from a left-hemispheric basal ganglia lesion may also derive from pathway lesions that in turn cause cortical deficits within the same hemisphere [132]. Weiller and colleagues [133] pointed out that large striatal lesions could also include cortical insula lesions that affect the blood supply system of the middle cerebral artery, resulting in aphasia. To extend the potential multifunctional role of the basal ganglia in language processing, there has been a recent revival in investigating their linguistic and non-linguistic functions. As described above, there have been early reports on prosodic production deficits primarily after putaminal lesions. A note of caution needs to be raised as to whether such prosodic deficits are motoric in nature or actually reflect a deficit in realizing basic acoustic properties of prosody such as fundamental frequency, duration, and intensity. This, of course, also applies to the perception of prosody. PD has been proposed as a model to understand how the basal ganglia contribute to the processing of linguistic or non-linguistic prosodic tone. In the past decades, a number of laboratories published neuroimaging and lesion evidence that describes a highly distributed network involving both cortical and subcortical structures during the perception of emotional tone (e.g., [134–137]). However, not all of the imaging studies reported activation of the basal ganglia [136], and the contribution of the basal ganglia in decoding prosodic cues has often been reported as secondary to cortical deficits or to impairments in decoding the finer temporal suprasegmental structure of auditory input [138]. Still, some neuropsychological studies have reported discrimination and recognition deficits of

Neurophysiology of Language Pathologies

765

emotional prosody after focal basal ganglia lesions [139–141]. In a series of studies, Pell and Leonard [141–143] systematically investigated the perception of emotional prosody utilizing discrimination, identification, and emotional feature rating tasks in PD patients and age-matched controls. In comparison to controls, PD patients showed an overall reduction in the perception of emotional prosodic cues. The authors took these results as evidence that the basal ganglia play a regulatory role in “predicting the value of cue sequences within a temporal sensory event” (see also [138] for an elaborative standpoint on this view). In conclusion, non-linguistic and linguistic prosodic processing seems to be modulated by the basal ganglia. However, in comparison to grammatical and lexical-semantic processing, the present evidence seems to point to a non-domain-specific function of the basal ganglia in these processes, a role involving the temporal encoding of linguistic or non-linguistic cues in an auditory sequence.

4

Cortico-Subcortical Networks Involved in Language Pathologies Starting from the pivotal studies of Broca and Wernicke, lesion methods have been the gold standard to investigate brain regions involved in language perception and production. Due to the development of non-invasive methods (e.g., fMRI, DTI, M/EEG, TMS), this perspective has been progressively broadened to embrace a model of the language brain as a connectome [18, 88]. More specifically, the language system shows properties of a “large-scale distributed neural network” entailing critical hubs, which are necessary for a given function, and supporting nodes, interconnected with each other [11, 12]. Many excellent articles are available describing the language network and its components in detail in relation to several aspects of language (e.g., [144–148] see [149] for an evolutionary perspective). In short, in the adult brain the language network is thought to encompass a left-lateralized set of regions surrounding the Sylvian fissure and mostly located in the frontal, parietal, and temporal lobe, and their connections ([150]; see [151] for a developmental angle). It is assumed that these regions are organized along two main systems, a ventral pathway and a dorsal pathway [145, 147]. The ventral pathway deals with the mapping of speech sounds to meaning. Anatomically, it comprises connections between the IFG and the temporal poles—via the uncinate fasciculus—and with the occipital lobe—via the inferior fronto-occipital fasciculus. The dorsal pathway, on the other hand, is involved in speech production, segmentation, and syntactic processing; anatomically, it connects Wernicke’s territory (temporo-parietal cortex) with (i) the inferior parietal cortex and ultimately the premotor cortex via the superior longitudinal fasciculus and (ii) Broca’s area, thanks to the arcuate fasciculus

766

Laura Verga et al.

[144, 145]. Further, the role of basal ganglia (e.g., subthalamic nucleus, caudate), thalamus, and cerebellum cannot be overlooked [19, 152], particularly regarding speech: Acting in concert with motor cortical regions (pre/SMA, pre/motor cortex), these subcortical areas facilitate both speech perception and production [20]. How and under which premise does this complex and multilayered language network reorganize as a consequence of brain damage [148]? Given the interconnection between main hubs and ancillary cortico-subcortical regions, even small local damage may cause neurophysiological changes in distant regions due to diaschisis [13], large-scale effects due to lesioned connections [26], or even global updates in normal neurophysiological patterns (e.g., by increasing the amount of slow delta waves; [7, 14, 30, 31]. How exactly these plastic changes come to be is still a matter of debate, and largely depends on whether hubs, ancillary nodes, or connections are primarily damaged. As different pathologies are likely to target different nodes or connections in the language network, they could inform on plastic changes occurring in distinct parts of the system. For example, semantic deficits rarely arise as a consequence of cerebrovascular accidents [12], but these are frequently observed in neurodegenerative processes, such as those causing PPA. By studying this pathology, it was discovered that resting state EEG/MEG activity is characterized by an increase in slow theta and delta frequencies [8, 153]; however, when neurodegeneration spreads from motor regions— supplementary motor cortex—it gives rise to speech deficits (i.e., apraxia of speech) in the absence of pathological signs in the EEG signal [153]. Thus, the plastic reorganization depends, among other things, on the region being affected first in the disease progression.

5

Novel Therapeutic Approaches Language pathologies and disorders, whatever their cause, come at a great individual and societal cost. For these reasons, much research is dedicated to treatments to help ameliorate the patients’ life quality. This effort is particularly important for highly invalidating pathologies with a high prevalence, such as post-stroke aphasia. While behavioral speech therapy remains the norm in the treatment of subacute and chronic aphasia, non-invasive brain stimulation techniques, such as TMS and tDCS, have been increasingly applied as a complementary approach with promising results [148, 154– 156]. TMS and tDCS are hypothesized to improve language abilities by facilitating the recovery of left-hemispheric perilesional regions either by inhibiting right-hemispheric homologs (e.g., [157]) or by stimulating left-hemispheric perilesional regions (e.g., [6]). However, the long-term efficacy of these effects—

Neurophysiology of Language Pathologies

767

especially in the case of tDCS—remains to be further investigated [148]. More recently, the positive results of non-invasive stimulation in post-stroke aphasia inspired the application of such methods in neurodegenerative diseases, such as PPA [158] and PD [159, 160]. While non-invasive stimulation in PD has primarily targeted motor symptoms (e.g., [161]), language impairments have been the focus of TMS and TDCS applications in PPA. The few studies conducted so far in PPA cautiously report improvement in semantic processing (e.g., [162, 163]) correlating with a decrease in aberrant functional connectivity between language network regions [164]. While further research is warranted, the initial encouraging steps may suggest future venues and directions to improve life quality even in neurodegenerative patients. Concerning developmental language disorders, therapeutic interventions apply a wide range of methods, although approaches strictly focusing on grammatical aspects of language have become less dominant. A potential unifying framework for the description of pathologies and the development of alternative treatments is the Atypical Rhythm Risk Hypothesis, which suggests that individually different and atypical rhythm processing abilities factor into the profile of developmental speech and language disorders, including stuttering [165]. Atypical rhythm processing abilities may in turn reflect patterns of inefficient or inconsistent neural oscillatory activity during speech processing. Interventions specifically targeted at training these abilities and/or at modulating the underlying neural mechanisms, for example, by means of neurostimulation, may therefore provide a promising starting point for the development of assessment and intervention strategies that incorporate electrophysiological components.

6

Summary and Future Directions In this chapter, we summarized evidence on the impact of brain damage on the neurophysiology of the language network. Damage to both cortical and subcortical regions may result in language- and speech-related pathologies, although several factors contribute to the severity and outcome. For example, damages locally affecting the gray matter or basal ganglia typically result in more specific deficits than lesions affecting a long-range white matter bundle. In the latter cases, the injury, albeit localized, might interrupt the flow of information between distant regions, thus leading to a massive neurophysiological reorganization of the entire system. Yet another factor to be considered is the extent to which the system itself is plastic: Brain damage can be compensated particularly well in children within a specific window of brain plasticity (usually between 1 and 5 years of age) but may lead to significant chronic deficits afterward [166, 167]. We then focused on the

768

Laura Verga et al.

difference between perceptive and productive deficits; in doing so, we attempted to overcome classical cortical models of language organization by emphasizing the role of subcortical structures in linguistic pathologies. Throughout this chapter, we stressed the idea that, whenever possible, a multi-modal approach is preferrable: The use of combined EEG and VBM/DTI in concert with LSM has proven effective to characterize not only the lesion, but the time course of recovery (e.g., see [10]). More importantly, the use of multimodal techniques sheds light on a view of the brain that is gaining new momentum: Most cognitive systems—including language— are not to be considered as encapsulated collections of distinct hubs; rather, they should be viewed as complex connectomes, in which the connections between regions are as important as the hubs and nodes themselves. How plasticity is implemented in such a complex dynamical system is a matter that will require additional research. Only by knowing mechanisms underlying functional recovery, it will be possible to develop new therapies and interventions. Besides the aforementioned non-invasive stimulation techniques, other promising candidates in this direction may be the use of stem cells to restore damaged tracts or the development of state-of-the-art prosthetics and robotic systems taking advantage of electrophysiological advancements in brain computer interfaces of TMS-based interventions [168].

7

Conclusion This chapter focused on how normal neurophysiology is altered by brain damage. Electrophysiological methods as compared to other methods, such as fMRI, offer several advantages which are particularly relevant for clinical populations: Not only is electroencephalography typically cheaper than magnetic resonance imaging, but it is also portable and allows tracking for fine-grained temporal aspects both within task and in the recovery process. Hence, identifying pathology-specific markers of brain damage and recovery may allow for a quicker establishment of perspective treatments at the patient’s bedside, thus maximizing the chances of therapeutic success.

References 1. Knuepffer C et al (2012) Reduced N400 semantic priming effects in adult survivors of paediatric and adolescent traumatic brain injury. Brain Lang 123:52–63. https://doi. org/10.1016/j.bandl.2012.06.009 2. Fratantoni JM et al (2017) Electrophysiological correlates of word retrieval in traumatic

brain injury. J Neurotrauma 34:1017–1021. https://doi.org/10.1089/neu.2016.4651 3. Mu¨nte TF, Heinze H-J (1994) Brain potentials reveal deficits of language processing after closed head injury. Arch Neurol 51: 482–493. https://doi.org/10.1001/ archneur.1994.00540170058017

Neurophysiology of Language Pathologies 4. Kielar A, Deschamps T, Jokel R, Meltzer JA (2016) Functional reorganization of language networks for semantics and syntax in chronic stroke: evidence from MEG. Hum Brain Mapp 37:2869–2893. https://doi.org/10. 1002/hbm.23212 5. Kielar A et al (2018) Abnormal languagerelated oscillatory responses in primary progressive aphasia. Neuroimage Clin 18:560– 574. https://doi.org/10.1016/j.nicl.2018. 02.028 6. Campana S, Caltagirone C, Marangolo P (2015) Combining voxel-based lesion-symptom mapping (VLSM) with A-tDCS language treatment: predicting outcome of recovery in nonfluent chronic aphasia. Brain Stimul 8: 769–776. https://doi.org/10.1016/j.brs. 2015.01.413 7. Dubovik S et al (2012) The behavioral significance of coherent resting-state oscillations after stroke. NeuroImage 61:249–257. https://doi.org/10.1016/j.neuroimage. 2012.03.024 8. Kielar A et al (2019) Slowing is slowing: delayed neural responses to words are linked to abnormally slow resting state activity in primary progressive aphasia. Neuropsychologia 129:331–347. https://doi.org/10.1016/ j.neuropsychologia.2019.04.007 9. Reid LB et al (2015) Interpreting intervention induced neuroplasticity with fMRI: the case for multimodal imaging strategies. Neural Plast 2016:e2643491. https://doi.org/ 10.1155/2016/2643491 10. Piai V et al (2017) Neuroplasticity of language in left-hemisphere stroke: evidence linking subsecond electrophysiology and structural connections. Hum Brain Mapp 38:3151– 3162. https://doi.org/10.1002/hbm.23581 11. Mesulam M-M (1990) Large-scale neurocognitive networks and distributed processing for attention, language, and memory. Ann Neurol 28:597–613. https://doi.org/10. 1002/ana.410280502 12. Mesulam M-M et al (2014) Primary progressive aphasia and the evolving neurology of the language network. Nat Rev Neurol 10:554– 569. https://doi.org/10.1038/nrneurol. 2014.159 13. Carrera E, Tononi G (2014) Diaschisis: past, present, future. Brain 137:2408–2422. https://doi.org/10.1093/brain/awu101 14. Spironelli C, Angrilli A (2009) EEG delta band as a marker of brain damage in aphasic patients after recovery of language. Neuropsychologia 47:988–994. https://doi.org/10. 1016/j.neuropsychologia.2008.10.019

769

15. Catani M, Mesulam M (2008) What is a disconnection syndrome? Cortex 44:911–913. https://doi.org/10.1016/j.cortex.2008. 05.001 16. Dick AS, Bernal B, Tremblay P (2014) The language connectome: new pathways, new concepts. Neuroscientist 20:453–467. h t t p s : // d o i . o r g / 1 0 . 1 1 7 7 / 1073858413513502 17. Friederici AD (2011) The brain basis of language processing: from structure to function. Physiol Rev 91:1357–1392. https://doi.org/ 10.1152/physrev.00006.2011 18. Friederici AD, Gierhan SME (2013) The language network. Curr Opin Neurobiol 23: 250–254. https://doi.org/10.1016/j.conb. 2012.10.002 19. Eisinger RS et al (2018) Non-motor characterization of the basal ganglia: evidence from human and non-human primate electrophysiology. Front Neurosci 12:385. https://doi. org/10.3389/fnins.2018.00385 20. Kotz SA, Schwartze M (2010) Cortical speech processing unplugged: a timely subcortico-cortical framework. Trends Cogn Sci 14:392–399. https://doi.org/10.1016/j. tics.2010.06.005 21. Kotz SA, Schwartze M, Schmidt-Kassow M (2009) Non-motor basal ganglia functions: a review and proposal for a model of sensory predictability in auditory language perception. Cortex 45:982–990. https://doi.org/ 10.1016/j.cortex.2009.02.010 22. Gaetz M (2004) The neurophysiology of brain injury. Clin Neurophysiol 115:4–18. https://doi.org/10.1016/s1388-2457(03) 00258-x 23. Broca P (1865) Sur le sie`ge de la faculte´ du langage articule´. Bull Mem Soc Anthropol Paris 6:377–393 24. Wernicke C (1974) Der aphasische Symptomencomplex: eine psychologische Studie auf anatomischer Basis. Springer, Berlin, Heidelberg 25. Bates E et al (2003) Voxel-based lesion–symptom mapping. Nat Neurosci 6:448–450. https://doi.org/10.1038/nn1050 26. Mirman D et al (2015) Neural organization of spoken language revealed by lesion–symptom mapping. Nat Commun 6:6762. https://doi. org/10.1038/ncomms7762 27. Mirman D, Thye M (2018) Uncovering the neuroanatomy of core language systems using lesion-symptom mapping. Curr Dir Psychol Sci 27:455–461. https://doi.org/10.1177/ 0963721418787486

770

Laura Verga et al.

28. Boehme AK et al (2016) Effect of aphasia on acute stroke outcomes. Neurology 87:2348– 2354. https://doi.org/10.1212/WNL. 0000000000003297 29. Thye M, Mirman D (2018) Relative contributions of lesion location and lesion size to predictions of varied language deficits in poststroke aphasia. Neuroimage Clin 20:1129– 1138. https://doi.org/10.1016/j.nicl.2018. 10.017 30. Meinzer M et al (2004) Intensive language training enhances brain plasticity in chronic aphasia. BMC Biol 2:20. https://doi.org/ 10.1186/1741-7007-2-20 31. Nicolo P et al (2015) Coherent neural oscillations predict future motor and language improvement after stroke. Brain 138:3048– 3060. https://doi.org/10.1093/brain/ awv200 32. Butz M et al (2004) Perilesional pathological oscillatory activity in the magnetoencephalogram of patients with cortical brain lesions. Neurosci Lett 355:93–96. https://doi.org/ 10.1016/j.neulet.2003.10.065 33. Vieth JB, Kober H, Grummich P (1996) Sources of spontaneous slow waves associated with brain lesions, localized by using the MEG. Brain Topogr 8:215–221. https:// doi.org/10.1007/BF01184772 34. Spironelli C, Manfredi M, Angrilli A (2013) Beta EEG band: a measure of functional brain damage and language reorganization in aphasic patients after recovery. Cortex 49:2650– 2660. https://doi.org/10.1016/j.cortex. 2013.05.003 35. Westlake KP et al (2012) Resting state alphaband functional connectivity and recovery after stroke. Exp Neurol 237:160–169. https://doi.org/10.1016/j.expneurol.2012. 06.020 36. Rosso C et al (2014) Broca’s area damage is necessary but not sufficient to induce aftereffects of cathodal tDCS on the unaffected hemisphere in post-stroke aphasia. Brain Stimul 7:627–635. https://doi.org/10.1016/j. brs.2014.06.004 37. Nagata K et al (1982) Topographic electroencephalographic study of cerebral infarction using computed mapping of the EEG. J Cereb Blood Flow Metab 2:79–88. https:// doi.org/10.1038/jcbfm.1982.9 38. Crosson B et al (2007) Functional MRI of language in aphasia: a review of the literature and the methodological challenges. Neuropsychol Rev 17:157–177. https://doi.org/10. 1007/s11065-007-9024-z

39. Mohr B et al (2016) Hemispheric contributions to language reorganisation: an MEG study of neuroplasticity in chronic post stroke aphasia. Neuropsychologia 93:413–424. h t t p s : // d o i . o r g / 1 0 . 1 0 1 6 / j . neuropsychologia.2016.04.006 40. Parkinson RB et al (2009) Lesion characteristics related to treatment improvement in object and action naming for patients with chronic aphasia. Brain Lang 110:61–70. https://doi.org/10.1016/j.bandl.2009. 05.005 41. Andriessen TM, Jacobs B, Vos PE (2010) Clinical characteristics and pathophysiological mechanisms of focal and diffuse traumatic brain injury. J Cell Mol Med 14:2381–2392. https://doi.org/10.1111/j.1582-4934. 2010.01164.x 42. Schretlen DJ, Shapiro AM (2003) A quantitative review of the effects of traumatic brain injury on cognitive functioning. Int Rev Psychiatry 15:341–349. https://doi.org/10. 1080/09540260310001606728 43. Shenton ME et al (2012) A review of magnetic resonance imaging and diffusion tensor imaging findings in mild traumatic brain injury. Brain Imaging Behav 6:137–192. https://doi.org/10.1007/s11682-0129156-5 44. Mcdonald S (1992) Communication disorders following closed head injury: new approaches to assessment and rehabilitation. Brain Inj 6:283–292. https://doi.org/10. 3109/02699059209029670 45. Barwood CH, Murdoch BE (2013) Unravelling the influence of mild traumatic brain injury (MTBI) on cognitive-linguistic processing: a comparative group analysis. Brain Inj 27:671–676. https://doi.org/10.3109/ 02699052.2013.775500 46. Hinchliffe FJ, Murdoch BE, Chenery HJ (1998) Towards a conceptualization of language and cognitive impairment in closedhead injury: use of clinical measures. Brain Inj 12:109–132. https://doi.org/10.1080/ 026990598122746 47. Butler-Hinz S, Caplan D, Waters G (1990) Characteristics of syntactic comprehension deficits following closed head injury versus left cerebrovascular accident. J Speech Lang Hear Res 33:269–280. https://doi.org/10. 1044/jshr.3302.269 48. Coulson S, King JW, Kutas M (1998) Expect the unexpected: event-related brain response to morphosyntactic violations. Lang Cogn Process 13:21–58. https://doi.org/10. 1080/016909698386582

Neurophysiology of Language Pathologies 49. Key-DeLyria Sarah E et al (2016) Sentence processing in traumatic brain injury: evidence from the P600. J Speech Lang Hear Res 59: 759–771. https://doi.org/10.1044/2016_ JSLHR-L-15-0104 50. Gorno-Tempini ML et al (2011) Classification of primary progressive aphasia and its variants. Neurology 76:1006–1014. https:// d o i . o r g / 1 0 . 1 2 1 2 / W N L . 0b013e31821103e6 51. Mesulam M-M (2001) Primary progressive aphasia. Ann Neurol 49:425–432. https:// doi.org/10.1002/ana.91 52. Adams JH et al (1989) Diffuse axonal injury in head injury: definition, diagnosis and grading. Histopathology 15:49–59. https://doi. org/10.1111/j.1365-2559.1989.tb03040.x 53. Ledwidge P (2018) The impact of sportsrelated concussions on the language system: a case for event-related brain potentials. Ann Behav Neurosci 1:36–46 54. Eierud C et al (2014) Neuroimaging after mild traumatic brain injury: review and meta-analysis. Neuroimage Clin 4:283–294. https://doi.org/10.1016/j.nicl.2013. 12.009 55. Inglese M et al (2005) Diffuse axonal injury in mild traumatic brain injury: a diffusion tensor imaging study. J Neurosurg 103:298–303. https://doi.org/10.3171/jns.2005.103.2. 0298 56. Irimia A et al (2012) Neuroimaging of structural pathology and connectomics in traumatic brain injury: toward personalized outcome prediction. Neuroimage Clin 1:1– 17. https://doi.org/10.1016/j.nicl.2012. 08.002 57. Ptak T et al (2003) Cerebral fractional anisotropy score in trauma patients: a new indicator of white matter injury after trauma. Am J Roentgenol 181:1401–1407. https://doi. org/10.2214/ajr.181.5.1811401 58. Ware JB et al (2017) Inter-subject variability of axonal injury in diffuse traumatic brain injury. J Neurotrauma 34:2243–2253. https://doi.org/10.1089/neu.2016.4817 59. Marini A, Zettin M, Galetto V (2014) Cognitive correlates of narrative impairment in moderate traumatic brain injury. Neuropsychologia 64:282–288. https://doi.org/10. 1016/j.neuropsychologia.2014.09.042 60. Davis GA, Coelho CA (2004) Referential cohesion and logical coherence of narration after closed head injury. Brain Lang 89:508– 523. https://doi.org/10.1016/j.bandl. 2004.01.003

771

61. Ilie G, Cusimano MD, Li W (2017) Prosodic processing post traumatic brain injury – a systematic review. Syst Rev 6:1. https://doi.org/ 10.1186/s13643-016-0385-3 62. Wong MN, Murdoch B, Whelan B-M (2010) Language disorders subsequent to mild traumatic brain injury (MTBI): evidence from four cases. Aphasiology 24:1155–1169. h t t p s : // d o i . o r g / 1 0 . 1 0 8 0 / 02687030903168212 63. Lau EF, Phillips C, Poeppel D (2008) A cortical network for semantics: (de)constructing the N400. Nat Rev Neurosci 9:920–933. https://doi.org/10.1038/nrn2532 64. Mirman D, Britt AE (2014) What we talk about when we talk about access deficits. Philos Trans R Soc Lond B Biol Sci 369: 20120388. https://doi.org/10.1098/rstb. 2012.0388 65. Kotz SA, Gunter TC (2015) Can rhythmic auditory cuing remediate language-related deficits in Parkinson’s disease? Ann N Y Acad Sci 1337:62–68. https://doi.org/10.1111/ nyas.12657 66. Chan D et al (2004) EEG abnormalities in frontotemporal lobar degeneration. Neurology 62:1628–1630. https://doi.org/10. 1212/01.WNL.0000123103.89419.B7 67. Wilson SM (2017) Lesion-symptom mapping in the study of spoken language understanding. Lang Cogn Neurosci 32:891–899. https://doi.org/10.1080/23273798.2016. 1248984 68. Olichney JM et al (2008) Patients with MCI and N400 or P600 abnormalities are at very high risk for conversion to dementia. Neurology 70:1763–1770. https://doi.org/10. 1212/01.wnl.0000281689.28759.ab 69. Neary D, Snowden J, Mann D (2005) Frontotemporal dementia. Lancet Neurol 4:771– 780. https://doi.org/10.1016/S1474-4422 (05)70223-4 70. Cope TE et al (2020) Anterior temporal lobe is necessary for efficient lateralised processing of spoken word identity. Cortex 126:107– 118. https://doi.org/10.1016/j.cortex. 2019.12.025 71. Hodges JR et al (1992) Semantic dementia. Progressive fluent aphasia with temporal lobe atrophy. Brain J Neurol 115(Pt 6): 1783–1806. https://doi.org/10.1093/ brain/115.6.1783 72. Ralph MAL et al (2017) The neural and computational bases of semantic cognition. Nat Rev Neurosci 18:42–55. https://doi. org/10.1038/nrn.2016.150

772

Laura Verga et al.

73. Ralph MAL et al (1998) Naming in semantic dementia—what matters? Neuropsychologia 36:775–784. https://doi.org/10.1016/ S0028-3932(97)00169-3 74. Grieder M et al (2016) Discovering EEG resting state alterations of semantic dementia. Clin Neurophysiol 127:2175–2181. https:// doi.org/10.1016/j.clinph.2016.01.025 75. Neary D et al (1998) Frontotemporal lobar degeneration: a consensus on clinical diagnostic criteria. Neurology 51:1546–1554. https://doi.org/10.1212/WNL.51.6.1546 76. Sami S et al (2018) Neurophysiological signatures of Alzheimer’s disease and frontotemporal lobar degeneration: pathology versus phenotype. Brain 141:2500–2510. https:// doi.org/10.1093/brain/awy180 77. Khanna A, Pascual-Leone A, Michel CM, Farzan F (2015) Microstates in resting-stateEEG: current status and future directions. Neurosci Biobehav Rev 49:105–113. https://doi.org/10.1016/j.neubiorev.2014. 12.010 78. Kotz SA et al (2003) Syntactic language processing: ERP lesion data on the role of the basal ganglia. J Int Neuropsychol Soc 9: 1053–1060. https://doi.org/10.1017/ S1355617703970093 79. Lichteim L (1885) On aphasia. Brain 7:433– 484. https://doi.org/10.1093/brain/7. 4.433 80. Tesak J, Code C (2008) Milestones in the history of aphasia: theories and protagonists. Psychology Press, London 81. Yourganov G, Smith KG, Fridriksson J, Rorden C (2015) Predicting aphasia type from brain damage measured with structural MRI. Cortex 73:203–215. https://doi.org/10. 1016/j.cortex.2015.09.005 82. Geschwind N (1972) Language and the brain. Sci Am 226:76–83. https://doi.org/10. 1038/scientificamerican0472-76 83. Geschwind N (1974) Conduction aphasia. In: Geschwind N (ed) Selected papers on language and the brain. Springer, Dordrecht, pp 509–529 84. Geschwind N (1970) The organization of language and the brain. Science 170:940–944 85. Anderson JM et al (1999) Conduction aphasia and the arcuate fasciculus: a reexamination of the Wernicke–Geschwind model. Brain Lang 70:1–12. https://doi.org/10.1006/ BRLN.1999.2135 86. Binder JR (2017) Current controversies on Wernicke’s area and its role in language. Curr Neurol Neurosci Rep 17:1–10.

https://doi.org/10.1007/s11910-0170764-8 87. Catani M, Jones DK, Ffytche DH (2005) Perisylvian language networks of the human brain. Ann Neurol Off J Am Neurol Assoc Child Neurol Soc 57:8–16. https://doi.org/ 10.1002/ana.20319 88. Tremblay P, Dick AS (2016) Broca and Wernicke are dead, or moving past the classic model of language neurobiology. Brain Lang 162:60–71. https://doi.org/10.1016/j. bandl.2016.08.004 89. Booth JR et al (2007) The role of the basal ganglia and cerebellum in language processing. Brain Res 1133:136–144. https://doi. org/10.1016/J.BRAINRES.2006.11.074 90. Kang EK et al (2017) Subcortical aphasia after stroke. Ann Rehabil Med 41:725–733. https://doi.org/10.5535/arm.2017.41. 5.725 91. Radanovic M, Mansur LL (2017) Aphasia in vascular lesions of the basal ganglia: a comprehensive review. Brain Lang 173:20–32. https://doi.org/10.1016/j.bandl.2017. 05.003 92. Friederici AD et al (2003) The role of left inferior frontal and superior temporal cortex in sentence comprehension: localizing syntactic and semantic processes. Cereb Cortex 13: 170–177. https://doi.org/10.1093/cercor/ 13.2.170 93. Moro A et al (2001) Syntax and the brain: disentangling grammar by selective anomalies. NeuroImage 13:110–118. https://doi. org/10.1006/nimg.2000.0668 94. Kotz SA et al (2002) Modulation of the lexical–semantic network by auditory semantic priming: an event-related functional MRI study. NeuroImage 17:1761–1772. https:// doi.org/10.1006/nimg.2002.1316 95. Kuperberg GR et al (2000) Common and distinct neural substrates for pragmatic, semantic, and syntactic processing of spoken sentences: an fMRI study. J Cogn Neurosci 12:321–341. https://doi.org/10.1162/ 089892900562138 96. Ni W et al (2000) An event-related neuroimaging study distinguishing form and content in sentence processing. J Cogn Neurosci 12: 120–133. https://doi.org/10.1162/ 08989290051137648 97. Longworth CE et al (2005) The basal ganglia and rule-governed language use: evidence from vascular and degenerative conditions. Brain 128:584–596. https://doi.org/10. 1093/brain/awh387

Neurophysiology of Language Pathologies 98. Profant O et al (2017) Auditory dysfunction in patients with Huntington’s disease. Clin Neurophysiol 128:1946–1953. https://doi. org/10.1016/j.clinph.2017.07.403 99. Hancock R, Richlan F, Hoeft F (2017) Possible roles for fronto-striatal circuits in reading disorder. Neurosci Biobehav Rev 72:243– 260. https://doi.org/10.1016/j.neubiorev. 2016.10.025 100. Grossman M et al (2002) Information processing speed and sentence comprehension in Parkinson’s disease. Neuropsychology 16: 174–181. https://doi.org/10.1037// 0894-4105.16.2.174 101. Kotz SA, Schmidt-Kassow M (2015) Basal ganglia contribution to rule expectancy and temporal predictability in speech. Cortex 68: 48–60. https://doi.org/10.1016/J.COR TEX.2015.02.021 102. Sapir S (2014) Multiple factors are involved in the dysarthria associated with Parkinson’s disease: a review with implications for clinical practice and research. J Speech Lang Hear Res 57:1330–1343. https://doi.org/10. 1044/2014_JSLHR-S-13-0039 103. So¨ro¨s P et al (2017) Increase in beta-band activity during preparation for overt speech in patients with Parkinson’s disease. Front Hum Neurosci 11:371. https://doi.org/10. 3389/fnhum.2017.00371 104. Mallet N et al (2008) Disrupted dopamine transmission and the emergence of exaggerated beta oscillations in subthalamic nucleus and cerebral cortex. J Neurosci 28:4795– 4 8 0 6 . h t t p s : // d o i . o r g / 1 0 . 1 5 2 3 / JNEUROSCI.0123-08.2008 105. Law J et al (2000) Prevalence and natural history of primary speech and language delay: findings from a systematic review of the literature. Int J Lang Commun Disord 35:165–188. https://doi.org/10.1080/ 136828200247133 106. Newbury DF, Fisher SE, Monaco AP (2010) Recent advances in the genetics of language impairment. Genome Med 2:6. https://doi. org/10.1186/gm127 107. Tomblin B (2011) Co-morbidity of autism and SLI: kinds, kin and complexity. Int J Lang Commun Disord 46:127–137. https://doi.org/10.1111/j.1460-6984. 2011.00017.x 108. Vargha-Khadem F, Gadian DG, Copp A, Mishkin M (2005) FOXP2 and the neuroanatomy of speech and language. Nat Rev Neurosci 6:131–138. https://doi.org/10. 1038/nrn1605

773

109. Watkins RV, Yairi E, Ambrose NG (1999) Early childhood stuttering III. J Speech Lang Hear Res 42:1125–1135. https://doi. org/10.1044/jslhr.4205.1125 110. Bishop DV (2017) Why is it so hard to reach agreement on terminology? The case of developmental language disorder (DLD). Int J Lang Commun Disord 52:671–680. https://doi.org/10.1111/1460-6984. 12335 111. Tomas E, Vissers C (2019) Behind the scenes of developmental language disorder: time to call neuropsychology back on stage. Front Hum Neurosci 12:1–10. https://doi.org/ 10.3389/fnhum.2018.00517 112. Helen T-F, Cooper J (1999) Present and future possibilities for defining a phenotype for specific language impairment. J Speech Lang Hear Res 42:1275–1278. https://doi. org/10.1044/jslhr.4205.1275 113. Tomblin JB et al (1997) Prevalence of specific language impairment in kindergarten children. J Speech Lang Hear Res 40:1245– 1260. https://doi.org/10.1044/jslhr.4006. 1245 114. Evans JL, Brown TT (2016) Chapter 72 – Specific language impairment. In: Hickok G, Small SL (eds) Neurobiology of language. Academic Press, San Diego, pp 899–912 115. Pijnacker J et al (2017) Semantic processing of sentences in preschoolers with specific language impairment: evidence from the N400 effect. J Speech Lang Hear Res 60:627–639. https://doi.org/10.1044/2016_JSLHR-L15-0299 116. Evans JL, Selinger C, Pollak SD (2011) P300 as a measure of processing capacity in auditory and visual domains in specific language impairment. Brain Res 1389:93–102. https://doi.org/10.1016/j.brainres.2011. 02.010 117. Yairi E, Ambrose N (2013) Epidemiology of stuttering: 21st century advances. J Fluen Disord 38:66–87. https://doi.org/10. 1016/j.jfludis.2012.11.002 118. Etchell AC et al (2018) A systematic literature review of neuroimaging research on developmental stuttering between 1995 and 2016. J Fluen Disord 55:6–45. https://doi.org/10. 1016/j.jfludis.2017.03.007 119. Kaganovich N, Wray AH, Weber-Fox C (2010) Non-linguistic auditory processing and working memory update in pre-school children who stutter: an electrophysiological study. Dev Neuropsychol 35:712–736.

774

Laura Verga et al.

https://doi.org/10.1080/87565641.2010. 508549 120. Piispala J et al (2017) Atypical brain activation in children who stutter in a visual Go/Nogo task: an ERP study. Clin Neurophysiol 128:194–203. https://doi.org/10. 1016/j.clinph.2016.11.006 121. Kreidler K, Hampton WA, Usler E, Weber C (2017) Neural indices of semantic processing in early childhood distinguish eventual stuttering persistence and recovery. J Speech Lang Hear Res 60:3118–3134. https://doi.org/ 10.1044/2017_JSLHR-S-17-0081 122. Vallar G et al (1988) Recovery from aphasia and neglect after subcortical stroke: neuropsychological and cerebral perfusion study. J Neurol Neurosurg Psychiatry 51:1269– 1276. https://doi.org/10.1136/jnnp.51. 10.1269 123. Wallesch C-W, Blanken G (2000) Recurring utterances—how, where, and why are they generated? Brain Lang 71:255–257. https:// doi.org/10.1006/brln.1999.2263 124. Wallesch C-W (2003) Sprache. In: Karnath H-O, Thier P (eds) Neuropsychologie. Springer, Berlin, Heidelberg, pp 551–555 125. Wallesch C-W (1985) Two syndromes of aphasia occurring with ischemic lesions involving the left basal ganglia. Brain Lang 25:357–361. https://doi.org/10.1016/ 0093-934X(85)90090-2 126. Wallesch CW, Papagno C (1988) Subcortical aphasia. In: Aphasia. Whurr Publishers, London, pp 256–287 127. Tankus A, Fried I (2019) Degradation of neuronal encoding of speech in the subthalamic nucleus in Parkinson’s disease. Neurosurgery 84:378. https://doi.org/10.1093/ neuros/nyy027 128. Crosson B (1985) Subcortical functions in language: a working model. Brain Lang 25: 257–292. https://doi.org/10.1016/0093934X(85)90085-9 129. Cappa SF, Vignolo LA (1979) “Transcortical” features of aphasia following left thalamic hemorrhage. Cortex 15:121–129. https:// doi.org/10.1016/S0010-9452(79)80012-X 130. Crosson B et al (1999) Mapping of semantic, phonological, and orthographic verbal working memory in normal adults with functional magnetic resonance imaging. Neuropsychology 13:171. https://doi.org/10.1037/ 0894-4105.13.2.171 131. Raymer AM et al (1997) Lexical–semantic deficits in two patients with dominant thalamic infarction. Neuropsychologia 35:211–

219. https://doi.org/10.1016/S0028-3932 (96)00069-3 132. Nadeau SE, Crosson B (1997) Subcortical aphasia. Brain Lang 58:355–402. https:// doi.org/10.1006/brln.1997.1707 133. Weiller C et al (1993) The case of aphasia or neglect after striatocapsular infarction. Brain 116:1509–1525. https://doi.org/10.1093/ brain/116.6.1509 134. Adolphs R, Damasio H, Tranel D (2002) Neural systems for recognition of emotional prosody: a 3-D lesion study. Emotion 2:23– 51. https://doi.org/10.1037/1528-3542.2. 1.23 135. Baum SR, Pell MD (1999) The neural bases of prosody: insights from lesion studies and neuroimaging. Aphasiology 13:581–608. h t t p s : // d o i . o r g / 1 0 . 1 0 8 0 / 026870399401957 136. Buchanan TW et al (2000) Recognition of emotional prosody and verbal components of spoken language: an fMRI study. Cogn Brain Res 9:227–238. https://doi.org/10. 1016/S0926-6410(99)00060-9 137. Wildgruber D et al (2004) Distinct frontal regions subserve evaluation of linguistic and emotional aspects of speech intonation. Cereb Cortex 14:1384–1389. https://doi.org/10. 1093/cercor/bhh099 138. Lieberman P (2001) Human language and our reptilian brain: the subcortical bases of speech, syntax, and thought. Perspect Biol Med 44:32–51. https://doi.org/10.1353/ pbm.2001.0011 139. Albuquerque L et al (2016) Advanced Parkinson disease patients have impairment in prosody processing. J Clin Exp Neuropsychol 38: 208–216. https://doi.org/10.1080/ 13803395.2015.1100279 140. Breitenstein C et al (2001) Impaired perception of vocal emotions in Parkinson’s disease: influence of speech time processing and executive functioning. Brain Cogn 45:277–314. https://doi.org/10.1006/BRCG.2000. 1246 141. Pell MD, Leonard CL (2003) Processing emotional tone from speech in Parkinson’s disease: a role for the basal ganglia. Cogn Affect Behav Neurosci 3:275–288. https:// doi.org/10.3758/CABN.3.4.275 142. Pell MD, Cheang HS, Leonard CL (2006) The impact of Parkinson’s disease on vocalprosodic communication from the perspective of listeners. Brain Lang 97:123–134. https://doi.org/10.1016/j.bandl.2005. 08.010

Neurophysiology of Language Pathologies 143. Pell MD, Leonard CL (2005) Facial expression decoding in early Parkinson’s disease. Cogn Brain Res 23:327–340. https://doi. org/10.1016/j.cogbrainres.2004.11.004 144. Catani M, Mesulam M (2008) The arcuate fasciculus and the disconnection theme in language and aphasia: history and current state. Cortex 44:953–961. https://doi.org/10. 1016/j.cortex.2008.04.002 145. Friederici AD (2012) The cortical language circuit: from auditory perception to sentence comprehension. Trends Cogn Sci 16:262– 268. https://doi.org/10.1016/j.tics.2012. 04.001 146. Hagoort P (2019) The neurobiology of language beyond single-word processing. Science 366:55–58. https://doi.org/10.1126/ science.aax0289 147. Hickok G, Poeppel D (2007) The cortical organization of speech processing. Nat Rev Neurosci 8:393–402. https://doi.org/10. 1038/nrn2113 148. Hartwigsen G (2016) Adaptive plasticity in the healthy language network: implications for language recovery after stroke. Neural Plast 2016:e9674790. https://doi.org/10. 1155/2016/9674790 149. Hage SR, Nieder A (2016) Dual neural network model for the evolution of speech and language. Trends Neurosci 39:813–829. https://doi.org/10.1016/j.tins.2016. 10.006 150. Price CJ (2010) The anatomy of language: a review of 100 fMRI studies published in 2009. Ann N Y Acad Sci 1191:62–88. https://doi.org/10.1111/j.1749-6632. 2010.05444.x 151. Perani D et al (2011) Neural language networks at birth. Proc Natl Acad Sci 108: 16056–16061. https://doi.org/10.1073/ pnas.1102991108 ´ , Zikopoulos B 152. Barbas H, Garcı´a-Cabezas MA (2013) Frontal-thalamic circuits associated with language. Brain Lang 126:49–61. https://doi.org/10.1016/j.bandl.2012. 10.001 153. Utianski RL et al (2019) Electroencephalography in primary progressive aphasia and apraxia of speech. Aphasiology 33:1410– 1417. https://doi.org/10.1080/02687038. 2018.1545991 154. Marangolo P (2020) The potential effects of transcranial direct current stimulation (tDCS) on language functioning: combining neuromodulation and behavioral intervention in aphasia. Neurosci Lett 719:133329. https:// doi.org/10.1016/j.neulet.2017.12.057

775

155. Wortman-Jutt S, Edwards DJ (2017) Transcranial direct current stimulation in poststroke aphasia recovery. Stroke 48:820–826. https://doi.org/10.1161/STROKEAHA. 116.015626 156. Torres J, Drebing D, Hamilton R (2013) TMS and tDCS in post-stroke aphasia: integrating novel treatment approaches with mechanisms of plasticity. Restor Neurol Neurosci 31:501–515. https://doi.org/10. 3233/RNN-130314 157. Thiel A et al (2013) Effects of noninvasive brain stimulation on language networks and recovery in early poststroke aphasia. Stroke 44:2240–2246. https://doi.org/10.1161/ STROKEAHA.111.000574 158. Norise C, Hamilton RH (2017) Non-invasive brain stimulation in the treatment of poststroke and neurodegenerative aphasia: parallels, differences, and lessons learned. Front Hum Neurosci 10:675. https://doi.org/10. 3389/fnhum.2016.00675 ˇ 159. Rektorova´ I, Anderkova´ L (2017) Chapter thirty-eight – noninvasive brain stimulation and implications for nonmotor symptoms in Parkinson’s disease. In: Chaudhuri KR, Titova N (eds) International review of neurobiology. Academic Press, pp 1091–1110 160. Lee HK et al (2019) Does transcranial direct current stimulation improve functional locomotion in people with Parkinson’s disease? A systematic review and meta-analysis. J Neuroeng Rehabil 16:84. https://doi.org/10. 1186/s12984-019-0562-4 161. Zanjani A, Zakzanis KK, Daskalakis ZJ, Chen R (2015) Repetitive transcranial magnetic stimulation of the primary motor cortex in the treatment of motor signs in Parkinson’s disease: a quantitative review of the literature. Mov Disord 30:750–758. https://doi.org/ 10.1002/mds.26206 162. Teichmann M et al (2016) Direct current stimulation over the anterior temporal areas boosts semantic processing in primary progressive aphasia. Ann Neurol 80:693–707. https://doi.org/10.1002/ana.24766 163. McConathey EM et al (2017) Baseline performance predicts tDCS-mediated improvements in language symptoms in primary progressive aphasia. Front Hum Neurosci 11:347. https://doi.org/10.3389/fnhum. 2017.00347 164. Ficek BN et al (2018) The effect of tDCS on functional connectivity in primary progressive aphasia. Neuroimage Clin 19:703–715. https://doi.org/10.1016/j.nicl.2018. 05.023

776

Laura Verga et al.

165. Lada´nyi E et al (2020) Is atypical rhythm a risk factor for developmental speech and language disorders? Wiley Interdiscip Rev Cogn Sci 11:e1528. https://doi.org/10.1002/ wcs.1528 166. Alajouanine TH, Lhermitte F (1965) Acquired aphasia in children. Brain 88:653– 662. https://doi.org/10.1093/brain/88. 4.653 167. Vicari S et al (2000) Plasticity and reorganization during language development in children

with early brain injury. Cortex 36:31–46. https://doi.org/10.1016/S0010-9452(08) 70834-7 168. Dronkers NF, Ivanova MV, Baldo JV (2017) What do language disorders reveal about brain-language relationships? From classic models to network approaches. J Int Neuropsychol Soc 23:741–754. https://doi.org/10. 1017/S1355617717001126

Chapter 25 Electrophysiological Correlates of Second-Language Acquisition: From Words to Sentences Sendy Caffarra and Manuel Carreiras Abstract The chapter examines the impact of experience-dependent factors (cross-linguistic similarities between the first and second languages, age of acquisition, proficiency, and quality of language exposure) on secondlanguage phonological, semantic, and syntactic processing. Event-related potential (ERP) studies on second-language analysis are examined and summarized for each level of sentence analysis and each experiential factor. The overview provided here points to a largely qualitative distinction between experience-dependent effects observed in phonology/syntax and those reported in semantics, with experience having a stronger impact on the first two domains. The chapter also highlights novel research directions to be pursued and invites to reflection on methodological choices made in bilingual research literature. Key words Bilingualism, ERP, Syntax, Semantics, Phonology, L1–L2 similarity, AoA, Proficiency, Immersion

1

Introduction In today’s multicultural society, it is common to learn a second language (L2) and an increasing proportion of the world population is now bi-(or multi-)lingual. This multilingual community is far from homogeneous, exhibiting a wide range of variability in L2 attainment. This variability reflects different levels of L2 sentence analysis (phonology, semantics, syntax), as well as different types of language measures (behavioral and neural). Several research studies have tried to account for this variability by examining the impact of experiential factors, such as cross-linguistic similarities between the first and second languages, L2 age of acquisition (AoA), L2 proficiency, and the quantity/quality of L2 exposure (classroombased vs. immersion). The present chapter provides a synthetic overview of the impact of these factors on L2 phonological, semantic, and syntactic levels of analysis. We will focus on electrophysiological correlates of

Mirko Grimaldi et al. (eds.), Language Electrified: Principles, Methods, and Future Perspectives of Investigation, Neuromethods, vol. 202, https://doi.org/10.1007/978-1-0716-3263-5_25, © Springer Science+Business Media, LLC, part of Springer Nature 2023

777

778

Sendy Caffarra and Manuel Carreiras

Table 1 Summary of L2 factors and the associated theoretical perspectives L2 factors

Theoretical frameworks

L1–L2 similarity

L1 knowledge is the basis for L2 analysis: when L1 and L2 linguistic features are similar, L1 knowledge can be transferred to the L2 (positive transfer); when L2 rules are absent or different from L1 rules, cross-linguistic transfer is not possible or may lead to mistakes (negative transfer [10, 11]; but see [8, 9] for an alternative view on L2 phonology)

AoA

L2 final attainment depends on the age when L2 acquisition begins: when L2 learning starts before puberty, syntactic performance is more likely to be within the native range, compared to late L2 acquisition [85, 89, 90]. This is the result of interactions between maturational and experiential factors [91]

Proficiency

Distinct levels of L2 proficiency correspond to qualitatively different approaches to L2 analysis [78, 92, 93]: at low-proficiency levels, people mainly rely on the declarative system and frequency-driven mechanisms; at high-proficiency levels, language analysis is more based on rule-driven combinatorial processes (procedural system)

L2 exposure type

The quality of L2 input people receive influences how an L2 is acquired and computed [94]: compared to classroom-based instruction, extended naturalistic exposure may provide a wider variety of language sources and functional interactions, which can boost L2 learning and automatic comprehension abilities [95]

Table readapted from [1]

language analysis (event-related potentials, ERPs) in order to provide a fine-grained picture of these variables at these three levels of L2 processing. This work summarizes and further complements previous reviews on more specific topics [1, 2], expanding the focus to distinct domains of L2 language analysis. 1.1 Experiential Factors

The four factors examined here (L1–L2 similarity, AoA, proficiency, L2 exposure type) have been associated with distinct theoretical frameworks. The role of each L2 factor is still controversial, and different hypotheses have been proposed in order to describe their specific impacts on L2 processing. Table 1 offers a synthetic description of the theoretical perspectives associated with each factor (for a more detailed overview, see [1]). Please note that although all the theoretical models described below would admit more than one influential factor on L2 processing, Table 1 associates each factor with those models that most strongly highlight its role.

1.2 The ERP Technique in L2 Studies

ERPs have been widely employed to investigate language processing since they provide online measures of brain activity (cf. Chaps. 5 and 6). Specifically, ERPs represent a non-invasive measure of electrophysiological brain activity time-locked to the onset of an external event (e.g., a word appearing on a screen). This brain activity is measured at the scalp and mainly reflects the sum of

Experience-Dependent Effects on L2 Processing

779

synchronized postsynaptic potentials across large groups of cortical pyramidal cells. This technique is particularly useful for studying rapid brain responses to a linguistic stimulus; its high temporal resolution allows us to examine the specific brain responses elicited by a single phoneme, syllable, word, or morphosyntactic structure. ERP language studies often adopt violation paradigms where target words, containing violations of the specific feature under study, are compared with a control condition, similar to the target except for the violated feature. This paradigm is based on the assumption that—if all other linguistic variables are held constant—brain reactions to violations, compared to brain reactions to control stimuli, will reflect processes related to the use of the feature in question (as well as further processes involved in dealing with a violation). The ERP studies on L2 described below typically reported between-condition differences related to the latency, amplitude, or topography of a specific ERP correlate. Table 2 provides a short description of the ERP components examined in this chapter. Note that this table is not meant to be an exhaustive description of all ERP correlates of language perception, but provides only those details essential for understanding the following sections (for further details, see Chaps. 5 and 6).

2

The Impact of Experiential Factors on L2 Processing The following sections provide a selective overview of ERP findings related to the effects of the abovementioned factors on L2 comprehension processing. Three levels of analysis will be considered: phonology, semantics, and syntax. Note that while the L2 phonology section focuses on ERP findings for single-word presentations (and oddball paradigms), the following sections (L2 semantics and L2 syntax) mainly describe studies of sentence comprehension (and violation paradigms). We only consider ERP evidence for unimodal bilinguals and report behavioral evidence when no ERP findings are available. We hope that this chapter provides a useful overview of experience-dependent effects on L2 processing and points to directions for future research.

3

L2 Phonology L1–L2 Similarity Behavioral and electrophysiological studies have shown that L1 phonology has an impact on L2 sound perception and production ([3, 4]; see also [5, 6]). The perceived relationship between L1 and L2 sounds predicts how well learners will approximate and process L2 phonology. This is particularly true for late L2 learners [3, 7]. Not all theoretical perspectives agree on the specific direction of the effect of L1–L2 similarity in phonology. While

780

Sendy Caffarra and Manuel Carreiras

Table 2 Principal language-related ERP correlates described in this chapter Max. amplitude Topographic Type of violation (ms) distribution (experimental paradigm)

Main functional characterization

MMN 200

Frontocentral

Odd stimulus in a repetitive sequence of auditory stimuli (oddball paradigm)

ELAN 150

Left-anterior

Outright violations of obligatory Automatic early syntactic phrase structure (violation parsing processes, in which paradigm) word categories are used to build up an initial phrase structure [97]

LAN

Left-anterior

Difficulties in integrating Morphosyntactic violations morphosyntactic information within a sentential context, within a sentence structure such as grammatical agreement with the final goal of thematic violations, but also tenserole assignment [97] or marking and case-marking mismatch detection during violations (violation paradigm) linking processes of agreement computation [98, 99]. These processes are characterized by a high degree of automaticity [100]

N400 400

Centroposterior

Lexico-semantic anomalies where Difficulty processing lexicosemantic information the target word is difficult to associated with a word [101], integrate/predict on the basis which can depend on both of the context. Syntactic the functional organization of violations that require deeper semantic memory [102] and semantic integration or global local contextual constraints wrap-up processes at the end of [103] the sentence (violation paradigm)

P600

Posterior

Several violations of syntactic and Processes of syntactic reanalysis and repair following the morphosyntactic features, such detection of as phrase structure violations ungrammaticality or and agreement violations; but ambiguity [97, 104]. Costs also thematic-rule structure for monitoring, checking, and violations, temporary reprocessing input ambiguities, semantic [105]. Late processes of anomalies, and long-distance integration which are not dependencies (violation syntactic specific [106] paradigm)

400

600

Table readapted from [1]

Pre-attentive neural detection of a change in a constant property of the auditory environment [96]

Experience-Dependent Effects on L2 Processing

781

Best’s and Flege’s models [8, 9] claim that the greater the perceived distance between the L1 and the L2 phonemic elements, the better the discrimination (learnability) is, the contrastive analysis hypothesis [10, 11] predicts a positive transfer only when the two sounds are similar (see Table 1). EEG studies have used oddball paradigms to measure automatic phonological discrimination (indexed by MMN effects) for native (i.e., consonants and vowels shared by the L1 and L2) and non-native (i.e., L2 consonants or vowels that are not present or phonetically differ from the L1) phonological contrasts. Results have shown that in late bilinguals MMN effects for native contrasts are greater than those for non-native contrasts [7, 12, 13] and MMN effects for native contrasts can sometimes be comparable to those observed in native listeners ([14, 13]; cf. [12]). Non-native contrasts usually show small or even absent MMN effects [7, 8, 13] pointing to the fact that non-native contrasts are difficult to learn and may be assimilated into L1 phonological categories [15], which seems to be in line with the contrastive analysis hypothesis. These findings suggest that the L1 can facilitate (in the case of native contrasts) or hamper (in the case of non-native contrasts) the acquisition and perception of L2 sounds. In addition, these ERP studies show that acquiring automatic discrimination responses for non-native contrasts is particularly challenging and might depend on further experiential factors, such as AoA and the amount of L1–L2 language usage [16]. AoA To our knowledge, there is no clear evidence (i.e., MMN effects) supporting a relation between AoA and electrophysiological correlates of phonological discrimination. Hisagi, Garrido-Nag, Datta, and Shafer [17] tested early and late Spanish–English bilinguals and reported smaller MMN effects for both bilingual groups as compared to monolingual natives. Importantly, no electrophysiological differences were observed depending on bilinguals’ AoA, although some AoA effects were reported at the behavioral level. Similarly, two ERP studies that separately focused on different groups of early and late bilinguals showed MMN responses that were similar in nature between the two AoA groups [18, 19]. These ERP findings contrast with available behavioral evidence consistently showing that L2 sound perception and production typically declines as AoA increases ([8, 20–26]; cf. [27]; for an overview see [25]). Further ERP studies are needed in order to clarify this apparent difference between behavioral and electrophysiological findings. Overall, we can say that earlier L2 experience results in more accurate behavioral performance in speech perception; however qualitatively similar neural mechanisms for L2 sound discrimination seem to be involved at different AoAs. Given the available findings, we cannot exclude the possibility that AoA might exert a finer modulatory effect on the degree of automaticity of brainevoked responses [28].

782

Sendy Caffarra and Manuel Carreiras

Proficiency Available ERP findings on the role of L2 proficiency in phonology are quite heterogeneous and include different types of bilingual profiles (e.g., fluent or non-fluent bilinguals) and different definitions of L2 proficiency (e.g., amount of formal L2 instruction or accuracy of L2 pronunciation). When non-fluent bilinguals who received mainly L2 classroom-based education were tested, no effect of L2 proficiency (measured as duration of formal L2 instruction) was observed on electrophysiological correlates of phonological discrimination (indexed by MMN effects [12]; see also [29]). However, when L2 language training specifically focused on improving phonological skills was considered, experience-induced neural changes were reported, with a progressive emergence of MMN modulations for novel phonological contrasts as L2 proficiency increased ([30–32]; cf. [15]). Finally, White et al. [14, 33] tested the impact of proficiency on a more expert bilingual profile: English–French late bilinguals who were exposed to naturalistic L2 input with different levels of L2 proficiency (defined as the quality of L2 pronunciation). In this case, when syllables with native and non-native contrasts were presented, bilingual MMN effects were similar to those of native speakers, with no modulations due to L2 proficiency. L2 proficiency effects were reported only in lexical tasks where English words and pseudowords were presented and could be distinguished only based on (native or non-native) phonological contrasts. Greater and more native-like discrimination effects (measured by N400 lexicality effects) were observed as L2 proficiency improved. This effect was more evident for non-native contrasts. Overall, it is difficult to summarize the effects of L2 proficiency on phonology given the wide range of participant profiles examined and proficiency measures used. As a tentative summary, it seems that there is some evidence supporting effects of L2 phonology-specific training on automatic phonological discrimination responses (indexed by MMN). Evidence for L2 proficiency effects has been also reported at later stages of lexical analysis (indexed by N400 effects) in fluent bilinguals (who have been exposed to naturalistic L2 input). Note that the heterogeneity of the employed experimental designs also suggests that other factors (such as the type of L2 exposure) might contribute to these apparent discrepancies in ERP results. L2 Exposure Type The studies described in the proficiency section above suggest that the quality of L2 training (classroom-basedvs. immersion) may play an important role in automatic responses based on phonological discrimination analysis (indexed by MMN effects). Specifically, the length of classroom-based L2 training does not seem to have an effect on the MMN responses of late L2 learners, and formal L2 training does not guarantee improvements in L2 phonological discrimination [7, 12, 29]. These findings were

Experience-Dependent Effects on L2 Processing

783

further replicated in additional research work [28, 34, 35]. However, the amount of naturalistic L2 input (either delivered through immersion or through training that targets phonology) seems to affect MMN responses, which become progressively more similar to those observed in natives ([13, 30–32]; cf. [15]). Some evidence supporting the importance of a natural language environment to detect MMN changes has also been reported in kindergarten children with a high degree of exposure to L2 phonology ([36–38], but see [39, 40] where differences with MMN responses from natives could still be appreciated). In order to account for these results, some authors have noted that classroom-based programs are sometimes taught by non-native speakers and feature-restricted conversational settings, limiting the amount of L2 native input. In contrast, L2 immersion is more likely to offer a wide range of social situations where a massive amount of naturalistic L2 input is available, and L2 learners can achieve relevant communicative goals within authentic conversational contexts [33]. Overall, these findings suggest that the quantity and quality of L2 input have an effect on early electrophysiological responses that are pivotal for the acquisition of new native-like phonetic representations. In summary, ERP studies on phonology have quite consistently shown that the L1 plays a constraining role in L2 phonological analysis and predicts L2 learners’ ability to automatically discriminate new sounds. AoA seems to have an impact on L2 perception accuracy, although this has yet to be confirmed by electrophysiological findings. Similarly, the role of L2 proficiency in ERP phonological responses requires further clarification. The available findings certainly highlight the need for a more consistent definition of L2 proficiency and cleaner experimental designs that better isolate the effects of different experience-dependent factors. Finally, long-term naturalistic L2 immersion seems to be more effective than classroom-based training in terms of consolidating early automatic responses related to L2 sound discrimination.

4

L2 Semantics L1–L2 Similarity To our knowledge, no ERP study on L1–L2 semantic similarity is available in the literature. However, several behavioral studies have been conducted on this topic, and the available experimental evidence is summarized here below. We know that L1 lexico-sematic knowledge influences the first stages of L2 acquisition and seems to have an impact on the way bilinguals process, recognize, and translate L2 words [41–44]. Given the range of definitions of L1–L2 semantic similarity, here we decided to distinguish between two types of scenarios encountered in the

784

Sendy Caffarra and Manuel Carreiras

literature. The first scenario makes use of semantically unambiguous words, where there is a clear one-to-one semantic mapping between L1 and L2 words. Behavioral evidence has shown that the higher the L1–L2 semantic similarity between word pairs, the greater the interactive effects observed, with greater facilitation effects in priming tasks and greater interference effects in translation recognition tasks [45–47]. In addition, it has been shown that semantic priming effects between L1 and L2 words can be as strong as semantic priming effects between L1 words, suggesting a strong interaction between sematic representations within and across languages [48]. These interactive effects due to L1–L2 semantic similarity have been interpreted as reflecting the degree of activation of interconnected representations at the semantic level ([42, 47, 48]; cf. [43]). The second scenario involves L1–L2 semantic mapping that is indirect and ambiguous (one L1 word can be translated by multiple L2 words). Studies have shown that ambiguous L2 translation equivalents are harder to learn, produce, and recognize than unambiguous words [49]. This difficulty is modulated by the overall semantic proximity of the L1 word and all possible L2 translations, with facilitation effects in L2 word recognition as semantic similarity increases [50]. AoA Web-Fox and Neville [51, 52] conducted ERP studies in the USA on Chinese immigrants. They tested the effect of L2 language experience by presenting semantic violations to Chinese–English bilinguals who had acquired their L2 at different ages (1–3, 4–6, 7–10, 11–13, over 15 years of age). An N400 effect was observed in all groups with no differences in amplitude. Latency differences were observed for late L2 learners, with a delayed N400 violation effect for those Chinese immigrants who learned English after 11 years of age (see also [53]). These findings have been taken to suggest that brain systems mediating semantic analysis are quite robust in populations with different early language experiences. Semantic analysis seems to recruit similar cortical network across different AoAs; however the starting point of L2 experience modulates the speed of semantic processing and L2 fluency [51, 52]. Nonetheless, it is worth noting that additional factors may have influenced the results reported by Weber-Fox and Neville [51, 52]. For instance, the participants who acquired their L2 after the age of 11 differed from early L2 speakers not only in terms of AoA but also in terms of their L2 proficiency (level of performance on standardized linguistic tests). Proficiency Bowden et al. [54] tested English late learners of Spanish who received different amounts of Spanish training (1 or 3 years) and consequently had different levels of proficiency (low or advanced). Spanish semantic violations elicited similar N400 effects

Experience-Dependent Effects on L2 Processing

785

in both bilingual groups (although only highly proficient bilinguals showed additional late positive effects). No differences in the amplitude of the N400 effect were reported between bilingual proficiency groups and between bilinguals and native speakers. This has been confirmed by other cross-sectional and longitudinal ERP studies ([55–57], but see [58] for amplitude changes with finer-grained semantic manipulations in plausible texts). The only N400 differences reported relate to topographical distribution [54, 55] or latency [55], with broader and/or delayed (and longer-lasting) N400 effects for low-proficiency bilinguals compared to high-proficiency bilinguals and native speakers. The lack of amplitude changes in N400 violation effects suggests that L2 speakers at different levels of L2 proficiency may rely on similar neurocognitive mechanisms for lexico-semantic processing. In other words, implausible semantic context seems to have similar effects across non-native speakers with different levels of proficiency (and across native and non-native speakers). Although cortical processing of semantic information seems robust already at low levels of L2 competence, changes in N400 latency and topography suggest that L2 proficiency might affect the speed of semantic processing and the specificity of neural recruitment during semantic analysis (i.e., faster processing and more focal neural activity as L2 proficiency increases). L2 Exposure Type To our knowledge, no ERP study has directly compared L2 speakers with different types of L2 exposure, and it is still unclear whether ERP correlates of semantic analysis would be affected by this manipulation. However, it is worth noting that L2 training duration (through immersion or classroom-based) was a potential confound in ERP studies that focused on the effects of L2 proficiency on semantics [54–57]: different levels of proficiency corresponded to different amounts of immersion in the L2-speaking country [54, 55] or classroom-based training ([56, 57], longer for those participants that showed higher L2 proficiency). Based on this observation, we cannot exclude that the reported L2 proficiency effects on N400 latency and topography might be (at least partially) due to the type of L2 exposure. Further studies are needed to clarify this point. Overall, L1–L2 similarity seems to have behavioral effects on L2 word recognition and translation. However, to date there is no ERP evidence, and the time course of cross-linguistic effects on semantic analysis remains unclear. In addition, AoA delays seem to be associated with slower onsets in the electrophysiological correlates of semantic analysis (N400). L2 proficiency seems to have a similar effect on latencies, with earlier onset of semantic processes and more focal recruitment of cortical resources as L2 competence increases. Note that these proficiency effects can also be ascribed to

786

Sendy Caffarra and Manuel Carreiras

AoA and/or L2 exposure effects. Importantly, the majority of the ERP evidence examined here shows no modulation of the N400 amplitude in bilinguals (for comparisons between native and non-native speakers, see also [59–63]; cf. [58, 64]). This suggests that the brain systems mediating semantic analysis are quite robust to the effects of the experience-dependent factors examined here.

5

L2 Syntax L1–L2 Similarity Several ERP studies have investigated L1–L2 syntactic similarity using violation paradigms. Tokowicz and MacWhinney [65] observed an L1-like P600 response in English L2 speakers of Spanish when L2 agreement violations were similar to violations in the L1 or absent in the L1. However, no P600 was found when the syntactic constructions involved grammatical features that were differently instantiated in the two languages. Additional findings further showed that the P600 effect can be reduced (or even absent) when grammatical features are differently expressed in the two languages [66–68] and also when they are unique to the L2 ([55, 63, 69–72] Exp.2). Given these findings, some authors have suggested that L2 speakers are less likely to exhibit a P600 in response to violations of L2 grammatical features that cannot be transferred from the L1 [55, 63, 69]. Violations of these L2 grammatical structures would not be sufficiently salient to trigger the neurocognitive processes that are reflected by the P600 (cf. [33]) or even earlier automatic responses of violation detection (i.e., LAN [72]). AoA In an ERP study with different AoA groups of Chinese immigrants to the USA, Weber-Fox and Neville [51] observed that English (L2) ERP correlates for phrase structure violations differed from those of English (monolingual) natives, especially when the L2 had been acquired after age 11. In monolingual natives, word category violations initially elicited greater left negativities followed by a P600 effect; L2 speakers with a late AoA (over 11 years of age) showed anterior negativities without any clear left lateralization, while no P600 effect was observed when L2 acquisition began after 16 years of age. Based on this, it was suggested that as AoA increases, neural correlates of early syntactic processes show reduced left hemispheric specialization, and ERP effects underlying late processes of revision are delayed or do not even occur [51, 73, 74]. The authors suggested that this shift is due to biological and/or experiential changes that happened after puberty. However, we should point out two caveats regarding these findings. First, the AoA manipulation overlapped with L2 proficiency differences,

Experience-Dependent Effects on L2 Processing

787

which might, at least partially, account for the results. Second, when AoA was treated as a continuous (rather than a categorical) variable, there was a negative correlation with the P600 effect but, crucially, no discontinuity starting at puberty [75]. Hence, although the relation between AoA and P600 seems to be fairly consistent ([51, 73, 74], but see [70]), the nature of this effect and its interconnections with other experiential factors need further clarification. Proficiency Rossi et al. [76] conducted several cross-sectional ERP experiments on L2 speakers of German and Italian, presenting sentences with morphosyntactic and phrase structure violations. The general pattern was that low-proficiency L2 speakers did not show a LAN effect for morphosyntactic violations and reported a delayed P600 for both types of violations (see also [54, 55, 73]; cf. [77]) compared to natives. At higher levels of proficiency, the authors observed a response similar to that for the L1, although amplitude differences remained. They concluded that, at a high L2 proficiency levels, an L1-like brain response can be observed, reflecting early automatic parsing processes followed by late processes of re-analysis and repair. Another important study on the effects of L2 proficiency was a longitudinal study conducted by Osterhout et al. [56], where English learners of French were tested at different stages of L2 acquisition (hence, at different levels of L2 proficiency). After 1 month of L2 training, subject–verb agreement violations elicited a greater N400 effect; after 4 months of training, the same grammatical errors elicited a greater P600 effect. Based on these findings, the authors proposed that there is a qualitative change in electrophysiological correlates as L2 learning unfolds over time. The N400 effects suggest that beginning L2 learners treat syntactic anomalies as infrequent word combinations [56, 76]. The P600 effects suggest that more advanced L2 learners identify the structural nature of the syntactic anomaly and try to repair and re-analyze the grammatical violation. Although this theory is highly popular in the L2 literature on L2 syntactic analysis, it should be noted that more recent ERP studies do not fully confirm it. Specifically, the finding that increased L2 proficiency reduced the N400 effect has not been replicated ([79–82]; cf. [83]), while the progressive emergence of P600 effects as a function of L2 training has been confirmed ([80–82], see also [1] for a review). L2 Exposure Type Brito [84] qualitatively examined the role of immersion duration in a small group of native English learners of French. The experimental sample was divided into three small subsets with no (only classroom instruction), medium (up to 6 months), or long (more than 6 months) immersion in a

788

Sendy Caffarra and Manuel Carreiras

French-speaking country. All participants showed high levels of French proficiency and were presented with French sentences that could contain morphosyntactic violations (subject–verb person agreement violations). A visual exploration of the ERP results seems to suggest that the longer the immersion, the more P600 violation effects increase and resemble those observed in native speakers (similar results were found for L2 gender agreement violations in English–Spanish bilinguals [79]). The authors interpreted these results as suggesting that native-like electrophysiological responses associated with revision are more likely to appear and be consolidated after long periods of immersion. In addition, it is worth noting that the previously described early negative as well as P600 ERP effects attributed to L2 proficiency might be partially accounted for by immersion duration [54, 55, 76]. Given that the available ERP evidence is still scarce and heterogeneous, we cannot yet draw any definitive conclusions on the role of immersion. In summary, with regard to L1–L2 similarity, some authors have claimed that when L2 syntactic features cannot be transferred from the L1, these features are not salient enough to trigger late controlled processes of integration, re-analysis, and monitoring (reflected by the P600). In addition, as AoA increases, ERP effects underlying late processes of checking and repair seem to be delayed or reduced. Conversely, as L2 proficiency increases, L2 speakers are more likely to show late processes of re-analysis and repair (reflected by the P600), which might index reliance on rule-driven combinatorial processes during sentence comprehension. Finally, long periods of immersion seem to be associated with native-like L2 neural correlates (such as P600 effects), but more experimental evidence is needed to further replicate these findings.

6

General Conclusion Overall, this short overview of ERP findings related to the impact of various factors on L2 phonology, semantics, and syntax reveals qualitatively distinct experience-dependent effects depending on the level of analysis being examined. It seems that both the presence and strength of early automatic responses involved in phonological discrimination are strongly affected by a number of experience-dependent factors, such as L1 phonological knowledge and the quality of L2 exposure. The effects of experience (AoA, L2 proficiency, L2 immersion) on semantics seem to mainly influence the latency (or the topography) of electrophysiological correlates, rather than the amplitude. Finally, in respect to syntax, all the examined experiential factors appeared to impact the presence and strength of late neural responses associated with monitoring and revision processes (indexed by the P600).

Experience-Dependent Effects on L2 Processing

789

This overview of the bilingual ERP literature points to some interesting implications. Specifically, while in L2 phonology and syntax the impact of experience appears as amplitude differences in certain evoked brain responses (the MMN and P600), in L2 semantics, it appears as a latency (or topographic) difference. Consequently, L2 phonological and syntactic processing, but not semantic processing, seem to affect the strength of the neural responses involved. Neural mechanisms of L2 semantic analysis seem less vulnerable than L2 phonology and syntax to the experience-dependent factors examined here. It seems that L2 phonology and syntax are the most difficult linguistic skills to attain and also those most sensitive to linguistic experience (see also [85] for similar accounts).

7

Future Perspectives and Challenges Our overview also provides an opportunity to highlight those research domains that would most benefit from further investigation. Specifically, we have pointed out that—due to the lack of or small amount of ERP evidence so far available—additional ERP studies should examine the role of: • AoA in L2 phonology, so it will be possible to clarify the time course of the behavioral effects already observed • L1–L2 similarity in L2 semantic analysis, so that we can better specify the temporal sequence of effects at the behavioral level • The quality of L2 exposure in syntactic analysis, so that we can verify and further complement the small amount of evidence available to date Regarding L2 phonology, our brief overview of the literature also suggests that intensive exposure to L2 native sources (e.g., as it happens during immersion) is important to achieve efficient L2 sound discrimination, which points to the need for a revision of L2 teaching programs to increase the amount of L2 naturalistic exposure. In addition, there is a general need for a more consistent definition of L2 proficiency. This construct has been operationalized in a large number of ways across studies (by measures of global proficiency, test batteries, as well as more specific measures related to a specific linguistic competence, e.g., the quality of L2 pronunciation). Also, it is worth noting that L2 proficiency strongly co-varies with AoA and the effects of the two factors are barely distinguishable. However, ERP studies have been mainly focused on the influence of one factor at a time, selecting samples of participants with a wide variety of characteristics. This has resulted in a complex and mixed picture. We should consider whether our methods adequately represent and measure the complex

790

Sendy Caffarra and Manuel Carreiras

convergence of factors. Stronger attention to the interactions between different experiential factors (rather than their main effects) might help reconcile controversial results and provide a more unified picture of ERP effects (see [1]). Finally, we invite the reader to observe that L1 violation ERP effects have been often considered an appropriate reference in the L2 sentence processing literature. This common experimental assumption has been adopted in order to examine how electrophysiological correlates of L2 sentence analysis differ from those observed in native (usually monolingual) speakers. Although this approach allows researchers to better understand the temporal dynamics of L2 processing, it should be noted that it includes a fundamental and, for some, controversial assumption: when people reach L2 final attainment, their linguistic processing should present characteristics similar to those of native speakers, which are considered to be the ideal standard for success in L2 learning ([86–88]; cf. [78]). In conclusion, the present chapter provided an overview of the impact of different experiential factors on L2 phonology, semantics, and syntax. It points to a largely qualitative distinction between experience-dependent effects observed in phonology/syntax and those reported in semantics, with experience having a stronger impact on the first two domains. The chapter also invites us to think about the methodological choices made by L2 researchers and encourages us to pursue novel research directions that highlight the interactive effects between multiple experiential factors. References 1. Caffarra S, Molinaro N, Davidson D et al (2015) Second language syntactic processing revealed through event-related potentials: an empirical review. Neurosci Biobehav Rev 51C:31–47 2. Kotz SA (2009) A critical review of ERP and fMRI evidence on L2 syntactic processing. Brain Lang 109(2–3):68–74 3. Flege JE (2003) Assessing constraints on second-language segmental production and perception. In: Meyer A, Schiller N (eds) Phonetics and phonology in language comprehension and production. Mouton de Gruyter, Berlin, pp 319–357 4. Iverson P, Evans BG (2007) Learning English vowels with different first-language vowel systems: perception of formant targets, formant movement, and duration. J Acoust Soc Am 122:2842–2854 5. Dehaene-Lambertz G, Dupoux E, Gout A (2000) Electrophysiological correlates of

phonological processing: a cross-linguistic study. J Cogn Neurosci 12:635–647 6. Winkler I, Lehtokoski A, Alku P et al (1999) Pre-attentive detection of vowel contrasts utilizes both phonetic and auditory memory representations. Cogn Brain Res 7:357–369 7. Grimaldi M, Sisinni B, Gili Fivela B et al (2014) Assimilation of L2 vowels to L1 phonemes governs L2 learning in adulthood: a behavioral and ERP study. Front Hum Neurosci 8:279 8. Best CT, Strange W (1992) Effects of phonological and phonetic factors on cross-language perception of approximants. J Phon 20:305– 333 9. Flege JE (1995) Second language speech learning: theory, findings and problems. In: Strange W (ed) Speech perception and linguistic experience: issues in cross-language research. York Press, Baltimore, pp 233–277

Experience-Dependent Effects on L2 Processing 10. Gass S, Selinker L (1992) Language transfer in language learning, Revised edn. John Benjamins, Amsterdam 11. MacWhinney B (2005) A unified model of language acquisition. In: Kroll JF, de Groot AMB (eds) Handbook of bilingualism: psycholinguistic approaches. Oxford University Press, New York, pp 49–67 12. Peltola MS, Kujala T, Tuomainen J et al (2003) Native and foreign vowel discrimination as indexed by the mismatch negativity (MMN) response. Neurosci Lett 352:25–28 13. Winkler I, Kujala T, Tiitinen H et al (1999) Brain responses reveal the learning of foreign language phonemes. Psychophysiology 36: 638–642 14. White EJ, Titone D, Genesee F et al (2017) Phonological processing in late second language learners: the effects of proficiency and task. Biling Lang Cogn 20:162–183 15. Dobel C, Lagemann L, Zwitserlood P (2009) Non-native phonemes in adult word learning: evidence from the N400m. Philos Trans R Soc Lond B Biol Sci 364:3697–3709 16. Piske T, MacKay IRA, Flege JE (2001) Factors affecting degree of foreign accent in an L2: a review. J Phon 29:191–215 17. Hisagi M, Garrido-Nag K, Datta H et al (2015) ERP indices of vowel processing in Spanish–English bilinguals. Biling Lang Cogn 18:271–289 18. Dı´az B, Baus C, Escera C et al (2008) Brain potentials to native phoneme discrimination reveal the origin of individual differences in learning the sounds of a second language. Proc Natl Acad Sci U S A 105:16083–16088 19. Dı´az B, Mitterer H, Broersma M et al (2015) Variability in L2 phonemic learning originates from speech-specific capabilities: an MMN study on late bilinguals. Biling Lang Cogn 19(5):955–970 20. Bohn O, Flege J (1992) The production of new and similar vowels by adult German learners of English. Stud Second Lang Acquis 14: 131–158 21. Flege J (1991) Perception and production: the relevance of phonetic input to L2 phonological learning. In: Huebner T, Ferguson C (eds) Crosscurrents in second language acquisition and linguistic theory. John Benjamins, Amsterdam, pp 249–289 22. Flege J, MacKay I (2004) Perceiving vowels in a second language. Stud Second Lang Acquis 24:1–34 23. Moyer A (1999) Ultimate attainment in L2 phonology. The critical factors of age,

791

motivation and instruction. Stud Second Lang Acquis 21:81–108 24. Moyer A (2004) Age, accent and experience in second language acquisition. An integrated approach to critical period inquiry. Multilingual Matters, Clevedon 25. Moyer A (2007) Empirical considerations on the age factor in L2 phonology. Issues Appl Linguist 15:109–127 26. Pallier C, Bosch L, Sebastián N (1997) A limit on behavioral plasticity in vowel acquisition. Cognition 64:B9–B17 27. Ioup G, Boustagi E, El Tigi M et al (1994) Re-examining the critical period hypothesis: a case study of successful adult SLA in a naturalistic environment. Stud Second Lang Acquis 16:73–98 28. Bomba MD, Choly D, Pang EW (2011) Phoneme discrimination and mismatch negativity in English and Japanese speakers. Neuroreport 22(10):479–483 29. Shen G, Froud K (2019) Electrophysiological correlates of categorical perception of lexical tones by English learners of Mandarin Chinese: an ERP study. Biling Lang Cogn 22: 253–265 30. Tremblay K, Kraus N, Carrell TD et al (1997) Central auditory system plasticity: generalization to novel stimuli following listening training. J Acoust Soc Am 102(6):3762–3773 31. Ylinen S, Uther M, Latvala A et al (2009) Training the brain to weight speech cues differently: a study of Finnish second-language users of English. J Cogn Neurosci 22:1319– 1332 32. Zhang Y, Kuhl PK, Imada T et al (2009) Neural signatures of phonetic learning in adulthood: a magnetoencephalography study. NeuroImage 46:226–240 33. White EJ, Genesee F, Steinhauer K (2012) Brain responses before and after intensive second language learning: proficiency based changes and first language background effects in adult learners. PLoS One 7(12):e52318 34. Hisagi M, Shafer VL, Miyagawa S et al (2016) Second-language learning effects on automaticity of speech processing of Japanese phonetic contrasts: an MEG study. Brain Res 1652:111–118 35. Jost LB, Eberhard-Moscicka AK, Pleisch G et al (2015) Native and non-native speech sound processing and the neural mismatch responses: a longitudinal study on classroombased foreign language learning. Neuropsychologia 72:94–104

792

Sendy Caffarra and Manuel Carreiras

36. Cheour M, Shestakova A, Alku P et al (2002) Mismatch negativity (MMN) shows that 3–6years-old children can learn to discriminate nonnative speech sounds within two months. Neurosci Lett 325:187–190 37. Peltola MS, Kuntola M, Tamminen H et al (2005) Early exposure to a nonnative language alters preattentive vowel discrimination. Neurosci Lett 388:121–125 38. Shestakova A, Huotilainen M, Ceponien R et al (2003) Event-related potentials associated with second language learning in children. Clin Neurophysiol 114(8):1507–1512 39. Peltola MS, Tuomainen O, Koskinen M et al (2007) The effect of language immersion education on the preattentive perception of native and nonnative vowel contrasts. J Psycholinguist Res 36:15–23 40. Rinker T, Alku P, Brosch S et al (2010) Discrimination of native and nonnative vowel contrasts in bilingual Turkish–German and monolingual German children: insight from the Mismatch Negativity ERP component. Brain Lang 113:90–95 41. Degani T, Prior A, Tokowicz N (2011) Bidirectional transfer: the effect of sharing a translation. J Cogn Psychol 23:18–28 42. Duyck W, Brysbaert M (2008) Semantic access in number word translation. The role of crosslingual lexical similarity. Exp Psychol 55:102–112 43. Kroll JF, Stewart E (1994) Category interference in translation and picture naming: evidence for asymmetric connections between bilingual memory representations. J Mem Lang 33:149–174 ˜ abeitia JA, Carreiras 44. Dimitropoulou M, Dun M (2011) Masked translation priming effects with low proficient bilinguals. Mem Cogn 39: 260–275 45. Ferre´ P, Sánchez-Casas R, Guasch M (2006) Can a horse be a donkey? Semantic and form interference effects in translation recognition in early and late proficient and non-proficient Spanish-Catalan bilinguals. Lang Learn 56: 571–608 46. Guasch M, Sánchez-Casas R, Ferre´ P et al (2008) Translation performance of beginning, intermediate and proficient SpanishCatalan bilinguals. Effects of form and semantic relations. Ment Lex 3:208–308 47. Moldovan CD, Sanchez-Casas R, Demestre J et al (2012) Interference effects as a function of semantic similarity in the translation recognition task in bilinguals of Catalan and Spanish. Psicolo´gica 33:77–110

48. Perea M, Dunabeitia JA, Carreiras M (2008) Masked associative/semantic priming effects across languages with highly proficient bilinguals. J Mem Lang 58:916–930 49. Degani T, Tokowicz N (2010) Ambiguous words are harder to learn. Biling Lang Cogn 13:299–314 50. Bracken J, Degani T, Eddington C et al (2017) Translation semantic variability: how semantic relatedness affects learning of translation-ambiguous words. Biling Lang Cogn 20(4):783–794 51. Weber-Fox CM, Neville HJ (1996) Maturational constraints on functional specializations for language processing: ERP and behavioral evidence in bilingual speakers. J Cogn Neurosci 8(3):231–256 52. Weber-Fox CM, Neville HJ (2001) Sensitive periods differentiate processing of open- and closed-class words. J Speech Lang Hear Res 44(6):1338–1353 53. Xue J, Liu T, Marmolejo-Ramos F et al (2017) Age of acquisition effects on word processing for Chinese native learners’ English: ERP evidence for the arbitrary mapping hypothesis. Front Psychol 18:818 54. Bowden HW, Steinhauer K, Sanz C et al (2013) Native-like brain processing of syntax can be attained by university foreign language learners. Neuropsychologia 51(13): 2492–2511 55. Ojima S, Nakata H, Kakigi R (2005) An ERP study of second language learning after childhood: effects of proficiency. J Cogn Neurosci 17(8):1212–1228 56. Osterhout L, McLaughlin J, Pitka¨nen I et al (2006) Novice learners, longitudinal designs, and event-related potentials: a means for exploring the neurocognition of secondlanguage processing. Lang Learn 56(1): 199–230 57. Soskey S, Holcomb PJ, Midley KJ (2016) Language effects in second-language learners: a longitudinal electrophysiological study of Spanish classroom learning. Brain Res 1646: 44–52 58. Yang CL, Perfetti CA, Tan LH et al (2018) ERP indicators of L2 proficiency in word-totext integration processes. Neuropsychologia 117:287–301 59. Isel F (2007) Syntactic and referential processes in second-language learners: eventrelated brain potential evidence. Neuroreport 18(18):1885–1889 60. Moreno S, Bialystok E, Wodniecka Z et al (2010) Conflict resolution in sentence

Experience-Dependent Effects on L2 Processing processing by bilinguals. J Neurolinguistics 23(6):564–579 ˇ ok B, Zani A (2002) 61. Proverbio AM, C Electrophysiological measures of language processing in bilinguals. J Cogn Neurosci 14(7):994–1017 62. Weber K, Lavric A (2008) Syntactic anomaly elicits a lexico-semantic (N400) ERP effect in the second language but not the first. Psychophysiology 45(2008):920–925 63. Zawiszewski A, Gutie´rrez E, Fernández B et al (2011) Language distance and non-native syntactic processing: evidence from eventrelated potentials. Biling Lang Cogn 14(3): 400–411 64. Hahne A (2001) What’s different in secondlanguage processing? Evidence from eventrelated brain potentials. J Psychol Res 30(3): 251–266 65. Tokowicz N, MacWhinney B (2005) Implicit and explicit measures of sensitivity to violations in second language grammar. Stud Second Lang Acquis 27:173–204 66. Chang X, Wang P (2016) Influence of second language proficiency and syntactic structure similarities on the sensitivity and processing of English passive sentence in late ChineseEnglish bilinguists: an ERP study. J Psycholinguist Res 45:85–101 67. Foucart A, Frenck-Mestre C (2011) Grammatical gender processing in L2: electrophysiological evidence of the effect of L1–L2 syntactic similarity. Biling Lang Cogn 14: 379–399 68. Sabourin L, Stowe LA (2008) Second language processing: when are first and second languages processed similarly? Second Lang Res 24(3):397–430 69. Chen L, Shu H, Liu Y et al (2007) ERP signatures of subject–verb agreement in L2 learning. Biling Lang Cogn 10:161–174 70. Dı´az B, Erdocia K, de Menezes RF et al (2016) Electrophysiological correlates of second-language syntactic processes are related to native and second language distance regardless of age of acquisition. Front Psychol 7:133 71. Dowens MG, Vergara M, Barber HA et al (2010) Morphosyntactic processing in late second-language learners. J Cogn Neurosci 22(8):1870–1887 72. Foucart A, Frenck-Mestre C (2012) Can late L2 learners acquire new grammatical features? Evidence from ERPs and eye-tracking. J Mem Lang 66:226–248 73. Nichols ES, Joanisse MF (2017) Individual differences predict ERP signatures of second

793

language learning of novel grammatical rules. Biling Lang Cogn 22:78–92 74. Pakulak E, Neville HJ (2011) Maturational constraints on the recruitment of early processes for syntactic processing. J Cogn Neurosci 23(10):2752–2765 75. Meulman N, Wieling M, Sprenger SA et al (2015) Age effects in L2 grammar processing as revealed by ERPs and how (not) to study them. PLoS One 10(12):e0143328 76. Rossi S, Gugler MF, Friederici AD et al (2006) The impact of proficiency on syntactic second-language processing of German and Italian: evidence from event-related potentials. J Cogn Neurosci 18(12):2030–2048 77. German ES, Herschensohn J, Frenck-Mestre C (2015) Pronoun processing in anglophone late L2 learners of French: behavioral and ERP evidence. J Neurolinguistics 34:15–40 78. Steinhauer K, White EJ, Drury JE (2009) Temporal dynamics of late second language acquisition: evidence from event-related brain potentials. Second Lang Res 25(1): 13–41 ˜ o´n J, Fiorentino R, Gabriele A 79. Alemán Ban (2018) Using event-related potentials to track morphosyntactic development in second language learners: the processing of number and gender agreement in Spanish. PLoS One 13:e0200791 80. Deng T, Chen B (2019) Input training matters in L2 syntactic representation entrenchment: evidence from a follow-up ERP study. J Psychol Res 48:729–745 81. Deng T, Zhou H, Bi HY et al (2015) Inputbased structure-specific proficiency predicts the neural mechanism of adult L2 syntactic processing. Brain Res 610:42–50 82. Davidson DJ, Indefrey P (2009) An eventrelated potential study on changes of violation and error responses during morphosyntactic learning. J Cogn Neurosci 21:433–446 83. Mickan A, Lemhofer K (2020) Tracking syntactic conflict between languages over the course of L2 acquisition: a cross-sectional event-related potential study. J Cogn Neurosci 14:1–25 84. Brito AC (2017) Effects of language immersion versus classroom exposure on advanced French learners: an ERP study. Purs – J Undergrad Res Univ Tennessee 8:33–45 85. Lenneberg EH (1967) Biological foundations of language. John Wiley, New York 86. Cook VJ (1992) Evidence for multicompetence. Lang Learn 44(4):557–591

794

Sendy Caffarra and Manuel Carreiras

87. Grosjean F (1989) Neurolinguists, beware! The bilingual is not two monolinguals in one person. Brain Lang 36:3–15 88. Meisel J (2004) The bilingual child. In: Bhatia T, Ritchie W (eds) The handbook of bilingualism. Blackwell Publishing Ltd, Oxford, pp 91–113 89. Johnson JS, Newport EL (1989) Critical period effects in second language learning: the influence of maturational state on the acquisition of English as a second language. Cogn Psychol 21:60–99 90. Newport EL, Bavelier D, Neville HJ (2001) Critical thinking about critical periods: perspectives on a critical period for language acquisition. In: Dupoux E, Mehler J (eds) Language, brain, and cognitive development: essays in honor of Jacques Mehler. The MIT Press, Cambridge, pp 481–502 91. Knudsen EI (2004) Sensitive periods in the development of the brain and behavior. J Cogn Neurosci 16(8):1412–1425 92. Cummins J (1979) Cognitive/academic language proficiency, linguistic interdependence, the optimum age question and some other matters. Work Pap Biling 19:121–129 93. Ullman MT (2001) The neural basis of lexicon and grammar in first and second language: the declarative/procedural model. Biling Lang Cogn 4:105–122 ˜ oz C (2006) Age and the rate of foreign 94. Mun language learners. Multilingual Matters, Clevedon 95. Mackey A, Goo JM (2007) Interaction research in SLA: a meta-analysis and research synthesis. In: Mackey A (ed) Input, interaction and corrective feedback in L2 learning. Oxford University Press, New York, pp 379–452 96. Na¨a¨ta¨nen R, Paavilainen P, Rinne T et al (2007) The mismatch negativity (MMN) in basic research of central auditory processing: a

review. Clin Neurophysiol 118(12): 2544–2490 97. Friederici AD (2002) Towards a neural basis of auditory sentence processing. Trends Cogn Sci 6:78–84 98. Barber H, Carreiras M (2005) Grammatical gender and number agreement in Spanish: an ERP comparison. J Cogn Neurosci 17(1): 137–153 99. Bornkessel I, Schlesewsky M (2006) The extended argument dependency model: a neurocognitive approach to sentence comprehension across languages. Psychol Rev 113: 787–821 100. Gunter TC, Friederici AD, Schriefers H (2000) Syntactic gender and semantic expectancy: ERPs reveal early autonomy and late interaction. J Cogn Neurosci 12:556–568 101. Kutas M, Van Petten C, Kluender R (2006) Psycholinguistics electrified II: 1994–2005. In: Traxler T, Gernsbacher MA (eds) Handbook of psycholinguistics, 2nd edn. Elsevier, New York, pp 659–724 102. Kutas M, Federmeier KD (2000) Electrophysiology reveals semantic memory use in language comprehension. Trends Cogn Sci 4:463–470 103. DeLong KA, Urbach TP, Kutas M (2005) Probabilistic word pre-activation during language comprehension inferred from electrical brain activity. Nat Neurosci 8(8):1117–1121 104. Carreiras M, Salillas E, Barber H (2004) Event-related potentials elicited during parsing of ambiguous relative clauses in Spanish. Cogn Brain Res 20(1):98–105 105. Van de Meeredonk N, Kolk HHJ, Chwilla DJ et al (2009) Monitoring in language perception. Lang Ling Compass 3(5):1211–1224 106. Brouwer H, Fitz H, Hoeks J (2012) Getting real about semantic illusions: rethinking the functional role of the P600 in language comprehension. Brain Res 1446:127–143

INDEX A Abstractness ................................ 649–651, 658, 663–665 Action potential............................................ 5, 29, 30, 32, 33, 164, 166, 171, 242, 296, 312, 325, 432, 433 Advanced methods........................................................ 148 Affective prosody................................653, 654, 673–677, 681, 703, 715, 720 Age of acquisition (AoA) .................................... 777, 778, 781, 783–789 Analog......................................... 167, 391, 405, 431–436 Analysis by Synthesis model ............................... 395, 404, 422, 425–427, 430, 431, 436 Analysis of variance (ANOVA) ........................... 125, 127, 129, 133–141, 147, 149–152, 155, 682 Aphasia........................................... 17, 22, 340, 357–360, 573, 626, 754, 756, 760, 763, 764, 766, 767 Articulatory gestures............................................... 7, 330, 392, 395, 396, 424, 426, 430, 436 Attention ................................................8, 10, 22, 25, 38, 39, 51, 53, 56, 58, 61, 88, 141, 169, 178, 182–184, 198, 206, 222, 245, 246, 250, 254, 255, 264, 266, 270, 287, 323, 324, 369, 373, 378, 380, 382, 392, 397, 407, 420, 475, 479, 508, 511, 514, 515, 519, 520, 528, 540, 542, 544, 545, 565, 568, 583, 584, 602, 603, 620, 630, 636, 639, 642, 650, 653, 655, 660, 662, 665, 666, 672, 674, 676, 678–680, 682, 705, 722, 738, 745, 762, 764, 790 Auditory areas ........................................ 7, 12, 15–17, 21, 206, 254, 395, 402, 406, 407, 422–427, 429–431, 435–437 Auditory brainstem response (ABR).200, 202–204, 221, 222, 228, 661 Auditory evoked potentials (AEPs) .................... 399, 661 Axon....................................... 3, 5, 30, 32, 164, 432, 433

B Basal ganglia .................................................. vii, 9, 10, 14, 19–21, 32, 369, 427–429, 755, 757, 759, 760, 763–767 Beamformers ....................................................... 100–107, 109, 110, 112, 114–117, 219–220 Bilingualism ................................ 487, 488, 735, 739–740 Brain

areas ..................................................vii, ix, 24, 33, 35, 171, 213, 249, 257, 258, 266, 311, 319, 322, 328, 329, 340, 342, 627, 628, 665, 666 rhythms .........................................178, 181, 239–240, 245, 253, 257, 265, 660 structures .......................................................ix, 34–35, 168, 171, 265, 289, 322, 522 Brain-ocular activity ............................................. 740–746 Broca’s area........................................................12, 15, 17, 20–22, 323, 328, 345, 348, 349, 351, 352, 358, 427, 431, 659, 756, 760, 765 Brodmann, K.............................................. 10, 12, 13, 15, 16, 23, 404, 421, 427–429, 435

C Central nervous system................................................. 6, 8 Cerebral cortex...................................10–12, 24, 29, 181, 197, 311, 313, 369, 426 Child language ..................................................... 488, 492 Chunking........................................... 251, 258, 260, 261, 263, 269–271, 460 Cognitive neuroscience............................................. ix, 45, 198–213, 293, 302 Cognitive research......................................................... 747 Complex systems............................................................. 40 Compounding ..................................................... 448, 453, 461, 462, 471, 480, 483, 491 Consciousness ................................................... vii, 19, 20, 24, 33, 139, 185, 267, 390, 689, 764 Context ..................................................... 24, 33, 45, 139, 197, 254, 328, 340, 409, 450, 507, 528, 552, 584, 615, 655, 675, 701, 739, 762, 780 Corpus callosum........................................................13, 32 Cortical tracking................................................... 259–263 Corticobulbar excitability ...................323–328, 330, 331 Coupling....................................171, 173, 178, 181, 184, 204, 241, 242, 253, 257, 258, 260, 262–264, 266, 349, 417, 421, 555, 573, 692

D Data acquisition ..........54, 169, 228, 288–292, 296, 734 Data analysis .................................................xi, 44, 52, 53, 61, 65–121, 125, 167, 224, 228, 287, 300–303, 485 Dendrite........................... x, 3, 5, 10, 174, 196, 243, 431

Mirko Grimaldi et al. (eds.), Language Electrified: Principles, Methods, and Future Perspectives of Investigation, Neuromethods, vol. 202, https://doi.org/10.1007/978-1-0716-3263-5, © Springer Science+Business Media, LLC, part of Springer Nature 2023

795

LANGUAGE ELECTRIFIED: PRINCIPLES, METHODS, AND FUTURE PERSPECTIVES OF INVESTIGATION

796 Index

Depolarization......................................... 5, 185, 312, 315 Derivation....................................................... xi, 448, 449, 452–455, 458–461, 464–466, 469–471, 474, 483–484, 488, 490, 491, 519, 585, 590 Dialectal variation ......................................................... 408 Diencephalon .................................................7–8, 10, 369 Digital ............................54, 62, 167, 318, 431–436, 703 Distinctive features.............................................. 392–395, 402, 404–407, 412, 415, 426–431, 434–437 Dual Route model .......................................................... 22

E Electrocorticography (ECoG)................... 259, 287–292, 294, 296, 297, 301–303, 406, 418, 419, 552, 560, 563, 570, 571 Electroencephalography (EEG) ............................. 30, 44, 65, 124, 163, 196, 243, 286, 311, 342, 369, 397, 449, 506, 543, 552, 583, 614, 660, 671, 688, 734, 754, 781 Electromyography (EMG) ................................... 55, 185, 224, 313, 370, 688 Electrophysiology....................................... xi, xii, 63, 143, 156, 163, 186, 292, 293, 296, 298, 300, 505–522, 527, 544, 613–643 Emotion.................................vii, 8, 10, 13, 20, 289, 373, 378, 379, 505, 584, 594, 614, 619, 630, 635, 642, 643, 674, 676, 680, 688–692, 708, 710, 712, 713, 719–724, 745 Emotional prosody..................................... 673, 675–677, 679, 680, 682, 701, 703, 765 Encephalon........................................................................ 6 Entrainment ............................................... 241, 244–247, 251–257, 259, 261–266, 269, 270, 418, 420, 421, 432, 555, 556, 560, 566, 567, 570 Epistemological bridges................................... x, 391–392 Evaluation.......................................... 213, 218, 220, 225, 227–229, 330, 410, 636, 674–676, 678, 703, 713, 714, 716–720, 722, 723, 736 Event-related fields (ERFs) ....................................xi, 103, 176–177, 195–229, 256, 263, 397, 506 Event related potentials (ERPs) ............................. 35, 44, 74, 125, 170, 196, 256, 342, 397, 447, 506–508, 527, 552, 585, 614, 655, 671, 741, 758, 778 Evoked responses (ERs) ......................................... 35, 90, 91, 165, 176, 177, 195–200, 202, 203, 205–207, 210, 212, 213, 216, 219–229, 247, 257, 286, 287, 297, 299, 301, 315, 420, 553, 652, 672, 754, 781 Experiment ........................................................24, 34, 43, 123, 168, 211, 246, 286, 312, 340, 398, 456, 509, 529, 562, 593, 624, 648, 678, 692, 732, 759, 787 Experimental design.................................xi, 44, 125–128, 133, 143, 183, 468, 556, 557, 782, 783

Experimental methodology ................648, 649, 664, 665 Experimental pragmatics .............................................. 599 Eye-tracking ................................................... x, xi, 44, 54, 63, 137, 543, 680, 688, 689, 731–747

F Facial electromyography ...........xi, xii, 55, 224, 367–382, 687–724 Facial muscles ..................................................7, 322, 324, 367–382, 692–694, 700, 710, 712, 717, 721–723 Featurally Underspecified Lexicon (FUL).......... 412–414 FFT ....................................................................... 243, 244 Figurative language ......................... xi, xii, 586, 589–591, 598, 625, 628, 629, 636, 642 First and second language processing................. 480–488 Fissures.............................................. 10, 13, 17, 206, 765 Frequency band.......................................... 106, 170, 182, 184, 247–251, 257, 260, 263, 266–269, 294, 298, 375, 417, 521, 554, 562, 565, 566, 570, 586, 637, 676, 742 Frequency tagging ....................................... 39, 270, 271, 555, 556, 560, 566, 567 Frontal lobe ............................................ 7, 13–15, 17, 19, 21, 22, 296, 339, 345, 353, 354, 428, 463, 652, 756 Future perspectives .................................... 183–187, 528, 544–545, 681–682, 789–790

G Guidelines.......................................... 44, 47, 49, 63, 146, 221, 222, 291, 344, 372, 759 Gyri .......................................................10, 13, 14, 21, 36, 166, 175, 262, 346, 659

H High tone (H)............................................................... 659 Hindbrain .......................................................................... 7 Humor ........................................................ 379, 587, 588, 591–592, 600–603, 613, 614, 636, 642

I Idioms ............................................................ xii, 409, 587, 592, 598, 599, 601, 613, 614, 622–623, 628, 641–643 Immersion ......................... 777, 782, 783, 785, 787–789 Inflection ........................................................ xi, 448–451, 455–461, 466–492, 520 Information processing......................................... 22, 177, 178, 184, 197, 203, 205, 223, 228, 244, 250, 378–382, 636, 666, 762, 763 Intonation..................................................... 9, 13, 21, 45, 51, 356, 358, 408, 649–662, 664, 665, 669, 670, 672, 674, 676, 677, 708

LANGUAGE ELECTRIFIED: PRINCIPLES, METHODS, Intracranial neurophysiology............................... 286, 288 Invasive electroencephalography (iEEG)....................165, 286–288, 291–293, 295, 296, 298–304 Irony .............................................................. xii, 342, 587, 588, 591–593, 599, 613, 614, 628–635, 641–643, 678, 708

J Joke .............................................. xii, 379, 591, 602, 613, 614, 635–643, 688, 707, 708

L Laboratory .............................................. x, 24, 44, 48, 49, 53–61, 123, 343–359, 596, 682, 712, 721, 722, 764 Language perception........................................... 18, 21, 24, 197, 200, 205–210, 223, 266–267, 763, 765, 779 processing ...........................................x, xi, 13, 23, 43, 45, 137, 183, 199, 210, 211, 213, 219, 227, 228, 267, 328, 359, 367–382, 450, 473, 488, 506–508, 536, 573, 584, 585, 596, 602, 603, 622, 623, 626, 629, 652, 658, 659, 681, 687, 689, 692, 705, 710, 718, 741, 742, 744, 758, 762–765 Late anterior negativity (LAN) .................................... 475 Left-hemisphere damage (LHD) ................................. 654 Lexical access ........................................22, 207, 212, 447, 450, 464, 465, 467, 470, 508, 529, 530 Linguistic primitives.....................................391–395, 435 Linguistic processes................................ x, 211, 212, 253, 320, 379, 381–382, 740–742, 746, 763, 790 Linguistic tone ............................................ 649, 664, 665 L1-L2 similarity...................................778–786, 788, 789 Lobes ..................................................7, 8, 10, 13–19, 21, 22, 36, 179, 209, 211, 288, 290, 291, 296, 328, 345, 346, 353, 354, 356, 428, 452, 460, 463, 515, 634, 638, 652, 659, 756, 759, 765 Low tone (L) .......................................650, 658, 659, 666

M Magnetoencephalography (MEG) ......................... 25, 30, 46, 65, 125, 163, 196, 243, 285, 330, 397, 449, 506, 530, 552, 623, 655, 688, 732, 754 Meaning........................................... 17, 46, 73, 142, 197, 409, 461, 505, 527, 562, 584, 614, 670, 700, 734, 765 Measuring EMG ......................................... 374, 381, 701 Memory, Unification and Control model ...............23–24 Mental effort ........................................................ 380, 381 Metaphor .................................................... 585, 588–592, 598–602, 613–622, 641–643 Mimicry ............................. 370, 701, 714, 718–721, 723

AND

FUTURE PERSPECTIVES

OF INVESTIGATION

Index 797

Mirror neuron theory ................................................... 424 Mismatch field (MMF) .............................. 201, 208–210, 655, 661 Mismatch negativity (MMN) ......................................170, 177, 180, 200, 201, 205–212, 222, 407–416, 426, 450, 452, 459, 469, 471, 472, 510, 511, 520, 655, 656, 660, 661, 673, 674, 676, 679, 681, 762, 780–783 Mixed-effects models .......................................... 125, 127, 128, 130, 137, 139–153 Morphological units............................................. 517–521 Morphology...........................................................viii, 221, 322, 325–327, 341, 378, 447–492, 506, 519, 521, 522, 558, 762 Motor cortex ......................................................... 7, 9, 10, 13–16, 19, 21, 38, 255, 259, 266, 312, 313, 315, 320, 322, 323, 329, 330, 342, 354, 359, 369, 421–429, 620, 621, 717, 765, 766 Motor Theory of Speech Perception ................. 395, 421, 422, 424, 425, 436

N N1 .................................................... 75, 83, 84, 152, 170, 177, 185, 200, 201, 204, 206–208, 210, 397–399, 402–408, 415, 429, 620, 741, 745, 762 N400............................................55, 131, 198, 450, 506, 527, 556, 585, 614, 675, 703, 741, 758, 780 Neural architecture of language ...............................21–24 Neural components............................................. 213, 215, 217–221, 226–228, 510 Neural oscillations ................................. xi, 178, 239–271, 287, 417–423, 515, 552, 568, 571–572 Neural synchrony .......................197, 245, 251, 254, 269 Neurocognitive models.............................................36–39 Neurocomputational............................................ 389–437 Neurolinguistics ..................................... xii, 21, 286, 320, 462, 596, 648, 649, 664, 665 Neuromodulation ...................................... 253, 313–314, 340, 342, 344, 357, 359, 522 Neurophysiological primitives ............................. 396–397 Neurophysiological states ............................315, 433–437 N1m...................................................................... 397–403 Nodes of Ranvier .............................................................. 5 Non-brain-damaged (NBD) ............................... 653, 654 Non-invasive brain stimulation .................. 182, 340, 766

O Occipital lobe ............................................ 13, 18, 36, 765 Optically-pumped magnetometer (OPM)..................174, 185, 186 Oscillations ................................................. 101, 178, 240, 287, 417, 554, 635, 674 Oscillatory rhythms.............................253, 418, 432–435

LANGUAGE ELECTRIFIED: PRINCIPLES, METHODS, AND FUTURE PERSPECTIVES OF INVESTIGATION

798 Index P

P600......................................................55, 132, 200, 451, 518, 528, 556, 585, 614, 672, 741, 758, 780 Parietal lobe .........................................13, 15, 18, 21, 428 Parkinson’s disease ......................... 9, 291, 428, 759–761 Perceptual sampling Periaqueductal grey matter............................................. 20 Phase alignment ......................................... 244, 250–252, 256, 258, 260, 261, 267 Phase-amplitude coupling (PAC)....................... 257–259, 264, 421, 555, 570, 573 Phase coherence ................................................... 255, 256 Phonemotopic map.............................................. 397, 405 Phonology ............................................. viii, 23, 212, 285, 391, 408, 415, 434, 436, 451, 484, 513, 539, 569, 648, 650, 655–656, 763, 777–780, 788–790 Postcentral gyrus ............................................................. 13 Pragmatics .................................................. xi, xii, 25, 492, 536, 538, 583–603, 619, 628, 629, 631, 634, 641, 642, 713, 714, 758 Precentral gyrus............................................................... 13 Proficiency .............................................. 9, 416, 484–486, 488, 506, 520, 631, 736, 740, 778, 782–785, 787–789 Psycholinguistics ................................................... 51, 124, 126, 139, 146, 425, 449, 450, 455, 489, 529, 583, 688, 692, 693, 709, 714, 721, 731–747 Psychophysiology ............................................... 44, 53–61

R Repolarization ................................................................... 5 Review........................................... xi, 153, 178, 180, 183, 184, 198, 209, 213, 228, 257, 262, 266, 267, 287, 314, 329, 357, 358, 402, 407, 416, 418, 449–451, 469, 471, 472, 477, 483, 489, 506, 513, 521, 528, 531, 543, 544, 552–557, 560, 564, 566, 568, 573, 586, 593, 594, 598, 599, 601, 602, 649, 650, 653, 661, 664, 688–690, 692–694, 701, 709–713, 718, 719, 733–736, 738, 741, 742, 745, 754, 787 Right-hemisphere damage (RHD) ..................... 653, 654

S Schwann cells................................................................. 4, 5 Second language acquisition ........................ xii, 415–416, 777–790 Second language learning .................................... 488, 517 Semantic processing ................................ xi, 23, 210, 212, 218, 223, 329, 381, 450, 527–545, 561, 568, 569, 572, 583, 587, 623, 624, 627, 640, 760, 762, 765, 767, 784, 789

Semantics ......................................... 9, 67, 151, 198, 268, 301, 328, 345, 381, 410, 449, 507, 529, 551, 583, 611, 666, 672, 701, 735, 756, 777 Sentence processing ........................................... xi, 23, 24, 348, 471–473, 479, 485, 489, 542, 551–573, 790 Simulation .................................................... 39, 256, 621, 714, 717–720, 722, 723 Social intonation ........................................................... 670 Somatosensory cortex.........................10, 13, 15, 16, 427 Source localization ............................................35, 66, 84, 178–181, 210, 216, 506 Source-modeling .......................................... 54, 174, 179, 196, 197, 206, 209, 211, 213–221, 223, 224, 226, 227, 229, 452 Specific language impairment (SLI)........... 207, 761, 762 Spectro-temporal states ....................................... 433–436 Speech and language perception....................................18, 21, 206, 267, 765 and language production....................................18, 21 Stereotactic EEG (sEEG) ............................285, 287–289 Stroke..................................................315, 342, 357–359, 642, 754–757, 766, 767 Subvocal speech............................................... xi, 381–382 Sulci..................................................................10, 14, 166, 175, 322 Superior temporal gyrus (STG) .....................16, 23, 211, 291, 294, 297, 301, 302, 345, 356, 396, 402–404, 406, 419, 422, 429, 431, 452, 463, 519, 658, 755 Sustained negativity ............................................. 475, 602 Synaptic boutons............................................................... 3 Syntactic processing ................................................ 17, 23, 448, 543, 551–567, 570–573, 653, 673, 745, 758, 786, 789 Syntax...................................................viii, x, xi, 9, 23, 50, 136, 268, 269, 285, 436, 437, 464, 491, 492, 539, 541, 542, 545, 566–570, 572, 573, 601, 603, 648, 666, 672, 673, 763, 777, 779, 786–790

T Talairach coordinates .................................. 399, 402, 404 Temporal lobe ................................................... 13, 15–18, 21, 22, 179, 209, 211, 288, 290, 291, 328, 346, 356, 452, 460, 515, 634, 638, 759, 765 Theoretical neuroscience ................................... 36, 38, 39 Time-course of word learning ........................ xi, 506–508 Time-frequency analysis (TFA) .......................... 170, 171, 293, 298, 554, 564, 614, 633, 641 Tone of voice ............................................... 677, 679, 716 Tonochrony principle .........................396, 652, 660, 662 Tonotopic ..................................................... 16, 396, 402, 404, 434, 435, 649, 651, 660, 665

LANGUAGE ELECTRIFIED: PRINCIPLES, METHODS, Transcranial direct current stimulation (tDCS) ....................................... xi, 182, 339–360, 426, 614, 627, 633, 635, 641, 754, 756, 757, 766, 767 Transcranial magnetic stimulation (TMS) .............xi, 182, 249, 311–331, 340, 342, 392, 423–426, 429, 614, 620, 754, 765–768

AND

FUTURE PERSPECTIVES

OF INVESTIGATION

Index 799

W Wernicke’s area................................................12, 17, 320, 328, 346, 350–353, 355, 356, 358, 359, 652, 760 Word learning................................................. xi, 505–522, 739, 744