The Study of Speech Processes: Addressing the Writing Bias in Language Science 1107185033, 9781107185036

There has been a longstanding bias in the study of spoken language towards using writing to analyse speech. This approac

238 83 6MB

English Pages 280 [330] Year 2021

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Cover
Half-title page
Title page
Copyright page
Contents
List of
Figures
List of
Tables
List of
Abbreviations
Preface
Introducing a Fundamental Problem of Language Science
Part I Questions of Ontology: Writing and the Speech–Language Divide
1 How We Are Introduced to the Study of Spoken Language
1.1 Language as an “Autonomous” System, or the Effects of Scriptism
1.2 Defining “Speech”
1.3 Was the Speech–Language Division Ever Physiologically Grounded?
1.3.1 Saussure’s Argument of a Separate Language Faculty in Broca’s Area
1.3.2 Arguments of the Arbitrariness of Signs and Abstract Phonology
1.3.3 On the Primacy of Linguistic Criteria: The Historical Disconnect from Instrumental Observations
1.3.4 Explaining Systems of Distinctive Features: Lindblom’s Demonstration (1986)
2 The Modality-Independence Argument and Storylines of the Origin of Symbolic Language
2.1 Cognitive Skills as Insufficient Factors in the Rise of Symbolic Communication
2.2 The Case against Modality-Independent Accounts of Symbolic Language
2.3 Modality-Dependent Accounts of the Rise of Symbolic Language
2.3.1 Mimesis, Procedural Learning, and the Case of Sign Languages
2.3.2 “Sound Symbolism”: Questions of the Efficiency of Iconic Signs
2.3.3 Articulated Vocalization and the Rise of Symbolic Signs: A Laboratory Demonstration
2.4 The Phylogeny and Ontogeny of an Amodal Symbol Function as a Pseudo-Puzzle
3 The Recent History of Attempts to Ground Orthographic Concepts of Language Theory
3.1 From Orthographic Representations to “Substantive Universals”
3.2 Shoehorning Orthographic Concepts: Issues in Grounding the LAD
3.2.1 Biases and Limitations of Analyzing Language Development through Writing
3.2.2 The Search for Marks of Words and Phrases, versus “Chunks”
3.3 Neuroscience Falls upon Nonexistent Substantive Universals: Why This Invalidation Is Different
3.4 Abandoning the Competence–Performance Divide
Postscript – On the Use of the IPA and Terms of Latin Grammar in the Present Work
Part II Questions of Epistemology: The Role of Instrumental Observations
4 Recognizing the Bias
4.1 On the Tradition of Overlooking Instrumental Observations: The Case of the Phoneme
4.1.1 From Instrumental Records of Co-articulation to Transcribed Spoonerisms
4.1.2 On the Origin of Alphabet Signs: The Hypothesis of a Preliterate Awareness of Phonemes
4.1.3 Testing Phoneme Awareness: Issues in Defining Reference Units
4.2 The Looking-Glass Effect: Viewing Phoneme Awareness by Reference to IPA Transcripts
4.2.1 “Phonological” Evidence of Phonemes Versus Motor Processes
4.2.2 On Arguments of the “Logical Necessity” of Phonemes and the Success of Alphabet Systems
4.2.3 Effects of Writing on Speakers’ Awareness of Words, Phrases, Sentences
5 (Re-)defining the Writing Bias, and the Essential Role of Instrumental Invalidation
5.1 On the Persistence of Scriptism in the Study of Spoken Language
5.2 The Need to Address Complaints of Cultural Centrism and Ethical Concerns
Part III The Structure of Speech Acts
6 Utterances as Communicative Acts
6.1 Describing Speech Acts and Their Meaning
6.2 The Parity Condition, Motor-Sensory Coupling, and the Issue of Utterance Structure
Reinforcement Learning
Supervised Learning
Unsupervised Learning
6.3 The Coding of Speech Acoustics in the Auditory Brain Stem and Effects of Motor-Sensory Coupling
6.4 Multimodal Sensory Integration: Introducing Neural Entrainment to Speech Structure
6.4.1 The Specificity of Neural Entrainment in the Speech Modality
6.4.2 Neural Entrainment to Structures of Motor Speech: Linking to Spiking Activity
6.4.3 On the Role of Subcortical Processes: Multisensory-to-Motor Integration and Chunking
6.5 Relating to Utterance Structure, or What the Brain Does Not Intrinsically Construct
7 Relating to Basic Units: Syllable-Like Cycles
7.1 Speech Production: On the Brain–Utterance Interface That Never Was
7.2 Basic Sequencing Units in Theories of Speech-Motor Control: Some Examples
7.2.1 The Equilibrium-Point (EP) Hypothesis
7.2.2 The Task Dynamics (TD) Model
7.2.3 Directions in Auditory Space Into Velocities of Articulators: The DIVA model
7.3 Critical Evidence of Basic Sequencing Units and What Shapes Them
7.3.1 Intrinsic Muscle-Tissue Elasticity and Its Effect on Speech Motions
7.3.2 Other Intrinsic Effects of Muscle Tissues on Motion Sequencing within Syllable Cycles
7.3.3 Just How Many Units Are There in CV and VC, and Are These Represented in Memory?
7.3.4 Syllable Cycles within Chunks and Graded Motion Control without Phonemes
8 Relating Neural Oscillations to Syllable Cycles and Chunks
8.1 The Entrainment of Low-Frequency Oscillations and Speech Processing
8.1.1 On the Role of Theta- and Delta-Size Processing Windows
8.1.2 Reviewing Claims of a Non-sensory Entrainment of Delta to Content Units
8.2 Delta-Size Windows and the Sensory Chunking of Speech
8.2.1 Chunks and Their Signature Marks
8.2.2 Neural Entrainment in Speech Processing
9 Breath Units of Speech and Their Structural Effects
9.1 Utterances as Breath Units versus Sentences in Speaker–Listener Interaction
9.2 On Interpreting Measures of “Mean Length of Utterance” (MLU)
9.2.1 Utterance Complexity, Lexical Diversity, and MLU: Linking to Developing Motor Structures
9.2.2 Chunks in Breath Units of Speech and the Development of Vocabulary
9.2.3 On Explaining Developmental Milestones
9.3 The Structure of Spoken Language: An Interim Summary with a View on Addressing the Issue of Scriptism
Part IV The Processing of Speech Meaning
10 The Neural Coding of Semantics
10.1 Units of Writing, Structures of Utterances, and the Semantics of Speech
10.2 The Lexico-Semantic Approach: Context Information as “Nonessential”
10.2.1 Lexico-Semantics and Traditional Models of Language Processing
10.2.2 Embodied versus Disembodied Semantics
10.3 How Semantic Representations of Verbal Expressions Develop: On “Modes of Acquisition”
10.4 The Partitioning of Semantic Memory and Its Formatting in Spoken Languages
10.4.1 Words Are Not Biologically Grounded Units: Why Sensory Chunking Is Necessary
10.4.2 On Representations of Verbalized Forms in Memory: Activating Episodes of Speech Acts
10.7 The Nature of Semantic Representations: On the Neural Coding of Context Information in Action Blocks of Speech
11 Processes of Utterance Interpretation: For a Neuropragmatics
11.1 The Issue of the Selective Activation of Semantic Representations in Speech Contexts
11.1.1 Context-Based Semantics: Clinical Observations Using Unconventional Test Batteries
11.2 On Context-Based Speech Comprehension: Selective Activation of Semantic Representations On-Line
11.2.1 Thalamocortical Interactions and the Integrating Role of the Motor Thalamus
11.2.2 The Semantics of Utterances: The Analogy of Action Selection in Spatial Navigation
11.2.3 Subcortical Mechanisms of Buffering and Context-Based Semantic Processing
Epilogue
References
Index
Recommend Papers

The Study of Speech Processes: Addressing the Writing Bias in Language Science
 1107185033, 9781107185036

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

The Study of Speech Processes

There has been a longstanding bias in the study of spoken language toward using writing to analyze speech. This approach is problematic in that it assumes language to be derived from an autonomous mental capacity to assemble words into sentences, while failing to acknowledge culturespecific ideas linked to writing. Words and sentences are writing constructs that hardly capture the sound-making actions involved in spoken language. This book brings to light research that has long revealed structures present in all languages but which do not match the writing-induced concepts of traditional linguistic analysis. It demonstrates that language processes are not physiologically autonomous, and that speech structures are structures of spoken language. It then illustrates how speech acts can be studied using instrumental records, and how multisensory experiences in semantic memory couple to these acts, offering a biologically grounded understanding of how spoken language conveys meaning and why it develops only in humans. victor j. boucher is Senior Researcher and Professor of Speech Sciences at the Université de Montréal. His career work on the physiological processes of speech have led him to view human language as arising from constraints on motor-sensory systems and to a critical reappraisal of methods of language study.

The Study of Speech Processes Addressing the Writing Bias in Language Science Victor J. Boucher Université de Montréal

University Printing House, Cambridge CB2 8BS, United Kingdom One Liberty Plaza, 20th Floor, New York, NY 10006, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia 314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India 79 Anson Road, #06–04/06, Singapore 079906 Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning, and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9781107185036 DOI: 10.1017/9781316882764 © Victor J. Boucher 2021 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2021 A catalogue record for this publication is available from the British Library. ISBN 978-1-107-18503-6 Hardback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

Contents

List of Figures List of Tables List of Abbreviations Preface

page ix xii xiii xv

Introducing a Fundamental Problem of Language Science

1

Part I Questions of Ontology: Writing and the Speech–Language Divide

11

1 How We Are Introduced to the Study of Spoken Language

13

1.1 1.2 1.3

Language as an “Autonomous” System, or the Effects of Scriptism Defining “Speech” Was the Speech–Language Division Ever Physiologically Grounded? 1.3.1 Saussure’s Argument of a Separate Language Faculty in Broca’s Area 1.3.2 Arguments of the Arbitrariness of Signs and Abstract Phonology 1.3.3 On the Primacy of Linguistic Criteria: The Historical Disconnect from Instrumental Observations 1.3.4 Explaining Systems of Distinctive Features: Lindblom’s Demonstration (1986)

2 The Modality-Independence Argument and Storylines of the Origin of Symbolic Language 2.1 Cognitive Skills as Insufficient Factors in the Rise of Symbolic Communication 2.2 The Case against Modality-Independent Accounts of Symbolic Language 2.3 Modality-Dependent Accounts of the Rise of Symbolic Language 2.3.1 Mimesis, Procedural Learning, and the Case of Sign Languages 2.3.2 “Sound Symbolism”: Questions of the Efficiency of Iconic Signs 2.3.3 Articulated Vocalization and the Rise of Symbolic Signs: A Laboratory Demonstration 2.4 The Phylogeny and Ontogeny of an Amodal Symbol Function as a Pseudo-Puzzle

14 18 20 21 23 25 31

34 35 37 41 41 44 45 49

v

Contents

vi

3 The Recent History of Attempts to Ground Orthographic Concepts of Language Theory 3.1 From Orthographic Representations to “Substantive Universals” 3.2 Shoehorning Orthographic Concepts: Issues in Grounding the LAD 3.2.1 Biases and Limitations of Analyzing Language Development through Writing 3.2.2 The Search for Marks of Words and Phrases, versus “Chunks” 3.3 Neuroscience Falls upon Nonexistent Substantive Universals: Why This Invalidation Is Different 3.4 Abandoning the Competence–Performance Divide Postscript – On the Use of the IPA and Terms of Latin Grammar in the Present Work

52 53 58 59 61 66 69 70

Part II Questions of Epistemology: The Role of Instrumental Observations

73

4 Recognizing the Bias

75

4.1 On the Tradition of Overlooking Instrumental Observations: The Case of the Phoneme 4.1.1 From Instrumental Records of Co-articulation to Transcribed Spoonerisms 4.1.2 On the Origin of Alphabet Signs: The Hypothesis of a Preliterate Awareness of Phonemes 4.1.3 Testing Phoneme Awareness: Issues in Defining Reference Units 4.2. The Looking-Glass Effect: Viewing Phoneme Awareness by Reference to IPA Transcripts 4.2.1 “Phonological” Evidence of Phonemes Versus Motor Processes 4.2.2 On Arguments of the “Logical Necessity” of Phonemes and the Success of Alphabet Systems 4.2.3 Effects of Writing on Speakers’ Awareness of Words, Phrases, Sentences

5 (Re-)defining the Writing Bias, and the Essential Role of Instrumental Invalidation 5.1 On the Persistence of Scriptism in the Study of Spoken Language 5.2 The Need to Address Complaints of Cultural Centrism and Ethical Concerns

Part III

The Structure of Speech Acts

6 Utterances as Communicative Acts 6.1 Describing Speech Acts and Their Meaning 6.2 The Parity Condition, Motor-Sensory Coupling, and the Issue of Utterance Structure 6.3 The Coding of Speech Acoustics in the Auditory Brain Stem and Effects of Motor-Sensory Coupling 6.4 Multimodal Sensory Integration: Introducing Neural Entrainment to Speech Structure

76 77 85 89 94 99 103 105

108 108 111

115 117 117 125 129 133

Contents

6.5

6.4.1 The Specificity of Neural Entrainment in the Speech Modality 6.4.2 Neural Entrainment to Structures of Motor Speech: Linking to Spiking Activity 6.4.3 On the Role of Subcortical Processes: Multisensory-to-Motor Integration and Chunking Relating to Utterance Structure, or What the Brain Does Not Intrinsically Construct

7 Relating to Basic Units: Syllable-Like Cycles 7.1 7.2

7.3

Speech Production: On the Brain–Utterance Interface That Never Was Basic Sequencing Units in Theories of Speech-Motor Control: Some Examples 7.2.1 The Equilibrium-Point (EP) Hypothesis 7.2.2 The Task Dynamics (TD) Model 7.2.3 Directions in Auditory Space Into Velocities of Articulators: The DIVA Model Critical Evidence of Basic Sequencing Units and What Shapes Them 7.3.1 Intrinsic Muscle-Tissue Elasticity and Its Effect on Speech Motions 7.3.2 Other Intrinsic Effects of Muscle Tissues on Motion Sequencing within Syllable Cycles 7.3.3 Just How Many Units Are There in CV and VC, and Are These Represented in Memory? 7.3.4 Syllable Cycles within Chunks and Graded Motion Control without Phonemes

8 Relating Neural Oscillations to Syllable Cycles and Chunks 8.1

8.2

The Entrainment of Low-Frequency Oscillations and Speech Processing 8.1.1 On the Role of Theta- and Delta-Size Processing Windows 8.1.2 Reviewing Claims of a Non-sensory Entrainment of Delta to Content Units Delta-Size Windows and the Sensory Chunking of Speech 8.2.1 Chunks and Their Signature Marks 8.2.2 Neural Entrainment in Speech Processing

9 Breath Units of Speech and Their Structural Effects 9.1 9.2

9.3

Utterances as Breath Units versus Sentences in Speaker–Listener Interaction On Interpreting Measures of “Mean Length of Utterance” (MLU) 9.2.1 Utterance Complexity, Lexical Diversity, and MLU: Linking to Developing Motor Structures 9.2.2 Chunks in Breath Units of Speech and the Development of Vocabulary 9.2.3 On Explaining Developmental Milestones The Structure of Spoken Language: An Interim Summary with a View on Addressing the Issue of Scriptism

Part IV

The Processing of Speech Meaning

10 The Neural Coding of Semantics 10.1 Units of Writing, Structures of Utterances, and the Semantics of Speech

vii 133 134 137 141

143 143 147 147 149 152 156 157 159 162 165

172 172 173 174 176 176 179

182 182 183 187 190 192 194

197 199 199

Contents

viii 10.2

10.3 10.4

10.5

The Lexico-Semantic Approach: Context Information as “Nonessential” 10.2.1 Lexico-Semantics and Traditional Models of Language Processing 10.2.2 Embodied versus Disembodied Semantics How Semantic Representations of Verbal Expressions Develop: On “Modes of Acquisition” The Partitioning of Semantic Memory and Its Formatting in Spoken Languages 10.4.1 Words Are Not Biologically Grounded Units: Why Sensory Chunking Is Necessary 10.4.2 On Representations of Verbalized Forms in Memory: Activating Episodes of Speech Acts The Nature of Semantic Representations: On the Neural Coding of Context Information in Action Blocks of Speech

11 Processes of Utterance Interpretation: For a Neuropragmatics 11. 1

11.2

The Issue of the Selective Activation of Semantic Representations in Speech Contexts 11.1.1 Context-Based Semantics: Clinical Observations Using Unconventional Test Batteries On Context-Based Speech Comprehension: Selective Activation of Semantic Representations On-Line 11.2.1 Thalamocortical Interactions and the Integrating Role of the Motor Thalamus 11.2.2 The Semantics of Utterances: The Analogy of Action Selection in Spatial Navigation 11.2.3 Subcortical Mechanisms of Buffering and Context-Based Semantic Processing

Epilogue References Index

200 201 203 205 208 211 214 216

220 220 222 225 227 230 234

239 244 305

Figures

1.1 Typical schematic sagittal representation of the “speech” apparatus page 20 1.2 Observable acoustic structures of an utterance 25 2.1 Examples of the speech stimuli used by Boucher et al. (2018) in a sound–picture association task 48 2.2 Learners’ sound–picture associations across trials and feedback conditions (Boucher et al., 2018) 48 3.1 Examples of formal syntactic analyses performed on orthographic units 54 4.1 Tracings of radiographic recordings for [ku] and [ki] 78 4.2 Tracings of radiographic recordings for a spontaneous speech error 83 4.3 Tracings of radiographic recordings for a corrected form following a spontaneous speech error 83 4.4 Faber’s (1990) list of early Greek letters with their precursors in Old Aramaic script 86 4.5 Basic spectrographic values that are sufficient to synthesize /di/ and /du/, from Liberman et al. (1967) 90 4.6 Katakana signs representing how similar-sounding syllables can be variably interpreted depending on the choice of writing systems 96 4.7 Illustration of dynamic characteristics of speech production 102 6.1 A general scenario where a speaker offers food to a listener while producing utterances 119 6.2 The relationship between acoustic stimuli consisting of a synthesized “da” and ABRs (according to Skoe & Kraus, 2013) 130 6.3 Functional effects of the phase of neural oscillations (according to Schroeder et al., 2008) 136 7.1 Representations and transformations from input signal to lexical representation as viewed by Poeppel et al. (2007) 145 7.2 An illustration of the EP hypothesis of intrinsic force-length relationships in muscle control 148 ix

List of Figures

x

7.3 7.4 7.5

7.6

7.7 7.8 7.9 7.10

8.1 8.2

8.3

9.1 9.2 9.3 9.4

10.1

10.2

Idealized unitary mass displacement in a critically damped and under-damped spring system at three times (Boucher, 2008) Guenther’s DIVA control scheme (2016) Rectified and smoothed EMG activity of opener and closer muscles of lip and jaw motions, along with midline articulator displacements, bilabial compression, and intra-oral pressure (Boucher, 2008) Velocity and mid-sagittal displacement of lip and jaw opening as a function of force-related measures of closing motions (Boucher, 2008) Fiber composition of intrinsic tongue muscles in four parts of the tongue (Stål et al., 2003) Glottal motion and EMG activity of laryngeal muscles preceding the first vowel in “I say” (Hirose et al., 1980) Overall correct serial recall of auditorily presented CV and VC sequences of nonsense syllables Audio signal, intra-oral pressure, and kinematics of midline labial motions during the production of Bobby [‘bɑbi] and poppy [‘pɑpi] VOTs for [ta] and [da] produced at varying rates of speech (Boucher, 2002) Pitch (F0), and dB energy patterns of utterance stimuli along with corresponding ERPs of regions of interests (Gilbert et al., 2015) Inter-trial phase coherence (ITPC) for three types of stimuli as a function of the frequency of oscillations (Boucher et al., 2019) Age-related changes in speech breathing recorded via plethysmographic belts Vital capacities in liters and MLU in morphemes and syllables (Boucher & Lalonde, 2015) MLU in morphemes and syllables as a function of vital capacity (Boucher & Lalonde, 2015) Overall percentages of nominal forms of 1, 2, and 3 syllables or more used by 50 speakers aged 5–27 years (Boucher & Lalonde, 2015) An illustration of the embodied approach in which the semantics of the lexeme banana is seen to be grounded in perception and action systems (Kemmerer, 2015) Effects of rehearsing lexical items in different production conditions on participants’ recall of having produced the items (Lafleur & Boucher, 2015)

150 153

159

160 161 163 165

167 174

178

180 186 189 189

191

210

216

List of Figures

10.3 Phase-amplitude coupling between theta-band oscillations and gamma oscillations (Canolty et al., 2006) 11.1 Analyses of LFPs in the subthalamic nucleus by Watson and Montgommery (2006) E.1 An illustration of the relationship between neural responses, assumed “interim” units of language analysis, and speech stimuli

xi

218 236

242

Tables

1.1 Part of Lindblom’s (1986) predictions of “vowel systems” using a maximal perceptual distance metric and two acoustic-to -auditory mapping approaches page 32 11.1 Test batteries suggested by Murdoch and Whelan (2009) to observe the effects of pallidotomy, thalamotomy, and pathologies of the cerebellum on “high level linguistics” 224

xii

Abbreviations

ABR ABSL ASL CNS EEG, MEG EGG EMG EMMA ERP F0 F 1, F 2 FAF FFR IPA ITPC LAD LCA LFP MEP MLU MRI, fMRI tDCS TMS TP VOT VC

auditory brainstem response Al-Sayyid Bedouin Sign Language American Sign Language central nervous system electroencephalography, magnetoencephalography electroglottography electromyography electromagnetic articulography event-related potential fundamental frequency perceived as pitch formants frequency altered feedback frequency following response International Phonetic Alphabet inter-trial phase coherence Language Acquisition Device last common ancestor local field potential motor-evoked potential mean length of utterance magnetic resonance imaging, functional magnetic resonance imaging transcranial direct current stimulation transcranial magnetic stimulation transition probability between transcribed units voice onset time vital capacity

xiii

Preface

It is customary in a preface to indicate the source of a book, why it was written and for whom, and to thank those who helped forwarding the work through to publication. The present monograph is primarily directed at students and researchers in sectors relating to spoken language. It is also aimed at any interested reader wishing to understand the historical course and recent developments of language science extending to techniques of neuroscience. The book addresses a long-standing problem that may be apparent to anyone with minimal training in methods of linguistic analysis. Such training is part of introductory courses that are often a prerequisite in subprograms of psychology, language neuroscience, communication disorders, and language teaching, among other disciplines. The tradition has been that all who engage in the study of language are trained in analyzing transcribed speech and thus come to conceptualize spoken language by reference to theories erected on these analyses. This has definite consequences across sectors. By such training, many view spoken language as containing letter- and word-like units, organized in terms of given categories that are reminiscent of those used in codes of alphabet writing. But perhaps because of my field of interest (speech science), it has been persistently clear to me, as a student and researcher, that there are hardly any links between instrumental observations of speech and formal analyses of transcripts. This discrepancy was the source of a career-long interrogation on how it was that empirical observations did not serve to correct assumptions shared by analysts and investigators of spoken-language processing. In the meantime, I became acquainted with a body of historical essays, including a publication by Linell (1982/2005) entitled The Written Language Bias in Linguistics. These works documented how spoken language came to be studied using text and essentially demonstrated that the formal analysis of language, as currently taught in universities, is conceptually based on orthographic code. The essays exposed an important bias with broad implications, although the implications were not spelled out except by reference to the sociology of literacy and language theory. There was a need, as I saw, for a work that documented the course of the writing bias, the arguments used to claim the existence of orthographic-like units and categories in the brain, and the xv

xvi

Preface

consequences for experimental research. More pressingly, there was a need to address the bias by detailing how speech can be studied, not through the prism of one’s writing system, but through instrumental techniques that could identify motor-sensory elements and structures of speech processing. However, I fully recognized the risks of such an endeavor. I realized that exposing a bias across sectors of language science ran the risk of appearing confrontational on all fronts and that readers might not be aware of the accumulating evidence of a basic problem. To avoid such judgments, the monograph had to discuss experimental findings. It had to be made clear that, throughout the history of language science, investigators have explicitly acknowledged a basic discrepancy between theories erected on orthographic concepts and observed processes of spoken language. But it was also a concern that a book that documents a bias in research would constitute a wholly negative enterprise – unless it proposed a way of addressing the problem. The monograph had to show that there is a coherent set of findings that supports an approach to the study of spoken language that does not entail notions of letter- and word-like units as in text. The discussion of this evidence, however, presented yet another risk. In this case, there was the chance that some vital findings could be missed, or else that the evidence would refer to domains of research unfamiliar to some readers. On this problem, the challenge was to remain on topic while assuming a knowledge base by which readers could judge competing proposals. As a compromise, I provide, especially in the latter parts of the book that refer to neuroscience, multiple references to recent surveys and critical reviews from which readers can cull background information. In short, I fully recognized the risks of submitting a work dealing with the writing bias in language science. However, far outweighing these considerations was the prospect of a science that seeks to understand the biological underpinnings of spoken language based on culture-specific constructs of writing. In other words, weighing in the balance was the prospect of pursuing studies of speech processes in a way that made the scientific status of the field appear questionable at its base. The present work aims to address the writing bias through an approach that rests on observable structures of speech. This offers a view of how research may move forward in elaborating biologically plausible accounts of spoken language where speech observations are commensurate with neural processes. The evidence that is marshaled in support of the approach draws mostly from published work, though some pivotal findings are the product of my collaboration with colleagues whom I wish to thank. In particular, I am indebted to Boutheina Jemel (Université de Montréal), who designed the image of the book cover, and Annie C. Gilbert (McGill University). Both have had a major influence on my view of the role of neural oscillations in speech processing. Both have convinced me of the value of small laboratories where

Preface

xvii

experimentalists can transgress the boundaries of academic disciplines and share expertise. I also gratefully acknowledge the support of members of our team, especially Julien Plante-Hébert and Antonin Rossier-Bisaillon without whom there would have been no time to write. The essential parts of this monograph were developed in answering an invitation from Philippe Martin (Université Paris-Diderot) to deliver a series of conferences in his department, and I am truly grateful for his ongoing encouragement and discussions on prosodic structure. The format of the subject matter that follows benefited from the commentaries of students who attended my courses at the Université de Montréal. Hopefully, the monograph can serve to foster critical thinking in future students and researchers. I also thank Douglas Rideout for revising the text under the pressure of impending deadlines. Finally, in the context where there is considerable controversy in language theory, I wish to express my gratitude to my editor Helen Barton and Cambridge University Press for their open-mindedness and support.

Introducing a Fundamental Problem of Language Science

It is quite difficult to conceptualize how spoken language functions without at once referring to writing. Most readers of the present text began to represent speech with alphabetic signs at about four or five years of age. It is therefore understandable that people with years of training in alphabet writing would view speech as containing combinations of letter-size elements, words, phrases, and sentences, like the units on this page. But it should be recognized that these intuitions are not universal. People who learn writing systems such as Japanese kana or Chinese hànzi characters, for instance, conceptualize units and combinations quite differently, and do not represent letter units, or words as groupings of letters separated by spaces (e.g., Hoosain, 1992; Lin, Anderson, Ku et al., 2011; Packard, 2000). Nevertheless, specialists in various disciplines have come to use alphabetic signs, along with other orthographic units and categories, not only in analyzing different languages, but also in clinical tests and research on the neural underpinnings of speech processing. For an outsider, this might seem odd. Clearly, not everyone knows alphabet writing. How then could culture-specific concepts of letters and words, or categories like consonant, vowel, noun, verb (etc.), serve to analyze different spoken languages, let alone neurobiological processes common to all speakers? Indeed, such applications of orthographic concepts have not gone unchallenged. As documented in this monograph, there is a history of academic work in which authors repeatedly criticize the centrism of analyses that use orthographic units and categories, and many question the face validity of language theories based on such analyses. In considering these theories, one needs to weigh the fact that there is scant evidence – from instrumental observations of speech to brain imaging – validating the view that utterances are processed as sentences containing hierarchical sets of phrases, words, and letter-like units, as conceptualized in alphabet writing and conventional language analysis. Researchers who continue to seek these hierarchical sets in speech or brain responses also face a basic problem in that there are no working definitions of what constitutes a “word,” “phrase,” or “sentence,” except by reference to such marks as spaces in text (Dixon & Aikhenvald, 2002; Haspelmath, 2011). As for the belief that speakers create 1

2

Introducing a Fundamental Problem

sentences using rules that serve to combine given grammatical classes of words, there are conflicting views that are outlined in the present work. For instance, neuroscientists have submitted evidence invalidating the idea that the brain processes words in terms of orthographic-like categories such as noun and verb, which undermines decades of formal syntactic theory (see, e.g., Vigliocco, Vinson, Druks et al., 2011). Such results and the failure to ground writing-induced concepts of language analysis in speech bear disquieting implications for the field of study. Yet, despite a substantial body of critical commentary, few works present findings that address the problem of the “writing bias” in language science. For students in psychology, communication disorders, and language science, introductory texts offer little forewarning that core concepts of language study do not link to observable physical or physiological aspects of spoken language. In fact, one pains to find a textbook that mentions the problem. Some works allude to it, though almost as an aside. For example, in the final pages of his Introduction to Neuropsychology of Spoken Language and Its Disorders, Ingram (2007) offers the following terse critique of the history of processing models that assume “interim” units of representation such as letter-like phonemes and words: Early models of sentence processing (or production) tended to simply borrow the units of interim representation from linguistic theories (competence models). Subsequently, psycholinguists sought evidence from performance constraints for the psychological reality of these units. However, evidence at the neural processing level for an interim level of linguistic representation is scarce at best. (p. 377)

This may have prompted some readers to wonder why such criticism of decades of research appears in the latter pages of a text and not as an introductory warning. Other specialists are just as critical of the prospects that language study might one day link to neuroscience. For example, Poeppel and Embick (2005): In principle, the combined study of language and the brain could have effects in several directions. (1) One possibility is that the study of the brain will reveal aspects of the structure of linguistic knowledge. (2) The other possibility is that language can be used to investigate the nature of computation in the brain. In either case, there is a tacit background assumption: namely that the combined investigation promises to generate progress in one of these two domains. Given the actual current state of research, these two positions – rarely questioned or, for that matter, identified in studies of language and the brain – lack any obvious justification when examined carefully. (p. 1)

According to these authors and others (Embick & Poeppel, 2015; Grimaldi, 2012, 2017; Poeppel, 2012), the basic problem is one of ontological incommensurability, or the fact that “the fundamental elements of linguistic theory

Introducing a Fundamental Problem

3

cannot be reduced or matched up with the fundamental biological units identified by neuroscience” (Grimaldi, 2012, p. 3). Actually, this incommensurability extends beyond neuroscience and includes various domains of instrumental observation. Speech scientists have long recognized there are no divisible units in signals that match letter-like phonemes in language analysis (Liberman, Cooper, Shankweiler et al., 1967). In a recent meta-analysis of neurophysiological models of speech processing, Skipper, Devlin, and Lametti (2017) concluded that “after decades of research, we still do not know how we perceive speech sounds even though this behavior is fundamental to our ability to use language” (p. 78). However, one notes that the preceding formulation of the issue of ontological incommensurability does not call into question the validity of conventional language analysis, which is not unusual. Poeppel and Embick (2005) assert that, “If asked what to study to learn about the nature of language, surely one would not send a student to study neuroscience; rather, one might recommend a course in phonetics or phonology or morphology or syntax or semantics or psycholinguistics” (p. 2). The presumption that linguistic methods serve to analyze the “nature” of language exemplifies a misapprehension of the roots of the ontological problem in language study. It fails to recognize that the issue of incommensurability can arise precisely because units and categories of language analysis do not reflect natural elements, but cultural constructs that draw from a writing tradition. At the risk of stating the obvious, students who, for instance, learn to analyze the constituents of “sentences” by examining distributions of symbols in transcripts, or by performing substitutions and commutations of letters, words, parts of words, phrases (etc.), while using such notions as consonant, vowel, verb, preposition, auxiliary (etc.), are not working with natural units and categories. They are principally working with orthographic concepts of Latin grammar and overlooking entirely the signals and physiological processes involved in vocal communication. There is, in this sense, a misapprehension in claiming that current methods of language analysis serve to understand the nature of spoken language. As documented in this book, such claims overlook a body of work criticizing the orthographic bias in linguistic methods. In introductory works, though, readers are only marginally informed of the problems that arise when culture-specific concepts of writing are used in analyzing spoken language. As a further example, in a popular compendium of brain-imaging research on language, Kemmerer (2015b, pp. 273–274) cautions readers against culture-centric assumptions, noting that there are substantial differences in the way languages “carve up” meaning with words. Then, a vast body of neuroimaging research on the processing of lexical items is discussed, much of which refers to test protocols involving presentations of

4

Introducing a Fundamental Problem

isolated words. Regarding such methods, there are no cautionary remarks that language groups also carve up verbal expressions differently and that meaningful forms may not reflect units like words in European-style writing and dictionaries. This is not simply a technical matter. It is a decisive conceptual issue, one which also carries ethical implications. Although not widely publicized, the cultural specificity of the word concept has led to debates amongst language pathologists confronted by the problem of how to diagnose “word-finding” deficits for speakers of so-called wordless or polysynthetic languages (discussed later on in this monograph). In these cases, linguistic analysis does not serve to distinguish words from phrases or sentences, and the problem is not limited to little known languages like Inuktitut, Mohawk, Cayuga (and others). It can arguably extend to “isolating” languages representing some of the largest communities of speakers, such as Chinese, which does not conceptualize words as in alphabet writing (cf. e.g., Hockett, 1944, p. 255: “there are no words in Chinese”; Packard, 2000, pp. 16 and infra). Certainly, neuroimaging studies that use visual presentations of words like apple, dog, cup (etc.) provide valuable and even essential information on semantic processes and representations. And clearly isolated units corresponding to space-divided words in writing are used to name objects or actions (such as proper names and imperatives). But people do not speak to each other using isolated forms like apple, dog, cup, and it is inherently difficult to determine how many “words” there are in basic utterances. For instance, in I’m done, Tom’s gone. Take’m, You’re right (etc.), do ‘m, ‘s, ‘re, constitute words? Are the syllables that contain these units subject-verb phrases (and so forth)? Faced with such definitional issues, some analysts contend that, for the most part, speech may not involve words, or combinations of words as in a text, but formulas which are processed as such (Ambridge & Lieven, 2011; Beckner, Ellis, Blythe et al., 2009; Bybee & Beckner, 2009). Thus, beyond cautionary remarks on language relativity, there is the fundamental issue of whether concepts linked to writing can serve to guide research on the structures and processes by which spoken language conveys meaning. For scientists from other disciplines, the latter concern may seem so basic as to undermine language study as a science. After all, why has the failure to observe notional orthographic units and categories in sensory signals not led to a reevaluation of conventions of language analysis? Some do not see the problem as relating to a writing bias, but instead suggest the need to refine linguistic concepts using “features” and “morphemes” rather than letter-like phonemes and words (Embick & Poeppel, 2015; Poeppel, 2012). Many also point to a general problem of methodology: language specialists need to develop theories that take into account instrumental observations so as to orient what appears to be a stockpiling of eclectic data. For instance, Grimaldi (2012)

Introducing a Fundamental Problem

5

notes that there is a basic ontological problem in linking theories of language to observations of neural processes: Despite the impressive amount of neural evidence accumulated until now, the field of research results is fragmented and it is quite difficult to reach a unit of analysis and consensus on the object of study. This frustrating state of the art results in a detrimental reductionism consisting in the practice of associating linguistic computation hypothesized at a theoretical level with neurobiological computation. However, these two entities are at the moment ontologically incommensurable. The problem lies in the fact that a theory of language consistent with a range of neurophysiological and neuroimaging techniques of investigation and verifiable through neural data is still lacking. (p. 304)

Grimaldi mentions one exception. In his view, the language theorist Jackendoff has developed a formal model of “sentence” processing that attempts to connect linguistic analyses to neurophysiological observations. The proposal, the Parallel Architecture Model of Sentence Processing (Jackendoff, 2007b, 2009, 2017), was principally intended as a response to criticisms that formal theories of sentence generation present static arborescent structures that do not take into account that speech unfolds over time (see Ferreira, 2005). In other words, the model attempted to reconcile an analysis of sentences using static signs on a page with temporal aspects of speech. This is a rather obvious discrepancy between theory and observation, which might lead one to ask why language theories do not generally take into account the time dimension of speech and how this shapes the processing of language. On such issues, Jackendoff’s justification for the model is instructive in defining the ontological incommensurability problem that extends well beyond the issue of static formalisms. In particular, in proposing his model, Jackendoff admitted that he was breaking with a long-standing “mentalist” tradition in language study. By this, he meant that, following Saussure’s (1916/1966) division of langue and parole, language analysts have generally adopted a distinction “between the study of language competence – a speaker’s f-knowledge (‘functional knowledge’) of language – and performance, the actual processes (viewed computationally or neurally) taking place in the mind/brain that put this f-knowledge to use in speaking and understanding sentences” (2009, p. 27). The mentalist perspective adheres to the Saussurean premise that language is a product of the mind that can be studied separately from the speech medium by analyzing transcripts. Although Jackendoff (2009) suggested that the competence– performance division may have originally been intended as a methodological principle (p. 27), this is certainly not the case historically (as clarified in the present monograph). The mentalist tradition, as Jackendoff notes, has been highly influential and was a principal vector of cognitive psychology and cognitive neuroscience. Indeed, G. Miller (2003) recounts that in the 50s,

6

Introducing a Fundamental Problem

when behaviorist theories were being overturned by a group of authors for which he was a spokesman, he hesitated to use the term mentalist to describe the views of the group and instead referred to a “cognitivist” approach (p. 142). For Jackendoff, the competence–performance division in the mentalist tradition had some unfortunate consequences in that some theorists “tended to harden the distinction into a firewall: competence theories came to be considered immune to evidence from performance” (2009, p. 28). But this is not the only break with the mentalist tradition that Jackendoff requests for his proposal. To respond to the lack of agreement between formal language theories and neuroscience, he suggests a reassessment of a modular “syntacticocentric” view which has dominated language theory for over half a century and which, in his opinion, was “a scientific mistake” (p. 35). In the end, the Parallel Architecture Model attempts to answer the problem that speech unfolds over time. Even so, the model operates on conventional units and categories where sentences are seen as hierarchical assemblies of letter symbols, words, phrases (etc.), all of which have no general physical or physiological attributes in speech. Again, this incommensurability is not seen as particularly troublesome for the theorist who accepts that units, such as words, “as is well known, are not present in any direct way in the signal” (Jackendoff, 2007a, p. 378), or that letter-like phonemes are not in the “physical world” (Jackendoff, 2017, p. 186). But if such units are not in signals, in physical manifestations of spoken language, then how does neurophysiology operate to extract and process phonemes and word combinations? How do scientists test putative grammars, or how would a child acquire a combinatorial scheme if units are not present in sensory signals? There is in such views a “firewall” of sorts preventing any invalidation of the writing-induced concepts of language analysis and theory. Thus, while some authors acknowledge the historical failure to connect the study of language competence to performance (which includes physiology and signals), the presumption that spoken language can be studied using transcripts and orthographic concepts remains. This has led to enduring pseudo-puzzles that extend to debates on the origins of spoken language. For instance, how would formal grammars based on linguistic analyses emerge in human biology if the units on which the grammars function were not in the physical medium of communication? On such pseudo-puzzles, the failure to ground conventional units of language description in sensory signals has contributed to speculations of their innateness and saltations in the evolution of the mind/brain that present logical problems (as outlined by Christiansen & Chater, 2008). On the other hand, rather than addressing the problem of using writing concepts in the study of spoken language, some ask if the concepts are not “unavoidable descriptive conveniences” (Bybee & McClelland, 2005; Jackendoff, 2007a, p. 352). Certainly, examining transcripts and orthographic

Introducing a Fundamental Problem

7

units can provide useful information in several areas of inquiry, especially in sectors relating to reading. Nonetheless, static writing signs do not represent speech acts and signals. No matter how fine the transcripts are, they do not capture such things as muscle contractions, sound properties, breath flow, or any other physical or physiological aspect of verbal communication. Thus, writing signs offer no means by which to explain the structure of spoken language. As a consequence, researchers who refer to linguistic analyses and theory will not find a definition of the nature of units such as letter-like phonemes, words, phrases, or sentences. These forms used to describe spoken language are generally taken as given in sectors of language study, but never explained. More importantly, viewing spoken language through writing overlooks the multisensory context of speech that essentially defines the function and meaning of utterances. This latter problem is commonly acknowledged: one cannot interpret the meaning of utterances, let alone understand how they convey meaning out of context. Yet current models of language are based on the analysis of script that completely removes the object of study from the motor-sensory medium and communicative environment. The following monograph presents evidence, some of which has been available for some time, showing that one does not need to refer to writing concepts to study the processes of spoken language. In discussing this evidence, it is a contention that postulates of interim units and categories of language analysis do not exist beyond “descriptive conveniences” for observers who know alphabet writing, and are not “unavoidable” as some have argued. There are observable structures of speech that are universally present across languages. All spoken languages present syllable-like cycles, prosodic groupings, and units of speech breathing. It is also the aim of this book to show that findings linking these structures to neural processes present a major shift in the way semantic representations are conceptualized. Many readers who have formal training in language study are likely to object to such a viewpoint based on the notion that language is separate from speech. In the context where this belief is pervasive, any attempt to address the problem of the incommensurability between conventions of language analysis and observations of speech processes requires a critical look at the historical arguments that have served to maintain a speech–language division. This also extends to a division in methods of inquiry where linguistic analysis is often seen to have theoretical precedence over instrumental observations. These issues have essentially guided the organization of the subject matter of the present monograph into four parts. Part I, entitled “Questions of Ontology: Writing and the Speech–Language Divide,” documents the source of the belief that language is separate from speech. For many historians, this belief originates in the practice, instituted by philologists in the nineteenth century, of viewing spoken language via script.

8

Introducing a Fundamental Problem

Influential authors such as Saussure (1857–1913) formulated pivotal arguments for separating langue and parole that are still echoed in textbooks. In reviewing these arguments, an essential point of Part I is that early language theorists had no instruments by which to record and visualize speech. When instrumental methods became available, the notion that one could work out feature systems by examining letter signs on a page was already accepted in schools of phonology. But a turning point occurred when early instrumentalists reported that writing-induced concepts of linguistic methods did not reflect in speech. At that point, the argument was made that phonological criteria were essential in orienting observations. Many interpreted this to mean that empirical research had to be hypothesis-driven in terms of the assumed units and categories of linguistic theory. Part II, “Questions of Epistemology: The Role of Instrumental Observations,” examines the consequences of the idea of the primacy of linguistic-type analyses in guiding research. It is the case that, in using instrumental techniques, investigators need to have some idea of what exactly in speech serves a communicative function. However, such considerations offer no justification for the belief that research needs to be driven by orthographic concepts of linguistic descriptions. Yet this presumption prevails in sectors of language study. One key example discussed in Part II is the perennial debate on the existence of letter-like phonemes. Despite the acknowledged absence of these units in speech, it has been claimed that types of data, such as spoonerisms transcribed with letters, confirm the existence of phonemes. The assumed primacy of these types of indirect observations over instrumental evidence reflects an epistemological problem and a writing bias that has broad consequences. For instance, some investigators critically refer to studies of spoonerisms in arguing that phonemes are part of an innate competence underlying alphabet writing, a view that has been severely criticized. Part II reviews a body of evidence confronting the belief that language is separate from speech and can be studied using concepts of writing. Of course, not all authors share this viewpoint. Some, in fact, explicitly reject the speech– language division. This entails a major epistemological shift in that abandoning the division implies that structures that are readily identified in motor and sensory aspects of speech are the actual structures of spoken language. Part III, “The Structure of Speech Acts,” and Part IV, “The Processing of Speech Meaning,” address the problem of the ontological incommensurability between conventional linguistic analysis and observations of spoken language. Part III illustrates how one can study spoken language in context where, instead of focusing on script, one refers to the structural attributes of utterances. Viewed as speech acts, utterances are much like other bodily actions that are performed in a changing environment except that the actions modulate air pressure for communicative purposes. As already mentioned, the structure of

Introducing a Fundamental Problem

9

these actions is reflected in syllable-like cycles, temporal chunks, and breath units, all of which emerge from circumscribable processes of motor speech. A central thesis of the present book is that multisensory context information binds to utterance structure, more specifically, to chunks of articulated sounds via mechanisms of motor-sensory coupling and neural entrainment. On this basis, episodic and semantic memory of sensory experiences that accompany utterances can link to action chunks that some authors view as verbal formulas and semantic schemas. The relevance of this approach is discussed in the context of ongoing problems that arise in models of speech and semantic theory that attempt to provide an interface between neural processes and writinginduced concepts of language description. In sum, Parts I and II offer a review of historical developments underlying a writing bias that has created fundamental problems in research on spoken language, whereas Parts III and IV present an approach that serves to address these problems. However, the background information provided in the first parts should not detract from the main subject of the present monograph. This is not a book about writing, although it refers to critical works on the influence of what some historians call scriptism in language study. It is a book about language science and how one can view processes of spoken language without reference to culture-specific concepts of writing. The latter parts of the monograph develop this view in terms of a body of research findings. This research sheds light on how memory of multisensory experiences binds to structures of motor speech and how activations of semantic or episodic memory in relation to these structures underlies a context-based interpretation of utterances. However, given that many may believe that speech is functionally separate from language and semantics, it is useful to begin by examining how this belief, and conventional concepts of language analysis, arose from a tradition of viewing spoken language through script.

Part I

Questions of Ontology: Writing and the Speech–Language Divide

1

How We Are Introduced to the Study of Spoken Language

For many readers of the present work, the notion that language is separate from speech (or that language competence is separate from performance) will likely have been acquired in an introductory course to language study. This notion can have a major influence on how language is conceptualized, and presents a central tenet in the field. In examining introductory texts used by generations of students (e.g., the multiple editions of Akmajian, Demers, Farmer et al., 2010; Fromkin, Rodman, & Hyams, 2013, and others), one finds typical arguments for distinguishing speech from language. These arguments serve to specify not only the object of study, but also how to study it. However, all these arguments, it should be pointed out once more, involve written material. For instance, in sections pertaining to the “structure of language,” examples of sentences or words are usually presented in regular orthographic form or transcribed using letter symbols of the IPA (International Phonetic Alphabet). These can be used to illustrate that sound–meaning associations are arbitrary (symbolic), and that the way sounds are combined to form words, or the way words are assembled to form phrases and sentences, reflects knowledge of combinatorial rules. The implication of such examples is that this has nothing to do with the vocal processes of speech (or any modality of expression for that matter). The illustrations can captivate an audience, partly because they cater to received notions and intuitions. Of course, after decades of training in alphabet writing, the audience will “know” units like words and sentences, orthographic rules of letter and word combinations, as well as the categories on which the rules apply. On the other hand, one is likely to find some ambiguity on how linguistic descriptions relate to physical and motor-sensory aspects of speech communication. Although textbooks generally contain a section on the “sounds of language,” many do not contain a single illustration of acoustic waveforms and simply refer to IPA signs. In those textbooks that do contain oscillograms or spectrograms (e.g., Akmajian et al., 2010), there is no indication that it is impossible to divide signals in letter-like units without creating noise, or that “distinctive features” of sounds are not bundled together as IPA letters imply. The misrepresentation in this case is that, in describing spoken language with writing symbols, it is held that one is representing the sounds of a language when, in fact, 13

14

Questions of Ontology

the symbols do not reflect signals, suggesting instead the existence of abstract elements. Invariably, the presentations lead to a view of spoken language as reflecting a mental ability that organizes symbolic elements of various orders (at the phonological, morpho-syntactic, and semantic levels of description). Such an ability or “competence,” as the arguments run, is not a matter of the speech apparatus (usually presented by static sagittal representations of the oral and nasal cavities), but constitutes a separate capacity in the brain. Given that this capacity is seen to operate on strings of symbolic units, then the tacit assumption is that symbols of alphabet script can be used to study it. Presenting the object of study this way and focusing on text material can foster the conviction that what defines spoken language is its combinatorial or syntactic character. Thus, to borrow a term from Jackendoff (2009), language study appears, in its introductory ontology and epistemology, to be “syntacticocentric.” The examples that are provided can also create the belief that processes governing communication can be studied using familiar writing codes, without any immediate need to refer to principles of physics and physiology. Yet introductions such as these, which centrally involve examples based on transcripts, distort the role of spoken-language communication in rather obvious ways. Compared to the presentation of written material, one can consider how the use of audio or audiovisual examples of actual speaker–hearer interaction would lead to a very different concept of spoken language and how to study it. This would be especially clear in a case where one is confronted with interactions involving unfamiliar languages. 1.1

Language as an “Autonomous” System, or the Effects of Scriptism

Anyone attempting to analyze heard utterances in an unfamiliar language will quickly realize that this in not at all like analyzing one’s own language. Upon hearing unfamiliar speech, a listener can perceive structured sounds along with situational- and speaker-related information. But one would not “know” where to divide words, phrases, or sentences, nor be able to comprehend the utterance. Indeed, intuitions of orthographic units and categories can inherently bias firsthand observations. One revealing example is discussed by Gil (2002) who, in his fieldwork, was confronted with unfamiliar South Asian languages. Consider a situation where an analyst, who does not speak Riau Indonesian, hears an utterance in that language and then proceeds, as Gil does, to transcribe the sounds as “ayam makan.” If one reduces what is heard to this script, then the actual patterns of sounds including syllable-like cycles, groupings, and tonal contours need to be inferred by a reader of an alphabet. Thus, depending on the experience of the reader, it may not be clear what the space and letter sequences actually represent in terms of sound patterns, or if the original patterns heard

How We Are Introduced to the Study of Spoken Language

15

by the analyst would be better represented as “ay amma kan,” “ayam mak an,” “a yam makan,” (etc.). Of course, in recording the speech event with alphabet symbols, most of the structural and all of the situational attributes of sounds relating to the speaker’s identity, affect, state (etc.) and the general utterance context are omitted. The idea that an observer can skip over these structural and situational details and analyze content in terms of syntactic-semantic units led Gil to the following graphic analysis of constituents, which, as such, offers little if any information as to how the utterance conveys meaning.

Even if one were to consult a dictionary of Indonesian to translate the constituents into English, one could end up with nonsensical versions. (In this case, Gil offers several possible interpretations having to do with chickens and eating or being eaten.) In reality, it is only by considering what is occurring in the context of speech that an observer can interpret the signals, their structure, and their expressive character, so as to understand the meaning of the sounds. Hearing other situated utterances would ultimately serve to discern groupings and infer their function – inasmuch as the meaning of utterances can be ascertained from a communicative setting, at least in initial analyses. But the idea that one can understand how a spoken language works by analyzing the syntax of alphabet symbols on a page is an illusion. Such impressions arise when interpreting transcripts that represent a familiar language, where the function, structure, and meaning of sounds have been acquired through extensive experience in speaking the language. Although this may seem obvious, it is a common oversight in textbooks and is central to understanding the persistence of the problem of the ontological incommensurability between language theory and instrumental observations, as described in the Introduction. The tradition of analyzing utterances by way of transcripts and orthographic concepts fosters a view of spoken language as an autonomous combinatory system that operates on the very concepts that are used in the analysis. In this looking-glass approach, spoken language appears as an inherently mental function that assembles given syntactic-semantic units, all of which seem to have little to do with the motor and sensory processes of communication. This idea of “language autonomy,” which tends to overlook the biasing effects of orthographic constructs, has led to perennial debates on the grounding of language theory (for polemics on the autonomy principle, see Derwing, 1980; Lyons, 1991; Newmeyer, 1986a, 1986b, 2010; Taylor, 2007, among others). To characterize the problem briefly, an autonomy principle holds

16

Questions of Ontology

that a theory of language can be elaborated using linguistic-type analyses, without reference to other types of data. Newmeyer (2010) illustrates this principle using a chess-game analogy that, interestingly, is the same analogy used over a century ago by Saussure (1916/1966) to illustrate his assumption of a division between language and speech. As Newmeyer sees it: Chess is an autonomous system: There exists a finite number of discrete statements and rules. Given the layout of the board, the pieces and the moves, one can “generate” all of the possible games of chess. But functional considerations went into the design of the system, namely to make it a satisfying pastime. And external factors can change the system. Furthermore, in any game of chess, the moves are subject to the conscious will of the players, just as any act of speaking is subject to the conscious decision of the speaker. So chess is both autonomous and explained functionally. (p. 4)

That is to say, knowledge of the combinatorial rules of language is the central object, not the instances where a speaker forms particular utterances during functional acts of communication, or changes in the language code. The key term in this analogy is “Given . . . .” In analyzing a familiar language using transcripts where units and categories of an orthographic system are taken as given, or assumed to be known by all, one could formalize a set of rules that generate combinations. This has been the program of some linguists who have elaborated formal models of “sentence” generation said to constitute a “universal grammar.” But then influential models have often been based on analyses of English sentences, usually represented in regular orthographic script (an issue further discussed in Chapter 3). This brings forth obvious questions. In particular, it leads one to ask whether combinatorial models derived from these analyses of sentences can formalize actual operations on elements in the brain or if they reflect artifacts of orthographic conventions. As mentioned in the Introduction, there is at this time no physiological evidence supporting the type of hierarchical combinations of letter-like phonemes, words, phrases, and sentences, which are the hallmark of formal language analysis and models. Although the absence of evidence does not constitute invalidation, viewing studies such as Gil’s (2002) brings to light the difficulties of an approach where it is assumed that units and categories of Latin grammar are given and known by all. Gil could not presume such elements for Riau Indonesian. He opted for an ad hoc category “S” in the previously cited example because, contrary to familiar European languages, syntactic elements in Riau can be distributed just about anywhere in an utterance and have no case markings. In considering such instances, one can wonder what formal theories would look like if, instead of constructing a grammar based on analyses of English sentences, one took Indonesian as a familiar starting point. Gil’s study, entitled Escaping Eurocentrism: Fieldwork as a Process of Unlearning, suggested the need to question received notions in analyzing languages. In fact, numerous reports

How We Are Introduced to the Study of Spoken Language

17

attesting to a diversity of categories and units tend to undermine the claimed universality of some formal theories (see, e.g., Evans & Levinson, 2009). This does not mean that there are no universal processes, nor is the problem a matter of “language relativity.” There are, as discussed in the present work, elements and structures that arise in all spoken languages as a result of constraints on motor and sensory processes of vocal communication. However, it has to be asked if analyses of transcripts are at all suited to the study of these processes, or in guiding research on how spoken language conveys meaning. To illustrate this last point, consider a simple response like “hello” heard in the context of a phone call. Linguistic analysis hardly serves to understand how the sounds can be variably interpreted. Yet upon hearing the sounds, listeners can readily identify a familiar voice, a speaker’s dialect, affective state (etc.), along with coded indices, subtle features that are part of “knowing the language” and that indicate whether the speaker is receptive to the call, is happy to hear from an old friend, and so forth. There is no formal repertoire, no “grammar” or “dictionary” of such speech attributes. But the fact that they are used to interpret utterances in various ways implies a coding of these features in memory. The variety of sound features and associations that can be formed vastly outnumber IPA signs. Yet these attributes are not considered in linguistic analyses, principally because the analyses are limited to what can be represented in writing. Recognizing this aspect of linguistic methods is central. It helps to understand how the tradition of using transcripts has led to a division of the object of study where “language” is abstracted away from the sensory and motor processes of oral communication. Viewing utterances through a known writing system, the object appears to function in terms of conceptual units and rules that are difficult to dissociate from the analyst’s knowledge of a writing code. Several authors of historical essays have designated this effect, and the tendency to conceptualize spoken language through writing, “scriptism” (Harris, 1980, and see Coulmas, 1989, 1996, 2003; Harris, 1990, 2000; Linell, 2005; Love, 2014; Olson, 1993, 2017). As documented in these studies, scriptism not only refers to the particular biasing effect of alphabetic concepts, but refers more generally to the fallacy of claims that analyses of transcripts serve to understand spoken language. To the contrary, the aforementioned essays illustrate that, across cultures, “writing systems provide the concepts and categories for thinking about the structure of spoken language rather than the reverse” (Olson, 1993, p. 2). The problem is that structural attributes of utterances are not determined by orthographic representations of sentences. They are determined rather by motor-sensory processes of oral communication that, obviously, cannot be studied via script. As will be noted in Part II of this book, the response from some authors to the preceding criticism has been to argue that the concepts of alphabet writing are special in that they can capture the true units of a biological competence for language (see also Wolf & Love, 1997). In the end, the arguments imply a view

18

Questions of Ontology

on the superiority of European systems, despite the avowed absence of evidence for alphabet-style units in signals and neural responses. However, these arguments largely deal with a side issue to scriptism. They do not address the basic, ontological difference between sentences on a page and utterances. Using any writing system to record spoken language as such disembodies and decontextualizes the object of study. It creates an object that is ontologically separate from observable aspects of spoken-language communication. In summary, the textbook assumption that linguistics examines the sounds of spoken language cloaks the fact that language analysis, as currently taught and practiced, largely focuses on script. This tradition has led to a conceptualization of spoken language as an autonomous system operating on letters, words, and sentences separate from observable structures of utterances. As is known, the autonomy concept was formally introduced in language theory in the nineteenth century. Theorists like Saussure formulated influential arguments in claiming that language is separate from speech and reflects a faculty in cortical areas of the brain. Similar arguments, including Saussure’s chess-game analogy, can be found in textbooks where the idea of a separate language faculty has been reworded as a competence–performance division. Historically, this division was central to the rise of linguistics as a discipline. But it is also the locus of a fundamental problem facing research. There is, as many recognize, a basic incompatibility between theories erected on linguistic analyses and observations of speech. That this problem persists speaks to the influence of foundational works. In reviewing the arguments used by Saussure, the following sections draw attention to the context in which they were formulated. Specifically, it should be acknowledged that language theorists in the nineteenth century could not record, let alone visualize, sound patterns of utterances, and the idea of cortical faculties was dominant at the time. However, the concept of localized faculties has long since been abandoned. In light of available knowledge about the brain and current recording techniques, language study in its focus on script and reference to separate faculties seems oddly out of step. Perhaps the clearest example of this nineteenth-century viewpoint is the way in which “speech” is separated from language in introductory texts, as illustrated below. 1.2

Defining “Speech”

In fields of study bearing on spoken language, including several applied sectors, speech is typically presented in terms of a vocal production of sounds and is traditionally modeled as a source-filter function (e.g., Fant, 1960). It needs to be recognized, however, that speech cannot be defined this way precisely because one can create countless vocal resonances, such as in a source-filter system, without producing speech. In other words, it is inherently difficult to separate speech from language. Thus, “speaking to someone” is commonly understood as

How We Are Introduced to the Study of Spoken Language

19

a communicative act involving a language code (although not all information in speech is coded). Some cultures have terms to designate both language and speech without implying separate processes (e.g., eine Sprache in German, or un parler in French). But this is not simply a question of terminology. In areas of language study, mechanisms of oral expression are traditionally presented as if they could be physiologically isolated from language functions, usually localized in cortical areas. Examples of this can be found in practically all textbooks where “speech” is illustrated using schematic midsagittal sections, such as in Figure 1.1(a). Such representations have been used by analysts and teachers since the nineteenth century, and can even be found in works of neuroanatomy (e.g., Scott, Wylezinska, Birch et al., 2014). The reality is that illustrations of anatomical structures like (a), which are meant to refer to articulatory and phonatory systems, do not as such produce speech. Clearly, processes of the central nervous system (CNS) are involved, which go beyond a source-filter concept. In this light, some might argue that brainstem structures and other subcortical processes may provide a basis for separating speech functions from language and, in fact, this division has been applied in clinical sectors where “speech and language” disorders have been traditionally viewed as involving subcortical and cortical lesions, respectively. Thus, one might use a drawing like Figure 1.1(b) to suggest a speech–language division where some subcortical structures are assigned to speech. But as to whether this anencephalic-like concept might more aptly illustrate a speech–language division, one can refer to actual cases of anencephaly, as in (c). Anencephalic infants do not develop a telencephalon (and many do not develop parts of the diencephalon). Yet some have survived beyond two years of age and thus beyond the stage where normal infants begin to babble (at about 6–8 months; Dickman, Fletke, & Redfern, 2016; Oller, 2000). In rare instances, the infants have a relatively intact brainstem along with neural pathways and structures supporting vision and hearing. In such cases, it has been reported that many behaviors traditionally ascribed to cortical activity are present and can originate in the brainstem (Shewmon, 1988). For instance, these infants can react to noxious food and produce avoidance responses including vocalizations and cries. They show interactions with moving objects involving eye movements accompanied by head turning (indicating a degree of motor-sensory coupling), and also produce facial expressions including smiles (for a recent review of studies of anencephaly, see Radford, Taylor, Hall et al., 2019). But they do not babble, or produce anything resembling speech (i.e., they do not develop eine Sprache, un parler). The point is that, contrary to a Saussurean ontology, speech is not a separable physiological function and cannot be defined as a passive source-filter system. And just as problematic is the idea that language is separate from speech.

20

Questions of Ontology

Figure 1.1 Typical schematic sagittal representation of the “speech” apparatus. (a) Usual representation as can be found in introductory texts to language study; (b) with added brainstem and cerebellum for an anencephalic version of the sagittal schematic. (c) Photo of an anencephalic infant; see discussion in the text (from Bathnagar, 2012, with permission).

1.3

Was the Speech–Language Division Ever Physiologically Grounded?

Some may contend, as did Jackendoff (2009, p. 29), that the separation of speech from language or the performance–competence division was simply intended as a methodological or heuristic principle. This is not the case. Early in the history of language study, Saussure formulated a set of tenets for language analysts based on the belief that the speech–language division was physiologically attested and that there was a separate “language faculty” in the brain. This viewpoint has had a lasting influence. As R. Harris (2002) noted: “The basis of linguistic theory has remained in all essentials unchanged since it was first laid down in Saussure’s

How We Are Introduced to the Study of Spoken Language

21

Geneva lectures of 1907–11” (p. 21). The speech–language division is a prevailing doctrine, and also reflects nineteenth-century views of a hierarchical brain (Parvizi, 2009). However, the original claim of a physiological division no longer holds, and arguments for the separate study of language have been criticized on fundamental grounds. 1.3.1

Saussure’s Argument of a Separate Language Faculty in Broca’s Area

In terms of neurophysiological evidence, at the time of his Cours de linguistique générale, Saussure believed that the capacity to create a system of signs was located in Broca’s area (Broca, 1861). In discussing this function, Saussure distinguished language from the motor-control processes of parole, though he did not attempt to localize the latter. In citing Saussure’s Cours, it is worth mentioning that the English translation by W. Baskin alters key terms in that “speech” is often used where Saussure uses langage, so the following excerpt restitutes the term “language” (in brackets) that is in the French version of the Cours. It is perhaps useful to note that compared to English, which has two terms to designate language and speech, French has four: parole, langue, langage, and un parler. In Saussure’s view, language (langage) is the capacity to constitute a system of coded signs that can be a spoken code or langue, a writing code, or other systems of signs including gestural “sign languages.” Saussure’s central claim is that the division between speech and language is not merely methodological, but physiological: We can say that what is natural to mankind is not oral speech but the faculty of constructing a language, i.e. a system of distinct signs corresponding to distinct ideas. [. . .] Broca discovered that the faculty of [language] is localized in the third left frontal convolution; his discovery has been used to substantiate the attribution of a natural quality to [language]. But we know that the same part of the brain is the center of everything that has to do with [language], including writing. The preceding statements, together with observations that have been made in different cases of aphasia resulting from lesion of the centers of localization, seem to indicate: (1) that the various disorders of oral [language] are bound up in a hundred ways with those of written [language]; and (2) that what is lost in all cases of aphasia or agraphia is less the faculty of producing a given sound or writing a given sign than the ability to evoke by means of an instrument, regardless of what it is, the signs of a regular system of [language]. The obvious implication is that beyond the functioning of the various organs there exists a more general faculty which governs signs and which would be the linguistic faculty proper. (Saussure, 1916/1966, pp. 10–11)

The idea of a separate language faculty is still present in the literature. Yet setting aside the historical debate on localizing aphasia (see, e.g., Ross, 2010), it has been recognized that the notion of a circumscribable language function in

22

Questions of Ontology

Broca’s area is incompatible with clinical evidence, some of which dates back to the 50s. One early demonstration was provided by Penfield and Roberts (1959). These authors surgically removed large parts of Broca’s and neighboring areas in an individual who, following surgery, showed minor effects on spoken language that dissipated within a few weeks (although, on p. 163 of their report, the authors mention that the individual presented congenital problems that could have caused a “displacement of the function”). With the exception of a report by Marie (1906), these authors were amongst the first to suggest that processes of spoken language involve subcortical structures traditionally associated with “motor-control” pathologies: “Since all these removals of the convolutions that surround the speech areas do not produce aphasia, it seems reasonable to conclude that the functional integration of these areas must depend upon their connection with some common subcortical structure” (Penfield & Roberts, 1959, p. 212). Current understanding and clinical evidence largely support this conclusion. As Murdoch (2010a) notes: “Clinical evidence is now available to show that permanent loss of language does not occur without subcortical damage, even when Broca’s and Wernicke’s areas have been destroyed by lesions” (p. 79). Adding to these clinical observations, case reports of children who have undergone the excision of their left or right cortex, but who nonetheless develop spoken language, also undermine the thesis of a localized language faculty in Broca’s area (Bishop, 1983; Curtiss & de Bode, 2003). Such findings are part of a body of work that challenges a cortico-centric conception of spoken language where language is essentially seen to involve cortico-cortical couplings with white-matter connections to subcortical structures subserving motor functions (see the critique of Skipper, 2015; Skipper et al., 2017). This hierarchical view of the brain has changed. There is a now a consensus on the point that motorrelated speech dysfunctions associated with structures like the basal ganglia and cerebellum accompany semantic-related deficits, and that aphasia as such can arise from lesions in the thalamus (Crosson & Haaland, 2003; Murdoch, 2001; Murdoch & Whelan, 2009). The findings have major implications with respect to a Cartesian concept of language as separate from motor-sensory functions. In extending this changing concept of spoken language, the present monograph refers to a body of evidence that illustrates how constraints on motor-sensory process of speech can shape structures of spoken-language processing. The point is that, in light of current knowledge, the doctrine of a speech–language distinction is neither physiologically based nor methodologically useful. On the contrary, it has hindered any consideration of the inherent links between motor-sensory and semantic functions, as further documented in this work. For language theory and analysis, the previously mentioned set of findings points to fundamental issues that have yet to be fully weighed in sectors of language study. In the foundational principles of his Cours, Saussure assumed

How We Are Introduced to the Study of Spoken Language

23

that spoken language involves an autonomous system located in Broca’s area, and saw this system as physiologically separate from speech. Based on these assumptions, he proposed methods by which to analyze the abstract system using transcripts as he saw no need to refer to the structural aspects of motor-sensory processes of vocal expression. To the contrary, research shows that there is no cortically localized language function, and that “language” and motor-sensory processes intertwine. There are several implications for Saussure’s approach to the study of spoken language via transcripts, one of which should be clear: if there is no physiological basis for dividing language from speech, then observable structures of utterances are the structures of spoken language. In fact, the natural structures of spoken language were never “given” or represented by culture-specific writing signs and units. In continuing to idealize language via transcripts and concepts relating to letters, words, sentences (etc.), analysts fail to recognize that the principal historical justification for using script in the study of spoken language was the nineteenth-century view of an autonomous language faculty, which no longer holds.

1.3.2

Arguments of the Arbitrariness of Signs and Abstract Phonology

Saussure presented several influential arguments supporting a Cartesian view of language that still prevails. In particular, Saussure taught in his Cours that the capacity to create arbitrary (symbolic) sound–meaning associations and phonological systems demonstrates that a language function is separate from an oral modality of communication and is solely governed by what he called “rational” principles (which can be seen to refer to cognitive functions before G. Miller formally introduced the term). The linguistic sign is arbitrary; language, as defined, would therefore seem to be a system which, because it depends solely on a rational principle, is free and can be organized at will. (p. 78) [. . .] Just what phonational movements could accomplish if language did not exist is not clear; but they do not constitute language, and even after we have explained all the movements of the vocal apparatus necessary for the production of each auditory impression, we have in no way illuminated the problem of language. It is a system based on the mental opposition of auditory impressions, just as a tapestry is a work of art produced by the visual oppositions of threads of different colors; the important thing in analysis is the role of the oppositions, not the process through which the colors were obtained. (p. 33)

The quoted assertions are characteristic of the Cours. It is the case that signs in spoken language appear arbitrary and that distinctive features vary across languages. But the conclusion does not follow that signs and feature systems are

24

Questions of Ontology

mental constructions that “depend solely on a rational principle” and are “free and can be organized at will.” Certainly, spoken languages can draw from a variety of sound features that can be produced. And a speaker may variably associate sounds and meaning. Yet neither features nor arbitrary signs arise independently of the possibilities afforded by a motor-sensory medium of expression. In omitting this point, Saussure’s arguments have two major consequences on theories that adopt the doctrine of a speech–language divide. First, viewing language as a mental construction foregoes the need to explain how arbitrary signs and distinctive features arise. These are simply taken as given, or the product of a given faculty. A second, and more important consequence of the doctrine is that it excludes, de facto, any account where constraints on the medium of expression are viewed as factors that can shape processes and signs of spoken language. This is clear in Saussure’s outright denial that mechanisms of production and perception have anything to do with language. Such a theoretical fiat, in effect, denies some rather obvious links between structural attributes of utterances and constraints on the vocal modality – which are readily observed inasmuch as one examines utterances as signals and not as sentences on a page. For instance, no “rational” principle or mental exigency forces the linear arrangement of sounds and verbal elements in utterances. This is not a product of thoughts or sensory experiences. When one sees or imagines someone jumping, one does not first think of a person then an action, or an action separate from the individual performing it, as when saying “Paul jumps.” This linear formatting of sounds and associated meanings is imposed by the medium, by the fact that the sounds in utterances unfold over time. The medium of articulated sounds also imposes structure in the forms that are linearly arranged, as illustrated in the acoustic waves of Figure 1.2. In viewing the patterns in this figure, one might ask what possible mental or “rational” principle requires the cyclical closing and opening motions of the vocal tract to create syllable-like pulses of air, harmonic patterns, alternating long and short cycles marking rhythms, or shifts in fundamental frequency (F0) that accompany breath-divided utterances. These observable structures and units can be explained by reference to mechanisms of the motor-sensory medium (as detailed in Part III of this book), and serve spoken-language communication. But they cannot be understood as products of the mind. As for the arbitrariness of signs and the “symbolic” aspect of spoken language, Saussure did not consider how the uniquely human capacity for symbolic language links to an equally unique ability to produce and control orally segmented vocalizations (a point developed in Chapter 2). This is understandable given the state of research in comparative anatomy at the turn of the twentieth century. But the issue here is how such observations continue to be overlooked under the assumption of a speech–language division. There are historical reasons for the persistence of this viewpoint, which has created

How We Are Introduced to the Study of Spoken Language

25

Figure 1.2 Observable acoustic structures of an utterance: Cyclical patterns in energy contours, temporal groupings, tonal contours, and both amplitude and pitch declination across breath units of speech.

a firewall against invalidating evidence, as Jackendoff (2009) has remarked. One critical point of disconnect between theoretical concepts of language analysis and empirical observations occurred with the rise of phonology. 1.3.3

On the Primacy of Linguistic Criteria: The Historical Disconnect from Instrumental Observations

In considering language analysis, one might wonder why readily observed structures, as visualized in Figure 1.2 and which are universal in speech, are not generally viewed as “design features” of spoken language (Hockett, 1960). A principal historical reason for this oversight may be that the structural patterns are not easily transcribed. On this point, it is essential to note that, when methods of linguistic analysis were being devised, the basic instruments used today to record and view speech signals were not widely available. Saussure, for instance, was a philologist who worked on ancient texts and lectured on the reconstruction of proto-Indo-European. At the time, transcripts were the main data if not the only data on which to formulate an understanding of spoken language. Instruments that could visually record speech, such as the kymograph (Vierordt & Ludwig, 1855), were rarely used, and the first spectrograph only appeared decades later (Steinberg, 1934). But there was also some reticence as to the usefulness of instrumental observations, even amongst classical phoneticians. The reason was that both philologists and phoneticians were applying subjective classifications of speech sounds and many did not see the need for instruments (e.g., in 1913, Jespersen saw that “Les instruments qui permettent de mesurer les différences de position des organes et de représenter par des chiffres la tension des muscles ou l’amplitude des ondes [. . .] sont inutiles”; in Auroux, 2000, p. 511; and see Malmberg, 1972). A similar disinterest for instrumental methods prevailed amongst early phonologists of the

26

Questions of Ontology

Prague School in the 1920s, which included such influential authors as Trubetzkoy (1890–1938), Jakobson (1896–1982), and adherents like Martinet (1908–1999), Benveniste (1902–1976), among others (see Fischer-Jørgensen, 1975). How, then, did analysts come to conceptualize systems of sound opposition as put forward by Saussure without sound-recording instruments? A reading of early works offers a view of how the analyses were performed. At the time of Saussure’s Cours, philologists were engaged in comparing languages often using ad hoc alphabet-style symbols. With a view to reconstructing protoIndo-European, analysts of ancient texts focused on changes in sound distinctions, which were described using impressionistic articulatory categories. Terms like “stop,” “fricative,” “front,” “back,” “palatal” (etc.) were broadly used by both philologists and phoneticians like Passy (1859–1940) and Sweet (1845– 1912). But it is worth noting that these categories were subjectively established. Thus, when classical phoneticians categorized sounds according to articulatory place and mode, they did so by reference to what they felt they were doing when producing the sounds. Moreover, in an effort to standardize descriptions, Sweet (1877) suggested that different signs should be used only for “those broader distinctions of sound which actually correspond to distinctions of meaning in a given language” (p. 103), although this was also problematic. To illustrate the way speech was being described, consider how Sweet (1877) discussed the difficulty in segmenting sounds (in this case “glides”): Consonant-glides are more noticeable in French than in English, especially in stopcombinations, (strik[ʜ]t)=‘strict.’ Final voice stops often end in a voice-glide, (bag[ʌ]) =‘bague.’ In passing from (ɴ) to the next vowel the glide is generally formed so slowly as to be heard as a separate element, so that (ohɴ[i]oq)=‘oignon’ sounds like (ohɴjoq). Final (j) and (ɴ) end voicelessly, the glottis being opened at the moment of removing the tongue from the consonant position, so that (fijʜ) and (vIɴʜ) sound like (fij-jh) and (viʜjh). (p. 125)

In later attempts to standardize ad hoc signs, alphabet systems came to be used. However the choice of a letter system was circumstantial. In 1886, Passy headed a group of phoneticians and founded the Association phonétique internationale with the purpose of developing a transcription system for teachers of European languages (Auroux, 2000). In 1888, the association published the first version of the International Phonetic Alphabet, which came to be adopted by philologists and phonologists. The latter formally introduced a “commutation test” as a method for establishing the phonological system of a language. With this method, IPA letters are substituted in “minimal pairs” to identify “distinctive features” that differentiate literal meanings in paired lexemes. For instance, in English /bak/-/pak/, the commutation of b-p is seen to create a distinction in meaning in terms of a subjectively estimated “voice” feature. However, letters in such analyses were not merely a descriptive device. They took on a theoretical and psychological

How We Are Introduced to the Study of Spoken Language

27

status where features were thought to be actually bundled together at some mental level, in letter-like “phonemes,” if not in signals (see, e.g., early debates on whether phonemes were real elements in signals, “mental images” as Saussure called them, or “convenient fictions”: Twadell, 1935, contra Sapir, 1921, 1933). In considering these developments, the arbitrariness of the standard is not the problem. Many standards in science can be arbitrary to a degree. Rather, it is the choice of culture-specific signs that appears objectionable. There was no attempt to justify the use of alphabet symbols except by reference to their usefulness in teaching European languages (Auroux, 2000). The “International” system of the IPA set aside tone languages and a variety of “suprasegmental” marks that did not fit letters (a problem that was only partly addressed in later versions of the IPA). More fundamentally, there was no theoretical justification for restricting functional distinctions to minimal pairs of isolated words as represented in European-style dictionaries. By this standard, the use of letters in commutation tests overlooked the vast range of distinctions that could convey meaning differences in spoken language. This created problems in defining just what type of meaning phonologists were talking about. In fact, minimal pairs were restricted to distinctions in “literal” meaning (and see Coulmas, 1989). This was a determining point. At a time when the study of spoken languages was beginning to emerge as an academic discipline, the use of letter notations and phonological methods basically limited the object of study to what could be alphabetically transcribed. And then there was the problem of the validity of representing subjectively established features in successive letters, which prescribed that articulated sound features occur in bundles on a line as “one phoneme per letter” (Jones, 1929). Initially, there was the shared belief amongst philologists and phoneticians that their descriptions of speech sounds were “scientific” and captured actual articulatory and acoustic units. But following early instrumental observations, some authors expressed doubts. There were problems in deciding how to divide sounds into phonetic descriptions (as exemplified in the previous quote from Sweet). Instrumentalists like Rousselot (1897) and Scripture (1902) saw no basis in the continuous curves of their kymograph recordings for dividing speech beyond respiratory pauses. Both saw that, for the same letter representations, speakers of different languages and dialects (and even different age groups) produced variable sounds such that letter representations appeared illusory. On the other hand, these early works, which contained innumerable descriptions of waveforms, bore few explicit conclusions on speech structure or the validity of language analyses based on alphabetic script. A later critical report by Stetson (1928/1951) addressed classical phoneticians, Saussure, and Prague phonologists more directly. Stetson’s research demonstrated that an analysis of spoken language in terms of letter symbols on a line does not conform to the structural aspects of speech as seen in the “phase” of articulatory motions (i.e., the articulation of a vowel

28

Questions of Ontology

V does not start after a consonant C in producing CV). His report also showed that one could not segment speech into units smaller than syllable-like “ballistic pulses” – that is, one cannot halt speech during a pressurebuilding constriction of the vocal tract or the vocal folds such that speech minimally involves a cycle of motions that build and release pressure. Stetson explicitly outlined the implications for language analysts by quoting from works in phonology: To isolate and classify the essential “sounds” (phonemes) of a language, i.e. to assemble the phonetic alphabet of a language, was an important project for Sweet and F. de Saussure, and a primary enterprise for Trubetzkoy. This was the impulse that lay behind the International Phonetic Alphabet, the first achievement of the phoneme doctrine. Sweet noted that “sounds” differentiated words; at the hands of Saussure, Trubetzkoy and associates, the smallest phonic change in a word that shifts the meaning was made to indicate a new phoneme. “Dans le ‘projet de terminologie standardisée’ soumis à la Réunion Phonologique Internationale de 1930, on trouve les définitions suivantes: ‘Une opposition phonologique est une différence phonique susceptible de servir dans une langue donnée à la différenciation des significations intellectuelles; chaque terme d’une opposition phonologique quelconque est une unité phonologique; le phonème est une unité phonologique non susceptible d’être dissociée en unités phonologiques plus petites et plus simples.’ (Travaux du Cercle Linguistique de Prague, IV, p. 311).” It is apparent that the method involves a resort to meanings and also to observations of articulations. Certainly, in one form or another, differentiating the significant articulations by differences of meaning will be important to any system. It is unfortunate that the scholars who have made the most of the method have not only ignored the syllable but have also insisted that the articulation occurs in the separate, concrete world of la parole, while the phoneme symbol occurs in the separate, ideal world of la langue; and so the phonemicists have made a virtue of their ignorance of experimental methods of observation. (p. 136)

More importantly, Stetson cautioned language theorists on the writing bias underlying a conceptualization of speech as sequences of letter-like phonemes, and indicated why these units may be impossible in terms of temporal constraints on speech processing. The series of characters which we read and write as representing an articulate language have given us a mistaken notion of the units which we utter and which we hear. The series of speech units cannot correspond to the series of “sounds” or “phonemes” set down on paper. Even “slow, careful utterance,” let alone the rapid utterance of everyday, is much too fast for that. The maximum rate at which articulations can be uttered is 10–12 per sec.; and the maximum rate at which auditory signals can be identified is 14–16 per sec. A [transcribed] syllable of speech often indicates 2–7 phonemes; “do” has two, “tree” three, “quilt” five, “squelched” seven; the slow, careful rate of utterance is often 4–5 syllables per sec. Thus phonemes are often indicated 15–25 per sec. Obviously the phonemes are not uttered (or heard) one after the other; there must be

How We Are Introduced to the Study of Spoken Language

29

extensive overlapping, as physiological tracings prove. The consonants do not prove to be separable units, they must have breath pressure behind them; the pressure is supplied by the pulses of the syllables in which they function. A consonant cannot be pronounced alone; it is always a characterized factor in some syllable. The supposed consonant “elements”, which naive teachers assume that they are uttering, prove to be syllables with pulses from the chest, but often with the vowel shape unvocalized. (p. 137)

The idea of syllable-like pulses or cycles is developed later on in this monograph, but it is useful to note that Stetson’s wording of “syllables with pulses from the chest” suggested that the pulses link to respiratory mechanisms. Studies using electromyography (EMG) have shown, to the contrary, that expiratory processes are largely passive during normal speech (Draper, Ladefoged, & Whitteridge, 1959; Ladefoged, 1962). According to Ohala (1990, p. 30), Stetson likely misinterpreted observations of pressure variations in the lungs as indicating an active factor when, in fact, the variations reflect a resistance to expiratory flow created by cyclical articulatory motions upstream (see also Boucher & Lamontagne, 2001). On the other hand, several studies support Stetson’s main claim: speech cannot be segmented in units smaller than syllable-like pulses seen as coherent cycles of pressure-building and releasing motions where the latter can be voiced or unvoiced. Research in the 60s confirmed that C and V motions are not timed as successive phonemes, but “co-articulated” in coherent CV-like units (MacNeilage & DeClerk, 1967, and see Chapter 4). Furthermore, studies combining EMG and photoglottography confirm cyclical motions and pulses as basic units of production. Thus, sounds that can be produced in isolation, like vowels and “syllabic” consonants, minimally involve a pressure-rising constriction of the vocal folds prior to a voiced release (Hirose, Sawashima, & Yoshioka, 1980, 1983). Stop consonants like [p, t, k, b, d, g], however, cannot be produced in isolation. Articulation of these sounds involves unitary cycles where a pressure-building oral closure entails a releasing motion, and this is observed in CV, post-vocalic contexts as in VC, CVC, and complex clusters where the oral release can be voiceless, or voiced as brief schwas (e.g., Ridouane, 2008). The latter findings, incidentally, suggest that IPA notations and orthographic notions of VC and CVC as single syllables misrepresent the fact that these sequences are produced as two syllable-like cycles of motion each involving separate bursts of EMG activity (Boucher, 2008, and see the demonstrations in Part III). Stetson’s view of unitary cyclical pulses can account for several phenomena such as why certain consonant motions can create syllable beats, or the particular coherence of CV patterns (MacNeilage & DeClerk, 1967), discussed in subsequent chapters. As for the remark on the impossibility of phoneme-by-phoneme processing in production and perception, Stetson’s claim of a maximal rate of about 100 ms/syllable conforms to an observed upper limit on the production of separate sounds (Kent, 1994; Knuijt, Kalf, Van Engelen et al., 2017). In terms of acoustic perception, an extensive review by Warren (2008) indicates that order perception for unstructured

30

Questions of Ontology

speech sounds such as sequences of vowels or digits separated by silences is possible for rates no lower than 100 ms per item. Sequences in which items are artificially compressed to about phoneme-size elements can create auditory illusions and confusions in perceived order. If such observations are confirmed for structured speech, they would further substantiate Stetson’s original critique of phonemes as processing units. But findings such as those reported by Stetson were not a serious concern for language theorists of the day. In overlooking instrumental findings, a favored argument of early phonologists that is often repeated in introductory texts, was that instrumentalists could describe speech sounds to no end. Consequently, some saw instrumental observations as having to be guided by considerations of the functional aspects of sounds serving to communicate meaning distinctions. This argument of the “primacy of linguistic criteria” was present in Saussure’s writings, in early phonetics (see Sweet’s notion of “broad” transcriptions), and was essential to the constitution of phonology as a separate subfield of language study. As Trubetzkoy (1939/1971) stated: “It is the task of phonology to study which differences in sound are related to differences in meaning in a given language, in which way the discriminative elements [. . .] are related to each other, and the rules according to which they may be combined into words and sentences” (p. 10). For some phonologists this meant that speech observations had to be hypothesis driven, they had to follow assumptions of linguistic analyses and the criteria of distinctiveness because “nothing in the physical event . . . tells us what is worth measuring and what is not” (Anderson, 1985, p. 41, quoted by Lindblom, 2000; see also Anderson, 1981). It is the case that, to communicate using spoken language, sounds need to be distinctive. But as noted, nothing in the principle of distinctiveness provides an epistemological justification for restricting “features” to sounds that can be transcribed, or to indices that create differences in literal meaning, such as in analyses of minimal pairs of “words” as listed in dictionaries. Nor does this principle serve to justify the view that distinctive features occur in letter-like segments, or any other extent of speech for that matter. The issue of sound segmentation is an empirical question bearing on how features are produced and perceived along a time axis. It is an issue that was not resolved by the choice of alphabet signs to represent speech. On the contrary, in the history of language study, the decision to analyze speech as sequences of letters said to represent phonemes finds no “phonological” or experimental motivation. Yet in choosing to represent sound features with letters, the notion of letter-like packets of features gained a theoretical status, even though this choice was entirely circumstantial. In fact, when the field of language study began to consolidate in the latter part of the nineteenth century, various transcription systems were available. Some systems, like Bell’s Visible Speech, which became popular in some circles, did not involve letters at all (Abercrombie,

How We Are Introduced to the Study of Spoken Language

31

1967, pp. 118–119). Syllabaries were also available. But the only justification found in the literature for adopting alphabet signs is that they were useful for teaching European languages to learners who already knew an alphabet (see the accounts of early IPA meetings: Auroux, 2000). 1.3.4

Explaining Systems of Distinctive Features: Lindblom’s Demonstration (1986)

Saussure argued that one cannot comprehend a foreign language because one lacks the mental code for interpreting sounds. Such an effect, as he saw it, has nothing to do with the mechanisms for sound production and perception. For Saussure and early phonologists, systems of distinctive features and arbitrary (symbolic) sound–meaning associations reflect a mental capacity that can imprint “form” onto the “substance” of sounds. But when it comes to explaining how feature systems and symbolic signs emerge, a reference to a language capacity or faculty is rather vacuous. As previously mentioned, the doctrine that language reflects an autonomous function separate from speech arose historically when writing was practically the only means by which to record and analyze speech. The doctrine also took hold in a period when language was associated with a separate function located in Broca’s area. But claiming that phonological systems emerge from a cortical area of the brain leads to the odd question of how the brain on its own can possibly create sound features. On the other hand, analyzing speech via acoustic or physiological records instead of script can reveal constraints on motor and sensory processes that contribute to the formation of a feature system. Within this view, not all researchers adhere to the doctrine of a speech– language (or performance–competence) division and perhaps the first demonstration that processes of speech communication can shape features systems was a report by Lindblom (1986; see also Lindblom, 1999). Lindblom’s study focused on the acoustics of vowels in different languages, as described by Trubetzkoy (1939), and attempted to relate the varying complexity of vowel systems to properties of sensory processes. Rather than analyzing transcripts, Lindblom examined the auditory mappings of “vowel” features using psychoacoustic scales and a model that served to derive feature systems of differing complexity. The guiding hypothesis was that sounds used in spoken language tend to evolve so as to make speech communication efficient and intelligible. This would be facilitated by the development of “maximal perceptual differences,” a principle that is partly attested in cardinal vowels (all languages exploit the sounds [i, a, u], or close neighbors, which are maximally distanced in terms of both articulatory and psychoacoustic parameters). To evaluate how this principle could shape vowel systems, Lindblom first performed acoustic-to-perceptual conversions using two approaches. One

32

Questions of Ontology

approach (F) used asymmetric filters that modeled the response of the human auditory system. The other approach (L) used a conversion based on psychoacoustic scales. An algorithm, which calculated the maximal Euclidean distance, was then applied to a two-dimensional metric of formants (or F1–F2 space) referenced to [i] so as to derive sets of vowels of increasing number. Finally, the predicted vowel sets were compared to existing systems described in linguistic analyses. Table 1.1 provides examples of the results. Although Lindblom viewed his data as preliminary, Table 1.1 shows similarities between predicted and described vowel sets of existing languages with a number of nearly exact matches (the discrepancies may in fact bear on how one transcribes sounds in a F1–F2 space). It should be noted that Lindblom’s study focused on “possible vowels” as defined by methods such as minimal pairs. Table 1.1 Part of Lindblom’s (1986) predictions of “vowel systems” using a maximal perceptual distance metric and two acoustic-to-auditory mapping approaches: (F) using a model of auditory filters; (L) using psychoacoustic scales.

How We Are Introduced to the Study of Spoken Language

33

However, as already mentioned, there is no epistemological principle that justifies the assumption that functional distinctions in spoken language are limited to letters of the IPA, or literal meaning of minimal pairs of words as represented in dictionaries. Numerous acoustic indices can cue different interpretations of utterances, and these indices far exceed the features proposed in conventional language analysis (Port, 2007; Port & Leary, 2005). Indeed, “possible indices” can cover the range of perceptual and motor-control capacities. But the point here is that Lindblom’s data reveal that processes of modality can shape the feature systems used in spoken languages. Ontologically, this refutes the Saussurean doctrine according to which feature systems are a product of a mental function that imprints form on sounds. By overlooking the structural effects of sensory processes of the speech modality, this mentalist concept limits the development of an understanding of the emergence of distinctive sounds in communication, and this extends to a broad range of mechanisms, some of which can account for the rise of symbolic language in humans. On the latter topic, for Saussure, as for many researchers, the fact that sound– meaning associations are arbitrary (symbolic) is seen to imply that spoken language reflects a mental capacity that can generate special signs regardless of the modality of expression. Thus, a popular argument originally worded by Saussure, but often used to illustrate a competence–performance division, is that “language” can be externalized in modalities other than speech, such as gestural sign languages (Chomsky, 1965, 2000, 2006; Hauser, 2016; Hauser, Chomsky, & Fitch, 2002; Pinker, 1994b, and others). Such influential arguments reaffirm the notion of an autonomous, amodal language function that is separate from speech. This Cartesian viewpoint, arising from a tradition of conceptualizing spoken language through script, underlies several hypotheses on the origin of symbolic language as a modality-independent capacity. Yet these hypotheses often overlook that the symbolic aspect of vocal communication, as a distinctly human trait, links to the equally unique ability in humans to control orally modulated vocalizations. Recognizing this link not only helps to clarify the pseudo-puzzle of a putative language competence that is detached from motor-sensory processes of performance. It is also an essential step in addressing the issue of the ontological incommensurability between observations and language theory. Inasmuch as researchers assume that spoken language involves a mental capacity that is distinct from motor-sensory processes of expression, investigations will focus on the specificity of cognitive operations and view processes of modality as largely irrelevant. For this reason, the next chapter offers an extensive review of theories on the origin of symbolic language. This is followed by a laboratory demonstration of the role that vocal processes play in the rise of symbolic signs in human communication.

2

The Modality-Independence Argument and Storylines of the Origin of Symbolic Language

The following discussion draws from Boucher, Gilbert, and Rossier-Bisaillon (2018) who reviewed several accounts of the rise of symbolic language in humans (with some additions in reference to the case of sign languages). There is a vast literature on the origin of spoken language, much of which offers diverging viewpoints with few areas of consensus. For instance, there is no agreement in this literature on how to define “language” (Christiansen & Kirby, 2003). On the other hand, it is widely accepted that a fundamental feature of language is its symbolic function and that, aside from humans, no other species has developed vast systems of signs such as those that appear in spoken language. Indeed, for some, humans are the symbolic species (Deacon, 1997). However, interpreting such claims rests on how one defines symbols, and the processes by which they have evolved. The following aims to clarify these processes within evolutionary theories while offering a demonstration of how the ability to articulate sounds is an essential factor in the rise of symbolic language. For some readers, this particular ability may seem to be an obvious factor in the development of symbolic signs. But general definitions of symbols often overlook processes of expression and how they contribute to the formation of signs. In fact, many evolutionists, especially those who refer to nineteenthcentury theorists like Saussure (1916/1966) and Peirce (1998), define “symbols” principally as arbitrary associations between signals and concepts of objects or events (i.e., “referents”). Many also recognize that symbolic associations can operate from memory when designated referents are not in the immediate context of communication (a feature that Hockett, 1960, called “displacement”). These criteria are useful in distinguishing symbols from other types of signs that operate as icons or indices. The latter involve a non-arbitrary resemblance or actual physical connections to referents, whereas nothing in the attributes of symbols provides a clue as to their interpretation (Deacon, 2012, p. 14; Stjernfelt, 2012). However, Saussure’s and Pierce’s definitions, which focus only on an arbitrary association, can lead to symbols being conceptualized as mental constructs, unrelated to modalities of expression. It will be recalled that Saussure saw language as reflecting a separate mental capacity 34

The Modality-Independence Argument

35

that could generate symbols in any modality such as speech or gestural signs (see Saussure, 1916/1966, pp. 10–11). This idea has had a lasting influence, especially on linguistic theory, where language is often seen to reflect a mental capacity that has little to do with modalities of expression (Chomsky, 1965; Hauser et al., 2002). But if this is the case, why does symbolic language develop primarily in a vocal medium? In focusing on this question, the following discussion draws attention to a body of work in primatology that has failed to uncover any distinct mental ability that could account for symbolic language arising in humans. On the other hand, as outlined in the following, humans are the only primates that possess cortical control over vocal signals and oral structures serving to modulate vocalizations, which tends to undermine the doctrine that symbolic language arose from an amodal cognitive capacity. A review of this doctrine, which underlies several theories on the origin of spoken language, serves as a background to a demonstration of an opposing modality-dependent principle where symbolic language is seen to intertwine with the ability to articulate sounds. Such a demonstration reflects the approach of a group of studies in which evolutionary scenarios are submitted to critical laboratory experiments and computer simulations (as in Gasser, 2004; Monaghan & Christiansen, 2006; Monaghan, Christiansen, & Fitneva, 2011; Monaghan, Mattock, & Walker, 2012; Monaghan, Shillcock, Christiansen et al., 2014; Oudeyer & Kaplan, 2006; Williams, Beer, & Gasser, 2008). 2.1

Cognitive Skills as Insufficient Factors in the Rise of Symbolic Communication

In reviewing hypotheses for the origin of spoken language, it is important to acknowledge that several cognitive abilities and neural processes that were thought to underlie symbolic communication in humans have since been observed in other primates. In particular, it has been established that, with training, apes can learn vast sets of visual symbols and can combine these productively (e.g., Lyn & Savage-Rumbaugh, 2000; Savage-Rumbaugh, Murphy, Sevcik et al., 1993; Savage-Rumbaugh & Rumbaugh, 1978; SavageRumbaugh, Shanker, & Taylor, 2001). Follow-up studies have documented that chimpanzees and bonobos, raised in symbol-rich environments, can develop a vocabulary and utterance complexity similar to those of three-year-old children (Lieberman, 1984, ch. 10; see also Gardner & Gardner, 1969; Gardner & Gardner, 1985; Pedersen, 2012). There are also reported cases where chimpanzees have acquired American Sign Language (ASL) exclusively and only through communicating with other ASL-trained chimpanzees (Gardner & Gardner, 1985). Moreover, brain-imaging research indicates that, in apes, monkeys, and humans, associative memory in symbol learning

36

Questions of Ontology

involves similar neurological structures (e.g., Eichenbaum & Cohen, 2001; Squire & Zola, 1996; Wirth, Yanike, Frank et al., 2003). Other symbol-related abilities have been found to extend to nonhuman primates despite continuous claims to the contrary. Of note, there is the capacity to create hierarchical or embedded combinations of signs, a process known as “recursion,” which was claimed to reflect a uniquely human syntactic capacity (Fitch, Hauser, & Chomsky, 2005; Hauser et al., 2002; cf. Hauser, 2016). Some have also maintained that a related ability to combine symbols based on conceptual relationships, a property that some proponents of generative syntax call “Merge,” is distinctly human (Bolhuis, Tattersal, Chomsky et al., 2012; Chomsky, 1972; cf. Lieberman, 2016). However, Perruchet and Rey (2005) have demonstrated that chimpanzees can learn to generate embedded sequences of given symbols (and see Perruchet, Peereman, & Tyler, 2006; Perruchet, Tyler, Galland et al., 2004). Additionally, research has shown that monkeys can distinguish acoustic cues in speech (as discussed by Belin 2006), and manifest a “statistical learning” of speech sounds (Hauser, Newport, & Aslin, 2001). Several reports have further shown that nonhuman primates can process combinations of symbols based on conceptual relationships (contra, e.g., Arbib, 2015; Clark, 2012; Fitch et al., 2005; Hauser, Yang, Berwick et al., 2014; Jackendoff, 2002). Thus, seminal work by Savage-Rumbaugh et al. (Savage-Rumbaugh, McDonald, Sevcik et al., 1986; Savage-Rumbaugh & Rumbaugh, 1978) has revealed that training chimpanzees on paired symbols designating items and actions (“drink” and “liquids” versus “give” and “solid foods”) facilitates the learning of combinations of novel signs. In other words, the individuals more easily acquired pairs when action symbols correctly matched signs for types of foods, which implies a processing of signs in terms of their conceptual relationships (Deacon, 1997, p. 86). More recently, Livingstone, Pettine, Srihasam et al. (2014) trained rhesus monkeys on symbols representing distinct numbers of drops of a liquid (implying a coding of magnitude). On tests involving combinations of these learned symbols, the individuals not only showed a capacity to process the relative values of signs within a context, but also transferred these subjective valuations to new symbols, suggesting a capacity to process combined signs in terms of novel relationships. Livingstone et al. have also outlined that the value coding of signs involves similar neural processes in humans and monkeys that implicate dopamine neurons as well as interactions between the midbrain, the orbitofrontal cortex, and the nucleus accumbens (Livingstone, Srihasam, & Morocz, 2010; Livingstone et al., 2014; for a critical review of other findings of this type, see Núñez, 2017). Finally, it was also believed that only humans possessed the ability to imitate, while apes emulate behaviors (e.g., Fitch, 2010; Heyes, 1993; Tomasello, 1990, 1996). Emulation has been characterized as entailing a learning of the effects of

The Modality-Independence Argument

37

actions, rather than a copying of bodily motions. A limited capacity to imitate was thought to hinder the cultural transmission of communicative signs and tool use. Even so, some studies have shown that nonhuman primates can mimic the actions of their conspecifics and also learn to produce their calls and symbols (Gardner & Gardner, 1969; Gardner & Gardner, 1985; Pedersen, 2012; Sutton, Larson, Taylor et al., 1973). Recent research has made it clear that apes can selectively apply a range of social learning processes. This includes deferred imitation as well as the ability to follow eye-gaze and direct attention by eye gaze and pointing (for a review, see Whiten, McGuigan, Marshall-Pescini et al., 2009). Other related claims that only humans have “shared intentionality” and an advanced “theory of mind” (e.g., Pinker & Jackendoff, 2005) have been questioned in studies of apes reared by humans (Bulloch, Boysen, & Furlong, 2008; Leavens, Racine, & Hopkins, 2009, see also Call & Tomasello, 2008; contra Penn & Povinelli, 2007). In short, research over the past decades has revealed that, contrary to previously held assumptions, nonhuman primates possess mental abilities that serve to learn and process symbols. But the fact remains that monkeys and apes in the wild do not develop repertoires of symbolic signs like those used in spoken language (e.g., Pika, Liebal, Call et al., 2005). To illustrate the kind of signs that arise in nonhuman primates, one can consider the often-cited case of vervet monkeys that use distinct signals to communicate the presence of different predators (Cheney & Seyfarth, 1990; Price, Wadewitz, Cheney et al., 2015; Seyfarth, Cheney, & Marler, 1980; for similar referent-specific signals in apes, see Crockford & Boesch, 2003). It has been argued that these signs are symbols based on their seeming arbitrariness (Price et al., 2015), though some critics reject this interpretation (Deacon, 2012; Hauser et al., 2002). In fact, vocal signs in apes and monkeys appear to be largely indexical in that they reflect reactions to referents in the signaling context (see also Cäsar, Zuberbühler, Young et al., 2013; Crockford & Boesch, 2003; Hobaiter & Byrne, 2012). Thus, research tends to confirm that, while monkeys, apes, and humans share the cognitive abilities required to learn and use signs, only humans can develop vast systems of symbols, and they do so primarily in a vocal medium. One implication of these results is that, even though cognitive abilities can be essential prerequisites in acquiring and manipulating symbols, some other capacity is needed to account for the emergence these types of signs in vocal communication. 2.2

The Case against Modality-Independent Accounts of Symbolic Language

Compared to cognitive skills, the capacity to articulate vocal patterns stands as an obvious human-specific trait. Yet, in the literature on the origin of language,

38

Questions of Ontology

many researchers have been guided by the belief that language emerged in a gestural medium. Several findings have motivated this view, which implies a conceptualization of “language” as an amodal function. One pivotal finding relates to the lack of voluntary control of vocalization in nonhuman primates. In particular, studies by Jürgens (1976, 1992) showed that the brains of monkeys and apes lack monosynaptic fibers that link the motor cortex to the laryngeal-muscle motoneurons in the nucleus ambiguus. Although some question Jürgens’ original results (Lieberman, personal communication), subsequent comparative observations confirm this direct cortical connectivity to the nucleus ambiguus in humans (Iwatsubo, Kuzuhara, Kanemitsu et al., 1990; Kumar, Croxson, & Simonyan, 2016; Simonyan, 2014; Simonyan & Horwitz, 2011). Such findings concur with the poor control over reactive vocalizations in these species (Jürgens, 2002, pp. 242–245; Simonyan, 2014, and see especially Kumar et al., 2016). The nervous systems of monkeys and apes do, however, present direct monosynaptic projections to the motoneurons associated with the control of finger muscles, and to jaw- and lip-muscle motoneurons in the trigeminal and facial nuclei (Jürgens, 2002; Lemon, 2008; Sasaki, Isa, Pettersson et al., 2004). Compared to humans, though, there are fewer direct connections to tongue-muscle motoneurons in the hypoglossal nucleus (Jürgens, 2002; Kuypers, 1958), which aligns with the paucity of oral segmentations or syllable-like patterns in the calls of nonhuman primates (Lieberman, 1968, 2006a; Lieberman, Laitman, Reidenberg et al., 1992). Taken together, these observations may have led many to believe that, since apes and monkeys produce vocal signals as inflexible reactions, symbolic language instead evolved from controllable hand gestures (e.g., Arbib, 2005, 2012, 2015; Arbib, Liebal, & Pika, 2008; Corballis, 2002, 2009; Gentilucci & Corballis, 2006; Hewes, 1996; Pollick & de Waal, 2007; Tomasello & Zuberbühler, 2002). This held belief, however, conflicts with the general observation that gestural signs produced by nonhuman primates do not function symbolically, and are mostly iconic or indexical. Thus, theories of the gestural origin of language are not supported by observations of extant species. Instead, the theories refer to indirect evidence interpreted in terms of storylines that basically suggest that the Last Common Ancestor (LCA) of the homo and pan (chimpanzee and bonobo) genera had, at some point, developed symbolic gestures from which spoken language evolved. This view, popularized by authors like Corballis and Arbib as outlined in the following, has been criticized on fundamental grounds. One objection bears on the theoretical significance given to “sign languages” (further discussed in Section 2.3). Proponents of gestural theories frequently refer to sign languages as illustrating the possibility of a gestural stage in language evolution (e.g., Arbib, 2012; Corballis, 2002, ch. 6). As such, this interpretation adheres to a Saussurean view of language as deriving from an

The Modality-Independence Argument

39

amodal faculty (and see Chomsky, 1965, 2000, 2006; Hauser, 2016; Hauser et al., 2002; Pinker, 1994b). For example, gestural signs are seen to support the idea that “the language faculty is not tied to specific sensory modalities” (Chomsky, 2000, p. 121), and that “discoveries about sign languages [. . .] provide substantial evidence that externalization is modality independent” (Chomsky, 2006, p. 22). However, such claims repeatedly disregard the fact that there is no known case where a community of normal hearers develops sign languages as a primary means of communication (Emmorey, 2005). Said differently, gestural signs as a primary system generally appear where people share a pathology affecting the hearing modality, which hardly supports the notion of an amodal language capacity. On the contrary, it suggests that, given the normal human ability to control both vocalization and gestures, symbolic communication links to a vocal-auditory modality with visual gestures having an accessory imagistic role (McNeill, 2005). It follows that an account of the rise of spoken language requires an explanation of how and why symbolic signs link to the vocal medium. But this is not the perspective of a gestural account of language. Instead, the assumption of the gestural origin of language leads to postulate an evolutionary shift from gestures to vocal signs, which presents a conundrum for evolutionists. As a brief explanation of the inherent problems of this latter hypothesis, one can refer to the theories of Corballis (1992, 2002, 2003, 2009, 2010; Gentilucci & Corballis, 2006), and Arbib (2005, 2011, 2012; Arbib et al., 2008). A critical claim of these proposals is that left-lateralized control of hand gestures in area F5 of a monkey’s cortex evolved into left-sided dominance for language, which the authors locate in Broca’s area (on this classic locationist view, see Section 1.3.1). Corballis and Arbib also refer to research showing that mirror neurons in F5 discharge when a monkey observes hand motions in others (e.g., Ferrari, Rozzi, & Fogassi, 2005; Rizzolatti & Arbib, 1998). Both authors see in these responses a mechanism of action understanding, and they conjecture that mirror neurons played a role in the shift from a gestural to a vocal modality of communication. As to how this shift occurred, it is speculated that, when the LCA developed bipedalism, there was a freeing of the hands allowing for the development of expressive manual signs. At first, this led to putative iconic pantomimes that became conventionalized and symbolic (at a “protosign” stage), before “protospeech” developed. However, it is difficult to find in this narrative any working mechanism that converts gestures to vocal signs. For example, Arbib recently explained that pantomimes created an “open-ended semantics” that “provides the adaptive pressure for increased control over the vocal apparatus” (Arbib, 2015, pp. 612–613; see also 2012). By this account, the semantics of protosigns “establishes the machinery that can then begin to develop protospeech, perhaps initially through the utility of creating novel sounds to match degrees of freedom of manual gestures (rising pitch could represent an upward

40

Questions of Ontology

movement of the hand)” (2015, p. 613). Thus, the core explanation in this view is that semantics drove the evolution of vocalization, and the pairing of physiological parameters of hand control with those of articulatory, laryngeal, and respiratory systems of (proto-) speech. In this account, the example of iconic signs (of pitch increase and hand rising) hardly helps to understand how vast systems of symbolic signs emerged. Critics have also questioned whether any realistic model can be devised to “translate” hand motions into sequences of vocalized sounds (MacNeilage, 2010, pp. 287–288; Hewes, 1996). In weighing the aforementioned scenario, one should note that it is based on the claim that left-sided control of hand motions in a monkey’s cortex overlaps mirror neurons and Broca’s area in humans. It has been reported, however, that activity in mirror neurons during the perception and production of hand motions is not left-lateralized (Aziz-Zadeh, Koski, Zaidel et al., 2006). But more generally, one might question the a priori validity of theories where semantics is seen as driving the evolution of physiological mechanisms of vocalization. Such views are not limited to gestural accounts. They extend to a variety of theories that focus on cognitive skills while overlooking modalityspecific constraints on motor processes of expression and how they shape signals and signs. In this orientation, it is as if symbol systems arise from some amodal mental function. For example, Deacon (1997, 2012) suggests an account whereby signs created by apes (and children) first appear to have an indexical function. Then, when the signs are logically combined, they become more symbolic (Deacon, 1997, ch. 5). But again, apes in their natural environments do not develop symbolic communication despite a mental capacity to combine given signs (as per the experiments of Savage-Rumbaugh et al., among others). So the question remains: how did symbolic signs emerge for humans, and why is this linked primarily to the vocal medium? On these questions, Deacon (1997) basically offers a circular explanation: “[l]anguage must be viewed as its own prime mover. It is the author of a co-evolved complex of adaptations arrayed around a single core semiotic innovation” (p. 44). Aside from core semantics or semiotic functions, some theories also hold that the evolution of symbolic communication was driven by socio-cognitive functions. For example, the theory of “interactional instinct” suggests that, in language acquisition as in language evolution, children signal their intention to do something, and “[t]he intent becomes a symbol that the child expresses in an emotionally based interaction” (Lee & Schumann, 2005, p. 7). In related proposals, constellations of mental skills including “shared intentionality,” “perspective-taking,” “comprehension,” along with “thought processes” (Ackermann, Hage, & Ziegler, 2014) and “purpose” (Deacon, 2011), are evoked as driving factors. These accounts collectively imply what some have called “mentalistic

The Modality-Independence Argument

41

teleological” principles (Allen, 1996/2009), which do not accord with accepted features of evolution. In particular, one basic feature holds that evolution reflects biological change in relation to physical aspects of the environment (see Christiansen & Chater, 2008). Accepting this, it is difficult to fathom how the evolution of the anatomical structures of vocal communication would be driven by “semantics,” “thoughts,” “intentions” (etc.), without some basis in the sensory effects of physical signals and physiological processes of signal production. Indeed, if one defines symbols as entailing associations between concepts or memory of experiences and signal elements, then the notion that symbolic language arose from amodal mental factors presents a contradiction in terms in that, in the absence of a modality of communication, there are no signals with which concepts can associate (but cf. the pronouncement that language has little to do with communication: Chomsky, 2002, pp. 76–77; 2012, p. 11). 2.3

Modality-Dependent Accounts of the Rise of Symbolic Language

2.3.1

Mimesis, Procedural Learning, and the Case of Sign Languages

Contrary to the just-stated viewpoint, several authors share the view that language first evolved in the vocal modality and that this medium imposes particular constraints on the formation of signs (e.g., Lieberman, 1984, 2000, 2003, 2007, 2016; MacNeilage, 1998, 2010; MacNeilage & Davis, 2000; Studdert-Kennedy, 1998). This basically refutes claims that gestural sign languages and spoken languages have similar structures reflecting an amodal mental capacity (e.g., Bellugi, Poizner, & Klima, 1989). Thus, MacNeilage (2010, ch. 13) explains that vocal and gestural modalities have very different structuring effects on signs. In particular, gestural signs are holistic and can involve simultaneous hand and body motions whereas, in spoken language, sounds are strictly sequential, and are constrained by articulatory and respiratory–phonatory systems. This viewpoint is rarely echoed in studies of emerging sign language, although some explicitly refer to the differences between spoken and gestural signs. Most revealing is the work by Sandler (2012, 2013, 2017) on Al-Sayyid Bedouin Sign Language (ABSL), which has led to expressed reservations both on gestural accounts of language origin and linguistic methods of analysis that dominate research on sign languages. Traditionally, the study of sign languages has relied on linguistic analyses and assumptions that focus on abstract phonological features, “words,” and syntactic categories, but that entirely neglect modality-specific differences between gestures and articulated sounds (MacNeilage, 2010). Sandler et al. noted several difficulties in attempting to shoehorn gestures into features and units of phonological analysis and payed attention to the development of

42

Questions of Ontology

gestures as such (Sandler, 2012, 2013, 2017; Sandler, Aronoff, Padden et al., 2014). Their study of ABSL revealed a correspondence between developing gestural signs and the complexity of the expressed concepts, which tends to depart from the doctrine that spoken and gestural systems can develop from a common modality-independent function. As Sandler (2012) remarked, in commenting on Arbib’s gesture theory of language origin: [A] different motor system controls language in each modality [gestural and vocal], and the relation between that system and the grammar is different as well. Considering the fundamental differences in motor systems, I am mindful of the reasoning of experts in the relation between motor control and cognition (Donald 1991; MacNeilage 2008) who insist on the importance of the evolution of the supporting motor system in the evolution of language to the extent that “mental representation cannot be fully understood without consideration of activities available to the body for building such representations [including the] dynamic characteristics of the production mechanism” (Davis et al., 2002). (p. 194)

Although Sandler did not pursue the question of how motor processes shape gestural and spoken signs differently, the author recognized the implications for modality-independent accounts of language popularized in linguistic theory: [M]any believe that sign languages rapidly develop into a system that is very similar to that of spoken languages. Research charting the development of Nicaraguan Sign Language (e.g., Senghas 1995; Kegl et al., 1999), which arose in a school beginning in the late 1970s, has convinced some linguists, such as Steven Pinker, that the language was “created in one leap when the younger children were exposed to the pidgin signing of the older children . . .” (Pinker 1994: 36). This is not surprising if, as Chomsky believes, “. . . language evolved, and is designed, primarily as an instrument of thought, with externalization a secondary process” (Chomsky 2006: 22). [. . .] Our work on ABSL, of which the present study is a part, suggests that those of us who contributed to this general picture may have overstated our case. (Sandler, 2012, pp. 35–36)

Indeed there are essential differences between gestural and spoken signs that are systematically omitted in linguistic descriptions. Acoustic signs produced by articulatory motions that modulate air pressure hardly have the same structure as visual signs produced by motions of the hands. Constraints on the motor systems are vastly different. However, in analyzing both gestural and spoken systems through linguistic concepts of letter-like phonemes, words, phrases (etc.) and associated grammatical categories, the differences in the structural and motor-sensory attributes of the systems are lost and it will seem, from linguistic descriptions, that both reflect some abstract symbolic function operating on similar concepts. In reality, the signs are not the same and do not develop in the same way. In particular, compared to the pervasiveness of symbolic signs in spoken language, gestural sign languages are highly iconic and develop differently from spoken language (Taub, 2001; Pietrandrea, 2002). Recent research has revisited the role of iconicity in traditional accounts of

The Modality-Independence Argument

43

vocabulary development in deaf children. This research has established that, contrary to earlier reports and the views of linguistic theorists, the first signs acquired by deaf children are iconic, in contrast to the inherently symbolic verbal signs that arise with the babble of hearing children (for a recent critical review, see Ortega, 2017; on the role of babble: see McGillion, Herbert, Pine et al., 2017, and Section 2.3.3). In sum, the motor-sensory aspects of gestures and speech contribute to the development of contrasting types of signs. But beyond these fundamental differences, storylines suggesting that vocal symbols originate from manual signs face the intractable problem of devising a scheme that would convert the physiological parameters of hand motions to parameters of sound-producing motions involving oral, laryngeal, and respiratory systems. While such problems point to an inherent link between symbolic language and vocal processes, not all modality-specific accounts deal with the rise of symbolic signs. As an example, Lieberman’s proposal focuses on how spoken language (viewed as a syntactic or combinatorial system) links to the evolution of speech processes in conjunction with the basal ganglia and its function as a “sequencing engine” (e.g., Lieberman, 2003, 2006b, 2016). This proposal does not specifically address the issue of how vocal symbols emerged. MacNeilage (2010), on the other hand, views symbols as the most fundamental factor of language evolution (p. 137). In his frame/content theory, the vocal modality provided a prototypal frame, the syllable, which is seen to originate in cyclical motions such as those of mastication (1998). MacNeilage argues that, for arbitrary symbols to emerge, “hominids needed to evolve a capacity to invent expressive conventions” (2008, p. 99), though this did not arise from a “higherorder word-making capacity” (cf. Hauser et al., 2002). Instead, the capacity to form “words” rests on mimesis and procedural learning, which refines actions and a memory of actions. Donald (1997) described these processes as a capacity to “rehearse the action, observe its consequences, remember these, then alter the form of the original act, varying one or more parameters, dictated by the memory of the consequences of the previous action, or by an idealized image of the outcome” (p. 142). MacNeilage (2010), quoting Donald, suggests that, while human infants manifest procedural learning at the babbling stage, great apes do not so that “it would be no exaggeration to say that this capacity is uniquely human, and forms the background for the whole of human culture including language” (Donald, 1997, p. 142). Yet studies on how chimpanzees manufacture tools challenge this claim. For instance, in a recent longitudinal study, Vale et al. observed that chimpanzees can successfully create tools to retrieve rewards (Vale, Davis, Lambeth et al., 2017; Vale, Flynn, Pender et al., 2016). They can also retain, for years, knowledge about how to manufacture a tool and can transfer this knowledge to new tasks, indicating an ability to acquire skills. Again, procedural learning,

44

Questions of Ontology

like other cognitive abilities, is required in symbol learning. But it is not a sufficient factor in accounting for the human-specific development of symbolic language in a vocal medium. Nonetheless, the preceding proposals share the view that symbolic communication does not emerge from mental functions alone, or gestures, and essentially links to a capacity to produce patterns like syllables and babble. 2.3.2

“Sound Symbolism”: Questions of the Efficiency of Iconic Signs

Another theory which is also built on mimesis suggests that symbolic language may originate from iconic vocal signs (or sound symbols), based on the assumption that “iconicity seems easier” than making arbitrary associations (Imai & Kita, 2014; Kita, 1997; Massaro & Perlman, 2017; Monaghan et al., 2012; Perlman, Dale, & Lupyan, 2015; Sereno, 2014; Thompson, Vinson, Woll et al., 2012). This idea that symbols arose from a mimicking of objects and events partly draws from experiments by Sapir in 1929 and Köhler in 1947 on sound–shape pairings, where listeners judge perceived consonants and vowels as relating, for instance, to “angular” and “rounded” forms; or Ohala’s (1994) “frequency code” where pitch is related to features such as “size” and “brightness” (see, e.g., Maurer, Pathman, & Mondloch, 2006). The frequency code also extends across species that use vocal pitch in signaling submission or aggressive intentions (Morton, 1994). The central assumption is that such iconic cues are vital to the ontogenesis and phylogenesis of language because they inherently facilitate sound–meaning mapping and the displacement of signs (Monaghan et al., 2014). Experiments showing sound–form associations in infants and adult learners are often cited as supporting this view. For instance, a study by Walker, Bremner, Mason et al. (2010) shows that prebabbling four-month-old infants are able to associate pitch with features such as height and brightness (see also Monaghan et al., 2012). However, the experiments do not serve to demonstrate the facilitating effect of iconic sounds on language development, or even the necessity of an iconic stage in developing arbitrary signs of oral language. On these issues, attempts to relate iconic signs to spoken language face a logical problem. Iconic gestures or sounds offer highly restricted sets of signs compared to arbitrary symbols. Any restriction on the number of signs will inherently limit the diversity of form-referent associations and thus the efficiency of signs in communicating fine distinctions in meaning, leading some to see that “while sound symbolism might be useful, it could impede word learning” (Imai & Kita, 2014, p. 3; and also Monaghan & Christiansen, 2006; Monaghan et al., 2012). This is evident in signs serving to name referents where sound symbols are quite limited. For example, it is difficult to conceive how one could mimic open-set forms such as proper nouns. On the idea that

The Modality-Independence Argument

45

vocal iconic signs may nonetheless be easier to learn than arbitrary signs, computational models and experiments involving adult learners lead to the opposite conclusion (Gasser, 2004; Monaghan et al., 2011; Monaghan et al., 2012; Monaghan et al., 2014). For instance, in a series of experiments, Monaghan et al. (2012, 2014) compared the learning of arbitrary and iconic (systematic) sign–meaning pairs, taking into account given co-occurring “contextual” elements. In both neural network simulations and behavioral experiments, arbitrary form–meaning pairs were learned with fewer errors and more rapidly than iconic pairs. Still, such tests do not address the issue of how arbitrary signs arise in the vocal medium and why this is specific to humans. 2.3.3

Articulated Vocalization and the Rise of Symbolic Signs: A Laboratory Demonstration

The aforementioned simulations and tests focus on how infants and adults learn sound–shape associations with respect to the perception of given signs as provided by an experimenter. Such protocols do not address the question of how signs emerge, and infants in the first months of life may not produce symbolic signs. However, it is well established that maturational changes in the vocal apparatus coincide with the rise of such signs. On this development, comparisons of vocal processes in human infants and apes are revealing of the mechanisms that underlie the rise of vocal symbols. In particular, pre-babbling infants, like nonhuman primates, are obligate nasal breathers and produce sounds with nasal resonances (Crelin, 1987; Negus, 1949; Thom, Hoit, Hixon et al., 2006). Acoustically, nasalization dampens upper harmonics and formants, reducing the distinctiveness of sounds. Moreover, continuous nasal airflow during vocalization implies that articulatory motions may not create salient features of oral segmentation (Lieberman, 1968, 1984; Lieberman, Crelin, & Klatt, 1972; Thom et al., 2006). For instance, producing motions like stops [p, t, k, b, d, g], fricatives [f, s, ʃ, v, z, ʗ], or any other articulatory motion that segments the sound stream, requires modulations of oral pressure that are difficult to achieve in a system where air flows through the nose. For this reason, early productions in infants largely appear as continuous nasalized cries and vocalizations which, as in the vocal sounds of nonhuman primates, are divided by breath interruptions and glottal closures (Crelin, 1987; Lynch, Oller, Steffens et al., 1995). At three months, though, humans manifest a control of pitch contours, which can reflect a refinement of monosynaptic connections between the motor cortex and motoneurons of laryngeal muscles (Ackermann et al., 2014). Some suggest that, at this stage, vocalizations become less reactive and iconic, and can involve a symbolic coding of pitch (Oller, 2000; Oller & Griebel, 2014). Subsequent supraglottal changes, which are also human specific, then occur.

46

Questions of Ontology

Of particular interest is the progressive decoupling of the nasopharynx, indexed by the distancing of the epiglottis from the velum. In human newborns, as in many mammals, the epiglottis overlaps the velum creating a sealed passage that allows nasal breathing while ingesting food and extracting breast milk (Crelin, 1987; Kent, 1984; Negus, 1949). It should be noted that the decoupling of the nasopharynx involves soft tissue – which accounts for the difficulty in dating this change based on fossil records. Many works of comparative anatomy involving computed tomography (CT) and magnetic resonance imaging (MRI) discuss this decoupling by referring to a “descent of the larynx,” which is indexed by a lowering of the hyoid bone attached to the larynx and related measures of the length of the vocal tract. Since this lowering is observed across species, some have concluded that laryngeal descent has little to do the rise of spoken language in humans (e.g., Fitch & Reby, 2001; Fitch, 2010, ch. 8). However, laryngeal descent is only accompanied by a permanent decoupling of the epiglottis and the velum in humans, starting at about 6–8 months (Kent, 1984; Lieberman, 2007; Nishimura, Mikami, Suzuki et al., 2006). Following this decoupling, articulatory motions can create a variety of salient oral features in vocal signals, which constitutes a pivotal human-specific development. Although some contend that other species can produce a range of vowel-like resonances (Fitch, 2000, 2005; Fitch & Reby, 2001), only humans segment these resonances using various articulatory motions. This important change is accompanied by a general increase in rhythmic behavior (Iverson, Hall, Nickel et al., 2007) leading to babbling (Locke, 1995; Oller, 2000). Combining this morphological development with a capacity to modulate pitch contours creates the unique ability to manipulate both tonal and articulatory patterns. As noted, compared to other primates, humans develop direct cortical projections to motoneurons of the laryngeal muscles and also present a greater ratio of projecting fibers to tonguemuscle motoneurons, which together support a cortical control of orally modulated vocalization patterns. The symbolic potential of these patterns can best be understood by considering the rise of reduplicative syllables in children’s early babble. At the stage of canonical babble, repetitive articulatory patterns appear such as [dada], [mamama], [nana] (etc.), which become associated with caregivers, food, and other contextual referents. These early signs show that the arbitrariness underlying symbolic language can be inherent in the types of sounds that arise with orally segmented speech (a point also noted by Corballis, 2002), and that contextual information is sufficient to establish functional sound–meaning associations. In fact, the developmental literature does not suggest that iconic sounds or gestural mimics precede the rise of symbolic signs in canonical and variegated babble (Locke, 1995; Oller, 2000; and see McGillion et al., 2017). On the other hand, it is generally acknowledged that the rise of vocal symbols

The Modality-Independence Argument

47

accompanies a shift from pitch-varying to orally modulated vocalizations, although some symbol coding for pitch can precede this shift (Oller, 2000; Oller & Lynch, 1992). Overall, the human capacity to produce orally segmented sounds, which relates to circumscribable changes in motor processes of vocalization, constitutes a necessary though not a sufficient factor in the rise of symbolic signs. In other words, cognitive and perceptual abilities are certainly required to form sound–meaning associations. But these abilities are present in nonhuman primates and so do not, in and of themselves, account for the human-specific development of symbolic language. On the other hand, the changes that underlie a shift from pitch-varying to orally modulated sounds reflect a necessary factor for the rise of efficient vocal symbols in that, beyond pitch patterns, oral modulations provide a more diverse set of salient articulatory features by which to create distinctions in meaning. Of course, a direct demonstration of the critical role of this shift in the development of vocal symbols is not possible (i.e., one cannot manipulate the human ability to articulate sounds). But one can artificially reduce the acoustic features associated with articulatory motions by filtering signals in order to observe the effects of these features on the formation of symbolic associations. This guided the design of a “demonstration of principle” by Boucher et al. (2018) using a picture–sound association task. In the task, listeners heard unfamiliar speech sounds consisting of 24 twosyllable lexemes spoken in Mandarin. These contexts had different patterns of rising, falling, and flat tones. Each lexeme was presented with an array of four standard pictures (Snodgrass & Vanderwart, 1980) where one picture was a correct match, and repeated presentations served to observe the rise of associations across trials. To evaluate the effect of a shift from intoned to orally articulated patterns, the original lexemes and filtered versions of the same lexemes were presented. The filtering left only the pitch pattern of the first harmonic so that the lexemes sounded like hummed vowels with variable tones (an example of the stimuli is given in Figure 2.1). To support the sound–picture associations, two types of feedback were provided, representing two basic responses that can be obtained in a verbal learning context. In the first type (feedback A), the participants made an association and were given feedback on whether or not the association was correct (“yes/no”). In the second type (feedback B), the participants additionally received feedback on what the sounds designated (the correct picture was displayed). The first feedback condition emphasized a process of inference, while the second favored rote learning. For both conditions, the prediction was that symbolic sound–meaning associations would arise at faster rates for sounds with oral patterns compared to sounds with only intoned patterns principally because of the greater efficiency of oral features in making fine distinctions in meaning. Presentation

48

Questions of Ontology

Figure 2.1 Examples of the speech stimuli used by Boucher et al. (2018) in a sound–picture association task where the stimuli are filtered (left) and unfiltered (right) versions of a two-syllable lexeme in Mandarin. The filtering removes the articulatory information of upper harmonics and leaves only the first harmonic (F0), which is heard as vocalized tone without specific articulatory features. The filtered contexts were normalized for amplitude so that filtered and unfiltered items had similar intensities.

Figure 2.2 Learners’ (n = 40) sound–picture associations across trials and feedback conditions when speech sounds are presented with only their pitch contours or with their accompanying oral features (Boucher et al., 2018).

order of the two sets of filtered and unfiltered items was counter-balanced such that half of the participants heard filtered items first and the other half heard unfiltered items first, and all items were randomized within the sets. Figure 2.2 summarizes the effect of a change from intoned to orally modulated patterns on the rise of verbal symbols. Upon hearing sets of items containing features of oral articulation, the rate and accuracy of sound– meaning associations shift positively. Thus, listeners formed correct symbolic associations more rapidly (on fewer trials) when they heard orally modulated

The Modality-Independence Argument

49

sounds than when they heard vocalized tones. This effect was present across feedback conditions, and the results did not vary significantly for the differing tone patterns (for details, see Boucher et al., 2018). The results support a seemingly self-evident principle: beyond the cognitive abilities of learners, a shift from items with tonal features (as in the early vocalization of children) to items that include features of oral articulation (as in babble) contributes to the rapid formation of symbolic associations. It should be mentioned that, in numerous languages like Mandarin, as in the vocalizations of children, tonal patterns support the coding of arbitrary sound–meaning associations. Thus, it is not only the articulatory features of sounds that contribute to the formation of symbols. But adding these articulatory features, as in the preceding experiment, contributes to a greater efficiency of signs in making and learning fine distinctions in meaning. The effect illustrates the basic idea that one may not account for the emergence of symbolic signs in language without some reference to the types of signals that a modality of expression affords. As basic as this may seem, it is not a generally recognized principle in language study where the Saussurean doctrine of a speech– language division tends to overlook the fact that signal properties and motorsensory processes of speech contribute to the formation of such aspects as symbolic signs. There are straightforward implications with respect to the aforementioned storylines of language origin in that many of these accounts are developed by referring to a concept of “language,” implicitly or explicitly defined, as an amodal mental function. The preceding demonstration of principle (from Boucher et al., 2018), although hardly serving to invalidate evolutionary scenarios, suggests the need to reassess this concept given that specific vocal processes support symbolic communication, something that is uniquely human. 2.4

The Phylogeny and Ontogeny of an Amodal Symbol Function as a Pseudo-Puzzle

It can be seen in the preceding review that, for most authors, humans stand out amongst primates in their ability to develop symbols. Humans are the “symbolic species” (Deacon, 1997). But it should be noted that the very definitions of “symbol” that guide much of the work on language origin make reference to nineteenth-century theorists like Saussure and Peirce who did not consider how constraints on a modality of expression contribute to shaping signals and signs. Following this tradition, many accounts of language origin are oriented by the belief, popularized in linguistic theory, that symbolic language draws from a mental competence that has little to do with processes of modality. Adopting this Saussurean perspective, the rise of symbolic language does seem “mysterious” (Hauser et al., 2014), “puzzling” (Aitchison, 2000;

50

Questions of Ontology

Bouchard, 2015), and can lead to the question “why only us” (Berwick & Chomsky, 2015) or to speculations about evolutionary saltations (e.g., Hauser et al., 2002). But on the idea that symbols arise from a mental capacity or what Saussure called a language faculty separate from speech processes, research spanning decades has shown that monkeys and apes possess a basic competence to learn, process, and combine symbolic signs. Yet they do not develop the types of productive symbols found in spoken language. As it appears, the emergence of symbolic language cannot be explained in terms of cognitive capacities alone. Other modality-related factors involving the ability to produce and control orally modulated sounds are needed to account for symbolic language in humans and why it arises primarily in the vocal medium. On these factors, several works of comparative physiology have identified human-specific changes that can underlie the rise of symbolic signs, and these are seen in ontogenesis in terms of a shift from pitch-varying patterns to orally articulated babble. As Corballis (2002, pp. 187–188) indicated, articulatory modulations of sounds generate numerous features in signals, creating inherently arbitrary signs. These signs can be rapidly associated with co-occurring referents in the course of developing language. No stage of iconic gestures or “sound symbolism” appears to precede this development (Locke, 1995; Oller, 2000). But nor does such a stage seem necessary. In fact, studies have revealed that infants, even pre-babblers, readily associate heard sound sequences with referents, and use these symbols to categorize objects or parts of objects (see Ferry, Hespos, & Waxman, 2010; Fulkerson & Waxman, 2007). And it is the same for twelve-month-old children (MacKenzie, Graham, & Curtin, 2011). Interestingly, these reports also show that infants are less successful in forming symbolic associations when presented with sounds like intoned [mmm], similar to the tonal stimuli seen in Figure 2.1, or sounds that cannot be articulated, such as backward speech (MacKenzie et al., 2011; Marno, Farroni, Dos Santos et al., 2015; Marno, Guellai, Vidal et al., 2016). This ability to acquire symbols is not distinctly human, as indicated earlier, but communication by way of orally articulated signs is. Of course, there are numerous restrictions to drawing parallels between the ontogeny and phylogeny of vocal processes, and one can only speculate on what would have evolved if humans had not developed processes for articulating sounds. Some will argue that humans obviously possess the cognitive ability to form sign–meaning associations in other modalities of expression, such as sign languages. Viewed this way, the capacity to articulate sounds would seem unrelated to some separate symbolic function that evolved in the brain. However, such interpretations overlook that it has long been established that monkeys and apes also have the cognitive ability to learn and use sign– meaning associations, including those of sign language. But what these primates do not develop on their own is symbolic signs, while such signs appear in

The Modality-Independence Argument

51

human development with the rise of babble. So on speculations of what might have evolved, the available evidence of primatology minimally suggests that symbolic language bears a link to the particular types of signals and signs that humans are able to produce. As for hypotheses on what might have developed had humans been limited to pitch-varying calls, it may be surmised from the preceding test results on sound–meaning association (Figure 2.2) that pitchcontrolled signals without features of oral articulation would restrict the rapid rise of efficient symbols. On the phylogenesis of these oral features, certainly a pivotal factor is the decoupling of the nasopharynx that contributed to freeing the oral tract, thus allowing for an articulatory modulation of sounds (e.g., Crelin, 1987; Lieberman, 1984, 2006b; Negus, 1949). Some see these changes as a consequence of bipedalism (originating 5–7 million years ago) and that bipedalism led to a decoupling of respiration and locomotion that supported an independent control of phonation (Provine, 2005; see also Maclarnon & Hewitt, 1999, 2004). However, as expressed previously, bone markers in the fossil records cannot serve to index a decoupling in the soft tissues of the nasopharynx, so dating this change as following or coevolving with bipedalism appears problematic.

3

The Recent History of Attempts to Ground Orthographic Concepts of Language Theory

The review in Chapter 2 illustrates the lasting influence of a notional speech– language division formally introduced by authors like Saussure. The arguments used in Saussure’s Cours to bolster the idea that symbolic signs arise from of a modality-independent faculty are still being repeated. As in the Cours, many disregard the fact that symbolic language in humans generally arises in a vocal medium and bears links to an ability to articulate and control fine distinctions in sounds. As for repetitions of Saussure’s argument that gestural sign languages are also symbolic, recent research questions such interpretations. The research shows that sign languages initially develop from iconic signs (e.g., Ortega, 2017, and references therein), whereas spoken language, even in early stages of infant babble, develop as symbolic systems (Locke, 1995; McGillion et al., 2017; Oller, 2000). It is also acknowledged that the rise of symbolic sound–meaning associations that accompanies babble relates to identifiable changes in motor-sensory structures of speech in the context of rising rhythmic behaviors (Iverson et al., 2007; Locke, 1995; Oller, 2000). This account of symbolic signs, however, conflicts with the Saussurean doctrine of symbolic signs as deriving from a localized function in the brain, separate from processes of speech. As noted, there is no physiological basis for such a division or Saussure’s idea of a language faculty in Broca’s area. Critics also contend that the notion of an autonomous language function relates to a tradition of conceptualizing spoken language through writing. Nonetheless, the Cartesian viewpoint of Saussure has dominated language study, and with few changes. In the latter half of the twentieth century, the speech–language division was reinterpreted as a distinction between competence and performance. “Language competence,” like Saussure’s concept of langage, was still associated with a faculty, though not localized in any particular region of the brain. It is not the purpose here to debate the details of these historical developments. Instead, the following focuses on how, in the 60s, the doctrine of an amodal language competence, in combination with a computational approach, contributed to preventing an invalidation of orthographic concepts in formal theories of language. 52

The Recent History of Attempts to Ground Orthographic Concepts

3.1

53

From Orthographic Representations to “Substantive Universals”

Following the stand taken by early phonologists on the theoretical status of phonemes, language specialists throughout the twentieth century largely proceeded to elaborate theories via analyses of transcripts with fluctuating references to psychological evidence, and little or no reference to observations of signals or motor-sensory processes. This was the case for authors influenced by Saussure, including Bloomfield (1933), Z. Harris (1951), and Hockett (1958). In this period, a number of linguists focused on distributional analyses of IPA signs and, inspired by information theory (e.g., Shannon & Weaver, 1949), formulated associative chain models based on the probabilities of successive elements in utterances, as in Markov chains. Such probabilistic models were deemed to be compatible with the behaviorist paradigm of successive stimulus–response conditioning, which was the view promoted by Skinner (1957). As documented by several authors, an influential presentation by Lashley (1948/1951) exposed chain models as untenable with respect to the organization of sequential motions (for differing accounts of this development, see Gardner, 1985; Murray, 1994; Newmeyer, 1986a). It is worth mentioning that, in his critique, Lashley discussed spoken language in terms of the neurophysiology of motor sequencing and not simply in terms of symbol strings. Thus, in rejecting chain reflexes, Lashley noted the need to postulate hierarchical principles that organize sequential behavior. For instance, he discussed rhythm as an organizing principle and how periodic behaviors, even in insects, can be reorganized via interactions with physical aspects of the environment (see his references to Buddenbrock and Bethe; and, on rhythm as a hierarchical principle, see the attempt to develop this principle by Martin, 1972). As is known, some of Lashley’s central criticisms were referenced in Chomsky’s (1959) review of Skinner (1957), which served to introduce an alternative to chain models in terms of formal generative rules (Chomsky, 1957). In Chomsky’s writings, Lashley’s view of rhythm as a physiological organizing principle was omitted and replaced by derivations of sentences comprising rewrite rules and algebraic sets. In the context of the 50s and 60s, the appeal of this formal approach was extensive (as recounted by Gardner, 1985; Miller, 2003). With the development of digital computers and their use in modeling human intelligence (Turing, 1950), Chomsky’s grammar as a set of formal rules drew attention to the possibility of formulating algorithms that could derive all the “sentences” of a language. The proposal also linked to rising computational approaches in cognitive psychology (e.g., Miller, Galanter, & Pribram, 1960). However, with the rejection of chain models and the “bit” as psychological units of information, modelers in certain fields of artificial intelligence needed to specify the representations on which formal computations operate. This was a requirement for formal grammars as well.

54

Questions of Ontology

Figure 3.1 Examples of formal syntactic analyses performed on orthographic units. (a) Chomsky (1957), (b) Chomsky (1995).

On these representations, Chomsky did not pursue distributional analyses of utterance corpora, as practiced by structuralist linguists. Instead, formal computations that served to derive sentences were presented as operating on labeled constituents (N, VP, PRO, T, etc.) reflecting sets of units most often presented in regular orthographic script. To take a few simple examples, Chomsky (1957, p. 27) illustrated in Figure 3.1(a) the formal rules serving to rewrite the sentence “The man hit the ball” into phrases, and phrases into categories of words separated by spaces. Other types of derivations bearing similar references to orthographic forms and categories continued to be used, as in the Minimalist Program where one finds representations such as Figure 3.1(b) (from Chomsky, 1995, p. 165). There are peculiar assumptions in such formalisms that went unquestioned in the 1957 version and subsequent revisions of generative grammar. Specifically, it was assumed that the representations on which computations operate are divisible units as in a text (letter-like phonemes, words, phrases), which bear the particular category information of Latin grammar (noun, verb, auxiliary, consonant, vowel, etc.). That these elements appeared conceptually linked to orthographic code was of little concern, and this may be partly explained by the original focus of the theory. In the approach proposed by Chomsky in 1957, formal computations could equally serve to generate sentences in text or utterances in spoken language. This is stated in the following paragraph that also specified the goal of language analysis and formal grammars: All natural languages in their spoken or written form are languages in this sense, since each natural language has a finite number of phonemes (or letters in its alphabet) and each sentence is representable as a finite sequence of these phonemes (or letters), though there are infinitely many sentences. Similarly, the set of “sentences” of some formalized system of mathematics can be considered a language. The fundamental aim in the linguistic analysis of a language L is to separate the grammatical sequences which are the sentences of L from the ungrammatical sequences which are not sentences of L and to study the structure of the grammatical sequences. The grammar of L will thus be

The Recent History of Attempts to Ground Orthographic Concepts

55

a device that generates all of the grammatical sequences of L and none of the ungrammatical ones. (Chomsky, 1957, p. 13)

In this quote, the “device” that operates on units of text is not meant to capture computations in the brain but provides a heuristic formalization that serves to investigate grammaticality. In subsequent works, Chomsky (1961) emphasized this point: It would be absurd to require of the grammars . . . that their output be the kinds of sets of strings, or sets of structural descriptions, that can be handled by strictly finite automata, just as it would be absurd to require (whether for the purposes of mathematical or psychological researches) that the rules of arithmetic be formulated so as to reflect precisely the ability of a human to perform calculations correctly in his head. (p. 8, emphasis added)

But as many recognize, this perspective shifted. By 1965, the formal grammars were deemed to have a mental reality, and linguistic methods of analysis were claimed to be useful in guiding neurophysiological investigations: The problem for the linguist, as well as for the child learning the language, is to determine from the data of performance the underlying system of rules that has been mastered by the speaker-hearer and that he puts to use in actual performance. Hence, in the technical sense, linguistic theory is mentalistic, since it is concerned with discovering a mental reality underlying actual behavior. [. . .] Mentalistic linguistics is simply theoretical linguistics that uses performance as data (along with other data, for example, the data provided by introspection) for the determination of competence, the latter being taken as the primary object of its investigation. The mentalist, in this traditional sense, need make no assumptions about the possible physiological basis for the mental reality that he studies. In particular, he need not deny that there is such a basis. One would guess, rather, that it is the mentalistic studies that will ultimately be of greatest value for the investigation of neurophysiological mechanisms, since they alone are concerned with determining abstractly the properties that such mechanisms must exhibit and the functions they must perform. (Chomsky, 1965, p. 3, emphasis added)

The idea that there is an innate language acquisition device (LAD) was proposed at this point (pp. 25 ff.). By 1976, through an association with the biologist Lenneberg (1967), the claim was that “we may regard the language capacity virtually as we would a physical organ of the body” (p. 2). Thus, the original view of the generative device changed from that of a heuristic formalism serving to investigate the grammaticality of sentences to an innate mental program: What many linguists call “universal grammar” may be regarded as a theory of innate mechanisms, an underlying biological matrix that provides a framework within which the growth of language proceeds. There is no reason for the linguist to refrain from imputing existence to this initial apparatus of mind as well. Proposed principles of

56

Questions of Ontology

universal grammar may be regarded as an abstract partial specification of the genetic program that enables the child to interpret certain events as linguistic experience and to construct a system of rules and principles on the basis of this experience. (Chomsky, 1976, pp. 2–3)

This shift led to enduring polemics amongst proponents of generative syntax (e.g., Botha, 1979a, 1979b; Katz, 1980, 1996; Katz & Postal, 1991; Postal, 2003). Those who adopted the original view criticized the inconsistency of imputing a biological reality to formalisms. To be clear, the debate did not center on the use of algebraic sets and rewrite rules. In all sciences, formalisms serve to understand natural phenomena. But these formalisms are not the phenomena, as reflected in the title “The formal nature of language” (Chomsky, 1967). More recently, Postal (2009) criticized the ontological incoherence of viewing a formal grammar as an “organ” that generates an infinite number of sentences (Chomsky, 1975b, p. 10). As Postal (2009) remarked: “Consider a liver and its production of bile, a heart and its production of pulses of blood; all physical and obviously finite. And so it must be with any cerebral physical production” (p. 109). For Katz and Postal, the problem entailed a conflation of type and token (Katz, 1996; Katz & Postal, 1991; Postal, 2009). In generative grammar, sentence, NP, V, (etc.) are types that do not exist physically, much as variables in mathematical expression (and for a related stand within phonology, see Reiss, 2017). For these authors, the study of competence entailed the elaboration of formal computations, using intuitions and comparisons of sentences and non-sentences, so as to provide a theory of grammaticality (as in Chomsky 1957). This approach is more or less akin to formal logic than to experimental science (e.g., Botha, 1979a, 1979b). Needless to say that, for experimentalists who pursued research on the psychological and biological reality of generative grammars, the objections of Postal and Katz carry profound implications. One implication is that, if models of sentence generation are purely formal, they are not amenable to experimental invalidation. In considering “substancefree” models that draw from analyses of sentences using categories and units of a writing code, it is difficult to define just what the formalisms capture beyond the grammaticality of orthographic conventions. On the other hand, if universal grammar is interpreted as reflecting an indeterminate “language organ” then, as the aforecited critics note, its output would be constrained by physiology. But one does not find in the tenets of generative grammar any indication of how physiology constrains entities like sentences, phrases, words (etc.). As Postal (2009) noted, in the decades since the proposal of a biological universal grammar, Chomsky “has not specified a single physical property of any linguistic object” (p. 113). Still, this was not a major concern for experimentalists in the 70s and 80s. Many psycholinguists during this period focused on

The Recent History of Attempts to Ground Orthographic Concepts

57

demonstrating the psychological reality of generative grammar. This led to a lasting emphasis in this sector on syntax and the processing of “sentences.” As it turned out, however, a body of work did invalidate the type of derivations postulated in generative transformational grammar, creating some disillusionment with the theory (see Fodor, Bever, & Garrett, 1974). This trend continued in psycho- and neurolinguistic investigations of generative theory including that of the Minimalist Program (Chomsky, 1995), as described by Ferreira (2005). In interpreting this research, it is useful to mention that Ferreira expressed reservations about the frequent use of stimuli consisting of visually presented words and sentences, but did not discuss how this could reflect the orthographic concepts of syntactic theory: A major weakness in the field of psycholinguistics is that it has focused too heavily on written language, when the spoken medium is much more commonly used and obviously is ontogenetically primary. The reason for this reliance on reading has been convenience: It is easier to present stimuli to participants on a computer monitor than to try to record speech files and play them out, and more importantly, until recently, no sensitive online measures for recording moment-by-moment processing were available for auditory language. (Ferreira, 2005, p. 373)

Beyond these methodological issues, Ferreira remarked that, over the years, experimental research has had little impact on generative models (despite some attempts to adjust the theories; e.g., Bresnan, 1982; Jackendoff, 2007b, 2017). In developing these models, generative grammarians largely pursued a type of argumentation based on analyses of written sentences where grammaticality is intuitively established (Wasow & Arnold, 2005). In this enterprise, actual utterances were not the object of study. In attempting to account for this disconnect with respect to spoken language, it was noted that Jackendoff (2009, 2017) surmised that, in response to the collapse of a derivational theory in the 80s, linguists hardened the competence–performance division so as to render competence models immune to observations (2009, p. 28). But this does not account for the state of the field of study, nor the “scientific mistake” of syntactico-centrism (p. 35). As outlined previously, the disconnect was already instituted well before generative syntax. Theorists like Saussure and structuralist linguists had established the practice of analyzing spoken language via text. Methods such as commutation tests on letters and distributional descriptions using IPA corpora were largely seen to suffice in elaborating feature systems and models. By the time generative syntax became the focus of language study, analyses were performed on sentences where theorists openly admitted that the units and categories used in these analyses were not in utterances. Indeed, reports during this period had shown that the ability to divide units of linguistic analysis like phonemes and words linked to knowledge of alphabet writing

58

Questions of Ontology

(Berthoud-Papandropoulou, 1978; Ehri, 1975; Fox & Routh, 1975; Kleiman, Winogard, & Humphrey, 1979; Morais, Cary, Alegria et al., 1979; and for more recent evidence, see Section 3.2.2 and Part II). Still, such findings had little to no impact on theories of syntax. Rather, the problem of linking to observable aspects of spoken language was relegated to models of “language processing” where the goal was to develop an interface serving to convert linguistic concepts to and from utterances (but cf. Part III on persisting problems in devising such an interface). But not all authors adopted the program of generative grammarians. Many questioned the validity of innate grammars and the competence–performance division. This was the case of a group of authors who proposed a connectionist approach (e.g., Elman, 1993; Elman, Bates, Johnson et al., 1996; McClelland and Rumelhart, 1986) which, in converging with usage-based constructivist theories, led to an “emergentist” program (for an account of this development, see O’Grady, 2008). This latter approach has fueled debates amongst professional linguists largely focusing on the acquisition of syntax. It is not the aim of the present monograph to offer (yet another) comparison between emergentist and generativist programs. However, it is essential, in understanding the ontological incommensurability between syntactic theory and observations, to outline how these programs view language development. In this area, criticisms of the LAD and the argument of the “poverty of stimulus” (Chomsky, 1965, 1980b) illustrate how polemics on developing syntax largely draw from linguistic analyses and assumptions of units and categories that conceptually link to writing. 3.2

Shoehorning Orthographic Concepts: Issues in Grounding the LAD

Historically, the study of spoken-language development has generally involved analyses of transcriptions of children’s speech beyond the babbling stage (for a historical account, see, Finestack, Payesteh, Disher et al., 2014). This is because pre-babble productions consist of vocalizations with few oral patterns, which make these sounds difficult to represent with letters (Ramsdell, Oller, & Ethington, 2007). When it comes to language development, however, transcripts present obvious limitations in capturing rather basic factors. For instance, spoken language develops in the context of speaker–listener interactions involving multisensory input, including sensory information from motor processes that are undergoing maturational changes. These essential aspects of developing language are not captured by IPA signs, so it has to be asked whether any theory erected on analyses of transcripts can, in and of itself, constitute an account of language development. On this question, it is not only that IPA signs offer limited information, but that transcripts only apply at

The Recent History of Attempts to Ground Orthographic Concepts

59

a given developmental stage, and analyses of these data most often involve the presumption that children are manipulating particular units. 3.2.1

Biases and Limitations of Analyzing Language Development through Writing

On recording children’s productions, the IPA is impractical prior to canonical babble and yet it has been established that pre-babbling infants control tone contours while actively communicating with caregivers (Capute, Palmer, Shapiro et al., 1986; Nathani & Oller, 2001; Oller, 2000; Oller, Dale, & Griebel, 2016; Papoušek, Bornstein, Nuzzo et al., 1990; Reimchen & Soderstrom, 2016). Thus, before linguistic descriptions are possible, infants do produce functional or meaningful utterances. Moreover, as discussed in Section 2.3.3, the major shift that occurs at around six to eight months from breath-divided to orally modulated intonations coincides with maturational changes in supraglottal processes and a general increase in rhythmic behaviors. However, it is only when heard oral patterns allow for some application of the IPA that linguistic analysis can begin and, at that stage, analyses often include indications of syntactic categories and units using spaces and markings like < . ; [ ] > (MacWhinney, 2000). In the 70s, some analysts did not bother with the IPA and simply used regular writing, as in the influential monograph by Brown (1973). But such conventions in analyzing children’s utterances can obviously orient a concept of language development. The latter facet of language analysis presents an extension of what Oller and colleagues (Nathani & Oller, 2001; Oller, 2000) called “shoehorning,” which designates a particular observer bias. Oller criticized a tendency in the developmental literature to examine only aspects of children’s speech that can be transcribed with letter signs. This shoehorning has a number of ramifications. Historically, despite defenders of instrumental techniques (e.g., Lynip, 1951), most studies have used the IPA including marks denoting units that require, for their identification, knowledge of an orthographic code. An analysis involving letters, words, sentences, and terms like noun, verb, object (etc.), carries the presumption that children can manipulate such elements. One instigator of the popular corpus CHILDES (MacWhinney, 2000) cautioned contributors about transcribing speech with elements that link to writing (though CHILDES includes orthographic corpora): Perhaps the greatest danger facing the transcriber is the tendency to treat spoken language as if it were written language. The decision to write out stretches of vocal material using the forms of written language can trigger a variety of theoretical commitments. (MacWhinney, 2019, p. 18)

60

Questions of Ontology

Despite some expressed cautions, at a time when generative theory dominated research on language development, observer-induced units in transcripts of child speech were not viewed as a major problem – until some analysts asked how children themselves divided speech. For instance, commenting on early research on the acquisition of syntax, Morgan and Demuth (1996) described the rising issue of segmentation this way: [T]heories of syntax acquisition commonly adopt the assumption that, prior to acquiring structural knowledge of their language, children can represent input utterances as strings of words. A few theorists have worried over how children might attain proper wordlevel representations [. . .]. Most often, however, this assumption has been adopted uncritically. (p. 7)

On the other hand, in the last few decades, a body of research has accumulated on how children may segment speech in relation to assumed word units. The difficulty, as mentioned earlier, is that there is no operational definition of the word. In fact, historically, the concept was not always recognized by language theorists. Thus, Saussure like many of his contemporaries did not postulate a level of word formation between morphemes and “syntagms”; Bloomfield (1933, p. 179) cautioned that there are numerous languages where it is impossible to distinguish between words, phrases, and bound morphemes; and Z. Harris (1946), expressing a similar concern, did not include words at all in his analyses of utterances. But this changed with generative syntax where, from the start (Chomsky, 1957), models of sentence generation were taken to involve rules operating on categories of words. Considering these changing viewpoints and the indeterminacy of the word concept (Dixon & Aikhenvald, 2002; Haspelmath, 2011), a consumer of research is left wondering what notion of word guides investigations of children’s speech. On this latter question, investigators are divided both in terms of assumptions and methods. The issue constitutes a critical point of discord between proponents of generativist and constructivist theories of syntax acquisition, and carries implications with respect to the validity of these perspectives. Specifically, in generative theory, it is assumed that syntax operates on innate “substantive universals” (i.e., semantic-syntactic categories like “nouns” and “verbs” as suggested by Chomsky, 1965; and see Pinker, 1984, 1987, 1994a, 1994b). These categories hook onto or bootstrap to words so that “formal universals” (computational rules) can generate sentences. An opposing set of assumptions guides usage-based grammarians. In this program, categories like NOUN and VERB are merely place labels; they “are not innately specified categories but simply useful mnemonics for clusters of items that behave in a similar way” (Ambridge & Lieven, 2011, p. 201; see also Ambridge, 2017). Such clustering of items is seen to reflect formulas and semantic schemas that can be identified by “functionally based distributional

The Recent History of Attempts to Ground Orthographic Concepts

61

analyses.” For example (from Ambridge & Lieven, 2011, p. 202), a child may generalize across exemplars like I can see you, I can kick you (etc.) giving a slot-and-frame formula or schema I can X you, where X is a developing category of action or non-action VERB (on semantic schemas and analogies, see, e.g., Bowerman 1990; and for further examples and discussion on the issue of categories, see Ambridge & Lieven, 2015). As for the segmentation of slot-and-frame formulas and schemas, this is related to a general notion of “chunking.” Thus, generativist and constructivist perspectives on syntax acquisition are also fundamentally opposed on the issue of how children segment speech, even though both are formulated by referencing transcripts of spoken language. 3.2.2

The Search for Marks of Words and Phrases, versus “Chunks”

The above viewpoints can guide experimentalists in different directions. For usage-based grammarians, the segmentation issue does not critically rest on finding marks of conventional notions of words. This, however, is not what has guided most experimental work. In the area of speech perception, especially, researchers who set aside or are unaware of the indeterminacy of the word concept have continued to investigate the cues that children (or adults) might use to segment words in the speech stream. This research now spans three decades since the inaugural work of Juscyck (1986, 1993, 1996). Yet one finds no consensus in this work on what constitutes a word. The problem has created considerable confusion in the literature (as illustrated below) and no consistent cues for word divisions have been found to apply across contexts and languages (for reviews that assume word units, see Ambridge & Lieven, 2011, ch. 2; Behme & Deacon, 2008; Cutler, 2012; Werker & Yeung, 2005). The absence of general marks for words in sensory signals presents a paradox for the hypothesis of a LAD and a presumed bootstrapping of universal substantive categories. On the other hand, perception studies have suggested several segmentation principles. There is, however, a revealing discrepancy between assumptions about segmentation that guide perception research and observations of how children (and adults) actually manipulate units in production. Before outlining these discrepancies, it should be noted that repeated citations in the literature to the effect that Jusczyk and Aslin (1995) have demonstrated early word segmentation by 7–8 month old children add to the confusion. Some of the hundreds of citations read as follows: The ability to segment fluent speech into word forms has been shown to emerge during the first year of life, around 8 months of age (Jusczyk & Aslin, 1995). (Nazzi, Iakimova, Bertoncini et al., 2006, p. 283)

62

Questions of Ontology

Despite the complexity of this learning task, infants as young as 7.5 months of age demonstrate the ability to extract words from continuous speech (Jusczyk and Aslin, 1995). (Saffran, Johnson, Aslin et al., 1999, p. 29) Two decades ago, Jusczyk and colleagues conducted groundbreaking research in early word segmentation by showing that English-monolingual infants are able to segment monosyllabic words between 6 and 7.5 months of age, and bi-syllabic words by 7.5 months of age (Jusczyk & Aslin, 1995; Jusczyk, Houston & Newsome,1999). (Polka, Orena, Sundara et al., 2017, p. 2)

And so on. However, one should consider that the study by Jusczyk and Aslin involved an auditory priming of word items followed by a task on item recognition in passages, indexed by children’s head-turning responses. The priming part used repetitions of isolated lexemes such as feet, bike, cup, dog, “ . . . chosen because they are content words containing stressed syllables,” and included non-words. In other experiments, the priming used passages containing repetitions of feet and bike, or cup and dog. Head-turning responses showed a preferential recognition for the target lexemes over non-words. Jusczyk, Houston, and Newsome (1999) applied a similar experimental protocol with bisyllablic lexemes that aimed to show that young children preferred items with initial stress (although the items used: hamlet, device, kingdom, etc. could have sounded like non-words for the 7-month-old listeners). In such studies, which are ongoing, it is revealing that investigators focus on stand-alone lexemes as represented in dictionaries and assume that these are manipulated or perceived as separate units by children. Experiments tend not to use frequent, familiar expressions like you’re, she’ll, I’m, Mom’s (Mom+is) (and so on), or bisyllables like gi’me, take’m, see ya, stop’it (etc.), where the latter carry lexical stress and can function as stand-alone forms. These types of common expressions would likely be recognized by toddlers (or at least be more familiar than items like hamlet, device, kingdom, etc.). But then investigating whether these items function as “words” would require a definition of the word concept. In fact, it is in viewing what counts as a word in much of the perception literature that one surmises that investigations are oriented in terms of notional orthographic forms (implicitly: words are those units that are divided in a text). A quite different concept arises when considering children’s production. Within a constructivist approach, items such as the preceding common expressions can function as lexicalized formulas, and can be manipulated as such by children and adults (Bates & Goodman, 2001; Dabrowska, 2004; Tomasello, 2000b; Tomasello & Bates, 2001). This conflicts with conclusions from the above perception studies. And in fact, contrary to claims that toddlers are segmenting free-form lexemes or words, corpus-based analyses of speech produced by children indicate that they may not be manipulating pronouns, determiners, prepositions, and other “bound forms,” as separable elements

The Recent History of Attempts to Ground Orthographic Concepts

63

from lexemes before about 3 years of age. And even then, the development appears indeterminate and piecemeal (Lieven, Pine, & Baldwin, 1997; Pine & Lieven, 1997; Pine & Martindale, 1996; Tomasello, 1992, 2000a, 2000b). Beyond observations of children’s productions, studies of adult corpora relating to certain pathologies (aphasia, dementia, Alzheimer’s disease) indicate that speakers are manipulating innumerable formulas, not necessarily in terms of separable words, but as coherent chunks (Bates & Goodman, 2001; Kindell, Keady, Sage et al., 2017). Collectively, these observations not only contradict interpretations that children perceive word units by 7–8 months, but a fortiori the claim that newborns are able to divide words via an in-utero detection of lexical stress (e.g., Saffran, Werker, & Werner, 2007 who refer to Hepper & Shahidullah, 1994). On this latter claim, some propose that infant perception of stress reflects an inborn word-segmentation ability of the LAD (McGilvray, 1999, p. 66, and see the “unique stress constraint” as a universal principle, Gambell & Yang, 2005; Yang, 2004). Such proposals seemingly ignore that not all languages have “word stress,” which is the case of Korean and French, or else misinterpret stress as being generally word-based (e.g., a language like French has groupfinal lengthening, but no “word-final stress,” contra Endress & Hauser, 2010, p. 181). But more fundamentally, the proposals overlook the fact that the word concept as such is not universal. The most decisive counterevidence to the idea of an inborn word-segmentation ability is given by languages where it is not possible to divide words and where stand-alone units appear as indivisible sentence- or phrase-like blocks of bound forms (a point that is illustrated in Section 10.4.1). Conversely, studies show that the ability to divide text-like words in speech associates with the acquisition of reading in both preliterate children and adults (for a critical review, see Veldhuis & Kurvers, 2012). It should also be acknowledged that the word concept historically relates to culture-specific conventions of writing. For instance, the spaces that divide words were not always present in alphabet writing. In fact, historians have documented that they were invented by medieval monks, such that notions of words stand as cultural constructs, not universal or biologically based segmentations (see references to Saenger 1997 in Section 10.4.1). As for phrases, within generative grammar, the problem of grounding a capacity to divide a noun-phrase and a verb-phrase (NP, VP) is just as vital as the issue of word segmentation and categorization. In this area, one finds yet another clash between perception research and corpus-based analyses. On the one hand, several findings from perception experiments are taken to suggest that young children and even newborns can detect various prosodic cues for phrase boundaries, which some see as an early, perhaps innate, “prosodic bootstrapping” of syntax (e.g., Christophe & Mehler, 2001; Christophe, Millotte, Bernal et al., 2008; Christophe, Peperkamp, Pallier et al., 2004;

64

Questions of Ontology

Soderstrom, Seidl, Nelson Kemler et al., 2003; for a review, see Ambridge & Lieven, 2011; Gutman, Dautriche, Crabbé et al., 2015). Yet it is generally acknowledged that prosodic groups do not consistently map phrases as assumed in formal syntax, even at major noun-verb boundaries. For instance, short subject nouns and verbs usually appear in a single prosodic group (e.g., He eats/the cake, Paul eats/the cake, etc.). Such inconsistencies are not occasional. In an often-cited study of child-directed speech, Fisher and Tokura (1996) reported that 84 % of the “sentences” directed at a child had short pronoun subjects (as in he’s, it’s, you’re, I’m, etc.), though the problem is hardly restricted to pronominal forms. In short, there are substantial difficulties in linking phrase units of formal syntax and linguistic analysis to utterance structure (and see the issue of linking “noun” and “verb” categories to phrases discussed by Ambridge & Lieven, 2011, ch. 2, and pp. 196–197; Ambridge, 2017). It may be argued that these problems of word and phrase segmentation are not critical within a constructivist approach to syntax that focuses on formulas and schemas. However, in isolating these units, authors often refer to distributional analyses of allophones, prosodic marks, and especially statistical learning, not necessarily with the aim of dividing words or phrases, but as segmentation principles relating to a notion of chunks (also designated as “constructions,” flexible “slot-and-frame formulas,” semantic “schemas,” which do not function as fixed idiomatic expressions of the type described by Wray & Perkins, 2000). There is a particular emphasis in this view on distributions and transitional probabilities (TPs) with the idea that children and adult learners are sensitive to regularities in heard signals and infer boundaries between sounds that rarely follow each other (Aslin, 1993). Numerous perception experiments and models that use small artificial languages have demonstrated the effects of TPs and distributional cues in learning “artificial words” (e.g., Aslin, Saffran, & Newport, 1998; Endress & Mehler, 2009; Pena, Bonatti, Nespor et al., 2002; Perruchet et al., 2004; Saffran, 2001a, 2001b; Thiessen & Saffran, 2003). On the other hand, experiments or proposed neural networks that simulate statistical learning are often tested using IPA corpora or synthetic streams of speech that lack marks for structural prosody. Importantly, perceptual studies that include such aspects have shown that, when artificial words with high word-internal TPs straddle prosodic groups, they are not recognized as units (Mattys, Jusczyk, Luce et al., 1999; Mattys, White, & Melhorn, 2005; Shukla, Nespor, & Mehler, 2007). Thus, the perceptual effects of phonotactic regularities on segmentation appear to have little to no effect when they do not correspond to structural prosodic groups in signals, which are not generally represented in IPA corpora. This is a critical finding in that it reveals the inherent limitations of studies that focus on transcripts – where letters do not represent utterance structure.

The Recent History of Attempts to Ground Orthographic Concepts

65

It should also be noted that experiments that use natural speech samples rather than streams of synthetic sounds have led to different conclusions on how children segment speech. Interestingly, in attempting to replicate findings of statistical learning reported by Saffran, Aslin, and Newport (1996), Johnson and Jusczyk (2001) found that, when using natural speech samples, speech segmentation by 8-month-old children relied more on prosodic groups in utterances and “co-articulation” than on TPs. Throughout this research, however, there is a terminological ambiguity on whether the segmentation principles of “artificial words” apply to chunks or some orthographic notion of a word. Within a constructivist approach, the chunking process underlying formulas and schemas is central, but remains undefined within this approach. Nonetheless, slot-and-frame formulas serve to develop an account of grammatical categories that does not present the type of grounding problems that accompany the postulates of a LAD. In considering the notion of chunks, few constructivist grammarians venture to characterize the nature of these units and how they arise. Among those that do, Bybee (2010, 2011) has claimed that chunking originates in repetition: The underlying cognitive basis for morphosyntax is the chunking of sequential experiences that occurs with repetition (Miller, 1956; Newell, 1990; Haiman, 1994; Ellis, 1996; Bybee, 2002a). (Bybee, 2010, p. 34) Sequences of units or word strings that are often produced together . . . become units or chunks in their own right. (Bybee, 2011, p. 70)

Research on the neural processes underlying sensory chunking is discussed in Part III. However, it is useful to note at this point that Bybee’s view of chunking as arising from repetition deals with a consolidation of heard items in memory and not chunking as such. In fact, chunking links to the processing of motor-sensory sequences. Spoken language involves series of articulated sounds that unfold over time and, as discussed later on, constraints on sequence memory underlie a perceptual chunking of incoming information. In terms of this perceptual process which applies in learning novel expressions, Bybee’s reference to G. Miller (1956) is inapplicable. G. Miller’s idea of chunking refers to a semantic recoding strategy that operates on items in long-term memory, as in his often-cited example where the letters I, B, M, C, I, A, I, R, S, F, B, I are chunked according to familiar acronyms IBM, CIA, IRS, FBI to facilitate recall. For a child learning a language, incoming series of articulated sounds are not chunked in terms of explicit semantic operations on forms in long-term memory (at least not in the initial stages) but reflect limitations on a short-term buffering of sequences. But setting aside the chunking mechanism for the moment, the distributional approach of constructivist theory presents an account of syntactic categories as

66

Questions of Ontology

emerging from clusters of elements constituting formulas and schemas, as in I can X you, where X is a category that arises with language use. It is important to note that this viewpoint conforms to recent clinical and neuroimaging evidence on representations of syntactic class information. Beyond textbook arguments of generative syntax, such as arguments of the poverty of stimulus, or the “logical necessity” of syntactic units and classes, the evidence in question is seen to invalidate putative substantive universals that conceptually link to orthographic categories. 3.3

Neuroscience Falls upon Nonexistent Substantive Universals: Why This Invalidation Is Different

Section 3.2.2 outlines decades of work on the still unresolved problem of bootstrapping, or how putative syntactic categories of a LAD selectively bind to notional words and phrases. It was not a major consideration for investigators who adopted this theory that bootstrapping problems could stem from the use of orthographic units and categories in elaborating formal grammars. Nor were the reported effects of writing on the ability of children to segment words seen to undermine postulates of a LAD. Overlooking such effects, many were persuaded by arguments of a competence–performance division and the poverty of stimulus (Berwick, Chomsky, & Piattelli-Palmarini, 2013; Chomsky, 1965, 1975a, 1986, 1993; and on the claimed necessity of postulates of innate syntactic categories and units: Anderson & Lightfoot, 2002; Crain & Pietroski, 2001; Gambell & Yang, 2005; Legate & Yang, 2002; Lightfoot, 1989; McGilvray, 1999; Piattelli-Palmarini & Berwick, 2013). Aside from the competence–performance doctrine, criticized previously, the poverty of stimulus argument presents a particular theoretical barrier to counterevidence. The argument basically suggests that, faced with a set of heard utterances, children would have an infinite number of hypothetical combinatorial rules from which to choose and no way of knowing which rules generate unheard grammatical sentences (on Chomsky’s varying formulations of this argument, see Berwick et al., 2013; Cowie, 1999). The solution within a program of generative syntax is to posit that children already know the rules, and the categories of units to which the rules apply. This inborn knowledge would thus provide a basis for selecting, amongst possible combinations, those that create grammatical sentences. Several critics have suggested, to the contrary, that children hear sufficient numbers of utterances and distributional patterns to allow for a data-driven learning of syntax (Behme & Deacon, 2008; CameronFaulkner, Lieven, & Tomasello, 2003; Dabrowska, 2004; Sampson, 2002). The problem, noted by Ambridge and Lieven (2011), is that, within the logic of the poverty of stimulus argument, data cannot serve to invalidate the hypothesis of innate rules: “since there is an infinite number of hypotheses, there can never be

The Recent History of Attempts to Ground Orthographic Concepts

67

enough input data to rule them out, no matter how much language the child hears” (p. 120, emphasis in original). Conversely, demonstrating that the input is rich enough for an input-based learning of grammar does not provide evidence that children are not using innate knowledge of syntactic categories (Behme and Deacon 2008). Nor does rich stimuli serve to reject a view of input-based learning as an epigenetic effect of an innate LAD (Berwick et al., 2013). Similarly, attempts to refute postulates about syntactic universals by extending typological studies to hundreds of known and extinct languages (e.g., Evans & Levinson, 2009) may not invalidate forthcoming “true universals” at different levels of analysis. As one commentator argued “ . . . it is perfectly possible for new languages to demonstrate diversity at one level and uniformity at another” (Baker, 2009, p. 448). But on the nature of syntactic categories, constructivist and generative accounts of syntax are fundamentally opposed. As described earlier, constructivist accounts suggest that categories of verbal elements emerge through learning how elements are distributed in chunked formulas. In this view, grammatical categories are not in the brain as predetermined properties of words. By comparison, predetermined grammatical categories like nouns and verbs are central to formal operations of generative models of syntax. These categories are seen to be potential substantive universals of words somewhere in the brain and part of a biological LAD (e.g., Chomsky 1965). On these contrasting assumptions of verb and noun categories, recent clinical and neuroimaging studies have provided decisive findings. Interestingly, these findings came about, almost unexpectedly, following certain methodological reevaluations of conventional linguistic analyses. In the late 80s, several clinical studies reported that aphasic individuals can present more severe impairments on verbs than nouns, or the reverse (thus a “double dissociation”). This led to the conclusion that lexemes are represented separately in the brain in terms of their grammatical categories (e.g., Caramazza & Hillis, 1991; Miceli, Silveri, Nocentini et al., 1988; Zingeser & Berndt, 1988). However, subsequent clinical investigations, along with behavioral evidence and studies using brain-imaging techniques, led to conflicting results (for extensive reviews, see Crepaldi, Berlingeri, Cattinelli et al., 2013; Crepaldi, Berlingeri, Paulesu et al., 2011; Moseley & Pulvermüller, 2014; Vigliocco et al., 2011). One recurring methodological problem was the variable overlap of grammatical and semantic categories. While nouns often refer to objects, and verbs to actions, there are frequent instances of overlap. Nouns can refer to actions as in a jump, a look, a punch (etc.), and lexemes with abstract referents like love, charm, smell (etc.) are semantically related across noun and verb categories. In clinical and experimental studies, investigations had focused on lexemes referring mostly to concrete objects and actions. But the tasks and stimuli used were not controlled in terms of the potential semantic

68

Questions of Ontology

overlap of noun and verb forms, which weakened the conclusion that words were separately represented in terms of their grammatical class. When studies began to control for this confounding overlap with stimuli that dissociated semantic aspects of lexemes from their grammatical category, results emerged where activated cortical areas and response times on lexical tasks reflected semantic rather than grammatical-category effects (Siri, Tettamanti, Cappa et al., 2008; Tyler, Bright, Fletcher et al., 2004; Vigliocco, Vinson, Arciuli et al., 2008). As an example, in their study involving functional magnetic resonance imaging (fMRI), Moseley and Pulvermüller (2014) used a passive reading task where nouns and verbs were presented expressing concrete and abstract referents. They found that the latter did not activate different areas of the brain for nouns and verbs. In other words, it is only when the semantics of lexemes is concrete that different topographies of activity arise. In another report, Vigliocco et al. (2011) discussed an extensive body of work showing that differential neural and behavioral responses attesting to effects of grammatical class only arise when there is a processing of combinations or clusters of elements, as when presenting lexemes with inflections or with distributional contexts. For instance, in one revealing demonstration, Vigliocco et al. (2008) used a priming paradigm to show that syntactic class information is not automatically retrieved as a property of lexemes, but can be a property of elements in a “frame” (p. 176), which can be retrieved with a possible differential delay when participants hear a “phrase.” In their experiment, the primes were verb and noun lexemes presented in isolation or within short clusters the +noun and to+verb, while the target lexemes were all unambiguous verbs. The effect of priming for grammatical category only appeared in the cluster or phrases (not for isolated lexemes), indicating that the syntactic-category information was not processed as an attribute of “words,” but as an attribute of a distributional cluster. From these and other demonstrations, and drawing from an extensive review of clinical and neuroimaging results, the authors concluded that the evidence “ . . . strongly suggests that there are no distinct brain signatures for processing words from different grammatical class” (Vigliocco et al., 2011, p. 423). In terms of the aforementioned approaches to syntax, the findings appear to be compatible with a constructivist view where grammatical categories emerge as a result of items occurring within a distributional cluster or chunk. In this approach, categories are not a combinatorial principle of “words.” On the other hand, the findings refute a generative syntax, where formal rules serving to combine words into sentences operate on assumed substantive categories like noun and verb. To be clear on the interpretation, the invalidation in this case is not based on demonstrations of whether or not nouns and verbs are “true universals.” The above evidence specifically demonstrates the non-existence

The Recent History of Attempts to Ground Orthographic Concepts

69

in the brain of these syntactic categories as substantive properties, a view that suggests an orthographic concept of syntax. To further comprehend the implications of the above results in a historical light, one can use the metaphors of shoehorning and bootstrapping found in the literature. As discussed previously, some critics (e.g., Nathani & Oller, 2001; Oller, 2000) see conventional language analysis as tending to shoehorn observations of speech to what can be represented by signs of the IPA. Such a bias also extends to syntax. Linguistic analyses of transcripts representing the speech of children have involved the shoehorning of units and categories that require, for their identification, observer-dependent knowledge of an orthographic code. Within generative theories, these units and categories are assumed to be innately known by children, despite fundamental bootstrapping problems. In fact, it has long been known that there are no signature marks serving to divide speech into words, phrases, or sentences so the problem arises as to how children would assign syntactic categories to forms that have no sensory markings, or where divisions are often difficult (as in single-syllable noun-verb phrases). In this context, findings showing that word, phrase, and sentence segmentation is influenced by training in alphabet writing, along with evidence that putative categories as basic as “noun” and “verb” are not in the brain, critically invalidate theories that assume that children are bootstrapping pre-existing syntactic categories to known units in speech. 3.4

Abandoning the Competence–Performance Divide

From the standpoint of some advocates of an emergentist program and constructivist accounts of language acquisition, language use and processes of vocal communication are structuring agents. This presents an ontological shift with respect to an “autonomous linguistics” and the doctrine of a speech– language division. As a consequence, authors have explicitly recognized that a language theory erected on analyses of script does not as such account for the structure of spoken language, and that formal grammars cannot serve to build an understanding of how language develops or how utterances convey meaning. To illustrate the shift, one can consider the following position statements: The distinctions between form/substance and competence/performance, having served their historical purpose, should be abandoned. (Lindblom, 1999, p. 206) Language is shaped and constrained by a semiological function and by interactive functions of performance. (Langacker, 1995 and 1991) The phenomena of language are best explained by reference to more basic non-linguistic (i.e., ‘non-grammatical’) factors and their interaction – physiology, perception, processing, working memory, pragmatics, social interaction, properties of the input, the learning mechanisms, and so on. (O’Grady, 2008, p. 448)

70

Questions of Ontology

This renouncement of the Cartesian division between performance and competence, between speech and language as Saussure saw it, bears major epistemological implications for language study. In particular, if it is recognized that spoken language is shaped by constraints relating to physiology, perception, memory, and properties of signals, then the elements and structural attributes of vocal communication may not be defined or investigated by reference to linguistic descriptions using orthographic concepts. Obviously, other methods are required with a view on observable elements and structures in the vocal medium. Yet some who advocate a rejection of the performance– competence division express a reluctance to abandon conventional units of linguistic description (e.g., Jackendoff, 2009). Studdert-Kennedy (2000), for one, suggested that a reference to these units may be inescapable: Every phonetician is familiar with the fact that spectrograms do not divide the acoustic flow of speech into a sequence of discrete, invariant segments corresponding to the segments of linguistic description (Fant, 1962; Liberman et al., 1967). Yet, perhaps because “the sounds of the world’s languages” are physical events amenable to increasingly sophisticated acoustic analysis, speech scientists have been reluctant to accept that “ . . . there is no way to avoid the traditional assumption that the speaker-hearer’s linguistic intuition is the ultimate standard that determines the accuracy of any proposed grammar, linguistic theory, or operational test” (Chomsky, 1965, p. 21). Many speech scientists have continued to hope that advances in speech technology or behavioral analysis may enable them to shed the introspective methods still burdening their colleagues in syntax. (p. 276)

This view expresses the perennial difficulty of conceptualizing spoken language when instrumental observations of utterance structure fail to support writing-induced intuitions of letter-like segments, words, phrases, and sentences. However, abandoning the performance–competence division implies that structures of speech are not separate but constitutive of spoken language, and because they are readily identified in signals, writing biases can be avoided. Defining elements and structures in relation to constraints on physiology, memory, signal processing (etc.) can entail an unconventional concept of language processing in that the units that emerge from biological mechanisms are not likely to reflect the traditional writing units of linguistic description. On the other hand, there is, as should be clear from the previous chapters, a long-standing tendency in fields of language study to overlook empirical observations of speech structure when they fail to support constructs of writing. This problem, relating to epistemology and held views on the relevance of instrumental records, is the subject of Part II. Postscript – On the Use of the IPA and Terms of Latin Grammar in the Present Work This book centrally examines the mechanisms that shape elements and structures of spoken-language processing. To avoid confusion, readers should keep in mind

The Recent History of Attempts to Ground Orthographic Concepts

71

that the structures in question are designated using the terms syllable-size cycles, chunks, or temporal groups, and breath units of speech or utterances, all of which have observable correlates in speech. The physiological mechanisms underlying these frameworks are detailed in Part III, and the preceding terms should be kept separate from those that refer to the writing-induced concepts of phonemes, syllables (as groups of phonemes), words, phrases, sentences, and the conventional category labels assigned to these units such as consonant, vowel, noun, verb, determiner, adjective (etc.). Alphabetic signs (IPA), like other orthographic units and classifications that draw from Latin grammar, are used as convenient labels and symbols for readers. However, within the present work, these signs and labels have little to no epistemological value in understanding processes of spoken language. For instance, IPA transcripts can present distributional regularities suggesting chunks or clusters of elements. Such analyses can be useful and revealing, but they do not serve to define the nature of chunking underlying the clusters, and IPA letters do not capture the timing marks of chunking in action sequences. Familiar terms like consonant and vowel can also be useful as general indicators of closing and opening actions of the oral tract. But they do not apply to non-existent letter-like units in speech (as is documented in this book). Similarly, as discussed earlier, categories such as verb, noun, pronoun, adjective (and others) can serve as familiar place labels for distributional classes of items within chunks of speech. On the other hand, these labels do not represent actual semantic-syntactic categories in the brain. In short, the reader is cautioned that orthographic terms and units can be wholly inadequate and misleading in representing physiological or physical factors so they should not be interpreted as actual units or classes of spoken-language processing. With these cautions in mind, elements and structures that, in the present work, relate to processes are referenced to observable marks in signals or output obtained through the application of instrumental techniques at different levels of observation.

Part II

Questions of Epistemology: The Role of Instrumental Observations

4

Recognizing the Bias

If, as expressed by some authors, language phenomena are best explained by neural and motor-sensory processes, signal properties, and contextual interactions, then the question arises as to whether conventional linguistic analysis can guide research on spoken language. A realistic response to such epistemological issues may be that it depends on one’s objectives. If the aim is to account for judgments on the “grammaticality” of sentences in a text, then multivariate statistical analyses of speakers’ perceptions using formal syntax as a predictor may be a suitable approach (if one factors in an individual’s knowledge of a writing code). If the goal is to develop a code of alphabet writing serving to distinguish items in a dictionary, then traditional phonological analyses might suffice. In other words, language analysis involving conventional orthographic signs and categories may be useful in pursuing certain goals. On the other hand, such analyses present obvious limitations to building an understanding of the physiological processes of spoken language, as emphasized previously. Yet in sectors that focus on these processes, conventional units and categories are essentially what guide investigators. As Branigan and Pickering (2016) note: “In fact, most psychologists of language largely have shied away from making claims about linguistic representation and instead adopt the representations proposed by linguists” (p. 6). Historically, this acceptance of linguistic assumptions was bolstered by formal theories where categories and units of language analysis were taken to be part an inborn competence. For instance, Chomsky propounded that the ability to constitute words and sentences reflects the workings of a language faculty and that conventional linguistic units and categories are “logically necessary” (Chomsky, 1965, p. 3; 1980, pp. 28–29; and in Hauser et al., 2002). The contentiousness of such pronouncements and the arguments for a separate language faculty were discussed earlier. But theoretical arguments aside, the dominant use of writing-induced concepts of analysis in experimental sectors, despite the evidence invalidating these concepts, brings to light prevailing epistemological issues. The belief that speakers come into the world with a capacity to form letterlike phonemes, words, and sentences has withstood a range of counterevidence, from the earliest instrumental observations of speech to studies that have 75

76

Questions of Epistemology

repeatedly shown that notions of phonemes, words, and sentences arise with knowledge of alphabet writing. Neuroscientists, as noted earlier, frequently criticize the imperviousness of language theories to experimental invalidation. Many have expressed pessimism as to whether empirical findings can ever contribute to reassessing linguistic models and analyses (e.g., Embick & Poeppel, 2015; Ferreira, 2005; Grimaldi, 2012, 2017; Poeppel, 2012; Poeppel & Embick, 2005, and others cited in the Introduction). As to why, in this situation, writing-induced notions of units and categories continue to guide much experimental work, there is a long-standing tradition in sectors of language study to overlook the implications of observed structures in utterance signals, and this tradition may be traced back to certain critical claims. In particular, it will be recalled from Section 3.3.1 that Prague phonologists believed that the criterion of functional distinctiveness, which they analyzed by way of minimal word pairs, was a primary principle serving to guide speech observations. This argument bore basic epistemological implications. It conferred a primacy to analyses of speech using observer-based knowledge of a writing code over empirical observations. Yet, as emphasized earlier, nothing in the criterion of distinctiveness supports the premise that sound features are limited to contrasting word pairs, or that they occur in letter-like packets. These presuppositions find no theoretical justification and essentially derive, historically, from the practice of recording speech via writing conventions. Nonetheless, the held primacy of linguistic criteria in the early periods of language theorization can explain the unmistakable tendency in language study to overlook instrumental evidence that does not support conventional units and elements of language analysis. To clarify this tendency in a historical perspective, the following sections provide a review of the research on basic units conceptualized as phonemes. More than any other concept, the notion of letter-like segments has withstood a body of invalidating observations using various physiological and acoustic techniques. This serves to illustrate a point of epistemology: in sectors of language research, there is a tendency to set aside instrument-based data and to refer instead to transcripts that, as such, presuppose a knowledge of writing signs and units. Contrary to what instrumental records show, analyses of IPA transcripts most often lead to interpretations supporting conventional concepts of language theory at putative “high levels” of processing. Numerous cases illustrate this tendency, which has definite consequences on relating language theory to observations. 4.1

On the Tradition of Overlooking Instrumental Observations: The Case of the Phoneme

In the study of spoken language, perhaps the most obvious textbook example of the disregard for instrumental records of spoken language involves basic

Recognizing the Bias

77

acoustic observations. As noted in reference to Figure 1.2, readily obtained oscillographic and spectrographic records of utterances reveal structural patterns that appear across languages. Conversely, the records show no generalizable marks serving to isolate phonemes or words. Objectively, these observations invalidate writing-induced units of conventional language analysis, and this was how early instrumentalists like Stetson (1928/1951) interpreted such results. He saw no support for letter-like entities in kymographic waveforms of speech, and language theorists who assumed these units were held to account (p. 137). Yet this interpretation did not prevail. Speech observations are not widely acknowledged as invalidating traditional linguistic concepts. To the contrary, many maintain assumptions of letter- and wordlike units at an abstract level, and this tendency to overlook types of evidence is not limited to acoustics. It extends to a broad range of physiological data. One historical example is the observation of “co-articulation.” 4.1.1

From Instrumental Records of Co-articulation to Transcribed Spoonerisms

In the 60s and 70s, numerous studies used electromyography (EMG), cineradiography, and motion transducers to examine the sequencing of articulatory events in speech (Amerman, Daniloff, & Moll, 1970; Fromkin, 1966b; Kent & Minifie, 1977; Kent & Moll, 1972; Kozhevnikov & Chistovich, 1966; MacNeilage & DeClerk, 1967, among others). One overarching result of this work was that feature-related contractions and motions of the lips, tongue, and jaw were not sequenced in terms of strings of consonant and vowel phonemes, but reflected co-articulation. There were clear implications: co-articulation undermined the assumption that feature-related events occur in sequential bundles where each bundle is represented by a letter seen as a phoneme (Jones, 1929). It should be emphasized that many articulatory motions associated with different features are not timed in terms of any specific sequencing unit and can spread across articulated sounds in different languages. Some even spread across entire utterances. This can occur for velum lowering in producing nasal sounds, which is not constrained to occur in terms of the sequencing of oral motions. Similar spreading can variably occur with labial motions, or motions of the tongue root and pharynx as when producing emphasis in Arabic (see also Ladefoged, 1971, 1973; Moll & Daniloff, 1971, commented in Boucher, 1994). However, this spreading does not apply to major-class “consonant” and “vowel” features or constriction points. For instance, in all spoken languages, closing motions in producing [t] or [k] do not extend along the time axis. They are necessarily sequential, occurring at specific points on the time axis, and thus serve to reveal basic sequencing units of speech. However, observations on the timing of feature-related motions by reference

78

Questions of Epistemology [ku]

[ki]

Figure 4.1 Tracings of radiographic recordings for [ku] and [ki] (original data from Rochette, 1973). Constriction points for the “vowel” (back for [u], front for [i]) and labial motions (rounded for [u], unrounded for [i]) are articulated concurrently with those of the “consonant” [k], suggesting a single unit of articulation and not consecutive consonant and vowel units as represented in the IPA transcript.

to sequential oral constrictions do not support the notion that speakers are planning features in letter-like bundles on a line. As an example, one can consider co-articulation in cases such as [ku] and [ki]. Transcripts of these units imply that labial motions for the vowels [u] and [i] are produced after the constriction [k]. In reality, several reports, some of which date back to the early 70s, show that feature-related motions for the consonant and a vowel co-occur in producing the syllables, as illustrated in Figure 4.1. Frontal motions of the tongue body for a vowel [i] are co-produced with the closing motion giving a frontal (or palatal) [k] in [ki], but back motions for the vowel [u] are co-produced with the closing motions leading to a back (or velar) [k] in [ku]. Moreover, the feature-related motions of lip spreading or rounding also co-occur with the oral closing motion for [ki] and [ku] (e.g., Daniloff & Moll, 1968; Kent & Moll, 1972; Kozhevnikov & Chistovich, 1966). Thus, instrumental data, which have been available for some time, demonstrate that feature-related motions, specifically those that do not extend along the time axis, are not produced or planned in successive phoneme bundles, but co-occur in single “CV”units. This was essentially the conclusion of MacNeilage and DeClerk (1967) who provided perhaps the most extensive co-articulation study. Their investigation used both electromyographic (EMG) and radiographic recordings to compare the timing of articulations in consonant-vowel-consonant contexts (CVC). The results showed that a syllable-like cycle, noted as a “CV form,” presented a “cohesive unit.” Fromkin (1966a) observed this coherence and also saw that “the minimal linguistic unit corresponding to the motor commands which produce speech is larger than the phoneme, perhaps more of the order of

Recognizing the Bias

79

a syllable” (p. 170). Yet this evidence was not taken to signify the collapse of the phoneme concept. Rather, the authors referred to indirect psycholinguistic evidence of transcribed spoonerisms as supporting phonemes (Fromkin, 1970; MacNeilage, 1970, and see Fromkin, 1973). This dismissal of instrumental observations was expressed by MacNeilage and DeClerk when discussing their results on context-effects of speech production, which showed that there were no sequential activations of muscles corresponding to phonemes on a line: If it is conceded that the . . . mechanisms outlined here are the main means by which context effects on muscle contraction can be imposed on invariant phoneme commands introduced into the motor system, then it must be concluded that invariant phoneme commands are an unlikely basis for a number of the results observed here. However this does not mean that the idea of invariant phonemic units can be ruled out at all levels of the command system. The existence of spoonerisms in which a single phoneme is permuted (e g. “tasted the whole worm” for “wasted the whole term”) is evidence of the behavioral reality of the phoneme at some stage of the production process. (MacNeilage & DeClerk, 1967, pp. 38–39, emphasis added)

Far from provoking a reassessment of the idea of letter-like phonemes in motor theories, the invalidating data were set aside. In referring to evidence from transcribed speech errors, many models of production and perception that followed co-articulation studies maintained phonemes and included “lookahead” schemes to account for co-articulation. These theories developed in parallel to a variety of formal rule-based models of phoneme adjustment introduced by Chomsky and Halle (1968). In modeling context effects this way, the implication was that in articulating constriction place features, literally all phonemes had to be abstractly adjusted to neighboring vowels to conform to observed co-articulatory patterns. These unwieldy conversion schemes further led to debates as to whether the supposed adjustments to accommodate the phoneme concept reflected biomechanical properties or were centrally planned (e.g., Ostry, Gribble, & Gracco, 1996 contra Whalen, 1990). The point of interest here is that, in these developments, many investigators shifted their focus to transcribed speech errors with the view that such data corroborated the existence of phonemes. In terms of epistemology, this implied that indirect evidence of spoonerisms, gathered mostly by way of IPA representations, had precedence over EMG and various kinematic observations in guiding investigations of basic units of production and planning. The shift gave rise to a body of work on spontaneous errors that generally sought support for the assumption of phonemes, and other conventional linguistic units (for model-oriented reviews from the last fifty years of speech-error research, including laboratory-induced errors, see Bencini, 2017; Pouplier, 2007; Pouplier & Goldstein, 2010). On the other hand, to this day, three fundamental problems undermine speech-error evidence, both in terms of the inadequacy of IPA transcripts in recording errors and the failure, within the literature, to

80

Questions of Epistemology

acknowledge that the distributional patterns of errors rule out letter-like entities as basic planning units. The first problem (outlined by Boucher, 1994) relates to the selective focus of speech-error research on letter substitutions, permutations, and omissions as evidence of the psychological reality of phonemes to the exclusion of inviolate aspects of errors that overrule phonemes as basic sequencing units. Specifically, the rationale for speech-error research is that, in producing utterances, speakers have to plan the sequential order of feature-related motions in terms of units. Errors in sequencing due to anticipation or perseveration can occur and this has served to reveal planning units of varying length. Thus, anticipation or perseveration errors across elements within a lexeme (e.g., kerrific for terrific), across elements within a group of morphemes or lexemes (e.g., spicky points for sticky points), or across elements belonging to larger groups (e.g., he said “have you studied a lot” and I sot – I said . . .) were interpreted as suggesting that speakers plan a series of speech motions for “word”-length, “phrase”-length, “clause”-length units respectively (these examples are taken from Shattuck-Hufnagel 1983, and for other examples see Fromkin, 1973, and Stemberger, 1983). But if this rationale holds for phonemes as basic sequencing units, then given the frequency of occurrence of phonemes, one would expect a preponderant number of errors where VC is produced for CV, or the reverse. The problem is that no transcribed errors of anticipation or perseveration occur across adjacent “consonant” and “vowel” classes: no error occurs where C is produced for V, or V for a C within a syllable cycle (see also Crompton, 1981; Shattuck-Hufnagel, 1992). In other words, the idea of phonemes as basic planning units is not borne out by distributions of speech errors since errors in sequencing consonant and vowel classes only appear across an extent no shorter than two successive syllable-size cycles. Moreover, the frequent suggestion that phonemes might nonetheless be independently planned as basic sequencing units and that syllables simply constitute “frameworks” for these units is equally questionable by the aforementioned principle in that, if phonemes are independently planned, there would be frequent errors of permutation, substitution, shift (etc.) between successive consonants and vowels (on the frames hypothesis as applied to normal and pathological speech, including paraphasias and verbal apraxia, see, e.g., Berg, 2006; Marquardt, Sussman, Snow et al., 2002; Saito & Inoue, 2017; Shattuck-Hufnagel, 1979, 1983; Sussman, 1984; Vousden, Brown, & Harley, 2000). That this does not occur in corpora of transcribed speech errors stands as a largely unacknowledged contradiction to interpretations that spontaneous errors provide “strong evidence” of phonemes as basic sequencing units (Fowler, 1985; Fowler, Shankweiler, & Studdert-Kennedy, 2016). In neglecting this inviolate aspect of speech errors, general interpretations of errorful patterns as evidence of independently planned phonemes essentially

Recognizing the Bias

81

draw from the way in which errors are represented by letters. By using the IPA to record errors, the inevitable impression is that errors involve substitutions, permutations, and omissions of letter-like bundles of features, and that syllables or even adjacent consonants in clusters can be broken down into these letter entities. For instance, Fromkin (1973) argued for the psychological reality of phonemes as basic units by noting that clusters are divided in errors like blake fruid for brake fluid or frish gotto for fish grotto (and see Cutler, 1980). On the other hand, instrumental records do not support such interpretations, and this brings to light the questionable circularity of using letter signs to investigate the existence of letter-like planning units (cf. the observations of Boomer & Laver, 1968; Browman & Goldstein, 1990; Roberts, 1975 cited in Pouplier 2007). Considering this second problem, investigators of speech errors who use instrumental methods have criticized the use of transcripts in no uncertain terms. For instance, based on their EMG observation of tongue twisters, Mowrey and Mackay (1990) concluded that “[t]he results indicate that traditional methods of data collection on which most speech error corpora are based are inadequate. Production models based on these corpora are not supported by the electromyographic data and must accordingly be revised” (p. 1299). In using acoustic and motion-based observations, Pouplier and Hardcastle (2005) expressed a similar criticism: While the field [of speech-error research] has in general always been aware of limitations of transcription as a research tool, it has only become evident on the basis of acoustic and articulatory evaluations of anomalous utterances to what extent and in which way the automatic perceptual prefiltering may have provided an incomplete and in some aspects even misleading picture of the nature of errors. (p. 230)

But compared to transcribed errors, instrumental records of spontaneous speech errors are a rarity. There are, on the other hand, numerous studies of induced “slips of the tongue” which use EMG, electromagnetic articulography (EMMA), electropalatography, and other techniques. In these investigations, slips of the tongue are induced in tasks where speakers repeat syllables with similar consonants. In such contexts, errors usually arise from a perseveration of motions across syllables. A consistent finding that has emerged from these experiments is that slips of the tongue involve errors on the gradation of motion amplitudes or myographic activity controlling changes in muscle lengths. Gradation errors may not lead to perceptible auditory changes and so may not be transcribed (for similar errors in clinical groups and paraphasias, see Kurowski & Blumstein, 2016). As an example, in the extensive work of Pouplier and Goldstein, errors are induced by having speakers repeat sequences such as cop top. In a study using these repetitions, EMMA recordings showed that errorful motions of the tongue dorsum, despite not creating a heard [k],

82

Questions of Epistemology

were present during the articulation of [t]. Thus, perseveration can create interference across syllables underlying speech errors, but this may not be captured by auditory perception and transcripts (Goldstein, Pouplier, Chen et al., 2007; Pouplier, 2007; Pouplier & Goldstein, 2005, 2010). Still, several authors reject these observations of induced slips of the tongue on the argument that these induced errors lead to the production of sounds that are not present in a speaker’s language, which rarely occurs in spontaneous speech errors. For this reason, many feel that slips of the tongue or tongue twisters may represent a different case with respect to “phonological planning errors” (Fowler et al., 2016; Frisch & Wright, 2002; Levelt, Roelofs, & Meyer, 1999; Stemberger, 1989 contra Pouplier, 2007). This objection, which questions the relevance of experimentally induced errors, leaves few instrumentbased observations of spontaneous errors. But one exception is a report by Boucher (1994), which examined spontaneous errors that occurred during a radiographic study of articulatory patterns in French (Rochette, 1973). Of the errors that appeared in the recording sessions, some were cases where a speaker produced an error, leading to unintended well-formed lexical items, and then produced, in a separate utterance, the same lexical item as the intended forms. In other words, these cases presented rare recordings where a verbal item, produced by error and intentionally, could be compared in order to identify errorful motions that create sounds that are present in a speaker’s language (qualifying as putative “phonological planning errors”). Details of the radiographic recordings showed effects of perseveration not previously reported. The recordings also supported criticisms that transcripts can misrepresent essential aspects of speech errors. In one case, a speaker produced a perseveration error transcribed as [tRwaRUt – tRwavUt] (trois routes – trois voûtes). The transcripts performed by trained independent observers suggested a substitution of a phoneme (an entire “feature-bundle”) where a uvular [R] replaced a labial [v], giving two lexemes: the error routes and the intended voûtes. However, the radiographic recordings revealed that the transcripts were misleading: both the error and intended forms contained labio-dental motions for [v], as seen in Figure 4.2. As such, the error in routes reflected a perseveration effect of a preceding highback consonant, the [R] of trois. The effect can be explained as follows: the perseveration of the high-back motion for the consonant [R] in trois (syllable1) caused an overshoot of the backward motion of the tongue body for the vowel [-U-] in voûtes (syllable2) that created the error [RU] of routes. Thus, perseveration of a consonant motion in one syllable affected a vowel motion in a following syllable. A second perseveration error revealed a similar miscue on the amplitudes of motion. In this case, the transcripts [bjɛf̃ ɛ – bjɛf̃ Rɛ] (bien faits – bien frais) suggested the omission of a phoneme [R] in the cluster [fR] of frais leading to

Recognizing the Bias error [Ru] in “routes”

83 corrected [vu] in “voûtes”

Figure 4.2 Tracings of radiographic recordings for a spontaneous speech error transcribed as [Ru], but which contained labio-dental contact. Note that both the error routes and intended voûtes are lexemes in the language.

corrected [fR] in “frais”

error [fε] in “faits”

normal [fε] in “fait”

Figure 4.3 Tracings of radiographic recordings for a corrected form frais following a spontaneous speech error faits, and a normally articulated faits by the same speaker (adapted from Boucher, 1994). The dotted lines indicate motions following the articulation of the labio-dental [f]. Note that the errorful form faits is produced with motions of the tongue dorsum that are absent in the normally articulated form faits.

the error [fɛ] faits. The speaker also produced, in a separate utterance, an intended lexeme faits, which allowed a comparison with the unintended, errorful form faits. This revealed again that the transcripts were misleading: as seen in Figure 4.3, tongue backing for the uvular [R] was present for the errorful form despite being transcribed as [fɛ] (see the absence of tongue backing for a normal form [fɛ]). The error reflected a perseveration effect: the high-front tongue motion for the vowel complex [-jɛ]̃ in bien (syllable1) caused, in the following syllable, an undershoot of the backward motion of the tongue body for the consonant [R] (syllable2) leading to the “omission” error in faits. Thus, perseveration of a vowel motion in one syllable interfered with

84

Questions of Epistemology

a consonant motion in a subsequent syllable. In short, the cases showed that spontaneous errors were miscues on amplitudes or the gradation of motions and did not involve substitutions or omissions of letter-like units, contrary to what was represented in the IPA transcripts. But more importantly, the above examples reflect a planning in terms of coherent syllable units. Specifically, in one case, perseveration of a consonant motion in syllable1 created an errorful overshoot of a vowel target in a following syllable2. In the other case, the perseveration of a vowel motion in syllable1 created an errorful undershoot of a consonant target in a following syllable2. These cases do not support a separate planning of phonemes. They show that, in articulating a syllable, perseveration of a consonant-related motion can create an error in an accompanying vowel target, or perseveration of a vowel-related motion can create an error in an accompanying consonant target, which implies a concurrent planning of consonant- and vowel-related motions in terms of a coherent syllable unit. Thus, contrary to the aforecited interpretations of Fromkin, and MacNeilage and DeClerk, there is no conflict between these spontaneous errors and observations of co-articulation where, for instance, place features for consonants co-occur with place features for vowels, as in Figure 4.1. Both sets of observations imply that consonant- and vowel-related motions are not activated or “planned” successively as separate phonemes on a line, but concurrently in a coherent CV-like cycle, constituting a base sequencing unit. Although the preceding examples involve but two spontaneous errors, the observed perseveration effects on the amplitude of motions reflect the types of gradation errors that have been repeatedly observed in experiments by Goldstein and colleagues. Taken together, this evidence does not support the existence of phonemes. In fact, Browman and Goldstein (1990) dismiss phoneme segments as basic units and purport that speech errors “provide no evidence for the behavioral relevance of the segment” (p. 419). In sum, instrument-based observations of speech errors have led to different conclusions as compared to analyses of transcribed errors. This discrepancy can be easily explained. The latter method imposes a segmental format on error analyses: in using IPA signs, it seems that errors involve the displacement or omission of features in letter-like packets on a line. Such impressions are not supported by distributional aspects of errors, nor by instrumental records, both of which point to syllable-size cycles as fundamental sequencing units. But the continual use of the IPA leads to a third and perhaps more basic issue that relates to the circularity of investigating the existence of letter-like phonemes via alphabet-style signs. Accepting IPA transcripts as evidence of letter-size planning units at some abstract level presupposes that culture-specific writing signs can, as such, capture units of speech processing. For instance, Studdert-Kennedy (1987) saw that, in using letters to investigate the existence of phonemes, “the data

Recognizing the Bias

85

which confirm our inferences from the alphabet rest squarely on the alphabet itself” (p. 46, emphasis added). In other words, if one accepts an analysis based on the IPA as evidence of representations in speakers’ brains, then one is assuming that letter signs are not merely cultural conventions but reflect actual letter-like entities at a biological level. From this viewpoint, the previously mentioned instrumental observations could be dismissed on the belief that the very invention of letters supposes some prior concept of phonemes. This is the claim of some authors who point out that the invention of the Greek alphabet implies a preliterate and biologically determined awareness of letter-like entities in speech (as in Fowler, 2010, p. 58: “Their inventors must have had the impression that the spoken language had units to which the letters would correspond”). Such views carry profound implications and can have several consequences when applied in clinical or teaching sectors. However, historical records suggest another account of the origin of letter signs in Greek script, one which does not imply any innate knowledge or awareness of letter-like segments in speech. 4.1.2

On the Origin of Alphabet Signs: The Hypothesis of a Preliterate Awareness of Phonemes

Sampson (1985) and Faber (1990), among others, have argued that the invention of the Greek alphabet may not have arisen from a momentous realization of phonemes in speech. Instead, the development likely involved adapting Canaanite syllabaries of West Semitic, as reflected in Old Aramaic or Phoenician script. In these writing codes, only consonant-related features are represented while vowels are inferred (thus the difficulty in qualifying these systems as “syllabaries” and the use of the terms “alphasyllabaries” or “consonantaries”). Supporting the claim that the Greek alphabet system arose from an adaptation of existing scripts are the evident similarities between early Greek signs and Semitic systems, as illustrated in Figure 4.4. In light of the correspondences between Greek and Aramaic signs, historians agree that the Greek alphabet reflects more an adaptation of existing graphic systems than an indigenous creation and that the adaptation was a matter of adjusting signs to suit Ancient Greek (see also Daniels, 2017; Daniels & Bright, 1996; Powell, 2009). For instance, according to Stetson (1928/1951), Semitic languages possessed a limited set of vowels and only four or five CV syllables (where C can be a “consonant cluster”). Vocalic elements could thus be inferred from consonant signs. Greek, on the other hand, presents many more vowels and a larger set of syllable patterns such that vocalized sounds were not easily inferred from consonant signs alone. To insure a differentiation, Ancient Greeks took the step of adding vowel signs, but added these as consecutive elements on a line.

86

Questions of Epistemology

Figure 4.4 Faber’s (1990) list of early Greek letters with their precursors in Old Aramaic script (adapted with permission). As specified by Faber (p. 36), the Greek signs in parentheses are found in archaic materials and were not preserved in Classical Greek. “Group A includes Greek letters adopted directly from a Canaanite source. . .. Where two values for these letters are given, the first one represents the Greek value and the second the Semitic value. Group B contains letters based on Semitic prototypes that were added to the end of the Greek alphabet. The alternate values represent variation in early Greek orthographic traditions.. . . Group C represents indigenous Greek developments.”

As one can surmise, the adaptations of Semitic consonantaries implied a consideration of how visual signs could be changed to accommodate the sounds of Ancient Greek. It is important to note that, in this adaptation of graphic signs, the placement of vowels on the line need not imply an innate awareness of consonant and vowel elements as consecutive phonemes in heard speech. In fact, in other systems related to late Indic and Ethiopic abugidas, vocalic elements are visually represented by adding diacritics or changing the details of consonant signs, producing single graphic units, not separate signs on a line, or by adding a separate vowel sign to the left, the right, above, or below consonant signs (for a history of these systems, see Salomon, 1996, 2000). Such eclectic representations hardly suggest a natural predisposition to being aware of consecutive consonant and vowel units or letter-like phonemes in speech. Still, some authors contend that morphology and preliterate abilities to manipulate segments in language games may have influenced the rise of

Recognizing the Bias

87

a phoneme awareness underlying letter signs, if not for Ancient Greeks then in other cultures (e.g., Mattingly, 1987; Sproat, 2006). For instance, Sproat (2006) argues that, even though an awareness of segments most often requires the support of script, some preliterate notion of phonemes could have played a role in the late Indic Brahmi system, where there are segmental signs. The view is that these signs may have resulted from the oral tradition of reciting the Vedas, which gave rise to manuals of recitation that influenced Brahmi writing. As to how an awareness of letter-like segments in speech could have emerged in an oral tradition, Sproat (2006) suggests that “one can speculate that such phonological sophistication may have arisen out of the language games that were played by Vedic practitioners for the purpose of preserving the sacred texts in an oral tradition” (p. 69). On the other hand, historians point out that a writing system existed during the Vedic age, and thus could have influenced the use of segments in Brahmi (Patel, 1993) and language games. Salomon (1996) also notes that, although the origin of Brahmi is controversial and involves numerous indigenous marks, about half of the writing signs of early Brahmi can be associated with Semitic precursors, suggesting, as in the case of the Greek alphabet, influences of previously existing scripts (for more details, see Share & Daniels, 2016, and the references therein). Of course, studies of ancient writing do not as such serve to determine whether or not the originators of alphabetic signs had a notion of letter-like phonemes in speech. The correspondence between Greek and Aramaic script illustrated in Figure 4.4 simply shows that one need not refer to a preliterate awareness of phonemes to account for the graphic invention of the Greek alphabet. A more relevant way of evaluating the hypothesis of a preliterate knowledge of segments is to refer to a body of research on whether or not preliterate individuals, or individuals who are literate in non-alphabetic systems, are explicitly aware of letter-like units in speech. This research is discussed subsequently. But as noted in Section 3.2.2, studies have repeatedly shown that the development of constructs of letters and words links to training on a writing system. Yet such findings have had little impact on assumptions of linguistic analyses involving transcripts, and this is perhaps best explained by the effects of writing. In fact, for anyone with decades of training on an alphabet system, it can be difficult to acknowledge that letters are representations, not sounds. Said differently, it is difficult to objectify one’s cultural constructs (this is further discussed in Section 4.2). Such effects are important in reviewing studies of phoneme or phonological awareness in that, in this research, speech stimuli and test material are commonly represented via letters of the IPA without reference to structures of sounds. Such a circular approach whereby speech material represented with letters is used to ascertain an awareness of letter-like constructs often underlies arguments for an innate, preliterate

88

Questions of Epistemology

awareness of phonemes. However, in reviewing these arguments, one should bear in mind that they entail a questionable claim on the naturalness of alphabetic signs and units. Specifically, in light of the variety of writing systems that have appeared throughout history, claiming that the ancestry of the Greek alphabet reflects a preliterate awareness of letter-like units carries the idea that alphabet writing is somehow more in tune with the real units of spoken language than other systems. This idea is most often implicit, but occasionally expressed, as when suggesting that the use of letter signs by ancient scribes shows a level of “phonological sophistication” (Sproat, 2006) or that alphabet writing is more “transparent” than other sound-based systems (Rimzhim, Katz, & Fowler, 2014 contra Share & Daniels, 2016). Critics remark that such value judgments are not limited to historical works, where authors have propounded an evolutionary account “by which writing evolved from the primitive stages [logographic and syllabic] to a full alphabet” (Gelb, 1963, p. 203). They are also found in research reports, as Share (2014) has noted. For instance, DeHaene (2009) mentions that “[t]he Phonecian system, however, was not perfect. It failed to represent all vowels” (p. 193), and Hannas (2003) expresses the opinion that, compared to alphabet writing, Asian logographic systems may have hampered creativity (for further examples of such value judgments that extend to clinical investigations and neuroscience, see Daniels & Share, 2018; Share, 2014). It is the case that some writing codes may be more efficient than others on a selected dimension, though comparisons are highly problematic. As several historians warn, writing systems are variably designed to represent not only sound features, but meaning and prosodic patterns (Coulmas, 2003; Daniels, 2017; Daniels & Bright, 1996). For instance, alphabetic systems may code certain sound features more efficiently than logographic systems, but studies indicate that readers of logograms can understand a text quicker than readers of an alphabet (Lü & Zhang, 1999). Thus, systems that may be more efficient on one dimension can be less efficient on another, and designing test material for cross-system comparisons is impractical. Moreover, suggesting that soundbased systems may be more efficient and transparent than other systems because of a 1:1 correspondence with IPA signs (e.g., Rimzhim et al., 2014) is presumptuous of the universality of alphabet writing. In this approach, systems are evaluated in terms of an alphabet standard, the IPA, which was originally adopted for teachers of European languages (recall Section 1.3.3). There is little question that the IPA is not “international” but centered on the writing tradition of certain cultures. Such a standard does not take into account that sound-based systems do not always represent sound features with letters (e.g., the case of syllabaries and consonantaries) and that writing signs are not only designed to represent feature distinctions for items in dictionaries (e.g., there are no IPA equivalents for sounds symbolized by punctuation marks and

Recognizing the Bias

89

other signs like ). In fact, the efficiency of any type of sound-based system in representing units and features cannot be assessed without first determining the units and features that exist in spoken-language communication. Thus, some sound-based systems represent units with symbols for syllables, others use letters, and still others use a mixture of syllabic and morphological signs. How can one determine which system reflects speech and a speaker’s “phonological awareness”? Such questions are not resolved by experimental tasks where presented contexts and speech stimuli are transcribed in IPA or assumed to contain letter-like units. This presents a central epistemological problem especially for research on reading. 4.1.3

Testing Phoneme Awareness: Issues in Defining Reference Units

In considering the testability of claims of a preliterate awareness of phonemes, one faces a paradox. It has long been demonstrated that one cannot divide utterances into letter-like phonemes. Nonetheless, various indirect methods have been used to investigate whether speakers are aware of phonemes at a more abstract level, such as the detection of rhymes and alliterations, and similar indirect approaches appear in research on word awareness (see Veldhuis & Kurvers, 2012; and on the idea that rhymes and alliterations show an awareness of phonemes, Section 4.2). But this does not mean that sensory and motor aspects of speech do not offer direct support for an awareness of units. As discussed in Part I, Stetson (1928/1951) provided some of the first observations of syllable-size cycles. Further observations of co-articulation, as reviewed earlier, have also led researchers to conclude that there are coherent CV-like units of speech sequencing. Interestingly, it has been acknowledged in the development of text-to-speech systems that one cannot concatenate speech sounds into anything smaller than a “diphone,” which is a unit that captures coarticulation (see Hinterleitner, 2017; Ohala, 1995). Beginning in the 40s, developers of dictation systems came to realize that there were no invariant units in speech acoustics corresponding to IPA letters and phonemes. For readers who are unfamiliar with the invariance problem, Figure 4.5 from Liberman et al. (1967) provides a classic example. The figure shows the essential spectral patterns required to synthesize /di/ and /du/. One can appreciate that any attempt to segment and relate perceptual indices like formant transitions to a consonant letter is problematic as the indices vary with vocalic formants. Thus, the unit of feature categorization is not perceptually localized in a phoneme-size segment of speech but extends minimally to a syllable-size cycle of co-articulated elements. Recent neuroimaging evidence also shows that syllable-cycles in speech entrain endogenous neural oscillations, confirming that the cycles constitute processing frames (more on this in Chapter 8). In short, there is substantial evidence in various sectors for basic syllable-size

90

Questions of Epistemology

Figure 4.5 Basic spectrographic values that are sufficient to synthesize /di/ and /du/, from Liberman et al. (1967).

units in speech articulation, perception, and neural processing, and this evidence can be of value in research on phonological awareness. At the same time, these observations illustrate the aforementioned paradox of attempts to apply the phoneme concept in the fields of reading instruction and language acquisition. Specifically, in supposing that speakers become aware of phonemes in speech (and other units like words and sentences), the problem arises as to how this awareness comes about when the units are not marked in signals and, by extension, to how individuals come to link letters to speech in learning to read. This paradox has led to contortions in defining terms like “phoneme awareness” and “phonological awareness,” which are central to research on reading as it relates to alphabet systems (for a recent review of applications of phonological awareness, see Gillon, 2018; for criticisms of definitions, Geudens, 2006; Scholes, 1998). To illustrate this terminological difficulty, one can consider the oft-cited definition from Stahl and Murray (1994) who – “use the term phonological awareness rather than phoneme awareness because in many cases it is seen that learners of alphabet code are referring to units larger than a single phoneme”: The relationship between phonological awareness and early reading has been well established since the 1970s (see Adams, 1990, for a review). Phonological awareness is an awareness of sounds in spoken (not written) words that is revealed by such abilities as rhyming, matching initial consonants, and counting the number of phonemes in spoken words. These tasks are difficult for some children because spoken words do not have identifiable segments that correspond to phonemes; for example, the word dog consists of one physical speech sound. In alphabetic languages, however, letters usually represent phonemes, and to learn about the correspondences between letters and phonemes, the child has to be aware of the phonemes in spoken words. (p. 221, and see footnote 1)

One should remark that, in the definition, it is implied that speech sounds can as such provide a physical basis for segmentation (“one physical speech

Recognizing the Bias

91

sound”), although the authors recognize that signals present no identifiable segments corresponding to phonemes. There are, as just noted, observations that define sequential units of speech that are not considered. But then the claim is that learners come to link letters to phonemes by an awareness of these units in speech, as attested by indirect tasks of rhyming, alliteration detection, phoneme counts (etc.). Implicit in this circular definition is the idea that letters hook onto known phonemes and the issue, then, is how this knowledge arises. On this pivotal question, the absence of phoneme segments in speech signals leaves few options: either an awareness of phonemes is viewed as arising with the learning of certain types of graphic signs, in which case phonemes are culturally acquired constructs, or such awareness is seen to arise from a preliterate knowledge of letter-like entities that happens to be reflected in alphabet-writing cultures (a similar choice was worded some two decades ago by Scholes, 1998). Some reports suggest variable effects of morphology, language games, and experience with sign distributions as script-independent sources of phoneme awareness (e.g., Fletcher-Flinn, Thompson, Yamada et al., 2011; Kandhadai & Sproat, 2010; Mann, 1991; Mattingly, 1987). However, as illustrated in the following, the reports are based on analyses of transcribed speech where sound manipulations or detection are interpreted by references to letters of the IPA and this presents, again, circular evidence (letter representations of speech are used to examine abstract letter-like phonemes, as in the case of speech errors). In evaluating the evidence of phonological awareness, one can refer to numerous studies involving populations of preliterate children or adults who do not know alphabet writing. These studies derive from independent research groups and use various tasks that focus on how individuals can manipulate and identify assumed phonemes in speech stimuli. For some, the studies collectively constitute “the most thoroughly developed body of research on phonological processing skills” (Geudens, 2006, p. 25). One widely acknowledged finding of this work is that illiterate children and adults manifest little to no awareness of phonemes in speech as compared to “syllables.” For instance, children who are learning to write and illiterate adults or individuals who have acquired non-alphabetic systems are good at tapping out syllables of heard lexemes, but not phonemes, and such an ability appears to be independent of general cognitive skills (Liberman, Shankweiler, Fischer et al., 1974; Morais et al., 1979). Another widely acknowledged finding is that illiterate adults or adults who only know a non-alphabetic system have great difficulty in “deleting consonants” of heard lexemes (Bertelson & De Gelder, 1991; Bertelson, Gelder, Tfouni et al., 1989; Morais et al., 1979; Read, Yun-Fei, Hong-Yin et al., 1986). In comparing the effects of learning alphabetic orthographies to Chinese characters, Miller (2002) found that differences in how children develop phonological awareness are largely predicted by differences in the writing

92

Questions of Epistemology

systems: “Thus, orthographic structure appears to be a major source of the conscious understanding of language that children develop in the course of learning to read and write their native language” (p. 17; see also Morais, Kolinsky, Alegria et al., 1998, and for the orthographic predictability of phonological awareness in learners of an abugida, Reddy & Koda, 2013). In sum, in tasks where one would expect a general ability to segment phonemes, if it is assumed that these units emerge from a preliterate competence, one finds, to the contrary, that illiterate individuals show almost no explicit awareness of phonemes. Rather, such awareness appears to accompany the learning of graphic letter-like signs or glyphs. There are opposing interpretations that refer to other tasks and graphic material, as outlined in the following. However, throughout the body of research on phonological awareness, one encounters a pervasive difficulty in defining reference units in speech stimuli or contexts used in various tasks. Most often, units in speech are conceptualized with reference to transcripts and orthographic units. More specifically, studies often evaluate phonological awareness through tasks where the number of phonemes or syllables in speech stimuli are assumed to reflect the number of IPA letters or “syllables” (viewed as groups of letters) used to transcribe the stimuli. The division of syllables in this case, either before or after a letter, follows no explicit principle. The difficulty, however, is that representing speech with sequences of letters arranged as CV, VC, CVC (etc.) hardly serves to define units in the actual speech contexts that are used in the tasks. Syllabification patterns, for instance, may not be represented as static Cs and Vs and can vary depending on how the speech stimuli are produced (see the examples of Cummins, 2012, on the varying numbers of beats in producing Carol [one or two], naturally [two or three], etc.). More generally, in evaluating the phonological awareness of elements and units in spoken language, a reference to speakers’ production appears essential (as emphasized by Geudens & Sandra, 2003). This is especially critical for languages where one cannot assume that speech patterns follow those of the observer’s speech or orthographic concepts, as in the case of Asian languages (cf. Mann, 1986, 1991). For example, Catford (1977) noted that a lexeme like sports can be judged by Japanese speakers to have four syllable beats or moras, principally because this is how such items are produced by native speakers (approximately [sɯporɯtsɯ] with some variants). On the particular concept of “syllable,” it needs to be acknowledged that the linguistic/phonological categories of “consonant” and “vowel,” as well as notions of syllables as groupings of Cs around a V nucleus, link conceptually to orthographic codes of alphabet writing. The notions are not reflected in speech. In fact, across languages, motions related to consonants can, as such, constitute syllable-like cycles or beats such that there are “voweless syllables.” This not only applies to syllabic consonants [n, l, m, r] as in the English forms

Recognizing the Bias

93

button [bʌtn], bottle [bɑdl], bottom [bɑdm], butter [bʌdr] where [tn, dl, dm, dr] can constitute syllable beats (for examples in other languages, see Ridouane, 2008). It can extend to fricatives and fricatives next to stops or “affricates,” as in keeps [kips], batch [batʃ], badge [badʒ] (and so on, consider [ps] and [ts] as separate beats in it keeps going, a tourist’s pass), and it also applies to “final stops” as in CVC where the stop can be uttered with a voiced schwa or voiceless release (for photoglottographic and acoustic evidence of these release motions, see Ridouane, 2008). Thus, there are observations that show that a sequence represented as “CVC“ or “VC” is made up of two close-release cycles, with each cycle constituting two units of production that can be represented as separate units in memory. This will be demonstrated in Part III. On the latter view of syllable pulses or cycles, it is important to mention that several studies on phonological awareness in children have reported a cohesion in CV units, compared to VC, depending on the types of consonant motions that are produced (see the reviews of Geudens, 2006; Treiman & Kessler, 1995, and references to observations by Geudens & Sandra, 2003; Schreuder & Bon, 1989). For example, in tasks where beginning readers were asked to divide presented lexemes containing CV and VC by slowing down their speech, it was found that the learners had much more difficulty in dividing stop consonants in CV than in VC, whereas syllabic consonants [n, l, m, r] were easier to segment. This varying cohesion can be explained by Stetson’s principle of syllable-like cycles presented earlier (while also considering that the cycles also reflect the contraction and relaxation of muscles, as illustrated in Chapter 7). In producing stops like [p, t, k, b, d, g] the closing motion creates relatively high oral pressure that entails a pressure-releasing motion, or as Stetson noted, one cannot halt a pressure-building motion without producing a following voiced or voiceless motion that releases pressure. Consequently, in articulating CVC with stops in slowed speech, there will be a tendency to separate a CV close-release cycle from a subsequent cycle for -C where the release is voiceless or a voiced schwa. Other types of closures, such as for low-pressure syllabic consonants, do not forcibly entail a releasing motion and can thus be separated as a stand-alone cycle. This principle is reflected in the above observations of children’s slowed speech. The children had difficulty in dividing initial stops in CV because it is inherently difficult to halt the pressure-building motion without producing the following (voiced) release, whereas dividing stops in VC is easier, not because the children were aware of a phoneme in this context, but because -C is a separate cycle and they were not attempting to halt the pressure-building motion. It was also easy for them to separate syllabic closures basically because they could halt these low-pressure motions. In sum, units of production stand as syllable-like contraction-relaxation cycles that are reflected in pressure fluctuations and periodic rises in acoustic

94

Questions of Epistemology

energy. Preliterate children or adults readily perceive these periodic units as demonstrated by their ability to count syllable beats in speech or separate these units in slowed speech. But the units do not reflect putative phonemes or groupings of phonemes as represented by C and V letters. And yet it is this latter concept that most often guides evaluations of phonological awareness. Thus, children in the above tests using slowed speech might be seen to have poor phoneme awareness in segmenting CV where C is a stop, but better awareness when C is a syllabic consonant or when a stop appears after a V. It will be noted that one cannot explain this differential performance by reference to C and V representations of phonemes. In fact, applying such concepts can lead to a systematic misinterpretation of children’s awareness of units as involving letter-like entities when other units and speech processes are involved. 4.2

The Looking-Glass Effect: Viewing Phoneme Awareness by Reference to IPA Transcripts

Some authors have expressed their reluctance to recognize that phonemes are cultural constructs and that phoneme awareness arises with the use of particular writing signs (e.g., Fowler et al., 2016; Geudens & Sandra, 2003; Kazanina, Bowers, & Idsardi, 2018; Sproat, 2006, and others cited in this chapter). Yet there are studies that specifically demonstrate the illusory effects of writing. In research on reading for instance, a study by Ehri and Wilce (1980) reported that children were aware of “extra phonemes” when similar sounding items were written with differing numbers of letters. The effect appeared in a segmentation task involving orally presented lexemes. In the task, the learners tended to detect an additional element in pitch, but not rich, as in other pairs (e.g., catch-much, new-do; and similar artifactual effects were observed with written pseudo-words). This is not uncommon, as Goswami (2002) noted: “Even most college students cannot make judgments about phonemes without the support of letters” (p. 47). Analogous effects have been reported with respect to several writing systems. One example is a study by Prakash, Rekha, Nigam et al. (1993) involving Hindi speakers learning an abugida. Using a “phoneme deletion” task, the authors observed that speakers were aware of a separate consonant phoneme in heard lexemes when an inherent (unpronounced) vowel accompanying the consonant was written with a glyph. In sum, illusory “extra phonemes” are not limited to letter signs and can involve the support of different types of graphic signs (see the case of writing in Braille, discussed by Sproat, 2006). But in these reports, interpretations that the extra elements are phonemes are based on an analysis of presented speech stimuli by reference to the IPA. This looking-glass approach bears general

Recognizing the Bias

95

implications for the validity of phonological awareness tests that are both designed and analyzed in terms of alphabetic signs. To briefly describe the latter problem, it should be noted that these evaluations depend entirely on how the analyst conceptualizes speech sounds. In the aforementioned analysis, as is the dominant approach in phonological awareness tasks, responses are evaluated by referring, not to structures of speech sounds, but to representations of the speech sounds in terms of IPA letters and groupings of letters seen as syllables. Thus, in the study by Ehri and Wilce, the learner’s ability to detect elements was viewed as an “artifact” of writing in that the extra elements in pitch as opposed to rich did not match the number of letters in the IPA representations [pIč] and [rIč]. But then this assumes a priori that the IPA letters reflect true units of speech. In considering such analyses, it should be mentioned that, in the earliest developments of the IPA, attempts to represent sounds using “one letter per phoneme” (Jones, 1929) led to debates on how many phonemes there are in diphthongs, semivowels, stops, geminates, and affricates, including articulations written as , which are variably transcribed as two units [tʃ] or one [č]. In a segmentation task aimed at evaluating phoneme awareness, elements like [tʃ]/[č] in pitch and rich are likely to be uttered as a separate syllable-like cycle, but interpreting these articulations as a phoneme (or two) is entirely dependent on the choice of IPA signs to represent speech. In considering such cases examined by Ehri and Wilce, one is led to wonder which inference presents an “artifact” of script: is it the participant who is conceptualizing an extra sound in the written form pitch, or the observer who is conceptualizing numbers of phonemes in the transcript [pIč] or [pItʃ]. And how would one go about deciding how many units there are in the stimuli. In fact, there is substantial evidence of units in speech, as noted earlier. But these do not align with constructs of letters and thus cannot be observed through letter-representations of the IPA. Ehri and Wilce (1980) concluded from their findings that phoneme awareness is a “consequence of printed word learning” and suggested that letter constructs arise from an interactive grapheme-sound process (p. 380; also Ehri, 1998, 2014; Ehri, Wilce, & Taylor, 1987). These conclusions are instructive, but not only for research on reading. They also draw attention to the broad problem of the influence of constructs of writing used by specialists to analyze speech and speech processes. As mentioned in Section 1.1, several historical works have documented that writing has profoundly affected the way spoken language is conceptualized (Coulmas, 1989, 2003; Harris, 1986, 1990, 2000; Linell, 2005; Olson, 1993, 2017). Studies like these attest to the fact that, as a consequence of decades of training on a writing system, people develop constructs of elements and units that do not reflect speech signals. The constructs can influence an individual’s perception to the point that it becomes difficult for them to recognize that letters

96

Questions of Epistemology

are representations and not sounds. Of course, in a practical sense, acquiring constructs of letters can be an imperative in learning to read and write. On the other hand, the approach taken in the literature on phonological awareness does not generally acknowledge the influence of writing in linguistic analysis. Yet evaluating phonemic and phonological awareness by referring to the writing constructs of any culture carries an inherent centrism, as emphasized previously. The biasing effect, which IPA signs have on viewing speech as containing phonemes, is readily illustrated when one considers the writing constructs of other cultures. As a typical example, many reports cited in this section repeatedly mention the production and detection of rhymes and alliterations, as well as consonant deletions in some reports, as evidence of an awareness of phonemes – even for speakers who do not know alphabet writing. However, using a sound-based system of another culture, for instance Japanese katakana, can lead to quite different interpretations. Figure 4.6 lists katakana signs that an analyst might use to note rhyming sounds (the signs in columns a and b), alliterations (the paired signs in rows 1 and 2), and “consonant deletions” (signs in row 3). Using these notations instead of IPA, rhyming, alliteration, and deletion may not appear to involve sound features of phonemes but sound features of syllables (or mora). The point here is that, by using script to represent the sounds, the interpretation that speakers are manipulating phonemes or syllables is based wholly on a conceptualization of the sounds by reference to writing signs. The critical issue, however, is what writing system, if any, serves to capture the basic units of speech. Who is right, and more importantly, with reference to what?

Figure 4.6 Katakana signs representing how similar-sounding syllables can be variably interpreted depending on the choice of writing systems. The three pairs of signs in each column (a) and (b) are rhyming syllables (identical “vowels”); pairs of syllables on lines (1) and (2) are alliterations (identical “consonants”); the two syllables on line (3) differ in terms of onsets (as in “consonant deletions”). Note that the view that rhymes, alliterations, and deletions involve separate consonant and vowel “phonemes” rather than different syllables rests upon the choice of letter or syllable signs to represent sounds.

Recognizing the Bias

97

In overlooking observations of co-articulation, acoustic perception, neural entrainment, and referring instead to IPA representations in investigating an awareness of units, one is in effect stating a priori the superiority of culturespecific constructs of letters. Thus, in choosing to use letters rather than the syllabary representations of Figure 4.6, rhymes (transcribed as [ku, tu, u], and [ki, ti, i] for signs in columns a and b), alliterations ([ku, ki], [tu, ti] for signs in rows 1 and 2), and deletions ([u], [i] for signs in row 3) will appear to involve letter-like entities, or phonemes. But as discussed earlier, there is no epistemological justification for choosing letters to represent sound features of speech. Historically, the choice to use a European-style alphabet for the IPA was circumstantial. Nonetheless, some authors have taken the stand that such signs reflect innate units of speech processing. For investigators involved in research on reading, this discussion may seem superfluous. Many, if not most, interpret the body of work on phonological awareness as supporting the view that intuitions about phonemes in speech arise with the support of certain graphic signs where graphemes need not be visual but can include such systems as Braille (Ehri 1998; Morais, Alegría, & Content, 1987). However, others contend, to the contrary, that there is a scriptindependent awareness of phonemes in speech, that these units are logically necessary, and that they are present somewhere in the brain (e.g., Fowler, 2010; Fowler et al., 2016; Kazanina et al., 2018; Peeva, Guenther, Tourville et al., 2010; Studdert-Kennedy, 1987, 2005). Fowler et al. (2016), for instance, present nine lines of indirect evidence and argumentation in support of the idea that phoneme awareness arises from an inborn competence. Lundberg (1991) and Mann (1991) are cited as offering somewhat unique evidence from research on reading. The following is a brief critique. Lundberg (1991) summarizes the results of two of his studies, but provides few details. In one study, he tested 51 of 200 school children, 6–7 years old, who were classed as “illiterate.” Some of these children performed well on phonological awareness tasks. On the most difficult tasks, four could reverse the phonemes in short lexemes, nine could segment phonemes, and three did well in a synthesis task. Lundberg also refers to a second study (Lundberg, 1987) involving 390 6-year-old children of whom 90 percent performed well on rhyme recognition and some could perform tasks such as phoneme deletions. From the results, the author concluded that in “rare or exceptional cases we do find nonreaders with well-developed phonemic awareness” (Lundberg, 1991, p. 50). As for Mann (1991), a longitudinal study of Japanese children learning a syllabary and logographic kanji was taken to suggest a developing awareness of phonemes, independent of any knowledge of alphabet signs. Both Mann (1991) and Lundberg (1987) cite, in addition to observations on school children, Bradley and Bryant (1985) on language games as cases of textindependent phoneme awareness (their references also include Sherzer 1982;

98

Questions of Epistemology

McCarthy 1982; and for similar arguments, see Bryant & Goswami, 2016). It should be clarified, however, that the methods used in these studies do not collectively warrant the conclusion that phoneme awareness arises independently of graphic signs. In considering Lundberg’s results, one might initially ask how 51 of 200 Swedish children, between 6–7 years of age, could be classed as illiterate or having little contact with text when, as the author notes (Lundberg, 1999), Swedes, according to an 1995 OECD survey, presented the highest scores on all measures of literacy and Swedish children have the largest access to books at home. But the more critical point is that Lundberg’s conclusions are based on phoneme-awareness tasks that involve visual aids to segmentation that act as visual substitutes for letters. These visual accessories were used across tasks on phoneme synthesis, segmentation, and phoneme reversals. For instance, a word like [pIt] can be orally presented by sounding out [pə][I]-[tə] with schwas for stops (Lundberg, Olofsson, & Wall, 1980), and each syllable of this sequence is associated with a visual marker. Lundberg uses pegs on a board, others use chips or cards with letter signs

. After some practice in associating syllables with the markers, the order of markers can be modified, and the learner can be asked to read these out or create new combinations. For example, in a phoneme deletion or reversal task, the learner can be asked to read out sequences of pegs or cards or create words such as , and

. Success on such tasks suggests a “text-independent” awareness of phonemes in name only since the awareness relies on visual substitutes for letters. Moreover, in sounding out words like [pə]-[I]-[tə], one is not working with phonemes but syllable cycles, or if one is assuming phonemes in these stimuli, then reversals would imply [əp] and [ət]. In short, Lundberg’s results hardly support the conclusion that phonemic awareness arises without the support of graphic signs. Finally, analyzing speakers’ manipulation of rhymes (or rimes) or consonant onsets in terms of the IPA does not as such show an awareness of phoneme segments as opposed to syllables. As noted earlier in reference to Figure 4.6, such interpretations are wholly based on letter representations of sounds. Morais and colleagues repeatedly point out that the ability to detect or manipulate rhymes and alliterations can reflect a sensitivity to sound similarities at a “non-segmental level of analysis.” But this does not entail an awareness of letter-like phonemes as it seems when representing rhyming or alliterated sounds with IPA signs. In fact, preliterate individuals who are good at detecting and producing rhymes and alliterations can be unable to count phonemes or delete consonants in heard lexemes (e.g., Morais, 1991; Morais et al., 1987, and for illiterate adults, see de Santos Loureiro, Braga, do Nascimento Souza et al., 2004). As for Mann’s (1991) interpretation that Japanese children manifest phoneme awareness independent of alphabet writing, this was criticized by

Recognizing the Bias

99

Spagnoletti, Morais, Alegria et al. (1989) who point out that, among other omissions, Mann did not consider that Japanese children learn syllabaries by using a matrix in which characters in columns share the same sounds represented as consonants except for one column, which presents isolated vowels (an example of a grid can be found in Spagnoletti et al., 1989). In short, the performance of Japanese children on phoneme awareness tasks is not independent of visual representations in script, or glyphs in learning non-alphabetic systems. Also, the grids, as such, show that letters are not necessary to analyze functional sound features in that similarities or differences in sounds can also be represented by signs in a syllabary. In other words, nothing in the perception or manipulation of sound features suggests that the features occur in letter-like packets or phonemes on a line, which is a notion that arises when analyses are based on alphabetic transcripts (contra Uppstad & Tønnessen, 2010). A similar omission by Mann appears in referring to McCarthy (1982) who describes a language game in Bedouin Hijaze. In this game, speakers appear to transpose consonants in a transcribed example “baatak,” “taakab,” “taabak,” “baakat” (etc.), while repeating prosodic patterns and syllables. The author fails to mention that Bedouin Hijaze is written with an Arabic consonantary where vowels are inferred. Thus, speakers are engaging in a sound game that suggests a link with signs of Classical Arabic. However, it is in representing this game with alphabetic signs rather than signs of consonantary or syllabary that the notion arises that the game involves manipulations of letter-like phonemes. Again, it has been established that preliterate individuals who can detect and manipulate rhymes and alliterations, as in this game, can be unaware of how many phonemes there are in heard speech. These reservations can similarly extend to other reports of language play, all of which present questionable interpretations based on letter representations of sounds in speakers’ production or presented stimuli (contra Sherzer, 1982 and references therein). 4.2.1

“Phonological” Evidence of Phonemes Versus Motor Processes

Aside from the cited studies and research on spoonerisms (discussed previously), Fowler et al. (2016) refer to other types of indirect linguistic observations in arguing for an inborn phoneme awareness. Indeed, for some, phonemes exist and “linguistic evidence continues to be the best evidence for their existence” (Kazanina et al., 2018, p. 561). All of this evidence, however, is indirect in that it rests on transcripts. In other words, the arguments are not based on observations of sounds or speech processes but on representations of sounds using letters. For instance, Fowler et al. (2016) mention the classic illustration in Figure 4.6 by Liberman et al., which is representative of direct observations of acoustic-perceptual attributes that invalidate a phoneme concept, as some have recognized (e.g., Diehl, Lotto, & Holt, 2004; MacNeilage &

100

Questions of Epistemology

Ladefoged, 1976). Co-articulation is also discussed, but not in terms of invalidating phonemes, nor as evidence of cohesive CV-like units (as outlined in Section 4.1). It is acknowledged that these observations demonstrate “that [phoneme] segments do not exist in public manifestations of speech” (Fowler et al., 2016, p. 126). But this evidence is set aside, and the authors focus instead on indirect observations based on script. Analyzing speech as strings of IPA signs, though, can lead to a conceptualization of speech processes as involving operations on letter-like entities. On the other hand, as in the case of spoonerisms, quite different accounts of speech phenomena can be formulated when examining speech using instrumental methods as opposed to letters on a page. The aim here is to offer another illustration that analyses of transcripts, which some authors view as evidence of phoneme manipulation, can be explained by processes that do not imply these units. In arguing for an innate phoneme awareness, Fowler et al. (2016) present the case of recitations of Homeric verse by ancient Greek bards in reference to a historical work by Parry (1987). Briefly described, the rhythm of versification required a balancing of vowel length, which can vary according to the consonant that follows. Fowler et al. rely on Parry’s transcripts, based on alphabet writing of Ancient Greek, and claim that the “determination of ambivalent vowel length required counting consonants” (p. 130). This is taken to suggest that illiterate bards possessed an awareness of phoneme segments. In a related example, the authors refer to systematic processes of vowel lengthening before voiced stops as in cap-cab, which can apply productively to languages and novel expressions (the case of aspirated stops before a stressed vowel is also mentioned). In the authors’ view, such productive changes imply operations on phonemes as some phonologists describe lengthening via formal rules that adjust phonemes represented with letters (roughly, V becomes +long before a +voice C). But how much a vowel needs to change in milliseconds to be considered “long” is not specified in phonological analyses of static IPA signs. In short, in describing changes in features using letters, it seems that adjustments operate in terms of letter-like phonemes. A different account that does not assume operations on phonemes can be provided if one considers instrumental records. Observations of speech suggest two principles that underlie the cited patterns outlined by Fowler et al. The first principle refers to the effect of oral pressure on voicing. Essentially, voicing or vocal-fold vibration is dependent on air flowing through the glottis. Producing stops with varying force creates variations in intra-oral pressure that can modulate transglottal flow that powers vocal-fold vibration (Boucher & Lamontagne, 2001). The second principle relates to compensatory adjustments within rhythm “chunks.” Several early reports have described these compensatory effects (Catford, 1977; Kozhevnikov & Chistovich, 1966; Lindblom, Lyberg, & Holmgren, 1981;

Recognizing the Bias

101

Martin, 1972; Shaffer, 1982). Chunking is a central principle developed in Part III. At this point, it suffices to mention that it is a domain general process whereby motor sequences are perceived and produced, not one item at a time, but in blocks. Motions within chunks manifest a degree of relative-timing constancy, an effect that has been attributed to central motor programs (discussed by Kelso, 1997; see also Schmidt & Lee, 2014). Thus, while the overall duration of chunks varies, the relative timing of motions within these blocks remains stable and “self-adjusts.” Figure 4.7 provides an illustration of how these two principles interact so as to draw a parallel with the case of rhythm in versification and vowel length effects noted by Fowler and colleagues. Panel A in the figure presents a record of lip motions, oral pressures, and vocal-fold vibrations using electroglottography (EGG, and rectified/smoothed EGG) during the production of Bobby. In this case, Bobby is a chunk within the utterance I saw Bobby. The first principle to observe is that the production of closing motions with varying force reflects in oral pressure that influences vocal-fold vibrations, and this is seen in the correspondence between rising pressure and diminishing vibrations in the EGG curves. Voicing is thus modulated by oral pressure, and high-pressure closing motions for consonants can interrupt voicing altogether (Baken & Orlikoff, 2000). The second principle bears on the production of stress rhythm that involves varying the force of closures, as reflected in the greater pressure rises for [bɑ] compared to [bi] in Bobby. Forceful motions are generally longer than less forceful motions and this leads to compensatory adjustments within chunks: long motions in producing a forceful (stressed) syllable entail more rapid motions in neighboring syllables, as seen in the changing velocities of labial closures in panel A. These compensatory changes, however, do not imply operations on phonemes or an awareness of such units. Panel B shows how such compensatory timing can apply productively across languages and dialects. In producing papa in French (which has no lexical stress rhythm) and in varieties of British English (with a stress on the second cycle) and American English (with a stress on the first cycle), compensatory changes can appear within the chunks. It should be clear that such changes are not the result of speakers adjusting the millisecond values of letter-like phonemes. Panel C illustrates the effect of producing closing motions implying varying force. Again, producing closures creates rises in oral pressure, which reduces the airflow through the glottis and thus vocal-fold vibrations or voicing. A highpressure stop like [t] in the beat will cut off the flow and vibrations sooner than low-pressure stops like [d] in the bead or [n] in the bean, where there is negligible oral pressure. The acoustic effect, as displayed in panel C, will appear as a “lengthening” of periodic energy for a “vowel” when followed by a low-pressure closure that has little to do with rules of phoneme adjustment.

102

Questions of Epistemology

Figure 4.7 Illustration of dynamic characteristics of speech production. (A) Labial motions (at mid-line), intra-oral pressure, and effects on vocal-fold vibrations given by electroglottographic (EGG) recordings and rectified/ smoothed EGG, during the production of Bobby in the carrier phrases I saw . . . . Note that rising pressure corresponds to correlated decreases in vocal-fold vibrations. (B) Acoustic waveforms of papa produced in French, and in British and American English where varying stress leads to compensatory changes within an isochronic chunk (dotted lines). (C) Acoustic waveforms showing that high-pressure closing [t] in the beat can reduce vocal-fold vibration (as in A) compared to low-pressure [d] in the bead and [n] in the bean. This reduction of vocal-fold vibrations via oral pressures creates “vowel length” effects.

Recognizing the Bias

103

To summarize, in the context where one is producing isochronic rhythm chunks, as would apply in versification, vowel length varies with the force of closing motions, which naturally modulates voicing and creates compensatory timing changes within chunks. Thus, the balancing of rhythmic chunks can well reflect a detection of the audible effects of different motions on the timing of syllables within chunks, which concurs with the attested ability of illiterate individuals to count syllable beats (but not phonemes) in heard speech. In other words, this balancing of rhythm units hardly supports the claim of a preliterate awareness of phonemes, or rules that adjust millisecond values of vowel length by “counting phonemes.” Furthermore, depending on how speakers produce stress rhythms, with more or less force (or no force differential, as in French), compensatory changes can productively apply across languages and expressions. This can extend to several changes that occur when varying the force of oral closures, such as the systematic production of aspirations for high-pressure consonants. These principles of motor processes do not imply batteries of rules that adjust all phonemes on the way out to speech. In fact, the latter concept arises from viewing speech via strings of IPA letters that fail to capture any of the inherent dynamic properties of articulated sounds. The point is that conventional analyses of spoken language using letters do not constitute evidence of an awareness of phonemes or operations on these units. Other accounts can explain systematic aspects of speech where postulates of phonemes are neither necessary nor unavoidable. 4.2.2

On Arguments of the “Logical Necessity” of Phonemes and the Success of Alphabet Systems

Several authors have argued that the units used in language analysis, such as phonemes, words, and sentences, not only derive from an inborn language competence but are also “logically necessary” (Chomsky, 1965, 1980a; Donegan, 2015; Fowler, 2010; Fowler et al., 2016; Studdert-Kennedy, 2005; and others cited in Section 3.3). For instance, Fowler et al. (2016) quote Jacob, a biologist, who commented that the hierarchical organization of language and genetics “appears to operate in nature each time there is a question of generating a large diversity of structures using a restricted number of building blocks. Such a method of construction appears to be the only logical one (Jacob [1997], p. 188).” Likewise, Studdert-Kennedy (2005) states that phonemes are “mathematically” necessary by referring to principles of genetics and physics: Fisher (1930) . . . recognized that a combinatorial and hierarchical principle is a mathematically necessary condition of all natural systems that “make infinite use of finite means,” including physics, chemistry, genetics, and language. He dubbed it “the particulate principle of self-diversifying systems.” Briefly, the principle holds that all such systems necessarily display the following properties: (i) Discrete units drawn from

104

Questions of Epistemology

a finite set of primitive elements (e.g., atoms, genes, phonetic segments) are repeatedly permuted and combined to yield larger units (e.g., molecules, proteins, syllables/words) above them in a hierarchy of levels of increasing complexity; (ii) at each higher level of the hierarchy . . . units have structures and functions beyond and more diverse than those of their constituents from below; (iii) units that combine into a [higher] unit do not disappear or lose their integrity: they can reemerge or be recovered through mechanisms of physical, chemical, or genetic interaction, or, for language, through the mechanisms of human speech perception and language understanding. (p. 52)

For some readers, such arguments might appear syllogistic: (1) atoms and genes combine hierarchically to form molecules and proteins, which reflects a “mathematically necessary condition”; (2) phonemes combine hierarchically to form syllables and words; therefore (3) phonemes reflect a mathematically necessary condition. The problem is that phonemes (like words and sentences) link conceptually to writing. They are cultural constructs, and are no more biological or physical than if one conceptualizes speech units in terms of Arabic consonantaries or Japanese syllabaries. Yet the authors contend that phonemes, as represented by letters, are the real units of speech arising from a biological competence for language and suggest that this can explain the success of letter-based writing systems. Thus, the alphabet is “the most widespread and efficient system of writing” (Studdert-Kennedy, 1987, p. 68). Fowler (2010) also sees the success of alphabet writing as attributable to its inherent efficiency in representing phonemes: First, there are alphabetic writing systems. Their inventors must have had the impression that the spoken language had units to which the letters would correspond. Yet they had no alphabetic writing system to give them that impression. Second, there is the remarkable success of alphabetic writing systems. That is, most people taught to read become very skilled. Reading becomes sufficiently effortless that they read for pleasure. Is it likely that these writing systems would be so effective if they mapped onto language forms that individual members of the language community did not command? (p. 58)

As documented in Section 4.1.2, the invention of the Greek alphabet does not forcibly imply a preliterate awareness of phonemes (or the efficiency of alphabet systems). Contrary to this belief, numerous studies on phoneme awareness indicate that notions of letter-like phonemes arise with the support of certain graphic signs and not others. Further, the above claim, which judges different writing systems on their efficiency in representing speech sounds, assumes a priori that sound units are letter-like. But phonemes are not manifested in speech, as the authors admit. Syllable-like cycles, on the other hand, are in speech. Thus, in comparing the efficiency of systems on sound representation, it should be weighed that the aforecited body of work on phonological awareness has repeatedly shown that, for illiterate or preliterate children and adults, syllables in speech are much easier to count and segment than phonemes. By

Recognizing the Bias

105

these observations, then, letter signs are hardly efficient. As for the statement that alphabet systems are easier to learn than other systems, this is unfounded. For instance, Taylor (1988) noted that, compared to learning an alphabet code, which can take years of formal schooling, Japanese katakana is so easy to learn that most children “pick it up” before entering school. There is also the issue of whether alphabet writing might not exacerbate certain types of dyslexia compared to other systems. This is suggested by cases of bilingual individuals who know both alphabet and Japanese writing systems and who show signs of dyslexia in reading an alphabetic text, but not in reading Japanese script (Himelstein, 2011; Wydell & Kondo, 2003). There is also prevalence data showing a lower incidence of dyslexia in Japan than in populations that use an alphabet (Johansson, 2006). Finally, as for the opinion that the spread of alphabet writing is due to some inherent efficiency in the system, many geopolitical, economic, and cultural factors cast doubt on this opinion. Olson (1993) offers a relevant example: At the end of World War II, Douglas MacArthur, commander of the Allied forces, was urged by a panel of Western educationists to completely revise the educational system of Japan and abolish ‘Chinese derived ideograms’ if he wanted to help Japan develop technological parity with the West! . . . In fact, an authority on Chinese science, J. Needham (1954–59, 1969) has recently concluded that the Chinese script was neither a significant inhibitory factor in the development of modern science in China nor an impediment to scientists in contemporary China. (p. 2)

Cases like these illustrate that the belief that orthographic concepts arise from an inborn competence and are best reflected in alphabet systems is not without real-world consequences (other critiques along these lines can be found in Lurie, 2006; Share, 2014). 4.2.3

Effects of Writing on Speakers’ Awareness of Words, Phrases, Sentences

Much of this discussion on phoneme awareness can similarly extend to investigations on awareness of words, phrases, and sentences. As with phonemes, preliterate children and adults (or individuals literate in non-alphabetic systems) do not consistently segment words in continuous speech until they acquire an orthographic code. But the types of tests used in research on reading and phonological awareness do not address the nature of the segmentations that appear to come naturally to illiterate speakers. As reviewed above, the general finding that illiterate individuals have much more difficulty counting or manipulating phonemes than they do syllables does imply an awareness of units, although these do not conform to assumed letter-like entities in test material. Similarly, as Veldhuis and Kurvers (2012) note, it was evident from

106

Questions of Epistemology

early investigations that illiterate individuals have great difficulty segmenting utterances into words, including lexemes when they are accompanied by bound forms like determiners and prepositions. These results also imply an awareness of units, although, as Kurvers and others remark, the units do not align with words but “multiword” clusters or chunked formulas (consider also Kurvers, 2015; Onderdelinden, van de Craats, & Kurvers, 2009). The difficulty is that one may not understand these responses given by illiterate people nor define what it is that they are aware of by reference to analyses of speech using IPA signs and orthographic units that, by their design, do not represent structures like syllables and chunks. As outlined in Section 3.2.2, investigators often overlook the point that units of conventional language analysis draw from writing, even though it is widely acknowledged that there is no general isomorphism between letter signs or words, and sound patterns in utterances. In particular, prosodic groups in speech do not align with assumed syntactic constituents (e.g., Nespor & Vogel, 1986; Jackendoff, 2007b). The mismatch appears in both adult and child-directed speech so that the failure to ground units like words, phrases, and sentences in sensory speech leads to a paradox for formal theories (as outlined in Section 3.2.2). Beyond this segmentation issue, however, there is the more basic problem of the lack of working definitions for word, phrase, and sentence (Dixon & Aikhenvald, 2002, among others cited in the following). This indeterminacy of basic units has profound consequences for language study. Sub-disciplines of this field of study revolve around the word concept in that “[m]orphology deals with the composition of words while syntax deals with the combination of words” (Haspelmath, 2011, citing Dixon & Aikhenvald, 2002, p. 6). In fact, the units cannot be defined in terms of a general criterion other than orthographic conventions like spaces in text. This latter point can be illustrated by reference to any of several suggested definitions. For instance, one finds entire books devoted to defining what a word is, using the varying criteria of “replaceability” (Kramsky, 1969, p. 68), “listedness” (Di Sciullo & Williams, 1987, p. 3), “minimal free-form” (Bloomfield, 1933), and “cohesion,” among other principles (reviewed by Dixon & Aikhenwald, 2002). But in consulting these works, a reader will encounter a procedure where authors evaluate various distributional analyses and definitions through references to a given notion of word, most often illustrated by how forms are written. Thus, an author might state the case that, in Latin, the morphemes “I sing” constitute one word, citing the written form canto, while the same morphemes are two words in English, citing the written form I sing, and then proceed to judge various criteria on their capacity to account for the divisions. This example is from Kramsky (1969, p. 38), who sees a justification for the divisions in that, contrary to canto, intervening forms can appear within I sing such as I often sing. But then he remarks that in

Recognizing the Bias

107

Turkish, Modern Persian, and Czech, intervening forms do not lead to separate “words,” which is again illustrated by how forms are written in these languages. Such examples illustrate the circularity of distributional analyses. They also lead to the view, as Pergnier (1986) notes, that “[t]he word is a concept that has always been defined by reference to the practice of writing” (p. 16, translated from the original). The same can be said about defining phrase and sentence, which has been a long-standing problem (see, for instance, Fries, 1952, who compares over a hundred definitions). For some analysts, entire subject-verb-object groups can constitute words rather than phrases or sentences (Bally, 1950; Polivanov, 1936). Researchers who focus on distributional analyses of transcripts representing languages that have a tradition of alphabet writing may not be concerned with the aforementioned issues of definition. Yet the problem inevitably surfaces whenever writing systems of different languages present orthographic concepts that do not match those of European systems. As an example, one author, who developed an influential model of lexical access, admitted a basic difficulty in applying the word concept across languages, perhaps not realizing that the concept links to writing: An adequate account of “spoken word access process” is going to require, if nothing else, a convincing definition of the notion of “word.” Without knowing what is being accessed, it will be hard either to construct a theory of how access is conducted, or to determine what are the appropriate tests of such a theory and its competitors. (MarslenWilson, 2001, p. 699)

In sum, distributional analyses of IPA script alone have not succeeded in defining generally applicable units of spoken language. Other methods are required with a view to achieving a compatibility between conceptual units and observable structures in utterances.

5

(Re-)defining the Writing Bias, and the Essential Role of Instrumental Invalidation

5.1

On the Persistence of Scriptism in the Study of Spoken Language

Several authors have argued that linguistic analysis, contrary to the claims of leading language theorists, focuses on script and has never been about spoken language (e.g., Coulmas, 1989, 2003; Harris, 1986, 1990, 2000; Linell, 2005; Olson, 1993, 2017). Historically, influential theorists have indeed stated that the object of language study is spoken language and not writing. For instance, Saussure (1916/1966) specified that “[l]anguage and writing are two distinct systems of signs; the second exists for the sole purpose of representing the first” (p. 23). For Bloomfield (1933) “[w]riting is not language, but merely a way of recording language by means of visible marks . . .. All languages were spoken through nearly all of their history by people who did not read or write” (p. 21). Contrary to these tenets, writing-induced concepts have always been used in language analysis and theory, as Port (2010) has noted: Almost all linguists have followed Saussure (1916) in claiming to be studying spoken language not written language. But the fact is that almost all modern linguists, like Saussure, never really escaped from letter-based characterizations of language. Audio (and video) recordings are rarely found in the linguistics classroom or in most linguistics research. When we think of “words,” “speech sounds,” and “sentences” in our descriptions of language, we are importing the conventions of our writing system and trying to use them uncritically as hypotheses about psychological representations. (p. 322)

For Coulmas (1989, 1996) and others, it is the contradiction between the stated object of study and the tradition of analyzing speech through transcripts that defines the “written language bias.” In the author’s view, the bias is “the tendency to analyze languages using writing-induced concepts as phoneme, word, literal meaning, and sentence, while at the same time subscribing to the principle of the primacy of speech for linguistic inquiry” (1996, p. 455). One should note that, by this definition, the writing bias has little to do with the beliefs of a cultural group (cf. Levisen, 2018), but has everything to do with 108

(Re-)defining the Writing Bias

109

a tradition of conceptualizing spoken language through transcripts. The definition, however, does not address the question of why concepts of writing continue to guide research on spoken language. As documented in Parts I and II, much of the conceptual apparatus serving to analyze spoken language was elaborated at a time when transcripts were practically the only method available to record speech. Currently, recording techniques extend to acoustics, kinematics, and physiology, including neural responses. None of these techniques reveal the types of orthographic units assumed in linguistic methods. On the other hand, anyone who views a spectrogram of an utterance can identify structural patterns and attributes corresponding to perceived features, syllable-like cycles in energy, and groupings. Yet even in early instrumental work on speech, as seen in the case of Stetson (1928/1951), it was argued that observations had to be circumscribed by phonological criteria as determined by linguistic-type analyses. In effect, this meant that analyses involving writing signs and units had a theoretic precedence in orienting empirical research. The argument, however, failed to acknowledge the ontological incommensurability between analyses of culturespecific representations of sounds on a page and instrumental records of speech signals as physical events. This incommensurability has presented a prevailing problem for researchers. Currently, features and structures serving vocal communication can be readily observed though instrumental visualizations of signals and need not be driven by analyses of transcripts. But it must be asked how it is that empirical observations have not led to a reassessment of the practice of conceptualizing spoken language through signs and units of writing. Indeed, orthographic concepts continue to guide language research across sectors extending to neuroscience (as discussed in Parts III and IVof this monograph). In historical accounts, the writing bias has been characterized as the failure to recognize that writing has provided the concepts for thinking about the structure of spoken language rather than the reverse. However, in the context where these concepts are maintained in the face of empirical invalidations, the bias has another facet. It suggests a difficulty in conceptualizing elements and structures of spoken language with reference to visual records other than those of writing. To clarify the problem, it will be recalled from Chapter 4 that studies have shown that similar-sounding forms written with varying numbers of letters can create illusions where people report hearing different sounds. Such effects attest to the influence that years of training on sound–letter associations can have on an individual’s conceptualization of speech. Indeed, for experienced readers, some conscious dissociation of signs and sounds can be required to acknowledge that letters are representations, not sounds. In sum, it can be difficult to objectify the effects of cultural constructs of letters, words, sentences (etc.) in relation to sound structure. Traditional language analysis and

110

Questions of Epistemology

descriptions of speech stimuli in research suggest such effects. For instance, it was noted that, in studies of speakers’ awareness of units in speech, investigators frequently describe their stimuli or test material using letter signs and orthographic units without any reference to actual sensory signals. Such an approach is quite common and reveals a continuing problem of conceptualizing spoken language in terms of what can be visualized using various techniques. In the context where many investigators refer to transcripts, the writing bias may more aptly be defined as this difficulty of conceptualizing elements and structures of spoken language through instrumental records of speech rather than through writing constructs. There is a body of critical commentary on the writing bias that spans decades and where authors repeatedly warn that units and categories of conventional language analysis reflect the influence of orthographic code (to avoid belaboring the point, the reader can refer to numerous critics who discuss the writing bias on different levels of language analysis: Aronoff, 1992; Beaugrande & Dressler, 1981; Cabrera, 2008; Chafe, 1979; Derwing, 1992; Faber, 1990; Ginzburg & Poesio, 2016; Goody, 1977; Halliday, 1985; Harris, 1980, 1990, 1998, 2000; Hopper, 1998; Kress, 1994; Ladefoged, 1984; Linell, 2005; Ludtke, 1969; Miller & Weinert, 1998; Nooteboom, 2007; O’Donnell, 1974; Olson, 1996, 2017; Ong, 1982; Parisse, 2005; Port, 2006, 2007, 2010; Port & Leary, 2005; Scholes, 1995, 1998; Share, 2014; Share & Daniels, 2016; Taylor, 1997). Historians have outlined that writing and speech differ on several dimensions. Some of these differences are listed in the following (see also Coulmas, 2003), omitting perhaps the most obvious ontological difference: writing is a cultural phenomenon whereas spoken language is biologically grounded. Spoken language

Writing

• mostly continuous • evanescent • situated • multimodal • oral-aural • naturally structured

• mostly discretionary • permanent • autonomous • unimodal • visual or tactile • artificially formatted

Building an understanding of the workings of spoken language on these dimensions obviously requires different instrumental methods. This should not be construed as suggesting that conventional analyses of IPA corpora are not useful or that instruments may not, as such, bias observations. Certainly, any technology will restrict the type of data that can be gathered and, consequently, can orient one’s view of the processes that underlie a behavior. But there is a fundamental difference. The use of the IPA in recording speech rests upon

(Re-)defining the Writing Bias

111

knowledge of a culture-specific code. It thus introduces a bias in representations of speech while also creating an object of analysis that is ontologically distinct from physical manifestations of utterances. It should also be borne in mind that, in terms of the previous definition, the writing bias is not simply a “tendency to analyze languages using writing-induced concepts.” It has also involved a disregard for instrumental evidence as a means of countering cultural biases. The consequences of overlooking this evidence can be seen in the ongoing critiques of the centrism of language analyses and theories, as illustrated in what follows. In some cases, entire research programs have faltered on the realization that units and categories of linguistic analysis reflect culture-specific constructs. Other programs, which not only maintain the constructs but claim their innateness, have elicited sharp criticisms. 5.2

The Need to Address Complaints of Cultural Centrism and Ethical Concerns

One telling example of a critique bearing on a decades-long research program was expressed by Evans and Levinson (2009). Following an extensive review of work on grammatical universals, the authors concluded that there was a lack of evidence supporting notions of universal syntactic forms, and they wondered how such notions arose to begin with. Their answer took the form of a familiar complaint: “How did this widespread misconception of language uniformity come about? In part, this can be attributed simply to ethnocentrism” (p. 430, original emphasis). Tomasello (2003) expressed a similar opinion: Just as there was a time when Europeans viewed all languages through the Procrustean lens of Latin grammar, we may now view the native languages of Southeast Asia, the Americas, and Australia through the Procrustean lens of Standard Average European grammar. But why? On one reasonable view, this is just Eurocentrism plain and simple, and it is not very good science. (p.18)

On the other hand, how units and categories of Latin grammar came to guide programs of research is not explained by the charge of ethnocentrism, as mentioned earlier. In examining the field of language study, the primary reason for the misconception may well draw from the types of analysis that were instituted by philologists. As documented in the preceding chapters, by developing language theories through the study of texts, early linguists did not acknowledge the biasing effects of their notational devices. At the time, this had little consequence outside a narrow circle of academics who were attempting to reconstruct proto-Indo-European. But the goals and applications of language study changed. In early structuralist theory, letter symbols were seen by some to have a psychological status. By the mid-60s, formal models operating on letters and words took on a biological status. Thus, while language

112

Questions of Epistemology

theorists may have been using orthographic units and categories as notational devices, many came to view these devices as part of an inborn competence. It is this dérive of conventions in language analysis that has elicited unheeded warnings, such as expressed by Parisse (2005): For more than 20 years now, an important body of work (Halliday, 1985; Harris, 1990; Hopper, 1998; Linell, 1982; Miller and Weinert, 1998; Olson, 1996; Ong, 1982) has criticized the dangerous confusion between spoken and written language which can be found not only in the generative approach but also in many other linguistic approaches. Avoiding this confusion leads to rethinking the underlying theoretical choices of numerous linguistic works, and, especially, those that are rooted in generative linguistics. The main idea behind this criticism is that linguistic research was developed with written material, whose intrinsic properties strongly constrain the results that can be obtained, and that it was erroneously concluded that these results apply to spoken as well as to written language. (p. 389)

In fact, the confusion between spoken and written language has had broad consequences in applied sectors. The preceding chapter outlined several examples of how the idea of an innate awareness of letter-like phonemes can lead to judgments on the superior efficiency of European-style writing systems. Such beliefs have led to recurrent criticisms of “alphabetism” in reading research (e.g. Daniels & Share, 2018; Scholes, 1998; Share, 2014; Share & Daniels, 2016). In reviewing arguments of efficiency, Baroni (2011) concluded that “the alleged superiority of the alphabet to other writing systems (syllabic and logosyllabic ones) is an ethnocentric prejudice” (p. 127). Share and Daniels (2018) also express such a conclusion with respect to theories of reading processes and disorders: Most current theories of reading and dyslexia derive from a relatively narrow empirical base: research on English and a handful of other European alphabets. Furthermore, the two dominant theoretical frameworks for describing cross-script diversity – orthographic depth and psycho-linguistic grain size theory – are also deeply entrenched in Anglophone and Eurocentric/ alphabetocentric perspectives, giving little consideration to non-European writing systems and promoting a one-dimensional view of script variation, namely, spelling–sound consistency. (p. 101)

These types of complaints are not restricted to reading research. They extend to the application of writing-induced concepts like phonemes and words in psychometric tests and clinical assessments. Scholars have acknowledged that the word is a “decidedly Eurocentric notion” (Dixon & Aikhenwald, 2002, p. 2) that does not extrapolate across cultures and “polysynthetic” languages. In fact, this typological classification of languages as synthetic or analytic is implicitly based on whether or not a language has written “words” as in alphabet systems (Cabrera, 2008). Even so, the notion is prevalent in evaluations of general cognitive abilities. For instance, memory span is most often evaluated in terms

(Re-)defining the Writing Bias

113

of an ability to recall recited lists of words such as strings of digits. Yet several large-scale controlled studies have revealed differential performances on tests of digit span across populations of Chinese, Japanese, Malay, and American children (e.g., Chan & Elliott, 2011; Stigler, Lee, & Stevenson, 1986). There is also the question of what to do with languages that have no word forms for digits (Frank, Everett, Fedorenko et al., 2008). Failing to acknowledge that phonemes and words link to particular writing codes can lead to viewing awareness of such units as the norm, and to evaluating, by reference to such a standard, the cognitive abilities of individuals who do not know an alphabet (cf. Petersson, Reis, & Ingvar, 2001; Reis & Castro-Caldas, 1997). In clinical assessments as well, studies of dyslexia show the difficulty of assuming a general awareness of phonemes and the inherent problem of using translated tests to identify reading deficits across languages and writing systems (Elbeheri, Everatt, Reid et al., 2006; Everatt, Smythe, Ocampo et al., 2004; Johansson, 2006). On the notion of word, peer debates have appeared on the ethics of diagnosing “word finding” deficits and signs of aphasia in speakers of languages where there are no words (Bates, Chen, Li et al., 1993; Bates, Chen, Tzeng et al., 1991; contra Zhou, Ostrin, & Tyler, 1993). Such difficulties in psychometric and clinical evaluations suggest an oversight, or a failure to recognize that letter-like phonemes and words are cultural constructs, not biological elements of spoken language. However, this discussion should not be taken to imply that there are no universal units or structures of spoken-language processing. As Tomasello (2003) remarks, “Of course there are language universals. It is just that they are not universals of form – that is, not particular kinds of linguistic symbols or grammatical categories or syntactic constructions but rather they are universals of communication and cognition and human physiology” (p. 18). This view basically reflects the orientation of the following chapters of this book, which begin by examining universal structures of vocal communication. The identification of these structures need not assume the kinds of forms used in linguistic analyses because they are directly observable in utterances as articulated sounds, syllable cycles, groupings, and breath units. In other words, the approach is data driven, as exemplified in Chapter 2, which outlined the processes underlying the rise of syllable patterns and symbolic language. This view focuses on how constraints on motor speech shape observable elements and structures that serve to communicate meaning. Such a perspective can address the writing bias in that it does not refer a priori to units of linguistic analysis, but to structural aspects of speech and how they relate to processes that are common to all speakers.

Part III

The Structure of Speech Acts

6

Utterances as Communicative Acts

6.1

Describing Speech Acts and Their Meaning

Critics like Linell (2005), who have exposed the writing bias in language science, advocate that spoken language should be studied as communicative acts. This is not possible if language study is limited to transcripts. Setting such methods aside, firsthand observations of someone speaking suggest similarities with other bodily actions. For instance, both types of acts involve performing motions in a changing environment, which requires a processing of context information including feedback from one’s own actions. Neuroimaging studies have shown that, for speech, there is this simultaneous processing of utterances and multimodal context information (Hagoort & van Berkum, 2007). Indeed, some authors view spoken language as an action system (Glenberg & Gallese, 2012; Koziol, Budding, & Chidekel, 2012). The difference is that “speech acts” are motions that serve to modulate air pressure in vocal communication. They are, in a sense, utterance-based interactions between speakers, although this tends to depart from linguistic definitions (Searle, 1969). It may be argued that speech acts and utterances are particular objects of study in that their intent and meaning may not be deciphered without some knowledge of the code used by the speakers. But this is not always true, otherwise the learning of a spoken language would not be possible. As children, all speakers learn to interpret utterances and novel expressions in the context of speech acts. However, providing even a single example of this in a book is counterintuitive, for obvious reasons. Much of the contextual information that accompanies speech cannot be graphically represented. Even video recordings bear limitations. In fact, studies show that actual speech acts, where individuals engage in face-toface conversation, activate the brain differently than video-recorded speech (Redcay, Dodell-Feder, Pearrow et al., 2010). With these limitations in mind, the following example will attempt to illustrate a communicative act by relying on a reader’s familiarity with situated speech and how context impacts utterance meaning. It should be mentioned that the aforementioed speech-act viewpoint is by no means original and it partly relates to proposals on the role that “sensorimotor” 117

118

The Structure of Speech Acts

processes and episodic memory play in developing semantic representations (e.g., Matheson & Barsalou, 2018; Yee, Jones, & McRae, 2018). However, as argued in the following, these proposals are most often based on research that assumes conventional units, such as words, and where semantic representations are discussed without regard to constraints on a modality of expression (Binder & Desai, 2011; Binder, Desai, Graves et al., 2009, and others discussed further on in this monograph). In other words, the proposals do not refer to the constraints on actions of speech, nor consider how these constraints contribute to shaping semantic processes, which is the view taken in this book. To be clear on the difference, it is the case that episodic memory can be shaped to some degree by the way multimodal sensory events are experienced. Sensory stimuli (visual, acoustic, proprioceptive, etc.) that are temporally and spatially proximate are more likely to bind together in a perceptual gestalt or “window” of experience (Hillock-Dunn, Grantham, & Wallace, 2016; Wallace & Stevenson, 2014; and for how this may play out for episodic memory, see Ekstrom & Ranganath, 2018). But when multimodal experiences, or a speaker’s semantic memory of these experiences, become associated with chunked sequences of articulated sounds, they adopt the format given by constraints on the modality of expression. It is this structuring of semantic representations through speech acts that is the main thesis of the present work and what distinguishes it from an “embodied” approach described later on in this monograph. To illustrate the difference in viewpoint, Figure 6.1 represents a speaker– listener interaction where, instead of an analysis of transcripts using orthographic concepts, one refers to signal patterns in the context of a speech act. A series of snapshots is used to situate a scenario where a speaker offers something to eat or drink to a listener (the latter is represented by the vantage point of the camera). This type of scenario can apply broadly to a languagelearning situation involving behaviors of directed attention (using eye gaze and pointing; Tomasello, 2003; Yu & Smith, 2013). To exemplify the learning of meaningful expressions in such a situation, and to avoid the effects of received orthographic concepts, an artificial language is used that is modeled on French sounds. The exchange begins with a woman producing part of an utterance (snapshot 1) and then a second part (2) as she turns her head to the left to look at two containers on a table. Merely transcribing the speech segment does not specify the structure of the utterance, nor the various context indices related to the speaker’s voice and tone. This information is present in the acoustic spectra displayed in the figure. Moreover, the harmonic patterns and syllable-size energy pulses or cycles in the acoustic records serve to identify structural patterns. Specifically, inter-syllable delays mark two groups or “chunks” in the first segment of speech (indicated by arrowheads in the spectrogram). The first group contains two syllable cycles while the second group contains four

Utterances as Communicative Acts

119

Figure 6.1 A general scenario where a speaker offers food to a listener while producing utterances (the vantage point of the camera represents the listener). Corresponding spectral analyses of the utterance signals are segmented and aligned to the corresponding snapshots for illustrative purposes.

120

The Structure of Speech Acts

with a final rise in pitch. Then, as the woman turns her head to look at a piece of fruit (3), she produces a single group containing two syllable cycles, which also ends with a rise in pitch. After a pause, the listener responds by pointing to a container (4) and the woman then utters an interjection while turning her head (5) and produces two groups of two and four syllables with a final falling pitch. At the end of the exchange, the woman fills a glass and presents it to the listener (these actions and the resulting experience of the listener who drinks are represented by the single snapshot 6). This scenario illustrates a simple principle: structures composed of syllablesize cycles and groups are present in the signals linking a speaker and a listener, and these structures serve to divide multimodal sensory experiences that accompany utterances. In this way, the structures of speech acts are the format of sound–meaning associations in semantic memory. Not words, phrases, or sentences. For instance, setting aside the interjection, the recurrent syllables and three distinct chunks can be related to the offering of three objects (even if processing meaning does not rest solely on referencing entities in the context of speech). Moreover, the overall outcome of the sequences of chunks, or the “intent” of the exchange, can as such be a multisensory effect of serving or ingesting a liquid. In sum, an analyst or learner can infer the meaning of the chunked sequences of sounds as well as the meaning/intent of sequences of chunks in an utterance without assuming conventional units of analysis like words, phrases, or sentences. A similar “pragmatic understanding” has been shown to apply in the case of children learning novel expressions where meaning units and categories can emerge within the context of communicative acts (e.g., Akhtar & Tomasello, 2000; Tomasello & Akhtar, 1995). It should be mentioned that not all information in produced speech is intentional, such as the acoustic attributes that relate to a speaker’s voice identity, affective state, or position in space (etc.). Although this contextrelated information may not be voluntarily controlled, it can definitely impact interpretations of utterances. Any situated speech act will illustrate this. For instance, a simple utterance like Take out the trash is relatively meaningless when read or heard out of context. But hearing the familiar voice of a spouse or close friend can bring to mind a specific object, its characteristics, its location, and a potential outcome of the communicative act (etc.). Thus, utterances contain information that can serve to activate multimodal sensory experiences in semantic and episodic memory that can underlie the comprehension of verbal expressions (cf. Barsalou, 2009). Finally, examples such as the preceding scenario should make it clear that no amount of distributional or semantic analysis of transcripts can serve to understand how or what meaning is conveyed by speech without some reference to context information. Of course, in learning the (artificial) language in this example, numerous sensory exemplars would serve to refine the interpretation of acoustic indices

Utterances as Communicative Acts

121

and verbal expressions (e.g., Goldinger, 1996, 1998, 2007; Palmeri, Goldinger, & Pisoni, 1993). Common sense also suggests that, over time, episodic memory of multimodal sensory experiences, which initially grounds a semantic memory of verbal forms, can be curtailed in favor of explicit strategies for inferring meaning (such as using verbal descriptions and analogies to understand novel verbal expressions; as described in Part IV). Some readers will likely draw parallels between this view of speech chunking and constructional formulas, as outlined in Chapter 3. For instance, the aforementioned sensory chunks can serve to constitute slot-and-frame formulas and semantic schemas (in the example: [beri X po], [X po], [beri X]; and see Section 3.2.2). Likewise, some may see the analysis as related to models where co-occurring verbal forms within the context of utterances serve to derive the meaning of expressions and “vectors” of semantic schemas (e.g., Hintzman, 1986; Kwantes, 2005). But there is a basic difference in how the object of study is conceptualized. Constructivist accounts, like most linguistic theories of language acquisition, rely on distributional analyses of text corpora where analyses often assume, a priori, units like words and letter-like phonemes. There are few references in these accounts to how learners come to produce utterances, or how they come to associate meaning with speech structure. To clarify with an example, Dabrowska (2004) summarized a general approach to learning constructional formulas and semantic schemas as follows (for related views with some variants, see Ambridge & Lieven, 2015; Erickson & Thiessen, 2015; Monaghan & Rowland, 2017): To acquire units which will enable children to produce and understand novel expressions, language learners must do three things: 1. Segment the phonological representation, that is to say, identify chunks of phonological material that function as units ([ju:wɑnt’mIlk] = [ju:] (3c) *Johnathan ate/the cake

Actually, segmentations as in (2b), or (2c), seem just as likely as (2a). But grouping is most likely when items are long, as in (3a), or (3b), compared to the division in (3c), which seems odd. These intuitive divisions were investigated using utterances produced from memory and which contained proper nouns of increasing length, as in the above examples (Boucher, 2006; Gilbert & Boucher, 2007). The results showed that, as the nouns increased in length,

Relating to Basic Units: Syllable-Like Cycles

171

groupings were created by inter-syllable delays between the noun and verb items. Size effects on grouping do not relate to morpho-syntax, and suggest instead constraints on sequence processing. As explained in the following chapter, the effects reflect a basic sensory chunking of motor-sequence information that is not limited to speech.

8

Relating Neural Oscillations to Syllable Cycles and Chunks

8.1

The Entrainment of Low-Frequency Oscillations and Speech Processing

It has been repeatedly observed that endogenous neural oscillations, as recorded by various electrophysiological techniques, selectively entrain to periodic attributes of sensory signals, including those of utterances. The following account, which relates neural oscillations to utterance structure, draws from Boucher et al. (2019) who outline specific issues in ascertaining the role of neural entrainment in speech processing. As mentioned earlier (in Section 6.4), seminal papers by Lakatos, Schroeder and colleagues (Lakatos et al. 2008; Schroeder & Lakatos, 2009; Schroeder et al., 2008) have suggested that endogenous delta (< 3 Hz), theta (4–10 Hz), and gamma (> 40 Hz) oscillations can phase-lock to utterance structures. Specifically, it was suggested that theta oscillations in the auditory cortex can align with modulations in the energy envelope corresponding to “syllables” and could entrain low-end gamma oscillations relating in part to short-time acoustic indices (see Section 6.4). Compared to theta frequencies, however, delta-band entrainment has been more problematic. As Schroeder et al. (2008) noted: “Given the influence of delta phase on theta and gamma amplitudes, it is at first paradoxical that little of the energy in vocalizations is found in the delta range, but prosody (intonation and rhythm) is important in speech perception and it is conveyed at rates of 1–3 Hz, which corresponds to the lower delta oscillation band” (p. 108). Thus, delta oscillations are seen as integrating information from different levels, but the structures that entrain delta did not seem clear with respect to energy patterns. Many of the studies that followed focused on the entrainment of theta to syllable-size pulses (e.g., Giraud, Kleinschmidt, Poeppel et al., 2007; Luo & Poeppel, 2012). In these studies, observations of entrainment most often involved measures of cerebro-acoustic coherence between neural oscillations and modulations of energy in speech. In applying such measures, the idea that delta may similarly entrain to long periodic patterns in energy was pursued by some. For instance, Gross et al. (2013) reported that delta may be entrained by “slow speech envelope variations” (p. 6). In fact, the patterns that entrain delta 172

Relating Neural Oscillations to Syllable Cycles and Chunks

173

were unclear, being variably attributed to “intonation, prosody, and phrases” (Cogan & Poeppel, 2011; Giraud & Poeppel, 2012; Park et al., 2015), “accent phrases” (Martin, 2014), “metrical stress and syllable structure” (Peelle, Gross, & Davis, 2013), “long syllables” (Doelling et al., 2014), “words” (Kovelman, Mascho, Millott et al., 2012), and “sentences” (Peelle & Davis, 2012; Peelle et al., 2013). Others contended that delta might not entrain solely to energy modulations, but also to co-modulations in frequency (Henry, Herrmann, & Obleser, 2014; Obleser, Herrmann, & Henry, 2012), somewhat like “melody” in music. The issue is important in that, inasmuch as delta, theta, and gamma frequencies can bear on the processing of different types of information in speech (Cogan & Poeppel, 2011), defining entraining structures in utterances is central to any account of the role that neural oscillations play in speech processing. 8.1.1

On the Role of Theta- and Delta-Size Processing Windows

In circumscribing the function of neural oscillations, one often-cited account (Giraud & Poeppel, 2012) suggests that oscillations in the auditory cortex shape theta- and gamma-size sensory windows that organize the spiking activity of cortical neurons in response to speech (as in Lakatos et al. 2005, and Figure 6.3). For these sensory windows, it has been shown that theta oscillations are involved in processing acoustic features and that entrainment of theta correlates with judgments of speech intelligibility (Howard & Poeppel, 2010; Peelle & Davis, 2012; Peelle et al., 2013). To illustrate how this operates, one can refer to temporal indices, such as voice-onset-time (VOT) and formant transitions, which cue heard distinctions such as /ta-da/ and /da-ga/ respectively. These acoustic indices can vary with speaking rate. For instance, the VOT of /da/ produced at slow rates of speech can become similar to the VOT of /ta/ at fast rates (Boucher, 2002). Yet a listener’s categorization of produced /ta-da/ is relatively unaffected by changing speech rates because the indices vary proportionally by reference to syllable-size sensory frames. The effect is illustrated in Figure 8.1. The plot to the left of the figure presents the absolute values of VOTs for [ta] and [da] produced at varying rates of speech (where rate is measured in terms of syllable durations in spectrograms; for details, see Boucher, 2002). One notices that the two sets of VOTs overlap in fast speech such that no stable VOT boundary appears to support a listener’s categorization of the syllables, even though correct categorization is maintained across changing rates. However, the plot to the right shows that relative VOT measures within syllables present categorical distributions and an inferable stable boundary, which suggests that categorical perception may operate by reference to syllable-size frames. In fact, Boucher (2002) showed that removing parts of the “vowel” in [da], which, in effect, reduces the syllable

The Structure of Speech Acts

174 100

0.40

a

[t]

VOT (ratio)

VOT (ms)

80 60 40

b

0.30

0.20

[t]

0.10

20

[d] [d]

0 100

200

300

400

500

0.00 100

200

300

400

500

Syllable duration (ms) Figure 8.1 VOTs for [ta] and [da] produced at varying rates of speech (adapted from Boucher, 2002). (a) Absolute (millisecond) values of VOT for [ta] and [da]. Note that these values tend to overlap at fast rates of speech. (b) Measures of the relative timing of VOT within syllables where separate distributions (reflected in the two regression lines) support a categorical perception across rate changes. This suggests that categorical perception of VOT can operate by reference to syllable-size sensory frames. See the text for further evidence of this frame-based processing.

frame, can bias the categorization of intact VOTs as their ratio approaches the boundary points. Categorical perception of temporal cues relative to a syllable frame also applies to transitions (Boucher, 2002; Toscano & McMurray, 2015). It is this type of integration of acoustic indices within syllable-size sensory windows that links to entrained theta oscillations. This was demonstrated for formant transitions by Ten Oever and Sack (2015), who showed that entraining theta oscillations at different speech rates can bias the categorization of syllables perceived as /da/ or /ga/ (see also Ten Oever, Hausfeld, Correia et al., 2016, and, for scaling effects of rate on low-frequency oscillations, see Borges, Giraud, Mansvelder et al., 2017). Thus, entrained theta-band oscillations serve to integrate time-related feature information over syllable-size sensory frames. The question that follows, then, is what type of information is integrated by the entrainment of delta oscillations. 8.1.2

Reviewing Claims of a Non-sensory Entrainment of Delta to Content Units

Research by Gross et al. (2013) and Park et al. (2015), using MEG and measures of cerebro-acoustic coherence, has shown that delta frequencies can be more strongly entrained by forward- than backward-presented speech (but

Relating Neural Oscillations to Syllable Cycles and Chunks

175

cf. Zoefel & VanRullen, 2016). Since only forward speech bears interpretable meaning, the results were taken to show that delta is influenced by knowledge of linguistic content (Peelle et al., 2013). But the reports did not point to any specific element or unit of content that could entrain delta oscillations. On this issue, a study by Meyer, Henry, Gaston et al. (2016) attributed the content effects to “phrases.” That study used German utterances where the meaning of the initial parts influenced listeners’ parsing of a following syntactic phrase presenting ambiguous prosodic marks. EEG coherence measures indicated that listeners’ tendency to infer phrases biased delta-band coherence, and this led the authors to conclude that delta oscillations are influenced by content units like syntactic phrases independent of prosodic marks in sensory signals. Another study by Ding, Melloni, Zhang et al. (2016) extended this viewpoint and attributed the effects of linguistic content to words, phrases, and sentences. Using MEG and electrocorticographic techniques, the tests involved presentations of synthesized Chinese utterances in which prosodic patterns were removed, except for equally timed syllables, each of which was seen to constitute “words” arranged in two-syllable “phrases” and four-syllable “sentences” (among other stimuli). Measures of spectral power (and not phase alignment) showed peaks corresponding to the separate sets of units, and this occurred for Chinese-speaking listeners, whereas only syllable-related peaks appeared for English listeners. These results led the authors to conclude, as in the experiment by Meyer et al. (2016), that knowledge of content units like phrases and words influenced low-frequency neural oscillations independently of sensory marks. The problem, however, is that the preceding reports, which involved speech stimuli with unusual or absent prosody, do not support views of a nonsensory entrainment of delta to semantic-syntactic units. Indeed, such interpretations conflict with available observations and can be rejected on fundamental grounds. First, reports of the differential entrainment of delta to forward and backward speech do not support the conclusion that entrainment is influenced by knowledge of linguistic content. These results show, more specifically, that delta responds to order information, and they do not address the issue of entrainment to content units. Likewise, observations by Meyer et al. (2016) and Ding et al. (2016) of a delta-band phase coherence (or band-power coherence) in the absence of acoustic marks do not warrant the interpretation that delta entrains to content units independently of sensory marks of prosodic groups. In fact, neural responses to learned prosodic patterns in a listener’s spoken language need not involve sensory marks. For example, it has been established that, in silent reading, a reader’s EEG responses, including neural oscillations, align with prosodic structures, as if texts were covertly spoken (Steinhauer, 2003; Steinhauer & Friederici, 2001, and for neural oscillations see Magrassi et al., 2015). Consequently, reading tasks show top-down effects of prosodic structure

176

The Structure of Speech Acts

on neural responses. This suggests that listeners are processing verbal material in terms of a knowledge of prosodic patterns that is generally acquired from a lifetime of motor-sensory experience in speaking a language. Considering this experience, when test participants hear syllables and prosodic patterns in their native language, even with unfamiliar synthetic timing and tones (as in Ding et al., 2016), or when they read a text, their experience with prosody can well impact neural oscillations in the absence of acoustic marks (Magrassi et al., 2015). But then one may not conclude from this that entrainment of neural oscillations is independent of a sensory prosody and relates, instead, to units of linguistic content like words and phrases. More fundamental problems, however, undermine attempts to relate neural oscillations to these conventional units. As already mentioned, there is no working definition of “word” nor “phrase” in speech, nor any general marks by which to isolate these assumed units in utterances. It will be recalled that the ability to identify and consistently divide words in continuous speech arises with the learning of alphabet-style writing, and that not all cultures identify words and phrases (e.g., Hoosain, 1991, 1992; Miller & Weinert, 1998). Recognizing this has major implications in that the ethnocultural basis of the concepts of word and phrase undermines the idea of a biological tracking of these units by endogenous oscillations in the brain. But additional problems arise from the fact that words and phrases can be so short as to render any assumption of a match between these units and low-frequency oscillations untenable. For example, in highly frequent forms such as I’m done, Take’m, Sue’s gone, We’ll see (etc.), it is difficult to determine if parts of the syllables ’m, ’s, ’ll constitute words, or if I’m.., Sue’s . . . We’ll . . . (etc.) are phrases. Whatever these units are, however, one has to ask how elements with durations of about 50–200 ms could be tracked by oscillations in the range of delta oscillations (< 3 Hz). In short, it appears that the type of information that is processed in delta-size windows is not commensurate with notional units of linguistic content like words and phrases, which conceptually link to writing. Another hypothesis is adopted in the present work, one that relates delta oscillations to a processing of motor-sensory order information. 8.2

Delta-Size Windows and the Sensory Chunking of Speech

8.2.1

Chunks and Their Signature Marks

As mentioned earlier, the findings of Gross et al. (2013) and of other researchers, which show that delta waves are variably entrained by forward and backward speech, do not as such provide evidence for the effects of linguistic content. Rather, the findings more directly indicate that delta entrainment is influenced by order information in speech stimuli. It is essential to consider the specificity of order or serial information in articulated sounds, as

Relating Neural Oscillations to Syllable Cycles and Chunks

177

opposed to other sounds, since this can underlie sensory chunking. Compared to general acoustic perception, speech perception entails sounds that reflect motor-sensory events. Moreover, although both speech and general sound perception imply a processing of signals that unfold over time, speech communication critically requires a memory of sequential order. Yet immediate sequence memory has a limited span, and this can impose a chunking of incoming signals. This principle was first noted by G. Miller (1962) who saw that speech communication does not operate by interpreting signals one element at a time, but in blocks. More recent accounts refer to a domain-general “sensory chunking,” a process that is typically reflected in grouping patterns that arise in learning motor sequences. This phenomenon occurs not only in humans but also in non-human primates and other species (Terrace, 2001). Across motor behaviors, however, similar chunking marks appear, as can be observed in common tasks of sequence recall. For example, in reciting novel lists of letters or digits, temporal groupings spontaneously emerge in terms of characteristic inter-item delays (Terrace, 2001). In speech, and other behaviors, these delays generally appear as a lengthening and/or a short pause marking the item at the end of a group (for illustrations, see Gilbert et al., 2015b). This temporal chunking of sequences appears even when repeating verbal stimuli presented without prosody (Boucher, 2006), and listeners preferentially attend to inter-item delays in chunking speech compared to other marks like pitch contours (Gilbert & Boucher, 2007). The resulting chunks, when repeated, show an internal stability in the relative timing of grouped items (as described in Sections 4.2.1 and 7.3.4). That is, chunks can vary in their overall duration, but items within learned chunks present durational ratios that have a degree of constancy, indicating that they form coherent action blocks. Some theorists of working memory view sensory chunking as essentially arising from constraints on the span of a “focus of attention” (Cowan, 2000). Several studies have associated delta oscillations with top-down effects of attention and with the perception of temporal patterns in speech and music (for a review, see Ding & Simon, 2014). Yet this research does not address the issue of how the entrainment of delta could link to sensory chunking in processing speech. Some reports, however, have provided evidence of a link. In a set of studies using ERPs, Gilbert et al. (2014, 2015b) demonstrated that groups bearing characteristic marks of chunking evoke a rising negativity and that inter-item delays marking the ends of temporal groups create large positive shifts. Although Gilbert et al. did not examine neural oscillations, the experiments did serve to clarify attributes of sensory chunking. Specifically, the tests done by Gilbert et al. involved both utterances and sequences of nonsense syllables with similar temporal and tonal patterns. These patterns were independently manipulated so as to observe the separate effects of timing and pitch

178

The Structure of Speech Acts

Figure 8.2 Pitch (F0), and dB energy patterns (rectified and smoothed signals) of utterance stimuli along with corresponding ERPs of regions of interests (mean of FC3, FC1, FCz, C3, C1, Cz; from Gilbert et al., 2015b). Note that there are two sets of stimuli with different temporal groups of four, three, and two syllables, but similar F0 patterns: black line – length changes create the temporal groups (4)-(3) (2); grey line - the changes create groups (3)-(4) (2). Gilbert et al. also showed that responses to temporal chunks are unaffected by varying pitch patterns, and such effects appeared for both utterances and nonsense syllables.

marks. This is useful in that inter-item delays that mark chunks in speech often correspond to the endpoints of “intonational phrases.” The results from Gilbert et al. showed separate positive shifts for temporal groups, but not tonal phrases. In other words, the study demonstrated a processing of speech in terms of signature marks of chunking. The effect is displayed in Figure 8.2 (representing a subset of results from Gilbert et al., 2015b). The top two panels of the figure show the energy and tonal contours of the speech stimuli and, as can be seen, the neural responses indicate a processing of chunk marks. Importantly, such effects of chunk marks were also observed for nonsense syllables, suggesting that sensory chunking can operate when processing sequences of articulated sounds regardless of linguistic content. Finally, in a related experiment, Gilbert et al. (2014) used contexts like those of Figure 8.2 and showed that the size of the chunks in the stimuli affected

Relating Neural Oscillations to Syllable Cycles and Chunks

179

listeners’ recognition memory of heard items, and this was also reflected in amplitude changes in ERPs elicited by the heard items (N400 responses). It should be noted that chunks in speech and recall present periodicities in the range of delta waves and one report confirmed that presenting digits in delta-size chunks can influence memory (Ghitza, 2017), although how this relates to the windowing effect of attention focus remains unclear (cf. Section 6.4.2). In short, these findings suggested that sensory chunking operates on sequences of articulated sounds rather than putative semantic-syntactic units of linguistic analysis (such as words and phrases), and that this chunking can match delta-size periodicities. 8.2.2

Neural Entrainment in Speech Processing

In investigating the potential link between chunking and low-frequency oscillations, it is essential to note that marks of sensory chunks involve inter-item delays such that one may not observe entrainment to such temporal marks by measuring cerebro-acoustic coherence between oscillations and energy contours. For this reason, a report by Boucher et al. (2019) used measures of inter-trial phase coherence (ITPC) in EEG (Delorme & Makeig, 2004) to explore whether an entrainment of delta-band oscillations could underlie a chunking of motorsensory information in speech sounds. That delta entrainment could bear specifically on the processing of motor-sensory aspects of speech (as opposed to processing acoustic or syntactic-semantic content information) was determined by observing ITPC for three types of stimuli, namely sequences of pure tones, meaningless syllable strings, and utterances. These stimuli were elaborated with similar patterns of timing, pitch, and energy with each pattern presenting particular periodicities. For instance, energy fluctuations marking syllable pulses presented periods of 185 and 315 ms (5.4 and 3.2 Hz respectively), and chunks marked by inter-item delays had periods of about 685 ms (1.4 Hz), and so on. The prediction was that delta entrains specifically to chunk marks in speech sounds, not in heard tones, irrespective of whether the contexts are utterances or meaningless strings of syllables. The main findings of this investigation are displayed in Figure 8.3 (for more details, see Boucher et al., 2019). The results in Figure 8.3 present the ITPC values obtained for sites of interest following a spatial filtering of signals. The ITPCs confirm distinct frequencyrelated entrainment of oscillations for the speech and tone stimuli. Specifically, one can see that delta-band ITPC appears at temporal sites C5 and C6 for the speech stimuli (both utterances and nonsense syllables), but that this does not occur for the tone stimuli. By contrast, theta-band ITPC appears at the central site (Cz) across the three types of stimuli. Although an EEG offers only imprecise information on localization, these observations suggest that delta- and theta-size sensory frames have differing roles in speech processing.

180

The Structure of Speech Acts

Figure 8.3 Inter-trial phase coherence (ITPC) for three types of stimuli as a function of the frequency of oscillations at sites C5, Cz, C6 (from Boucher et al., 2019, with permission). Note that entrainment of delta-band oscillations (δ), as indexed by ITPC, occurs at C5 and C6 only for the speech stimuli (i.e., utterances and nonsense syllables). By contrast, entrainment in the theta-band (θ) appears across the tone and speech stimuli.

As noted, theta entrainment can create sensory windows that serve to integrate acoustic information (as discussed in reference to Figure 8.1 and the findings of Ten Oever & Sack, 2015), and the results show in fact that theta entrains to speech or non-speech sounds, such as sequences of pure tones. This is not the case for delta oscillations. These waves do not entrain to melodic-like patterns of tones, but temporal patterns of articulated sounds. More specifically, delta entrains to signature marks of sensory chunks, which suggests that delta-size windows serve to integrate order information of heard articulated sounds into coherent blocks. This accords with a view of sensory chunking as a domain general process involved in immediate memory of motor-sensory sequences. Thus, the type of information that is processed in delta-size sensory frames is essential to processing utterance meaning inasmuch as semantic interpretation requires a processing of sequences of articulated sounds. However, as the cited results show, delta entrains to temporal marks of sensory blocks and is not as

Relating Neural Oscillations to Syllable Cycles and Chunks

181

such entrained by assumed content units such as words or phrases (contra, e.g., Meyer, 2017). It should be mentioned that the finding of a neural entrainment to temporal marks, rather than to energy or frequency modulations, suggests certain advantages of measures, such as ITPC, over a direct tracking of cerebro-acoustic coherence. The latter measure assumes that entrainment operates in terms of corresponding modulations of sensory and neural signals, whereas the aforecited observations indicate that entrainment can be selective and reflect topdown effects. As for how oscillations can physically entrain to groups rather than to modulations in energy or frequency, one can hypothesize that, in processing heard sequences of articulated sounds, the storing of incoming motor-sensory order information in a limited buffer can impose an on-line segmentation of sequences commensurate with a delta-size sensory chunking (see also the concept of focus of attention, Cowan, 2000). The finding that delta entrains to patterns of articulated sounds, and not to patterns of pure tones, supports this view and suggests a basic link between delta-size windows and a domain-general chunking that arises in processing motor-sensory sequences (Graybiel, 1998, 2000, 2008; Terrace, 2001). From a fundamental perspective, these observations linking neural oscillations to structures of motor speech are revealing of the nature of the brainspeech interface. Needless to say, neural oscillations are universal. They are part of the biological makeup of a speaker’s brain. The preceding findings, among other related reports, bear out that these oscillations entrain to structures of articulated sounds including syllable-size cycles and chunks, which are also universally present in utterances. The interface thus appears as a direct link between periodic neural activity and structural properties of motor speech. These structures, however, are not attributes of neural oscillations. As outlined in Sections 7.3.1 and 7.3.2, syllable-size cycles of motion are intrinsically shaped by attributes of motor processes such that these action units and constraints on the chunking of these units cannot be understood as inherent products of the CNS. In this light, one needs to consider other structural constraints on motor speech. In particular, given that syllable-size cycles in acoustic energy entail the production of pulses of air, respiratory functions can intrinsically influence the number of cycles and chunks that appear within breath units of speech. For all speakers, inspiratory motions mark the flow of articulated sounds creating natural divisions between “utterances.” Constraints on speech breathing not only affect the F0 and intensity patterns of utterances (Wang, Green, Nip et al., 2010), but also their complexity in that complexity is likely to be influenced by the numbers of meaningful forms that can be produced within a breath span of speech. Such effects of speech breathing are examined in the next chapter, parts of which draw from Boucher and Lalonde (2015).

9

Breath Units of Speech and Their Structural Effects

It is worth noting, in relation to the previous chapter, that some studies that refer to measures of cerebro-acoustic coherence have reported an entrainment of neural oscillations to the energy contours of heard “sentences” (e.g., Peelle et al., 2013). However, there are definitional and technical issues in evaluating such reports. As discussed in the following, sentences are notional units with no consistent markings in speech, which is not the case for breath-divided utterances. The latter units vary extensively in length, and research is unclear as to whether low-end delta oscillations entrain to units other than chunks, or some periodic aspect of speaker-listener interaction (as suggested in some “hyperscanning” research; Mu, Cerritos, & Khan, 2018). In any case, the results show that low-frequency delta oscillations do not phase-lock to conventional content units of linguistic analysis such as words or phrases, and this most likely extends to sentences. On the other hand, it remains to be shown whether neural oscillations align with motor-sensory marks of breath units. Some studies of conversational turn-taking do suggest a coordination between speakers and listeners that operates on utterances viewed as breath units. It is worth clarifying the implications of these studies. 9.1

Utterances as Breath Units versus Sentences in Speaker–Listener Interaction

The effects of breath units during speaker–listener interactions and turn-taking have been observed in situ by way of plethysmographic belts. These serve to record respiratory motions of the abdomen and rib cage during conversations. Based on such techniques, it has been reported that speaker–listener entrainment on respiratory cycles is infrequent in conversational contexts, but does occur locally (McFarland, 2001). For instance, studies of turn-taking show that successful turns in conversation largely occur immediately after an inhalation when there is a speaker–listener alignment of breath cycles, whereas unsuccessful attempts to take a turn tend to be initiated late in the exhalation phase, specifically when there is no speaker–listener breath coordination (McFarland, 2001; McFarland & Smith, 1992; Rochet-Capellan & Fuchs, 2014). Such 182

Breath-Units of Speech and Their Structural Effects

183

observations suggest that active participation in a conversational act can involve a tracking of the periodicity of respiratory cycles, and this brings into focus the question of whether utterances as breath units, rather than notional units like sentences, present a basic structure of processing along with syllable pulses and chunks. Some studies have attempted to deal with the issue of the relationship between sentences and utterances by examining their correspondence in speech (Henderson, Goldman-Eisler, & Skarbek, 1965; Wang et al., 2010; Winkworth, Davis, Adams et al., 1995, among others). However, the presumption in performing such comparisons is that both sentences and utterances present marks that can be observed in speech. In fact, there are universal defining marks for utterances. In speech research, utterances are traditionally defined as breath units of speech or vocalization delimited by inspirations (see Vaissière, 1983 who provides historical references on this definition of utterances). These units also tend to show amplitude and F0 declination as a function of the expenditure of air and decreasing subglottal pressure up to inspiration points. On the other hand, there is a persistent difficulty in defining sentences. Attempts to characterize these units invariably refer to marks in texts. Critics note that the concept of a sentence as such arises with training in alphabet writing (Kress, 1994; Miller & Weinert, 1998). In sum, as with other conventional units of language analysis like the word and phrase, the sentence has no defining cross-linguistic attributes and is conceptually bound to European-style writing. It follows that attempts to evaluate the correspondence between utterances and sentences in speech miss the point that the latter are not speech units but cultural constructs. This question of the ontological status of the sentence is not without consequence, especially in considering the pivotal role of the sentence in language study. As discussed in Part I, much of syntactic theory rests on analyses of speech using orthographic concepts. Viewed through such concepts, spoken language appears as a grammatical ability to combine words and phrases into sentences expressing “complete thoughts,” which seem wholly unrelated to utterances as breath units of speech. But an approach that focuses on analyses of sentences on a page carries the risk of misinterpreting growth-related changes in utterance length as reflecting the development of mental grammars. A historical example of this is the linguistic measure of “mean length of utterance” (MLU), which has served to elaborate developmental scales of spoken language. 9.2

On Interpreting Measures of “Mean Length of Utterance” (MLU)

MLU is a measure that involves counting “morphemes” in transcripts of spontaneous speech and relativizing the counts over a given number of utterances. The

184

The Structure of Speech Acts

measure was originally devised by Brown (1973) and is currently used as a standard index, often in conjunction with other developmental scales. In proposing this index, Brown referred to language theory (Chomsky, 1957, 1965) and viewed language as a competence, an ability to combine given elements and units of syntax. The basic assumption underlying measures of MLU is that, since many aspects of developing syntax imply additions of meaningful elements in utterances, counting the average number of morphemes in utterances serves to index a “cumulative complexity” that reflects “knowledge” (Brown, 1973, pp. 53, 173). Applications of the MLU index extend to clinical populations, and recent variants of the measure also refer to word counts (e.g., Ezeizabarrena & Garcia Fernandez, 2018, and others cited in the following). However, Brown (1973) never provided a definition of “utterance.” He also used regular orthographic transcripts in measuring MLUs in children’s speech where the terms “utterance” and “sentence” were used interchangeably, as in many current applications of MLU. In an early review of Brown’s book, Crystal (1974) complained that there were several problems in applying the measure in that “speech units” did not match sentences: [W]hen Brown writes his chapter on coordination, decisions about whether two speech units are parts of the same sentence or are linked separate sentences will be very much dependent on utterance criteria being made explicit. It is not as if the problem is merely an occasional one: it shouts out at the analyst all the time. (p. 295)

For instance, Crystal (1974) asked how many sentences there are in the talk of a three-year-old who produces utterances like “and it goes up / up the hill / and it goes up / in the hill / and it takes us on up the hill / and it goes up the hill / up up up / . . . ” (p. 296; and for a review of reliability and validity problems of MLU counts, see Boucher & Lalonde, 2015). The failure to acknowledge the ontological incommensurability between sentences as orthographic units and utterances as breath units of speech led to an odd situation where there was a disregard of rather obvious breath-related effects on MLUs, even in Brown’s report. As one example, in his longitudinal study of three children, Brown (1973, p. 55) noted a steady increase in MLUs, except on one occasion where there was a sudden drop in the counts for one child because she had a cold. Though this did not discredit MLU as an index of “cumulative complexity,” it did reveal that utterances could reflect breath units and could, consequently, vary in conditions that hamper speech breathing (such as shortness of breath and congestion due to a cold) or, more generally, with the growth of respiratory functions. Consideration of the latter would seem essential in interpreting MLU. Yet the effects of children’s growing breath capacities have not drawn much attention from users of Brown’s index, despite several indications of a link with rising MLUs.

Breath-Units of Speech and Their Structural Effects

185

For instance, some early applications reported near perfect correlations between morpheme counts in MLUs and syllable counts (or counts of “words,” at r = 0.91 to 0.99: Arlman-Rupp, van Niekerk de Haan, & van de Sandt-Koenderman, 1976; Ekmekci, 1982; see also Hickey, 1991; Parker & Brorson, 2005; Rom & Leonard, 1990). Because syllables reflect pulses of air, strong correlations between morpheme and syllable counts in MLUs implied that Brown’s index of cumulative complexity could be influenced by the growth of respiratory volumes. But Brown (1973, p. 409) rejected a priori that MLUs could vary according to the numbers of syllables. In hindsight, it is difficult to understand how the seemingly obvious effects of breathing or syllable numbers on MLUs could be overlooked. One explanation may be that, for Brown, as for many cognitive psychologists of the day, the idea that mental operations on morphemes could relate to motor-sensory functions did not fit the dominant belief that language competence was separate from processes of performance, which included functions of speech breathing. But there was also the difficulty of demonstrating these growth-related effects of respiratory processes on MLUs. Brown’s index was originally limited to young children with MLUs of no more than four morphemes (“Stage V”), which occurs at about forty months (Miller & Chapman, 1981). At that age, children may not execute the standard maneuvers involved in measuring respiratory capacities (Desmond, Allen, Demizio et al., 1997; Merkus, de Jongste, & Stocks, 2005). Consequently, the effects of changing respiratory functions on Brown’s index could not be verified. However, applications of the MLU index have extended to older speakers including adults (e.g., Behrens, 2006; Charness, Park, & Sabel, 2001; Huysmans, de Jong, Festen et al., 2017; Rondal & Comblain, 1996, among others cited in Boucher & Lalonde, 2015). Given these applications, interpretations of MLU as a normative index of grammatical development have faced a body of evidence showing that developing MLUs accompany age-related changes in breath capacities (Blake, Quartaro, & Onorati, 1993; Chan, McAllister, & Wilson, 1998; Conant, 1987; Klee, Schaffer, May et al., 1989; Miller & Chapman, 1981; Parham, Buder, Oller et al., 2011; Scarborough, Rescorla, Tager-Flusberg et al., 1991; Scarborough, Wyckoff, & Davidson, 1986). The extent to which the growth of breath volume influences MLUs can be evaluated by reference to traditional measures of vital capacity (VC), which is the maximal volume of air that can be forcefully exhaled after a maximal inspiration, as can be measured using spirometric or pneumotacographic techniques. The relationship between VC and MLU can be understood by considering that habitual speech involves cycles of inspiration and expiration within a limited range of VC percentages where there is minimal contractile force of expiratory muscles (Hixon, Goldman, & Mead, 1973; Ladefoged, 1967). Because inspiratory and expiratory volumes in producing utterances present

186

The Structure of Speech Acts

Figure 9.1 Age-related changes in speech breathing recorded via plethysmographic belts (original data from Hoit et al., 1987, 1990; in Boucher & Lalonde, 2015, with permission). (a) Vital capacity (VC) of male speakers aged 7 to 25 years, and respiratory volumes in speech: the dotted lines show the mean initiation volume (top line) and termination volume (bottom line) in producing utterances at conversational loudness. (b) The proportion of VC used in producing utterances across age groups [(initiation-termination volumes)/VC].

these stable ratios of VC, absolute age-related increases in VC (in liters) afford the use of greater volumes of air when speaking and, consequently, longer utterances. To further clarify the general effect, Figure 9.1 summarizes the developmental data on VC and speech breathing reported by Hoit and colleagues (Hoit & Hixon, 1987; Hoit, Hixon, Watson et al., 1990). The dotted lines in panel (a) indicate age-related changes in VCs and the range of inspiratory and expiratory volumes observed in usual speech. Panel (b) shows that respiratory volumes used in speech present a stable proportion of VC across age groups. It can be seen that, while this proportion of VC remains relatively stable, there is a growth of VC in liters implying a substantial increase in the volumes of air used in producing utterances, and this impacts MLUs. Of course, many other changes in the vocal apparatus can influence MLUs. To get an idea of the relative impact of these factors, one can compare the effects of the growth in VC to other age-related changes in speech articulation and respiratory dynamics. For instance, motor control of articulation, as reflected in fluent adult-like speech rates, develops through to early adolescence (Kent & Forner, 1980; Pettinato, Tuomainen, Granlund et al., 2016; Smith, 1978; Smith & Zelaznik, 2004). But these differences have minor

Breath-Units of Speech and Their Structural Effects

187

effects on MLUs in that, even if children speak at slower rates, they expend less air in producing syllables. For instance, Hoit et al. (1990) found that seven-year-old males produce a mean of 35 mL/syllable compared to adults who produce 65 mL/syllable on average (see also Hoit & Hixon, 1987). However, children, compared to adults, have smaller vocal tracts and their vocal folds require less transglottal pressure to vibrate in producing audible speech (Hirano, Kurita, & Nakashima, 1983; Kent & Vorperian, 1995). As for speech breathing, Figure 9.1 shows that children initiate and terminate utterances at lower volumes than adults (by about 5–10 percent according to Russell and Stathopoulos 1988). This has been associated with age-related differences in the compliance of the lungs and thorax (Agostoni & Hyatt, 1986). But the effect of these developments on MLUs, whether it be the articulation of syllable pulses with slightly more air, or the 5–10 percent change in initiation volumes between seven and twenty-five years, can be weighed against the fact that VC rises by 330 percent over the same period. In sum, the growth of VC presents a major change that can clearly impact developing MLUs. Moreover, since children and adults use similar ratios of VC in producing utterances, VC measures can serve to evaluate this effect. Such capacity measures, and how they relate to developing MLUs were part of a study by Boucher and Lalonde (2015), discussed subsequently. 9.2.1

Utterance Complexity, Lexical Diversity, and MLU: Linking to Developing Motor Structures

It should be emphasized that consideration of maturational changes in motor structures, including rising breath capacities, can be essential in devising an account of the time course for a rising cumulative complexity in utterances as indexed by measures of MLU. In fact, growth-related increases in breath capacities can also influence the course of a developing diversification of lexical chunks, as documented in the following. However, this view should not be seen to ignore the effects of a maturing CNS, or be taken to imply that the growth of VCs “causes” a rise in an MLU index. Certainly, maturational factors in the CNS, such as the rising myelination of nerve fibers (e.g., Dubois, Dehaene-Lambertz, Perrin et al., 2008; Pujol, Soriano-Mas, Ortiz et al., 2006), influence the time line of language development. But changes at this level may not as such account for developmental milestones, like why babble arises specifically at six to eight months, or why MLUs cease to rise beyond about twenty to twenty-two years of age. Multiple factors intertwine, obviously, and the brain’s growing ability to manipulate verbal expressions is not separable from the growth of motor-sensory processes of expression. However, some factors appear to be more critical than others in explaining the particular chronology of developmental milestones.

188

The Structure of Speech Acts

For example, babbling is typically delayed in hearing-impaired children (Oller & Eilers, 1988), indicating that auditory stimulation is a factor. Nonetheless, identifiable changes in vocal processes such as the decoupling of the nasopharynx – not the development of the auditory system or associated brain processes – basically account for the emergence of orally modulated sounds and babble at six to eight months (see Chapter 2, Section 2.3.3). Said differently, for normally developing populations, the growth of speech-motor processes is a necessary though not sufficient factor in explaining the rise of babble at a specific age. The same logic holds for developing MLUs. Any number of neurological or respiratory conditions can impact the length of utterances, defined as breath units of speech. Still, in normal populations, the growth of speech breathing structures is a necessary though not sufficient factor that can account for characteristic MLUs appearing at age-specific time points. In investigating these growth-related effects in speech breathing, Boucher and Lalonde (2015) examined the VCs and MLUs of fifty male speakers aged from five to twenty-seven years. MLU counts in the study were performed in morphemes and syllable beats. The results showed similar agerelated progressions in VC and MLU measures, as displayed in Figure 9.2. In viewing these results, one notes the similarity in the distributions of values for VC and MLU and, in fact, Figure 9.3 shows that VC strongly correlates with MLU measured in morphemes (r = 0.81; p < 0.001) and syllables (r = 0.79; p < 0.001). MLU in morphemes and MLU in syllables are as such highly correlated measures (r = 0.92; p < 0.001). There is little question, in examining the preceding figures, that maturing respiratory functions present a factor that can shape the time course for expanding MLUs. On the other hand, some might argue that Brown’s MLU as an index of cumulative complexity only approximates syntactic complexity. In particular, it may be objected that children do not have the extended vocabulary of adults and that simply counting morphemes does not capture the increasing diversity of long formulas and lexicalized forms used by older speakers. But it should be remarked that the value of the MLU index finds indirect support in numerous reports showing correlations between MLUs and several types of measures of vocabulary development (e.g., DeThorne, Johnson, & Loeb, 2005; Ezeizabarrena & Garcia Fernandez, 2018; Hickey, 1991; Jalilevand & Ebrahimipour, 2014; Nieminen, 2009; Rice, Redmond, & Hoffman, 2006; Rollins, Snow, & Willett, 1996; Scarborough et al., 1991; Tavakoli, Jalilevand, Kamali et al., 2015). In fact, it stands to reason that any addition of forms within utterances, as indexed by MLUs, entails a complexification of conceptual relations that may be expressed, and that this would impact other measures of vocabulary or formula-based counts. But the literature on language testing has not offered any rationale for the observed relationship between MLUs and developmental changes in lexical

Breath-Units of Speech and Their Structural Effects

189

Figure 9.2 Vital capacities in liters (a), and MLU measured in morphemes (b), and syllables (c), for 50 speakers aged 5 to 27 years (Boucher & Lalonde, 2015, with permission). See also Figure 9.3.

Figure 9.3 MLU measured in morphemes (left), and syllables (right), as a function of vital capacity (from Boucher & Lalonde, 2015, with permission).

190

The Structure of Speech Acts

diversity or “vocabulary” (DeThorne et al., 2005). In examining this latter effect, a basic problem arises with the MLU index in that it is difficult to assume that younger children are manipulating count units, such as “morphemes,” as separate elements. Thus, to evaluate how increases in MLUs contribute to utterance complexity, one has to consider how speakers “chunk” speech. 9.2.2

Chunks in Breath Units of Speech and the Development of Vocabulary

Conventional measures of child speech involving the analysis of transcripts using units like morphemes, words, clauses (etc.) can be inherently difficult to interpret in that one cannot assume that children (or even adults) are manipulating these units. The difficulty can be compounded when analysts refer to units and categories of Latin grammar in describing children’s speech. As mentioned in Section 3.2, authors have questioned whether young children divide bound forms as “pronouns” and “determiners,” from neighboring “verbs” and “nouns” (among other putative units; Ambridge, 2017; Ambridge & Lieven, 2015; Lieven, Behrens, Speares et al., 2003; Lieven et al., 1997; Pine & Lieven, 1997; Pine & Martindale, 1996; Tomasello, 2000a, 2000b, 2003). In fact, the problem extends to adult speech as several proponents of usage-based grammars have reported that flexible slot-and-frame formulas, and countless high-frequency expressions, such as wanna, gi’me, shoulda, look out, take care (etc.), can function as lexicalized forms (Bates & Goodman, 2001; Bybee, 2006, 2010; Dabrowska, 2004). Still, even in failing to identify all separable elements, one can surmise that MLU counts can index a cumulative complexity of utterances in the development of spoken language, as Brown (1973) suggested. For this to be true, though, it would have to be shown that the MLU index correlates with the number of verbal units or chunks in utterances, although this would still leave the problem of how developing MLUs contribute to lexical diversity or vocabulary development. To explore these effects, Boucher and Lalonde (2015) performed counts of “nominal” lexemes in relation to MLUs for the aforementioned speakers aged five to twenty-seven years. Identification of these forms is straightforward in that they frequently appear in formulas of Determiner+noun in French and less frequently as stand-alone proper nouns used to name people and places (see Boucher & Lalonde, 2015, for illustrations and count criteria). Thus, although measures of nominal lexemes may not narrowly align with chunked formulas, they generally approximate chunks with a Determiner, which is most often a monosyllable in French. Two sets of ratios were calculated based on type and token counts of nominal forms, which were also classified in terms of their length in syllables. It should be remarked that, compared to tokens, type counts

Breath-Units of Speech and Their Structural Effects

191

Figure 9.4 Overall percentages of nominal forms of 1, 2, and 3 syllables or more (types per utterance) used by 50 speakers aged 5–27 years. The ratios indicate that, with increases in MLUs, there is a relative diversification of speakers’ vocabulary favoring long forms of 3 syllables or more (from Boucher & Lalonde, 2015, with permission).

serve to estimate the diversity of items used by speakers or their “active vocabulary.” As could be expected, the number of chunks approximated by token counts of noun lexemes per utterance correlated with increasing MLUs in syllables (r = 0.67, padj < 0.01). Thus, across speakers, token/utterance counts showed that age-related rises in MLUs associate with the production of increasing numbers of nominal forms in utterances. But type/utterance counts revealed another effect of developing MLUs, which became clear when nominal forms were classed in terms of their length (1 syllable, 2 syllables, and 3 syllables or more). Applying this classification, type counts indicated that changes in MLU associate with a relative diversification of long nominal forms. This effect is displayed in Figure 9.4. Rises in MLUs across speakers appear to accompany a relative rise in the diversity of long noun forms: specifically, the type counts show that, as breath units of speech become longer in the course of development, speakers’ active vocabulary of nouns shows a rising proportion of long forms of 3-syllables or more (r = 0.64, p < 0.001), as compared to 1-syllable forms (r = –0.60, p < 0.001) whereas the proportion of two-syllable forms remains relatively stable (r = –0.13, ns). This suggests a link between the production of increasingly long utterances and changes in vocabulary favoring multi-syllabic forms. To further clarify the link, Boucher and Lalonde examined whether the correlation between MLUs and type/utterance counts of long noun forms could hold if the effects of growing VCs were factored out. Partial correlations showed that controlling for the effects of increasing VC removed the correlation between MLUs and the diversification of long forms, suggesting

192

The Structure of Speech Acts

that the growth in respiratory functions has a strong intervening effect on the correlation. One simple explanation for this can be inferred from the preceding results. As one can surmise, the developing capacity to produce increasingly long utterances inherently increases the possibilities of combining elements in long multisyllabic chunks that can consolidate as lexical units or formulas through language use. Of course, several changes relating to physiological growth can contribute to increasing the numbers of units in utterances and thus to rises in utterance complexity – although a tripling (or more) of VCs between the ages of seven and twenty-five years certainly presents a major factor (Figure 9.1). Taking this factor into account, however, leads one to view language development as involving maturing motorsensory processes rather than as a separate unfolding of a mental grammar. The latter concept can arise, as it did for Brown (1973), when observers narrowly focus on the analysis of children’s speech using transcripts and orthographic concepts. Some may object that this account does not take into consideration innumerable variables that relate to changing cognitive abilities and social environment, which are obviously involved. But it should be recognized that, whereas cognitive and social factors vary widely across individuals, identifiable maturational changes in motor-sensory processes, are present in all speakers, specifically mark developmental milestones of spoken language. These maturational changes, however, may not be identified using linguistic-type analyses of transcribed speech. 9.2.3

On Explaining Developmental Milestones

In viewing children’s utterances as words and sentences on a page, it may be forgotten that speech involves actions that modulate air pressure and that brain processes serving to elaborate sequences of articulated sounds would necessarily develop in conformity with constraints on motor processes, including constraints on speech breathing. Relegating such constraints to a theoretic “broad” language faculty, separate from a syntactic capacity (as in Hauser et al., 2002), or to performance processes (Chomsky, 1965), does not eliminate their structural effects on spoken language. Nor can one gain an understanding of these structural effects guided by theoretical divisions that deny their relevance. But conventional linguistic analyses can also mislead investigators. For instance, it should be clear that an analysis of children’s speech using the IPA may not serve to account for developmental milestones, such as the emergence of babble, age-related rises in MLUs, or the diversification of vocabulary that accompanies increasing MLUs. Again, viewing these changes via letter signs and “meaning units” such as words, phrases, and sentences, can foster the concept that the changes reflect emerging mental skills in ways that are

Breath-Units of Speech and Their Structural Effects

193

unconstrained by motor-sensory processes. The case of the sentence unit, as well as Browns’ MLU index, offer instructive examples in point. In linguistic analysis, the speech of children is often interpreted as containing sentences, which can preclude a view of the impact that maturing respiratory capacities have on breath units of speech or utterances. In failing to note these breath units, the underlying contributing effects of major changes in respiratory volumes on the “additive complexity” of utterances becomes unclear, or else attributed to syntactic processes on “sentences.” On the other hand, acknowledging the relationship between breath capacities and utterances provides a rationale for reported links between MLUs and measures of developing vocabulary. As the aforementioned findings suggest, the capacity to produce increasingly long breath units of speech accompanies a relative diversification of long multisyllabic forms in speakers’ active vocabulary. This does not imply that the normal growth of MLUs is a “sufficient” factor in explaining this rising use of long forms. Certainly, the development of vocabulary or multi-item formulas requires exposure to a sociolect and accompanies changing neurocognitive functions. Yet compared to social and cognitive factors, maturational changes in MLUs appear as a “necessary” variable in accounting for a diversification of vocabulary on long multisyllabic items. To clarify this point, consider that proponents of a usage-based approach have indicated that children acquire a grammar and a lexical repertoire via a probabilistic exposure to verbal forms (e.g., Bybee & Beckner, 2009; Dabrowska, 2004; Ellis, 2002; Tomasello, 2003). However, it would be odd to suggest that a speaker’s increasing repertoire of long multisyllabic forms, which accompanies developing MLUs, derives from an increasing exposure to long lexemes and formulas, and a decreasing exposure to short items. Obviously, some structural effect on the production of multisyllabic units is involved. On the other hand, the diversification of long forms in speakers’ vocabulary is not better explained by invoking an unfolding cognitive competence in manipulating grammatical concepts and “meaning” units. In fact, conceptual semantic–syntactic attributes of units do not serve to account for a diversification of lexical items in terms of their size, as measured by the numbers of syllable pulses. In explaining such aspects of spoken-language development, the cited results suggest one account that relates to the growth of processes of modality: the capacity to produce increasingly long utterances affords the possibility of combining elements in long formulas and expressions that can lexicalize with language use. By this account, though, the time course for developing aspects of spoken language refers to the contributing effects of maturing motor processes, not to the unfolding of a competence that is separate from constraints on the motor-sensory modality of expression. The same may be said of other

194

The Structure of Speech Acts

developments such as the momentous rise of babble and symbolic language (as discussed in Chapter 2). 9.3

The Structure of Spoken Language: An Interim Summary with a View on Addressing the Issue of Scriptism

The relevance of the previous chapters on speech structure can be understood in light of how syllable cycles, chunks, and utterances as breath units of speech address the problem of scriptism, or the tendency to view spoken language through writing. As noted in earlier parts of this book, several authors of historical works point out that, across cultures, writing has generally provided the concepts for thinking about the structures of spoken language rather than the reverse (Coulmas, 1989; Harris, 1986, 1990, 2000; Olson, 1993, 2017, among others cited in Section 1.1). For these authors, the structures have long been forgotten by language theorists. Current methods of analysis apply to script and incorporate a concept of language handed down from traditional Latin grammar. By this concept, language is seen to consist of a list of readymade units, such as words in a dictionary, and sets of rules laid out in a grammar serving to assemble the units into sentences (Love, 2014 in reference to Harris, 1980). Such a viewpoint is indeed reflected in current language theories that focus on grammar and that draw upon analyses of sentences on a page, most often using ready-made orthographic units (as in Figure 3.1). In sum, the aforecited historical works have brought to light that formal linguistic analyses and theories are not about the structures of spoken language. On the other hand, these works do not aim to define what these structures are. Nor do they consider that linguistic methods were developed at a time when writing was practically the only technology available to record speech. Instrumental techniques have long since served to investigate speech beyond transcripts, from acoustics to physiological and neural levels of processing. However, investigators have faced basic problems in reconciling observed structures of speech with traditional concepts of analysis. On the one hand, as is now generally acknowledged, features of articulated sounds are not bundled together in letter-like units, and no defining physical marks divide speech into words, phrases, and sentences. There is, on the other hand, a more fundamental problem in linking observations to language theory. The tradition of viewing spoken language through script has led to formal models that are disembodied and decontextualized. As a consequence, there is no consensus in research on how the brain interfaces with spoken language as described in formal analyses and theories. It was noted in the Introduction that many neuroscientists wonder why this is still a problem. Others point out that attempts to devise an interface fail to recognize the ontological incommensurability between observations of biological processes and theories erected on concepts of writing. And yet there are observable

Breath-Units of Speech and Their Structural Effects

195

structures in speech, as those described in the preceding chapters. The difficulty is in conceptualizing spoken language in terms of these structures rather than by reference to writing-induced constructs. In commenting on the interface problem, several authors have explicitly questioned the validity of conventional linguistic concepts and have asked: what are the units of spoken-language processing? (see Section 7.1). On such questions, some, like McClelland et al. (2006) and Lindblom et al. (1984), remarked that it is not sufficient to merely posit units or structures. One also has to define how they arise and their role. The observations of the previous chapters, among other related findings, bear out that neural oscillations specifically entrain to syllable-size cycles and chunks within utterances. As it appears, the brain directly interfaces with observable structures of motor speech. Moreover, these structures are biologically grounded, they have physical marks, and they can be explained by reference to circumscribable attributes of speech acts. These attributes that contribute to shaping structures are not restricted to neural processes but notably include intrinsic properties of contractile tissues, human-specific oral and nasopharyngeal systems, constraints on respiratory functions, and so forth, as described in the previous chapters. As for the role of the observed structures in communication, syllable cycles, which develop in early babble, support arbitrary sound-meaning associations and symbolic language (Chapter 2). In combination with the chunking process, the cycles constitute motor-sensory frames where articulatory features show a degree of relative-timing constancy, as seen in timing ratios of VOT and transitions (illustrated in Chapter 8). Such constancy in motor and acoustic features supports production–perception parity, which stands as an essential condition of vocal communication (defined in Section 6.2). The chunking of sequences of syllables, described in several sections of this book, has been related to functions of basal ganglia circuits and is seen to constitute blocks of action. Also, because sounds involve modulations of air pressure, the number of blocks or chunks that can be produced is necessarily related to respiratory functions and breath-divided “utterances.” This structuring of learned action sequences into chunks and breath units is fundamental to an understanding of the processing of utterances as acts of communication. In particular, the domain-general chunking that is observed in learning motor sequences, when applied to speech processing, can well underlie the type of clustering of verbal elements that are described as slot-and-frame formulas and semantic schemas in constructivist accounts. This suggests that contextual sensory experiences, which are constitutive of episodic and semantic memory, can associate with blocks of action in speech. The mechanisms that serve to bind sensory experiences to these blocks are discussed in Part IV. However, this embodied and contextualized approach to understanding the semantics of speech implies a conceptualization of utterance structure that has little to do

196

The Structure of Speech Acts

with conventional language analysis. On this point, it was noted that one historical consequence of scriptism has been the notion that natural languages consist of a list of words, as in a dictionary, and a set of rules laid out in a grammar. However, readers will likely acknowledge that the “meaning” of utterances in varying speech situations can hardly be gleaned from a grammar and dictionary entries. This seemingly obvious point is central in examining research on the semantic representations of verbal expressions and in establishing a perspective on how observable structures of spoken language convey meaning.

Part IV

The Processing of Speech Meaning

10

The Neural Coding of Semantics

10.1

Units of Writing, Structures of Utterances, and the Semantics of Speech

As discussed in Part III, observations showing a frequency-specific entrainment of neural oscillations to syllable cycles and temporal chunks suggest a direct interface between brain activity and structures of motor speech. The entrainment essentially indicates a neural processing of utterances in terms of syllable- and chunk-size sensory frames. Research on semantic processes does not generally refer to such structures or processing frames and, instead, meaning is most often conceptualized by reference to conventional units of language analysis. In presenting another viewpoint, the two chapters that follow divide the subject matter of semantic processes into two bodies of work that deal with the neural coding of semantic aspects of verbal forms and their context-based activation in spoken language. The two bodies of findings reflect different research orientations. As documented here, the first set of findings focuses on lexico-semantics and the issue of whether the semantic properties of lexemes reflect abstract concepts or are essentially grounded in cortical networks that reflect perceptual, motor, and emotional systems. The results from the second set, which has not been at the center of a grounding debate, emerge partly from clinical reports. These reports draw attention to the role of the subcortical brain with respect to the context-related activation of verbal forms in utterance comprehension. Although both sets of results intertwine, their separate treatment in the literature, and the dominance of the lexico-semantic approach, may have hindered an understanding of the on-line processing of context information in verbal communication. To clarify certain central issues in research on semantics, it is useful to begin by defining the terms “context effects,” “semantics,” and the common assumption that mechanisms of speech comprehension operate on “words” or lexemes. The latter assumption underlies several limitations of the data obtained using lexico-semantic tasks and, throughout the following, the term “verbal forms,” or simply “forms” is used to refer to units that may not be commensurate with 199

200

The Processing of Speech Meaning

notions of lexemes or words as reflected in text and dictionaries. While it is implicitly assumed that tasks designed by reference to lexemes can capture semantic processes of spoken language, people do not speak in lexemes. Obviously, a focus on these conceptual units overlooks the innumerable “bound forms” that generally accompany lexemes in utterances. These bound elements can be relatively meaningless outside of a unit that extends beyond individual lexemes to a chunk of speech. But more fundamentally, as repeatedly emphasized in this book, there is no working definition of lexemes other than a reference to marks in writing (see Section 4.2.3; Dixon & Aikhenvald, 2002; Haspelmath, 2011). In other words, these units are conceptually linked to orthographic code. They do not apply across spoken languages, certainly not to so-called wordless languages, as will be illustrated in a subsequent section. The issue of units is central to any discussion of semantic processes. In fact, it is difficult to find a definition of the object of study of semantics as it relates to spoken language. Inevitably, one has to refer to some frame or unit processing, and this can lead back to the problem of assuming conventional linguistic units. Yet most psycho- and neurolinguistic investigations assume these units. As one investigator remarked, “the study of language and the brain has predominantly focused on isolated levels of linguistic analysis (e.g., phonology, morphology, semantics or syntax) as they pertain to related units of analysis (e.g., phonemes, syllables, words, sentences or discourse)” (Skipper, 2015, pp. 101–102). Most often, studies of semantic processes make use of tasks that involve lexemes, generally reflecting units of the type found in dictionaries. This lexico-semantic method has served to gather a vast amount of data on semantic processes, but it is not the intention here to offer another review of research that pursues this approach (compendia and critical reviews can be found in Kemmerer, 2015a, 2015b; Meteyard & Vigliocco, 2018; Rueschemeyer & Gaskell, 2018; Speed, Vinson, & Vigliocco, 2015, and others cited in subsequent sections). Rather, the present chapter draws attention to the shortcomings of a lexeme- or word-based approach and develops another perspective. The shortcomings have to do with the ongoing debates on the nature of semantic representations, the segmentation and neural encoding of multimodal context information, as well as the validity of using culture-specific units of writing in investigating biological mechanisms of semantic interpretation. It is argued in the following that these problems bring to light the need to refer to sensory chunks that serve to integrate, on a neural level, multimodal context information with the structures of motor speech. 10.2

The Lexico-Semantic Approach: Context Information as “Nonessential”

Data obtained through tasks that involve picture naming, lexical decisions, word recognition, and various priming paradigms where words/lexemes are presented

The Neural Coding of Semantics

201

or solicited in isolation have shaped much of the current knowledge on semantic processes. These tasks continue to be widely used despite criticisms to the effect that isolated lexemes do not capture the way speakers communicate meaning or interpret utterances. On this point, some studies have examined cortical activity related to the semantics of discourse in natural settings (see, for instance, Duranti & Goodwin, 1992; Huth, de Heer, Griffiths et al., 2016; Willems, 2015). However, this only represents about 1 percent of all studies, according to Skipper (2015), who performed a search in twenty leading neuroscience journals. Most investigations adopt a lexico-semantic approach using tasks that involve stand-alone forms as found in dictionaries (Meteyard & Vigliocco, 2018), and most have focused on cortical activity. As mentioned, this approach dispenses with other “words” (e.g., “function words,” flexional elements, affixes, compounds, and so on). Finally, and more generally, the lexico-semantic approach tends to overlook the ubiquitous influence of speech context on the semantic interpretation of verbal expressions. These context effects refer to both situational utterance-external information (e.g., multimodal sensory events that accompany speech including voices, faces, gestures, etc. as well as general information about a scene or setting) and distributional utterance-internal information relating to the co-occurrences of verbal elements (cf. Duranti & Goodwin, 1992). In explaining the rationale behind the lexico-semantic approach, Meteyard and Vigliocco (2018) note that the meaning of lexemes needs to map onto a memory of experiences and concepts so that one can share these experiences through verbal interaction. It is understandable that much of the work in lexico-semantics has been aimed at identifying semantic properties that are “context invariant.” For one thing, the ability to convey meaning across varying contexts may be viewed as essential to verbal communication. In other words, communication of meaning using speech logically entails a process by which verbal expressions serve to convey information in memory relating to experiences or concepts shared by both a talker and listener (an extension of the Parity Condition defined in Section 6.2). But since concepts and experiences vary across individuals and settings, some see communication as bearing on some core semantic concepts of verbal forms with context-varying information appearing as “extraneous stuff that happens when a word is used” (Meteyard & Vigliocco, 2018, p. 13; see also Mahon & Hickok, 2016; Reilly, Peelle, Garcia et al., 2016). It can be asked, however, if it is at all possible to control for all “nonessential” context information, and whether this approach is not counterintuitive to the fact that semantic memory normally develops through the use of verbal forms in speech contexts. 10.2.1

Lexico-Semantics and Traditional Models of Language Processing

It is certainly the case that presentations of isolated lexemes to speakers who have learned these forms will serve to activate semantic attributes in

202

The Processing of Speech Meaning

memory. Standard test batteries involving lexico-semantic tasks also have a clinical value. Naming tests stand out as a traditional method for observing “word-finding” deficits associated with various language disorders and have a long history in aphasiology (Ross, 2010). However, a debate emerged in the 90s on the nature of semantic representations. As outlined in the following, the central disagreement in interpreting clinical and neuroimaging data bears on whether the semantics of lexemes are abstract concepts or are grounded in cortical perception and action systems (for differing viewpoints on this debate, see Mahon & Caramazza, 2008; Martin, 2016; Meteyard & Vigliocco, 2018; Speed et al., 2015). The debate, though, is largely circumscribed to observations culled from lexico-semantic tasks, most of which, by their design, minimize context effects. This tends to limit the discussion of speech comprehension processes to the semantics of lexemes and to tasks that overlook context effects. Specifically, lexico-semantic tasks are usually devised by minimizing several context variables relating to language use. Typically, this is reflected in tasks where presented words or lexemes are selected by controlling for such factors as word familiarity or frequency of usage, neighborhood effects, phonological complexity, imageability (among other factors). It should be borne in mind that such controls imply that context effects are generally present and underlie a semantic memory of verbal forms, as well as a memory of situational episodes of speech. In reality, one never hears or produces verbal expressions without accompanying speaker-related information inherent to a voice, articulatory patterns, prosody, dialectal variants (etc.). Distributional information relating to utterance contexts is also present, even when lexemes are heard in isolation (as when responding to a question). In neurolinguistic investigations, overlooking such context information by focusing on lexical tasks has had a major influence on how neural mechanisms involved in speech understanding are conceptualized. As Skipper (2015) remarked, one central consequence has been that research has persistently underestimated the number of networks involved in spoken-language comprehension. For instance, the classic model (Geschwind, 1970) where language is located in Wernicke’s area with white-matter connections to Broca’s area, taken to relate to production, offers no account of context effects on speech meaning. But nor are the effects explained by the more recent dual-stream model, where language is viewed in terms of two streams arising from the auditory cortex, with one mapping sound to meaning and the other mapping sound to motor-sensory attributes (e.g., Hickok & Poeppel, 2007; Poeppel & Hickok, 2004; for a critique, see Skipper et al., 2017). More generally, the dominant idea in these models – that spokenlanguage comprehension can be more or less localized in cortical systems – fails to incorporate a body of observations on the role of subcortical processes.

The Neural Coding of Semantics

203

In particular, it has to be considered that aphasia can arise with lesions to the thalamus, and that several language-related dysfunctions associate with other subcortical systems (e.g., Bohsali & Crosson, 2016; Crosson, 2013; Crosson & Haaland, 2003; Kuljic-Obradovic, 2003; Murdoch, 2010a; Murdoch & Whelan, 2009; Otsuka, Suzuki, Fujii et al., 2005). As some authors have remarked, conventional test batteries that involve such tasks as picture naming may not capture the role of the subcortical brain in the context-based processing of language, as other types of test material reveal (Guell, Hoche, & Schmahmann, 2015; Murdoch & Whelan, 2009; and see Chapter 11). Yet this clinical data has not been the focus of the debate that opposes embodied and disembodied accounts, which instead appears to revolve around evidence gathered from lexico-semantic tasks. A brief discussion of these views serves to illustrate how data collected by methods that overlook context effects on speech meaning may, as such, have contributed to the difficulty in defining the nature of semantic representations. 10.2.2

Embodied versus Disembodied Semantics

There is a growing body of brain-imaging studies showing that the cortical systems involved in perception, action, and emotion are activated upon presentations of lexemes with related content information (in interpreting these findings, Martin, 2016, recommends the use of the term “system” to guard against the view that one is referring to motor and sensory cortices). As an example, it has been shown that lexemes designating actions like kick, lick, and pick activate somatotopic regions and areas of the motor cortex relating to the control of the legs, face, and arms respectively (Hauk, Johnsrude, & Pulvermüller, 2004; for other examples of this type, see Moseley, Kiefer, & Pulvermüller, 2015; Pulvermüller, 2013). Imaging studies also show that reading words or lexemes designating odors, taste, and sounds activate specific systems of sensory perception (Boliek, Hixon, Watson et al., 2009; Boulenger, Hauk, & Pulvermüller, 2008). Several reports involving action priming in word recognition tasks further illustrate a link between action, emotion, and word meaning. For example, Mollo, Pulvermüller, and Hauk (2016) showed that presentations of finger-related lexemes (such as pick) and leg-related lexemes (as in kick) associate with faster reaction times when key-based responses performed with a finger or a foot are congruent with word meanings (see also the experiments of Dalla Volta, Gianelli, Campione et al., 2009; Pulvermüller, 1999; Shebani & Pulvermüller, 2013). In a similar paradigm, the processing of lexemes that designate objects requiring motions toward or away from the body (as in cup and key) is facilitated when primed by a motion in the same direction (Rueschemeyer, Pfeiffer, & Bekkering, 2010; other related examples are discussed by Meteyard & Vigliocco, 2018). It is useful to note that these links may

204

The Processing of Speech Meaning

not strictly entail cortical processes. Several studies involving clinical populations have shown that individuals presenting with impaired motor abilities associated with Parkinson’s disease, or ALS, show deficits in comprehending action words, and perform differently on nouns and verbs in lexical decision tasks (Bak, O’Donovan, Xuereb et al., 2001; Boulenger, Mechtouff, Thobois et al., 2008; cf. Neininger & Pulvermüller, 2003). As for abstract lexemes, reports have revealed that brain regions associated with the limbic system, including the medial prefrontal cortex and the amygdala (not excluding non-limbic areas such as the superior temporal gyrus), are activated by presentations of social-emotional lexemes such as honor, brave, impolite (and so on; Martin, 2015; Roy, Shohamy, & Wager, 2012). Some studies that have examined, more specifically, the processing of “abstract” and “concrete” lexemes have adopted the approach that these categories have different emotional valences (i.e., whether lexemes have a positive, negative, or non-emotional associations). For instance, experiments by Kousta, Vigliocco, Vinson et al. (2011) have shown that, on tasks of lexical recognition, faster responses to abstract lexical items, compared to concrete items, are mediated by the emotional valence of lexemes. Controlling for this valence, an imaging study by Vigliocco, Kousta, Della Rosa et al. (2013) revealed that abstract lexemes, by comparison to concrete lexemes, present a greater activation of the anterior cingulate cortex, a region associated with limbic emotional processing (see the authors’ reference to Etkin, Egner, Peraza et al., 2006). For many, such findings support an embodied cognition approach where the semantics of words is seen to be grounded in brain systems of action, perception, and emotion (Barsalou, 1999, 2008; Decety & Grèzes, 2006; Gallese & Lakoff, 2005; Matheson & Barsalou, 2018; Yee et al., 2018). In this view, when speakers interpret verbal units, their semantic memory of experiences is reactivated and thus “simulates” experiences (Barsalou, 2008, 2009; Barsalou, Santos, Simmons et al., 2008; Gallese & Sinigaglia, 2011). Critics, however, object that many verbal expressions do not refer to experiences in any sensory modality (Caramazza, Hillis, Rapp et al., 1990; Dove, 2015; Goldinger, Papesh, Barnhart et al., 2016; Hauk, 2015; Hauk & Tschentscher, 2013; Mahon, 2015). The issue of distinguishing between the sensory “format” of mental representations and abstract semantic concepts has been repeatedly raised by Caramazza, Mahon, and others in arguing against the grounding hypothesis, and for a return to a traditional cognitive approach that provides “a computational account – that is, one that would describe distinct components of cognition by the nature of their representations and the algorithms that operate on them” (Leshinskaya & Caramazza, 2016, p. 991). For these authors, semantic concepts are disembodied and abstract, and part of an independent processing component at the neural level (although the component can interact with sensory and motor systems, as in the “grounding by

The Neural Coding of Semantics

205

interaction” proposal by Mahon & Carramazza, 2008). As an example, Leshinskaya and Caramazza (2016) discuss the case of the concept square. Such a concept, they argue, can relate to a number of lexemes and has generality, but it is distinct from the perceptual experience of square. Some concepts may not relate to any sensory experience (e.g., truth, idea) and thus may not reflect episodic memory, nor action- and sensory-based representation. According to this account, then, it is the generality of concepts that allows propositional and hierarchical links between lexemes or verbal descriptions, as in cat, predator, lion, king of the jungle (for further related arguments: Caramazza et al., 1990; on theories of neural hubs that may underlie propositional and hierarchical semantics: Martin 2016; more on this in Section 10.3). Proponents of the preceding “disembodied” view of semantic attributes argue that many of the findings used to substantiate an embodied cognition account can be explained by a disembodied approach (Leshinskaya & Caramazza, 2016; Mahon, 2015; Mahon & Caramazza, 2008). Both clinical and neuroimaging data are marshaled in support of this approach, some of which draws from early analyses of semantic memory impairment, such as Warrington (1975). In her clinical study, Warrington showed that individuals who could match different pictures of an object could not name the object, nor match names to verbal descriptions of the object. This is taken to suggest that semantic aspects of lexical items and perceptual systems can be differently impaired and thus reflect different neural components (Leshinskaya & Caramazza, 2016). However, the approach advocated by Caramazza and Mahon presents several problems in relation to the role of context information on semantic memory. These issues are outlined subsequently with a view as to how context bears centrally on the question of the nature of semantic representations. 10.3

How Semantic Representations of Verbal Expressions Develop: On “Modes of Acquisition”

One crucial limitation of the disembodied view is that it does not account for how semantic concepts of verbal forms develop through language use. This limitation is also present to some extent in the embodied approach. Neither account considers the structural effects of maturing speech-motor processes on the development of verbal expressions and sound–meaning associations (as outlined in Section 2.3.3, and Chapter 9). And both views focus on data gathered from lexico-semantic tasks that minimize context effects. Yet semantic concepts (and units) are not given a priori, and some authors emphasize that context information of language use has a constitutive role in acquiring the meaning and function of verbal forms (e.g., Della Rosa, Catricalà, Vigliocco et al., 2010; Meteyard & Vigliocco, 2018; Skipper, 2015). In fact, developmental observations show that

206

The Processing of Speech Meaning

semantic properties of verbal forms largely arise from context information, which, as defined earlier, relates to both utterance-external sensory experiences that accompany speech and utterance-internal distributions of verbal forms. In terms of the situational-experiential information, there is evidence (contra, e.g., Leshinskaya & Caramazza, 2016) that the meanings of verbal expressions develop from a memory of events and objects in the context of speech. This is partly attested by the “concreteness” of early vocabulary development. Verbal forms that refer to perceptible entities are the earliest and by far the most frequent forms used by children, and these generally serve to designate objects present in the speech context (Gogate, Bahrick, & Watson, 2000; Hanhong & Fang, 2011; Schwanenflugel & Akin, 1994; Schwanenflugel, Stahl, & McFalls, 1997). Definitions of “concrete” and “abstract” classes of lexemes vary in the literature. However, there is agreement on the idea that concrete forms relate to perceivable aspects of scenes while abstract forms reflect verbally derived concepts from utterances – even though the latter abstract category can also apply to a number of concrete items that are not experienced by a speaker (cf. Paivio, 2007; and on the emotional valence of abstract lexemes: Della Rosa et al., 2010; Kousta et al., 2011; Ponari, Norbury, & Vigliocco, 2017; Vigliocco et al., 2013). This ambiguity in the classification of forms can be understood by noting that the terms concrete and abstract, when applied to developing vocabulary, reflect two “modes of acquisition” rather than distinct categories of items (Wauters, Tellings, Van Bon et al., 2003). The modes basically designate different mechanisms. Specifically, episodic memory of experiences can underlie the grounding of semantic aspects of verbal items that have concrete referents, whereas explicit semantic memory may be more involved in constituting abstract items from verbal contexts, even though forms categorized as concrete can also be verbally derived. To clarify with an example, Della Rosa et al. (2010) mention the case of Australian children who can acquire the meaning of kangaroo by hearing the term in a context where they perceive the animal. By contrast, children living in Alaska might acquire the meaning of kangaroo through verbal descriptions or information provided by co-occurring forms in utterances (e.g., a kangaroo looks like a deer, it jumps like a rabbit, it has a large tail, etc.). It is the view here that these two modes of acquisition, which can fluctuate across communities of speakers, provide essential principles for understanding context effects on the development of vocabularies. They also serve to avoid some confusion that prevails in debates on the nature of semantic representations. One basic principle is the primacy of grounded expressions over verbal derivations in the development of semantic representations of verbal forms. The first mode of acquisition, where utterance-external context information binds to spoken forms, is a necessary antecedent to a mode that utilizes

The Neural Coding of Semantics

207

utterance-internal information. In other words, acquiring the meaning of novel verbal items through linguistic derivations, such as verbal descriptions, requires the prior acquisition of concrete terms that are grounded in sensory experience. This viewpoint is evidenced by the general concreteness of vocabulary in early development (see the aforecited references). It is also supported, although quite indirectly, by reports that subjective ratings of concreteness and imageability are better predictors of children’s performance on lexical tasks than ratings of (linguistic) context availability (Schwanenflugel & Akin, 1994, cf. Della Rosa et al., 2010). Recent studies suggest that emotionally valenced forms can be amongst the first “abstract” lexemes learned by children (Ponari et al., 2017). Also, Kosslyn (1980) reported that children’s reliance on concreteness or imageability can predominate the comprehension of verbal material up to about ten years of age. To illustrate the primacy of concrete semantics using the example of Alaskan children, one can presume that forms like look, jump, deer, and rabbit would have to be grounded in experience before verbal descriptions can serve to constitute a semantic concept for a kangaroo (e.g., a kangaroo jumps like rabbit; a kangaroo looks like deer). Even then, the accuracy of a newly acquired expression may only be assessed by the speaker if it functions to communicate meaning to listeners who access a similar concept. In sum, the constitution of semantic representations of verbal forms rests on: (1) memory of attended sensory experiences in the speech context that binds to verbal expression; (2) distributional information in verbal strings that can activate semantic attributes that regularly co-occur, and finally, (3) a process that monitors effects in communication. Overall, these operations can be seen to reflect a general mechanism of “coincidence detection” as could be modeled by Predictive Coding Theory (Kok & de Lange, 2015), in which semantic representations of verbal forms can be grounded in the memory of coinciding sensory events, as well as co-occurring forms in utterances, and communicative effects. For instance, a verbal description can activate semantic schemas of regularly co-occurring forms in verbal formulas that can then serve to infer the meaning of a novel item X as in saying X looks like a deer, X jumps like a rabbit. The more predictable the items are within a formula or semantic schema, the easier it is to infer and learn the meaning of a novel verbal item. Some of these mechanisms rest on implicit memory and “statistical learning.” But the point of this discussion is that semantic properties of verbal expressions link to utterance-external and utterance-internal context information that comes with language use. In this light, attempting to define invariant semantic attributes, by controlling for “nonessential” context effects in lexical tasks, presents a questionable enterprise in that context information can be wholly constitutive of the semantic representations of verbal forms. Indeed, any number of context variables relating to language use can act as primes in

208

The Processing of Speech Meaning

lexico-semantic tasks. In reviewing several reports of these effects across different tasks, Meteyard and Vigliocco (2018) saw the limits of the approach: These studies demonstrate that different aspects of meaning become available, salient, or accessed depending on the demands of the task. It could be argued that all of these variations take place for non-essential parts of meaning, but this begs the questions of what, then, is necessary or essential? This is an old question, most famously answered by Wittgenstein (1958) – the meanings of words come from the way they are used. Trying to provide absolute definitions, find semantic components (“simple constituents”), or comprehensively detail how words are related is impossible. (pp. 18–19)

Contrary to lexico-semantic tasks, the concept of modes of acquisition suggests that context information underpins semantic representations through three mechanisms. The first involves selective attention: speakers focus on varying aspects of sensory episodes in the speech context, which, over time, constitute semantic memories that bind to verbal forms. The second mechanism implies a processing of co-occurring verbal elements in utterances, which can activate sequential attributes of formulas or semantic schemas that support the learning of semantic representations for novel forms. Finally, the semantic attributes of forms can also entail a memory of situational outcomes of verbal expressions that, as noted earlier, can involve reinforcement learning. The following sections clarify these mechanisms, beginning with the role of attention processes in integrating sensory information with structures of spoken language. A determining question in delineating the neural coding of semantic representations is how this sensory information is formatted by utterance structure. 10.4

The Partitioning of Semantic Memory and Its Formatting in Spoken Languages

It is important to note that two major issues confront an embodied account of lexico-semantics when it comes to linking sensory experiences to verbal forms that vary across languages. These problems are occasionally recognized in the literature, but their implications have not been the object of critical reviews. The first issue relates to the fact that languages can use varying sets of verbal forms to designate an object in a sensory context or, as one might say, languages “partition” meaning differently. The second issue is that languages also vary in how they segment meaning along the time axis of speech such that semantic processing does not necessarily follow units like words or lexemes as reflected in text. The variable partitioning of meaning across languages is well acknowledged: speakers of different languages use varying sets of terms and have varying concepts even when naming the same objects or events. For instance,

The Neural Coding of Semantics

209

Meteyard and Vigliocco (2018, p. 9) mention that English has two words or lexical forms for leg and foot while Japanese has only one: ashi. Such languagerelative partitioning of semantics also extends to abstract terms (e.g., English has one term process where French has two: processus and procédé). These variations carry implications with respect to attempts to define invariant or core concepts in verbal expressions that would apply to all speakers. As one author put it: “The problem is that concepts are not available a priori: different languages partition the same scenes [or sensory experiences] in radically different ways, so part of the learner’s task is to determine which aspects of a scene are lexicalized in his or her language” (Dabrowska, 2004, p. 220). On the other hand, although terms and concepts vary across languages, the nature of semantic representations or the way in which these representations are constituted in memory can be universal. The prevalent view in the neuroscience literature on lexico-semantics is that sensory information that grounds a semantic memory of verbal forms entails distributed processes, extending, by some estimates, across 58 percent of the surface of the brain (Skipper, 2015; also Huth, Nishimoto, Vu et al., 2012). Within the embodied cognition approach, the premise is that that there are cortical hubs or convergence zones that, in processing lexical items, intrinsically bring together multimodal information (Martin, 2016; McNorgan, Reid, & McRae, 2011; Patterson, Nestor, & Rogers, 2007; Reilly et al., 2016; Simmons & Barsalou, 2003). As a summary example, Kemmerer (2015b) used Figure 10.1 to illustrate the range of experiences that could potentially serve to ground the meaning of the lexeme banana. The figure suggests a distribution of action, visual, and tactile elements that is reminiscent of the cortical layout of different categories of words as presented by Pulvermüller (2013, and also Patterson et al., 2007). From this viewpoint, it may be reasoned that, even though languages partition meaning through varying sets of verbal forms, any form designating an object or event can be grounded in a memory of sensory experiences. Accepting this, hypothesized convergence zones of the brain may bind multimodal information when processing units like banana (or forms that include bound elements like a banana, bananas, and so on). Such hubs, or other hypotheses of coincidence detection, may account for a speaker’s ability to assign various experiential attributes of other verbal forms when describing a lexical unit (e.g., bananas are yellow, long, sweet, etc.). Still, not all speakers focus on the same attributes or have the same concepts when naming objects and events, nor do all speakers segment these experiences in accordance with word-like units as found in dictionaries of European languages. For instance, in Mohawk, banana and peach are approximately designated by the verbal forms tekakonhwhará:ron and teiotahià:kton, which native speakers translate as “it is bent” and “it has fur” (Karihwénhawe Lazore,

210

The Processing of Speech Meaning

Tactile Manipulation elements Spatiomotor BANANA elements

e Siz ot

ion

r

Visual elements

M

lo

Sm ell

Auditory elements

Co

Ta ste

e ap Sh

Action-oriented elements

Figure 10.1 An illustration of the embodied approach in which the semantics of the lexeme banana is seen to be grounded in perception and action systems (from Kemmerer, 2015b; adapted from Thompson-Schill et al., 2006, and Allport, 1985, with permission).

1993). This does not undermine the grounding hypothesis. However, it does illustrate that the meanings of verbal expressions vary extensively across languages and reflect selective attention to different aspects of sensory experience that can relate to speakers’ life environments. It may be that, historically, speakers of Mohawk were struck by the shape and texture of imported fruit that appeared bent and furry compared to local fruit like berries. But the more essential point is that the forms used in Mohawk are not lexemes. These are holophrases that cannot be divided in smaller stand-alone units (other examples are provided in the following). Thus, while the meanings of verbal expressions can be grounded in a memory of experiences relating to perceptual, action, and emotional systems, these systems do not as such “format” or segment meaning along the time axis of verbal expression. More critically, in languages where stand-alone forms can extend to utterances, the grounding of verbal expressions would imply activations of several perceptual systems over varying extents of speech where all items appear as bound forms. The latter segmentation issue is overlooked in research on lexico-semantics and brings into view the limits of an embodied cognition approach based on words, as represented in writing. In this approach, the meaning of verbal expressions is taken to be grounded in a memory of sensory experiences. Yet it is obvious that sensory experiences do not inherently have the format of

The Neural Coding of Semantics

211

verbal sequences. Experiences involve concurrent multimodal information. Verbal expression, on the other hand, imposes a linear format on semantic memory and concepts. To use a previous example, when one sees someone jumping or feels pain, one does not first experience an individual or a body part then an action or pain as when uttering Paul+jumps, and My arm+hurts. This serial arrangement of semantic concepts is not the product of convergence zones or any sensory system of the brain. It is attributable to the modality of expression, to the fact that spoken language involves articulated sounds that deploy over time, such that semantic concepts associated with verbal forms are expressed in sequential order (which can vary across languages). Also, given that the sounds of spoken language are ephemeral, processing order information requires immediate memory that operates over a limited sensory window, implying a “focus of attention” and chunking (Cowan, 2000), as discussed in Part III. Thus, there is a need to refer to a processing frame or a mechanism of sensory chunking when relating semantic memory to motor speech. In describing the approach of embodied cognition, Matheson and Barsalou (2018) write: What are the building blocks of cognition? From a grounded cognition perspective, the building blocks of cognition develop in the modalities and the motor system, constrained biologically by the neural systems that have evolved to interface with the environment (Barsalou, 1999; Barsalou,Simmons, Barbey, & Wilson, 2003; Meyer & Damasio, 2003). (p. 6)

In the case of spoken language, the brain is interacting with the environment via structures of articulated sounds. Sensory experiences in semantic memory come to associate with these structures of motor speech presenting sequences of syllable-like units grouped in blocks or chunks. However, the “blocks” reflect biological constraints on actions and the processing of action sequences, not culture-specific concepts like words. In short, the above-embodied account offers a biologically plausible approach to understanding how meanings of verbal forms in any language can be grounded in a selective attention to aspects of sensory experience. But it does not account for the formatting of meaning in spoken language precisely because this formatting is not attributable to cognitive operations on semantic concepts and instead reflects inherent constraints on the motor-sensory modality of expression. 10.4.1

Words Are Not Biologically Grounded Units: Why Sensory Chunking Is Necessary

In research on lexico-semantics, it should be a concern that results gathered on tasks using words or lexemes do not generalize across languages, essentially because such units reflect conventions of European writing. The arbitrariness

212

The Processing of Speech Meaning

of these concepts should be clear from the fact that there is no operational definition of these units except by reference to marks like spaces in writing. In fact, early alphabetic script was undivided (appearing in the form of scriptio continua; Saenger, 1997). Spaces are reputed to have been invented by AngloSaxon monks in the eighth century, and remained inconsistent for more than a century (Parkes, 1992). The relevance of this for research on the nature of semantic representation should not be overlooked. Currently, it remains unclear whether tasks that use units relating to conventions of writing can capture biological mechanisms of semantic processing that apply across languages. One is immediately made aware of the problem of analyzing “meaning” through words or lexemes when one views languages that do not have such forms. For instance, for users of European-style alphabets, stand-alone forms in socalled polysynthetic languages (named as such because they lack word-like units in writing) may appear as “sentences” or “phrases.” This is the case of Inuktitut where verbal units are composed of up to nine or ten “affixes.” To illustrate these forms, Lowe (1981, p. 46) offered the following literal translations of phrases that include the base form expressing the idea of “house.” “a lovely house” “a house made of snow” “a lovely little house made of snow” “you have to build a house” “as you were constructing a house” “we are in a lovely little house made of snow” igluvigatsiarulungmunngauqatiginiaqtagit : “I’ll go with you to the lovely little house made of snow” iglutsiaq: igluvigaq : igluvigatsiaruluk : igluliuriaqaqtusi : igluliuqtutit : igluvigatsiarulungmiittugut :

Lowe cautioned readers that expressions such as these are not analogous to “compounds” in languages such as German, where one finds forms like Der Autobahnreststättenbesitzer (“the owner of a rest area along a highway”). Such concatenations can be viewed as containing “lexemes” Auto, Bahn, Rast, Stätte, Besitzer. But in the examples from Inuktitut, as in other indigenous languages of the Americas (like the preceding examples from Mohawk), added elements including the base element iglu- do not function as quasi stand-alone forms. It is the entire sequence that stands alone. In considering such cases, some authors have asked whether it is possible to replace the “troublesome (orthographic) word” with another unit of analysis and have concluded that there is no viable alternative (Allwood, Hendrikse, & Ahlsén, 2010). Yet polysynthetic languages present a dilemma only if one conceptualizes meaning in utterances as segmented by words, as in text and dictionaries. In fact, words as conventional units of writing do not as such provide a working principle of utterance segmentation (no more than letters provide a principle for segmenting

The Neural Coding of Semantics

213

sound features of speech). Sensory experiences can underlie semantic representations, but it is only when these representations bind to sequences of articulated sounds that a necessary segmentation applies, and this segmentation can be common to all languages. Because speech sounds unfold over time, their processing requires a buffering of motor-sequence information over sensory chunks of signal. As documented in Chapter 8, these sensory frames present a domain general principle that is reflected in the entrainment of neural oscillations to grouping marks in speech. But such sensory chunking operates specifically on blocks of articulated sounds and does not reflect culture-specific divisions of words. Lowe (1981) entitled his essay on Inuktitut Analyse linguistique et ethnocentrisme precisely because any attempt to analyze verbal forms and meaning by reference to conventional units like words entails a troublesome centrism. More specifically, the observer who is attempting to comprehend Inuktitut and who is used to alphabetic writing must come to conceptualize that speakers of this language do not use units as represented in text and dictionaries. These speakers do not name things like “a house” without designating events, or objects experienced with a house, who is in it, how it is made (etc.). Thus, analyzing the semantics of these examples by breaking them down into wordlike “roots,” such as iglu-, is problematic in that speakers of the language never produce these forms in isolation, and children do not hear “bare roots” (Mithun, 1989). It is like breaking down parts of words in English to create rasp-, mul-, cran-. They may be recognizable, but they are meaningless if they are not included in forms: raspberry, mulberry, cranberry – or more usually larger chunks when speaking to someone. The “affixing” scheme of Inuktitut presents such a case, except that there are no elements like berry, so that meaningful units can comprise bound elements extending to an utterance. As emphasized earlier, these examples carry broad implications for embodied accounts and lexico-semantics in that the mechanisms involved in the segmentation of utterances are not, and were never defined by the use of orthographic units like letters and words to describe spoken language. It is also in considering the way all speakers segment speech that the processing of meaning may not be very different across languages. Utterance comprehension is not reducible to the activation of semantic representations for words as represented in dictionaries, and implies, in all languages, a sensory chunking of articulated sounds that unfold over time. It is this chunking, not writing concepts, that shapes the forms of semantic activations. Research on lexico-semantics does not address the question of how semantic and episodic memory are activated in the processing of articulated sounds in speech acts. But the way in which memory is activated in context and in terms of blocks of articulated sounds is central to an understanding of the nature of semantic representations and speech comprehension.

214

The Processing of Speech Meaning

On the preceding idea of blocks of sounds, readers may have remarked the complexity of the examples from Mohawk and Inuktitut, and they may wonder how such sequences are learned. Studies on the acquisition of polysynthetic languages have led some analysts to recognize that the problem is one of chunking: “Faced with a continuous stream of speech, the child must first solve the ‘initial extraction problem,’ learning to recognize and remember recurring chunks of speech” (Mithun, 1989, p. 286). Once chunks are extracted, several reports suggest that the child progresses from simple to complex forms in a way that is similar to the slot-and-frame formulas and semantic schemas described by Ambridge (2017) and Ambridge and Lieven (2015; discussed in Section 3.2.1). In the case of polysynthetic languages, slots can correspond to emerging categories of bound elements and even some stand-alone forms (i.e., some slots are reserved for proper nouns; Fortescue, 1984; see also Crago, 1990; Crago & Allen, 1997; Mithun, 1989). Sensory chunking of articulated sequences can thus constitute a cross-language process that undergirds the rise of formulas and semantic schemas that are not reflected in orthographic conventions of linguistic analysis. Also, the multimodal sensory experiences that come to be associated with verbal expression through language use are not limited to the type of “core” semantic concepts that are studied with lexicosemantic tasks. They include situational episodic information of speech communication, as illustrated in the following. 10.4.2

On Representations of Verbalized Forms in Memory: Activating Episodes of Speech Acts

It was noted that experiments involving action priming with body motions can activate content-related words. For instance, in lexical decision tasks involving recognition memory, faster responses are obtained on lexemes like cup and key when participants are asked to respond with a movement toward or away from their bodies, or on words like kick and pick when the participants respond using a foot or a finger (e.g., Dalla Volta et al., 2009; Mollo et al., 2016; Pulvermüller, 1999; Shebani & Pulvermüller, 2013). However, it is curious to note that, within the literature on embodied semantics, repeated demonstrations of action priming with body motions have not been extended to speech motions. Yet verbal forms are represented in long-term memory, not only with their semantic features, but also with their articulatory features. Consequently, effects of action priming on recognition memory of verbal forms should extend to actions of verbalization. An experiment by Lafleur and Boucher (2015) examined these particular priming effects. Although the study focused on the effects of feedback on recognition memory of verbal items, it involved a priming of the motion attributes of lexemes and showed that such priming activates memory of situational information associated with acts of speech.

The Neural Coding of Semantics

215

In one of the two experiments, participants rehearsed visually presented lexemes in four production conditions. To observe the specific effects of action priming on memory of produced words, auditory feedback was blocked by delivering a loud white noise through headphones that the participants wore throughout the rehearsal phase of the experiment. Different sets of common lexemes were randomly assigned to the four production conditions. In these four conditions, participants were asked to rehearse different sets of lexemes, either by reading some items silently (“imagine saying the word”), by producing lip-synching motions, by saying items out loud, or by saying items out loud while looking at a listener in the test room. The participants then performed a distraction task after which they were asked to identify, on a list, which lexemes had been produced out loud. The results, displayed in Figure 10.2, showed a gradation of motion-priming effects on memory of the verbalized forms as compared to unrehearsed items. In particular, the priming of phonatory and/or articulatory motions, as when saying an item out loud or with silent articulatory motions, had a stronger effect on memory of verbalized items than when items were silently read. But recall was best when items had been spoken while addressing a listener. It is important to note that the preceding experiment was repeated with a second group of participants using non-words rehearsed in the same four conditions. Compared to the lexical items, recall of having produced the nonwords did not show gradation effects of different motion priming, and recall was relatively flat with non-significant differences across conditions. In other words, rehearsal in the first experiment specifically activated motion features linked to lexical forms in long-term memory. Moreover, the condition where participants rehearsed items while talking to someone activated not only the motion features of the lexical items in long-term storage, but also information on the situation in which these motions were performed. In this sense, if processing the meaning of verbal forms rests on an activation of experiences in semantic memory (e.g., Barsalou, 2008, 2009; Barsalou et al., 2008), then the preceding results indicate that the activations include memory of firstperson experiences of talking to someone. This has a number of implications with respect to embodied accounts of semantic processing. Verbal items are not represented in the brain solely in terms of experiences relating to word meanings. They are also represented in terms of situated actions of talking, and the aforecited observations of priming effects of articulatory-phonatory actions on the recall of verbal forms imply that experiential information in memory binds to motion features of verbal items and is therefore constrained by structures and processes of motor speech. Yet theories of embodied cognition deal with semantic representations and do not consider how constraints on motions “of the body,” in relation to verbal expression, impose a format on semantic representations.

216

The Processing of Speech Meaning 0.8

recall ‘produced aloud’

* 0.6 * 0.4

0.2

0 aloudlistener

aloudalone

lipsych

imagine not saying rehearsed

rehearsal condition

Figure 10.2 Effects of rehearsing lexical items in different production conditions on participants’ recall of having produced the items (n = 20; adapted from Lafleur & Boucher, 2015).

10.5

The Nature of Semantic Representations: On the Neural Coding of Context Information in Action Blocks of Speech

The aforecited priming experiment suggests that, whatever semantic memories are, they are represented along with motor features of speech, and, consequently, inherent constraints on motor processes will shape the representations in one way or another. This viewpoint on the formatting effect of spoken language differs from the one that prevails in lexico-semantic research. In this area, the debated question of “format” does not refer to constraints on the modality of expression. Instead, the central issue is whether semantics is grounded in sensory experiences or whether it reflects abstract concepts, a problem originally formulated by Caramazza et al. (1990). Reviewers of this debate note that, despite conflicting interpretations, there is a consensus on the cortical location of perceptual, action, and emotional systems that are activated when processing semantic properties of lexemes (Kemmerer, 2015a; Martin, 2016; Meteyard & Vigliocco, 2018). That said, it is essential to note that the program of work in lexico-semantics focuses on a certain type of context effects while excluding other types. Specifically, context information of sensory experiences that link to the literal meaning of stand-alone lexemes is included in the program. But multimodal context information that accompanies speech and habitual language use, which almost

The Neural Coding of Semantics

217

never entails isolated lexemes, is excluded or not considered. In both cases, nonetheless, sensory context underlies semantic representations, and this is most clearly seen in language development where early vocabulary generally serves to designate entities and events in the speech environment (see Section 10.3). In other words, the semantics of children’s vocabulary predominantly reflects the binding of multimodal context information with verbal structures, and this essentially underpins the nature of representations in semantic memory. In reviewing the ongoing debate in lexico-semantics, Martin (2016) remarks that activity in specific brain regions, when processing lexemes, does not help define the nature (or “format”) of semantic representations, or the adequacy of cognitive descriptions of meaning. In his view, all that is known “is that at the biological level of description, mental representations are in the format of the neural code” (p. 8). He concludes by saying that “what is missing from this debate is agreed-upon procedures for determining the format of a representation” (p. 8). For some readers, this debate may seem paradoxical in that research involving lexico-semantic tasks systematically minimize the contextual effects of language use, which all but obscures the fact that context information underlies semantic representations. On the format question, the debate overlooks the point that semantic representations of verbal forms do not have the format of sensory experiences. It is worth repeating that, inasmuch as semantic representations bind to verbal expression, they are formatted in terms of sequences of articulated sounds. Furthermore, because speech involves fleeting signals, speech processing requires a buffering of sound sequences in terms of sensory chunks where dictionary-like forms such as lexemes do not stand out as isolated units, but generally appear with bound forms and in blocks constituting formulas (as discussed in Section 10.4.1). As for an agreed-upon procedure that could determine the neural coding of semantic representations, if one accepts that the sensory context is constitutive of semantic representations, then a procedure is available in terms of analyses of cross-frequency coupling in neural oscillations. These analyses essentially capture the integration of multimodal sensory information with speech structure. An example was provided earlier in reference to Schroeder et al. (2008) who observed a cross-frequency phase-amplitude coupling during audio-visual stimuli (illustrated in Section 6.4.2). Another illustration is provided in Figure 10.3, which presents the analysis of a frontal electrocorticographic response by Canolty, Edwards, Dalal et al. (2006; also in Canolty & Knight, 2010). The figure shows a phase-amplitude coupling in tasks where participants heard tones and brief speech sounds while viewing landscape photographs. The authors report a phase-amplitude coupling between theta and high-frequency gamma waves relating to visual stimuli, suggesting an integration of information of a visual scene with sound patterns.

218

The Processing of Speech Meaning

Canolty and Knight did not detail the sound and speech stimuli that entrained the low-frequency oscillations, as is often the case in studies of phase coupling. This tendency to omit the specifics of acoustic stimuli has created a degree of confusion in the literature on the role of neural oscillations in speech processing (see, for instance, the range of assumed units and prosodic patterns that are seen to entrain delta waves, criticized in Boucher et al. 2019). However, a recent report by Mégevand et al. (2018) has demonstrated a cross-frequency coupling for audio-visual speech stimuli in the auditory cortex (but not the visual cortex), based on the entrainment of delta-band oscillations. Importantly, inter-trial phase coherence appeared for delta waves, which could be linked to quasiperiodic chunks in the speech stimuli, as in Boucher et al. (2019). Although Mégevand et al. (2018) did not specify the acoustic properties of their stimuli, the examples they provided in reference to an earlier report (Golumbic, Ding, Bickel et al., 2013) suggest that chunk patterns were presented. Specifically, the study involved speech stimuli transcribed as My best friend, Jonathan, has a pet parrot who can speak. He can say his own name and call out, ‘Hello, come on in’ . . . When sounded out, these stimuli have quasi-periodic groups (roughly, My best friend,/ Jonathan,/ has/ a pet parrot/ who can speak./ He can say/ his own name/ and call out/ ‘Hello/ come on in’/ . . .). It will be recalled

200

20

150

modulation index

frequency for amplitude (Hz)

175

125 100 75 50

00

25 2

4

6

8 10 12 14 16 frequency for phase (Hz)

18

20

Figure 10.3 Phase-amplitude coupling between theta-band oscillations (5–6 Hz on the horizontal axis) and gamma oscillations (>75 Hz on the vertical axis) during presentations of speech sounds and photos of landscapes (from Canolty et al., 2006, with permission).

The Neural Coding of Semantics

219

from Chapter 8 that controlled temporal groups with signature marks of sensory chunking entrain delta-band oscillations. Moreover, this delta entrainment operates on chunked sequences of articulated sounds and not on chunked tones. Taken together, these reports suggest that a coupling of cross-modal information can occur in terms of a low-frequency neural entrainment. In summary, the binding of multimodal context information in memory to speech structures can be constitutive of semantic representations of verbal forms. As for the binding mechanism, this is likely to entail a crossfrequency coupling of neural oscillations that can entrain to structures of motor-sensory speech. Although this principle requires further delineation with respect to its neural implementation, there is evidence that the coupling of oscillations to sensory stimuli underlies the formation and retrieval of semantic or episodic memories associated with pictures and lexemes (Osipova, Takashima, Oostenveld et al., 2006; Staudigl & Hanslmayr, 2013; see also the studies reviewed by Watrous, Fell, Ekstrom et al., 2015). As discussed further on, such evidence does not currently extend to processes of speech comprehension. Nonetheless, the coupling principle, and methods such as those of Canolty and Knight, can serve to understand how multimodal attributes in semantic memory link to chunks of articulated sounds that constitute, in effect, “meaningful blocks of action.” The fit between endogenous lowfrequency oscillations and domain-general chunking can help systematize a body of behavioral evidence ranging from grouping effects on verbal recall, which some relate to the focus of attention (Cowan, 2000; Cowan, 2005), to selective effects of neural oscillations of varying frequencies on the encoding and retrieval of episodic or semantic memories (e.g., Buzsáki & Moser, 2013; and the research discussed subsequently). Yet, even if this coupling mechanism can define the nature of semantic representations, it does not serve to understand context-related speech meaning. In normal speaker–listener interactions, multiple situational indices can activate varying details of experiences in episodic or/and semantic memory. This dynamic activation does not relate to issues of how semantic representations come to be constituted in cortical systems or how semantic memory binds to sensory chunks of speech. Instead, it relates to the question of how attributes in semantic or episodic memory are selectively activated moment-by-moment when processing utterances in speech contexts, which is the subject of the next chapter.

11

Processes of Utterance Interpretation: For a Neuropragmatics

11.1

The Issue of the Selective Activation of Semantic Representations in Speech Contexts

The previous sections suggest that the integration of the sensory experiences underlying semantic and episodic memory of verbal forms can entail a crossfrequency coupling in neural oscillations and a neural entrainment to chunked sequences of articulated sounds. Some investigators may contend that this coupling principle might simply reflect the effects of “coincidence detection,” where a redundant exposure to sequences of sounds and accompanying sensory experiences creates cortico–cortical couplings, as modeled in the Predictive Coding Theory (e.g., Pulvermüller, 2013). On the other hand, Hebbian learning and predictive coding would not account for the chunking of speech acts, or how interpretations of utterances arise “on the fly,” sometimes as a result of memories of singular episodic events. There are countless examples of this in daily speech interactions. For instance, an individual enters a workspace and greets a co-worker by saying How’s the knee? In an embodied account, the meaning of lexical items is said to be grounded in perceptual and action systems such that hearing knee would activate a fixed property coded in cortical networks, “simulating,” as it were, a sensory experience (e.g., Barsalou, 1999). Yet such activation hardly serves to understand the utterance. Clearly, the meaning of How’s the knee cannot be reduced to the lexeme knee. But nor can one comprehend the greeting out of context. In this case, the greeting makes sense if one refers to a shared experience between the talkers (e.g., before their encounter, the individuals were performing an activity and one fell, hurting his or her knee). It is this shared momentary experience or the activation of a memory of an episode that serves to interpret the greeting. Such experience, though, involves a single sensory event that has been encoded for its salience and valence. This has little to do with Hebbian learning or probabilistic associations of a verbal form and incidents involving a body part. Consequently, predictive coding schemes that adjust the 220

Processes of Utterance Interpretation

221

weighting of cortico–cortical couplings through repeated experiences may not explain instances where utterance-external context information selectively activates elements of episodic memory in interpreting speech. But there is also the utterance itself (or utterance-internal context). Theories of predictive coding do not as such serve to specify activations of semantic representations over successive parts of utterances, or over any extent of speech. Yet defining the “parts” on which activations operate is essential to a processing of meaning given that speech unfolds over time. Fundamentally, utterance interpretation implies sequential activations that operate moment-by-moment in terms of some chunking of verbal input. This was illustrated previously (in Section 6.4.3) by reference to “sentence disambiguation” tests, as reviewed by Chenery et al. (2008). These authors describe several studies showing that individuals with Parkinson’s disease perform poorly on tasks that require a contextual disambiguation of items (and tasks of probabilistic learning). But disambiguation of verbal input implies an activation of semantic representations on a chunk-by-chunk basis. It will be recalled that Chenery et al. refer to examples such as The shell was fired toward the tank and note that there are three ambiguous parts in the sentence, which implies that contextually appropriate meanings have to be sequentially activated (and contextually inappropriate meaning suppressed) in terms of successive parts. Although this example refers to text, the same principle holds quite generally for speech comprehension. Common utterances, even overlearned formulaic greetings, contain elements that, by themselves, are ambiguous and require successive activations in chunks that can appear as formulas (e.g., How’s/the knee, How’s/your brother, How’s/it goin’, How’s/that report/comin’ along, etc.). There is evidence of such bracketing of speech in subcortical regions (as illustrated in the following). But the point is that the activation of semantic representations in speech comprehension entails an on-line processing of utteranceinternal and -external context information implying a sensory chunking of input that deploys over time. It should also be kept in mind that both chunking and context-based selection have been related to subcortical functions, notably cortical-basal ganglia interactions, which will be discussed. This is not to say that cortico–cortical couplings are not involved. However, it is perhaps essential to distinguish the mechanisms that encode semantic representations in cortical perceptual and action systems from those that bear on context-based activations or the retrieval of representations in interpreting speech. The latter processes of activation and retrieval have not been the focus of lexico-semantic research. Thus, models that have developed by reference to the semantics of words and the embodiment of meaning in cortical systems offer few insights into context-based speech comprehension, as Skipper (2015) has noted:

222

The Processing of Speech Meaning

In current models, language processing occurs in a few networks but there are likely more networks that are more distributed than proposed.. . . For instance, it is now fairly uncontroversial that word processing is spread out in a way that follows the organization of sensory and motor brain systems (Martin et al., 1995). For example, action words referring to face, arm or leg movements differentially activate motor regions used to move those effectors (Pulvermüller, 2005). By averaging over multiple types of words not in the same word class (like action verbs), these distributed networks are being averaged out. Thus, what we probably see with neuroimaging is not the “semantic network” (Binder et al., 2009) but, rather, connectivity hubs in a distributed system (Turken et al., 2011) involving as many networks as we have words and their meanings (not to mention associated words). Furthermore, these results imply that the networks supporting language are not as fixed or static as (implicitly) suggested by contemporary models. Similarly, if language comprehension requires context and the context of language comprehension is always changing, then the brain networks supporting language comprehension must also be changing. In summary, we need a model that is more consistent with general models proposing a more complex and dynamic network organization of the brain (Bressler and Menon, 2010; Bullmore and Sporns, 2009). (Skipper, 2015, p. 108)

The difficulty is that studies of semantic processes have largely involved controlled verbal stimuli that minimize utterance-external and utteranceinternal context information. As noted earlier, much of this research focuses on test protocols where words are presented or solicited as isolated units. Consequently, mechanisms of context-based processing are relatively unknown. To cite Skipper again: “How then does the brain use context? I suggest that we have very little idea because most cognitive neuroscience research has been done with isolated levels and units of linguistic analysis without much context” (2015, p. 104). On the other hand, there is a body of clinical investigations attesting to the effects of subcortical lesions and diseases on the context-based processing of verbal items. Of interest is work by Murdoch and Whelan (2009; see also Crosson, 2013; Crosson & Haaland, 2003; Murdoch, 2010a, 2010b), which provides a compendium of clinical observations in which unconventional test batteries focus on semantic manipulations of words and sentences. To be clear on the relevance of this work, the clinical results do not as such relate to the issue of how semantic attributes link to utterance-external context or how the attributes are embodied in cortical systems, but they critically show a subcortical participation in the processing of verbal elements in given verbal contexts that require semantic manipulations. 11.1.1

Context-Based Semantics: Clinical Observations Using Unconventional Test Batteries

In their review of research on language disorders associated with subcortical pathologies, Murdoch and Whelan (2009) note the difficulty in delineating

Processes of Utterance Interpretation

223

the role of the subcortical brain using traditional aphasia batteries. According to the authors, “general measures of language function typically utilized in the assessment of subcortical aphasia may have lacked the requisite sensitivity to detect more subtle, high-level linguistic deficits” (p. 97). In one series of studies, Murdoch and Whelan set out to describe the postoperative effects that a bilateral pallidotomy (involving a surgical lesioning of the postventral part of the internal globus pallidus) and a unilateral thalamotomy (a lesioning of the ventral intermediate nuclei) have on language functions. Their compendium of reports also extends to nonsurgical groups of individuals with cerebellar lesions and Parkinson’s disease. It is important to mention that, as described by Murdoch and Whelan, ablations in the basal ganglia and thalamocortical circuitry, used to treat dystonia and motor-related symptoms of Parkinson’s disease, aim to reduce any excessive inhibitory output of the globus pallidus-substantia nigra complex. Such interventions reduce the inhibition of thalamusmediated excitatory signals to the cortex, especially the frontal regions. However, this thalamic disinhibition can have several unwanted effects in terms of language performance. These include semantic-lexical selection deficits, speech-initiation difficulties, and combined language comprehension and production deficits that arise from what Murdoch and Whelan call a disturbed thalamic “gating” mechanism and random excitation of inhibitory cortical interneurons. In other words, the random inhibitions impact activations of both motions and semantic representations in language comprehension and production. In examining the effects on language, the novelty of the studies performed by Murdoch and Whelan relates to the use of assessment inventories serving to evaluate “high level linguistics” in contrast to conventional tests such as the Boston Naming Test, the Boston Diagnostic Aphasia Examination, the Token Test (etc.). Table 11.1 lists the assessment batteries and the subtests in question. From the entries in the table, it can be seen that most (if not all) of the measures said to reflect high-level language have to do with the ability to manipulate the semantic attributes of words and sentences within given verbal contexts. Using these types of tests, individuals who had undergone neurosurgery were evaluated at different postoperative intervals. Despite the variability observed across speakers, it was reported that pallidotomy and thalamotomy were consistently associated with a greater vulnerability on high-level language processes compared to general language abilities as assessed on conventional batteries. This was also the case for groups with cerebellar lesions and Parkinson’s disease. The reader who is interested in the performance of clinical groups on different test items is referred to the reports cited in Murdoch and Whelan (2009), who also reviewed models of subcortical functions. Some essential findings are as follows.

224

The Processing of Speech Meaning

Table 11.1 Test batteries suggested by Murdoch and Whelan (2009) to observe the effects of pallidotomy, thalamotomy and pathologies of the cerebellum on “high level linguistics” Test of Language Competence-Expanded edition (TLC-E) (Wiig & Secord, 1989) Subtests: 1. Ambiguous sentences e.g. providing two essential meanings for ambiguous sentences (e.g. Right then and there the man drew a gun). 2. Listening comprehension: making inferences e.g. utilizing causal relationships or chains in short paragraphs to make logical inferences. 3. Oral expression: recreating sentences e.g. formulating grammatically complete sentences utilizing key semantic elements within defined contexts (e.g. defined context At the icecream store; key semantic elements = some, and, get). 4. Figurative language e.g. interpreting metaphorical expressions (e.g. There is rough sailing ahead for us) and correlating structurally related metaphors (e.g. We will be facing a hard road) according to shared meanings. 5. Remembering word pairs e.g. recalling paired word associates. The Word Test-Revised (TWT-R) (Huisingh et al., 1990) Subtests: 6. Associations e.g. identifying semantically unrelated words within a group of four spoken words (e.g. knee, shoulder, bracelet, ankle) and providing an explanation for the selected word in relation to the category of semantically related words (e.g. The rest are parts of the body). 7. Synonym generation e.g. generation of synonyms for verbally presented stimuli (e.g. afraid = scared). 8. Semantic absurdities e.g. identifying and repairing semantic incongruities (e.g. My grandfather is the youngest person in my family = My grandfather is the oldest person in my family). 9. Antonym generation e.g. generating antonyms for verbally presented stimuli (e.g. alive = dead). 10. Formulating definitions e.g. identify and describe critical semantic features of specified words (e.g. house = person + lives). 11. Multiple definitions e.g. provision of two distinct meanings for series of spoken homophonic words (e.g. down = position/feathers/feeling). Conjunctions and transitions subtest of the Test of Word Knowledge (TOWK) (Wiig & Secord, 1992) e.g. evaluation of logical relationships between clauses and sentences (e.g. It is too cold to play outside now. We will play outside (until/when/where/while) it gets warmer?). Wiig-Semel Test of Linguistic Concepts (WSTLQ) (Wiig and Semel, 1974) e.g. comprehension of complex linguistic structures (e.g. John was hit by Eric. Was John hit?).

In individuals who had undergone pallidotomy, discernible changes were largely restricted to subtests of high-level language. According to the authors, the results suggested that the globus pallidus is involved in the focusing mechanism that “potentially underpins the frontal lobe-mediated lexical-semantic manipulation and selection” (Murdoch & Whelan, 2009, p. 124). The results on specified subtests supported the role of the globus pallidus, “in particular,

Processes of Utterance Interpretation

225

a role in manipulating multiple competing lexical elements in both the context of verbal expression (Wallesh and Pagano, 1988) and comprehension (Nadeau and Crosson, 1997)” (p. 125). In further studies, comparisons, using non-lesioned speakers as a reference, were performed on groups that had undergone bilateral pallidotomy, unilateral thalamotomy, and groups with cerebellar lesions of vascular origin. Again, obvious differences appeared for the clinical groups on high-level verbal tasks. However, compared to pallidotomy, cerebellar lesions were associated with a deficient performance on a broad range of subtests, and thalamotomy was associated with deficits on an even broader variety of subtests. This was taken to suggest that the thalamus exerted a “superordinate” influence on semantic manipulations, with the cerebellum “influencing the overall efficiency with which other subcortical brain regions, excluding the thalamus, mediate complex lexico-semantic operations” (p. 127). It was also reported that there were improvements and declines on high level semantics associated with changing hypokinetic and hyperkinetic components of motor behaviors. The authors expressed several reservations in their interpretations relating to the potential effects of neuroplasticity and the consequences of local lesioning on neighboring anatomical structures. Nonetheless, the types of assessments used in these reports offer a unique set of findings in comparison to observations derived from conventional batteries (and see Cook, Murdoch, Cahill et al., 2004; Crosson, 1992, 2013; Crosson & Haaland, 2003; Murdoch, 2010b). They also suggest, in contrast to studies that emphasize cortical lexico-semantics, that understanding the role of subcortical regions requires testing material that can evaluate the semantic processing of items in verbal contexts. It can be argued that clinical observations culled from assessment batteries offer no obvious link to spoken language or brain mechanisms. It can also be remarked that the batteries in Table 11.1 can require participants to produce items, and this entails cortico–thalamic interactions that extend beyond the frontal lobe to motor-related systems, which was not the object of the aforementioned reports (nor did the reports pay much attention to how a reduced inhibition of thalamic excitatory signals could underlie deficits of both motor control and semantic selection). A clarification of these interactions is important in that utterance comprehension requires an on-line processing of context information in relation to structures of motor speech. 11.2

On Context-Based Speech Comprehension: Selective Activation of Semantic Representations On-Line

It is worth repeating that there is a paucity of research on the semantic processing of utterances. Clinical studies, such as those reviewed by Murdoch and Whelan (2009), and neuroimaging research on lexicosemantics do not address the overarching issue of context-based interpretations

226

The Processing of Speech Meaning

of speech. Clearly, spoken-language comprehension cannot be reduced to lexemes and activations of cortical systems that embody the meaning of lexemes (cf. Barsalou, 1999, 2015). Other neuropragmatic mechanisms are involved. Moreover, given that the neural processing of utterance structure does not align with words as represented in text and dictionaries, a review of research that assumes these units would not help clarify mechanisms of speech comprehension. Avoiding such assumptions, the following presents a limited discussion of findings relating to mechanisms that can support a semantic processing of speech, including a brief recapitulation of essential points from the previous chapters. The aim is to provide a perspective on two basic issues facing research on speech meaning. These relate to (1) how sensory information underlying semantic representations couples to action blocks in utterances, and (2) how context-based activations of this information in semantic and episodic memory support speech comprehension. Perhaps the essential starting point in addressing the preceding questions is sensory chunking, a process that, historically, has been associated with the basal ganglia (as discussed in Section 6.4.3). Utterances unfold over time, they involve sequences of articulated sounds and, because of limitations on immediate memory and attention, sequences are processed and learned in chunks. This “bracketing of actions” (Graybiel & Grafton, 2015) presents characteristic marks in speech and oral recall (see Section 8.2.1). As some authors have suggested, chunked sequences in spoken language can appear as clusters of elements or verbal formulas that can underlie semantic schemas (Chapter 3 and Section 6.1). In this sense, sensory chunking can be constitutive of meaningful blocks of speech action. Neurophysiological correlates of the chunking process have been identified (Chapter 8). As illustrated earlier, low-frequency neural oscillations entrain to syllable-like energy pulses and groupings of these pulses that, as some researchers contend, provide sensory frames for speech processing. In particular, oscillations in the range of delta or low theta frequencies entrain to signature marks of chunks in speech and not to similar marks in presented tones. Thus, there is a specific relationship between sensory chunking and the processing of speech-motor sequences. As for how blocks or chunks of articulated sounds become meaningful, it is known that “concrete” expressions are the first to emerge in vocabulary development (see Section 10.3). Before indirect strategies can be applied to verbally derive meaning for novel expressions (via explicit descriptions, analogies, co-occurring forms in utterance contexts, etc.), children acquire vocabulary where verbal items generally serve to designate entities that are experienced in the context of speech, indicating that meaning first develops by a coupling of multimodal sensory experiences to structures of articulated sounds. It should be added that semantic memory of learned verbal items not only reflects passive experiences of subject-external entities, but also

Processes of Utterance Interpretation

227

incorporates episodes of speech acts, a memory of first-person experiences in producing sounds and their effects in communicating with others (discussed in Section 10.4.2). According to some classic definitions, episodic memory reflects in the ability to recall first-person experiences, and semantic memory arises when repeated episodes are encoded, eventually becoming “contextindependent” (Tulving, 1972; Tulving & Thomson, 1973). The terms apply to the development of meaningful speech acts: first-person episodic memory of experiences that accompany speech encodes semantic representations that associate with chunks of articulated sounds. As for how multimodal experiences bind to these blocks, one mechanism is cross-frequency coupling in oscillatory neural activity (Section 10.7). Reports show an integration of multimodal (audio-visual) information via phase-amplitude couplings in the auditory cortex, and research suggests that the entrainment of low-frequency oscillations can serve to bind this information to structures of motor speech. It is the view here that this binding entails a neural entrainment to chunks of action presenting periodicities in the range of delta or low theta frequencies (a hypothesis that is further substantiated in the following). However, current evidence supporting this coupling of multisensory information to speech structure is limited. Moreover, reports of neural entrainment to speech most often refer to heard structural patterns and oscillations in the auditory cortex. This does not address the issue of how semantic memory, formatted in terms of action chunks, is contextually and sequentially activated in comprehending utterances. Also, the focus on speech perception rather than production does not help clarify the links between semantic and motor-related processes, such as action chunking, which implicate subcortical functions. Indeed, viewing speech on the action side may be more revealing of these links. This is evident when considering that neural oscillations, which entrain to structures of motor-speech, are an emergent property of thalamocortical interactions (Steriade et al., 1990) and that these interactions differ markedly for action and auditory systems. 11.2.1

Thalamocortical Interactions and the Integrating Role of the Motor Thalamus

Giraud and Poeppel (2012) showed that neural entrainment to structural aspects of heard utterances is reflected in periodic volleys of activity in neurons of Layers II, III, and IV of the auditory cortex. That this activity couples with activity in cortices related to motor speech is well attested and is assumed in conventional models (see Section 6.2). But ascending sensory input that projects to the primary and secondary auditory cortices arrives from the auditory thalamus (the medial geniculate nucleus; Jones, 2007). Input of motor speech, on the other hand, arrives from the motor thalamus (essentially from ventral nuclei). Compared to the cortical projections of the

228

The Processing of Speech Meaning

auditory thalamus, input from the motor thalamus reaches cortical Layers I and III-VI and projects widely across the prefrontal, premotor, and motor areas of the neocortex (Bosch-Bouju, Hyland, & Parr-Brownlie, 2013; Fang, Stepniewska, & Kaas, 2006; Nakamura, Sharott, & Magill, 2012). Research has shown that the motor thalamus also receives major afferent input from prominent motor-related regions, including the basal ganglia and the cerebellum. Using the nomenclature of Bosch-Bouju et al. (2013), zones of the motor thalamus receiving input from the basal ganglia (ventro-lateral anterior and ventro-medial territories) largely interconnect with prefrontal and premotor cortices. Cerebellar receiving zones (ventro-lateral posterior areas) mainly interconnect with the primary motor cortex (Bosch-Bouju et al., 2013; Nakamura et al., 2012; Sommer, 2003; for varying terminology, see Krack, Dostrovsky, Ilinsky et al., 2002). How this information is integrated via thalamocortical interactions is central to understanding the role of subcortical systems in action shaping and selection, which can extend to action sequences of speech. In fact, the synchronization of all cortical oscillations at various frequencies is dependent on reciprocal connections between the thalamus and the cortex, and many investigators view this synchronization as underlying a range of cognitive processes (e.g., Buzsaki, 2006; Rodriguez, George, Lachaux et al., 1999; Varela, Lachaux, Rodriguez et al., 2001). But it is important to note the specific pathways of these reciprocal connections. In reviewing research findings, Jones (2001, 2007) emphasizes that the majority of corticothalamic fibers arise from neurons in cortical Layer VI and return to the relay nucleus of the thalamus in which they lie. For example, Layer VI of the primary visual cortex returns corticothalamic fibers to the visual thalamus (the lateral geniculate nucleus). By comparison, fibers from neurons in Layer V can project to other thalamic nuclei, such as the pulvinar (as a secondary pathway of the visual cortex) or the anterior pulvinar and intralaminar nuclei (for the primary somatosensory area). For Jones (2001), the area-specific cortical projections of the thalamic relay nuclei suggest that they are unlikely to contribute to synchronizing activity across pools of cortical neurons. Consequently, investigators tend to refer to cortico–cortical pathways and nonspecific nuclei, such as the intralaminar nuclei, to account for synchronization. However, as the author remarks, a large proportion of intralaminar cells project to the striatum (Jones, 2001), which is significant given the role of the striatum in the valence coding of actions via the dopamine-regulating limbic system (Namburi et al., 2015). In light of these observations, the motor thalamus has a particular role in assimilating multiple inputs from of the basal ganglia and the cerebellum while interacting with motor and association cortices. Input from the basal ganglia is assumed to relate to the motivational context and the valence coding of actions for best outcomes. This entails a memory of sequential events (such as actions

Processes of Utterance Interpretation

229

that frequently occur in succession, or actions and their subsequent effects (Jin et al., 2014; Mattox et al., 2006; Shohamy et al., 2008). Input from the cerebellum to the motor thalamus provides proprioceptive information from the spinal cord and serves to monitor and optimize actions (D’Angelo, 2018; and see Section 6.4.3). At the same time, the motor thalamus receives forward models from prefrontal, premotor, and motor cortices. This integrated information in the thalamus is then “looped back” to the prefrontal, premotor, and motor regions so as to update models in context-efficient ways. For some, this looping entails cortical Layers V and VI (Bosch-Bouju et al., 2013). The preceding interconnections are highly relevant in understanding action coding, shaping, and context-based selection as it pertains to speech interpretation. As mentioned, zones of the motor thalamus that receive input from the basal ganglia mainly interconnect with prefrontal and premotor cortices. The valence coding of actions in the striatum implies functional links between the frontal association cortex, the dopamine-regulating limbic system, and the basal ganglia where memory (or types of memory) is involved. Clinical evidence has long supported this interpretation. In particular, it is known that Parkinson’s disease is associated with dysfunctions of the prefrontal cortex (Goldman-Rakic, 1994; Rogers, Sahakian, Hodges et al., 1998). A prevalent view is that a dopamine-based valence coding in the striatum, interacting with the prefrontal cortex, serves an “executive function,” and valence coding that applies to feedforward action models in relation to their outcomes operates via the hippocampus (Cools, Barker, Sahakian et al., 2001; Goldman-Rakic, 1994). It is widely recognized that the medial prefrontal cortex and the hippocampal formation in particular have a preeminent role in rapidly encoding and activating memories – and this may well extend to semantic memory in processing speech meaning (e.g., Depue, 2012; Dudai, 1989; Goldman-Rakic, 1994; Guise & Shapiro, 2017; Preston & Eichenbaum, 2013). But it is useful to note that, even though the hippocampus directly innervates the medial prefrontal cortex, studies using animal models have revealed that there are no return projections from the hippocampus to the prefrontal cortex, and only moderate projections involving parahippocampal and entorhinal cortices (Vertes, Hoover, Szigeti-Buck et al., 2007; Vertes, Linley, & Hoover, 2015). In fact, the links are largely indirect and mediated by the “limbic thalamus” or, more specifically, the nucleus reuniens, a portion of the ventral midline thalamus (Griffin, 2015; Vertes et al., 2015). The finding that there are indirect projections to the hippocampal formation via the nucleus reuniens bears pivotal implications. In particular, oscillations that arise from thalamocortical interactions underlie information flow across the thalamus and hippocampal formation at characteristic frequencies. With respect to the processing of speech meaning, the aforementioned finding suggests a perspective on the processes of a context-based tracking of actions mediated by the thalamus and

230

The Processing of Speech Meaning

implicating joint functions of the striatum, the medial prefrontal cortex, and the hippocampus. The process is further discussed in the following. 11.2.2

The Semantics of Utterances: The Analogy of Action Selection in Spatial Navigation

The evidence that the striatum operates jointly with the hippocampus largely draws from research using animal models and tasks involving maze-running that reflect a paradigm of goal-oriented spatial navigation. A body of work, reviewed in part by Mizumori, Puryear, and Martig (2009), shows contextbased activity in the striatum in preparation for goal-directed motions. It has also been established that lesioning of the striatum affects performance on tasks involving maze running (Devan & White, 1999; Eschenko & Mizumori, 2007; Penner & Mizumori, 2012; Schultz & Romo, 1992; see, however, Berke, Breck, & Eichenbaum, 2009 and cf. van der Meer, Johnson, SchmitzerTorbert et al., 2010). In short, despite some disagreement in the literature, the evidence is that both the striatum and the hippocampus participate in the encoding and retrieval of types of memory as observed in experience-based spatial navigation in animals (on the different types of memory, see van der Meer et al. 2010; on the view that the hippocampus is involved in encoding, retrieval, but not consolidation of memory as it occurs in the neocortex, see Ekstrom, Spiers, Bohbot et al., 2018, ch. 9). Although the aforementioned research does not readily extend to human communication, it is worth noting that processes of spatial navigation fulfill several functions that are required in the semantic processing of speech on-line, at least on a general level. For instance, both adaptive navigation and speech comprehension entail a processing of multimodal context information in activating relevant representations in memory. Both require a rapid updating of representations in a changing environment (as with rapidly unfolding speech signals) on route to a goal that involves motor-sensory sequences; and the goal or intent of actions bears on some valued outcome. Several authors have also remarked the similarities between experienced-based navigation and semantic processes, as pointed out by Eichenbaum (2017). Thus, O’Keefe and Nadel (1978) expressed the belief that, while the hippocampus in rodents is dedicated to mapping physical space, it evolved in humans as a function that maps “semantic space.” Similarly, Buzsáki and Moser (2013) suggest that, whereas the hippocampus and associated cortical areas originally evolved as a dedicated system of spatial navigation, it was co-opted in human evolution to support a broader role in mapping memories. These authors argue that the form of representations that first served to map out routes provided the basis for the evolution of an ability to remember the flow of events in episodic memory. They also suggest that the type of representations that elaborate mental maps or

Processes of Utterance Interpretation

231

spatial models evolved into an ability to represent structured semantics or schemas. There are, of course, essential limitations on relating the preceding speculations to semantic processes of speech. For one thing, it is unclear how specialized cell types in the hippocampusentorhinal circuit, which subserve spatial navigation, such as “place,” “grid,” and “border” cells, contribute to a semantic interpretation of speech (for historical reviews of research on cell types, see Moser, Rowland, & Moser, 2015, and on the hippocampus, Eichenbaum, Amaral, Buffalo et al., 2016). Still, place cells have processing attributes that are principally suited to a rapid context-based activation and updating of semantic representations, as would be required in speech comprehension. For instance, as documented by Moser et al. (2015), spiking activity in place cells arises when an animal enters a particular space (thus the term “space field” in designating firing patterns that occur in response to spatial features). In fact, there is an important debate on the types of features underlying activity patterns since it has been shown that place cells respond to non-spatial features such as odors, tactile input, and timing features (see references in Moser et al., 2015, and Mallory & Giocomo, 2018). In maze tasks, when an animal is sequentially placed in different environments, placecell activity carries over from one environment to the next. Spatial learning observed in terms of pools of cells can take minutes of navigational exploration before activity stabilizes, indicating the effect of context-based experience in forming representations of spatial features (e.g., Monaco, Rao, Roth et al., 2014). But this plasticity occurs rapidly. Some place cells show firing patterns that immediately stabilize as soon as an animal is placed in a new context, although these representations can develop further with experience (Leutgeb, Leutgeb, Barnes et al., 2005; Leutgeb, Leutgeb, Treves et al., 2005). Thus, representations of spaces appear to draw from pre-learned sets of contextindependent representations and are updated in terms of context-dependent plasticity (e.g., Dragoi & Tonegawa, 2013). These findings have been interpreted as suggesting that, in spatial navigation, the hippocampus or the hippocampus-entorhinal circuit may support episodic and semantic memory (e.g., Buzsáki & Moser, 2013) where context-based experiences in episodic memories can form and activate context-independent semantic memory. Although these findings relating to spatial navigation appear far removed from semantic and episodic memory in humans and human communication, it is important to note that some reports have demonstrated that the hippocampus links to a semantic processing of utterances. In particular, clinical reports indicate that individuals with lesions to the hippocampus present behavioral deficits in the context-based processing of speech and utterance comprehension (Duff & Brown-Schmidt, 2012, 2017; MacKay, Stewart, & Burke, 1998), although the same might be said of lesions to other structures. In referring to these reports, Piai, Anderson, Lin et al. (2016)

232

The Processing of Speech Meaning

proposed a novel paradigm by which to observe the role of the hippocampus in the on-line semantic processing of utterances. The paradigm involved an utterance-completion task where the final items were left out and visual prompts were used to cue the missing items. In this test, then, both unfolding utterance-internal context and some utteranceexternal information (pictures of objects) can activate semantic representations leading to the production of a missing lexeme. Importantly, different sets of utterances were presented that variably constrained the selection of the final item. In illustrating this condition, the authors provide the example of a trial utterance that constrained the choice of a final lexeme: She locked the door with the . . . (picture: key); and a trial utterance where the choice of a final item was less constrained: She walked in here with the . . . (picture: key). This condition was used to observe the effects of speech contexts where stronger or weaker semantic associations develop as utterance meaning unfolds over time. However, the authors did not describe their acoustic stimuli in terms of acoustic structures and marks that could potentially entrain neural oscillations. Activity in the hippocampus was monitored using bipolar electrodes and the authors reported greater power increases in a rather wide “theta” band (2–10 Hz) during the constrained utterances as compared to unconstrained contexts. On the other hand, the contexts did not influence naming times for the missing items (but then some view predictions of missing or forthcoming items as not being essential to an understanding of semantic processes; Huettig & Mani, 2016). The authors interpreted the data as suggesting that “theta-power increases are best explained by the active, ongoing relational processing of the incoming words to stored semantic knowledge” (Piai et al., 2016, p. 4). The experimental paradigm and spectral analysis of LFPs used by Piai et al. (2016) offer a rare glimpse into the on-line semantic processing of utterances in relation to the hippocampus. The analyses focused on increases in theta-band power, but did not include any measure of cross-frequency coupling in relation to gamma-band activity (or activity in other bands). Had the paradigm included visual context along with the utterances, one can only speculate on a potential phase coupling with high-frequency gamma oscillations (as in Canolty et al., 2006; Canolty & Knight, 2010; and, on modalityspecific coding of context information, see Section 10.7). Many interpret the “hippocampal theta” as supporting semantic or episodic memory in relation to processes of the prefrontal cortex, and several studies have reported phase relationships between action potentials in prefrontal regions and theta oscillations in the hippocampus (cf. the critical reviews of Hasselmo, 2005; Watrous et al., 2015). For instance, Lisman and Jensen and colleagues suggest a cortical multi-item buffering function for the hippocampus where interacting theta and gamma oscillations provide a hierarchical

Processes of Utterance Interpretation

233

organization of item representation (Jensen, Kaiser, & Lachaux, 2007; Jensen & Lisman, 1996, 1998; Jensen & Tesche, 2002; Lisman, 2010). However, there are important reservations on Lisman and Jensen’s latter view. Oscillations arise through thalamocortical interactions and, in the case of the hippocampus, as noted earlier, there is evidence that the links to prefrontal areas are indirectly mediated by the nucleus reuniens. Moreover, according to some reports, oscillations of the nucleus reuniens, which support a synchronization between the prefrontal cortex and the hippocampus, extend to the delta band (Roy, Svensson, Mazeh et al., 2017). This has been controversial in that authors have remarked that the delta-band oscillations may result from the confounding effects of respiratory rhythms in rodents while others argue that low-frequency oscillations nonetheless synchronize activity at a cortical level (Bagur & Benchenane, 2018; Kocsis, Pittman-Polletta, & Roy, 2018). The critical role of this mediating nucleus can be adduced from a recent investigation by Viena, Linley, and Vertes (2018) showing that a pharmacological inactivation of the nucleus reuniens impairs spatial memory in rodents (and for a lesion study, see Hembrook & Mair, 2011). As for oscillatory activity in the hippocampal formation, research using animal models have described inter-region synchronization, phase resets, and cross-frequency phase couplings in relation to oscillations in theta range, although reports show two functional frequencies at the edges of the theta band along with differences between rodents and primates on low 3 Hz oscillations (Watrous, Lee, Izadi et al., 2013). With respect to research involving humans, evidence for a “hippocampal theta” is not one-sided. Reports based on observations of LFPs often indicate stronger power increases in oscillations of 3 Hz or lower on memory tasks compared to high-end theta oscillations, and only a subset of reports involve memory of verbal material (Axmacher, Henseler, Jensen et al., 2010; Babiloni, Vecchio, Mirabella et al., 2009; Fell & Axmacher, 2011; Lega, Jacobs, & Kahana, 2012; Lin, Rugg, Das et al., 2017; Schack & Weiss, 2005; Watrous et al., 2013). In these latter studies of verbal memory, recall of word lists and Sternberg tasks are the preferred methods. As one example, in an extensive investigation by Lega et al. (2012), participants were required to memorize and then recall lists of lexemes by producing as many items as could be remembered. A spectral analysis of the LFPs in the hippocampus was applied for two consecutive epochs representing the memorization and the delayed-recall parts of the task. The results revealed, as in other reports, two distinct power increases in oscillations at about 3 Hz and 8 Hz. While power increases at both frequencies exhibited phase synchrony with oscillations in the medial temporal cortex, only the increases at 3 Hz aligned with successful memory encoding of verbal lists.

234

The Processing of Speech Meaning

Reports such as the preceding help clarify that verbal memory entails lowfrequency oscillations in the hippocampus. Yet it is inherently difficult to relate the results to speech processing, as in many investigations of verbal memory. One prevalent problem is that test material and descriptions of test material make it unclear how neural oscillations link to actual motor-sensory attributes of the stimuli. In fact, the practice that consists of correlating theta power with successful recall of words overlooks entirely the possibility that oscillations can entrain to speech structure. In examining the literature, it is as if presentations of verbal stimuli are seen to engage a memory of “words” without reference to sensory and action properties of verbalization. Contrary to this viewpoint, it has been established that neural oscillations selectively entrain to structures of motor speech. Acknowledging this is perhaps an essential step toward understanding the role of the hippocampus in the semantic processing of utterances, not by reference to words or text, but in terms of structured patterns of speech entailing motor-related functions that include those of subcortical systems. 11.2.3

Subcortical Mechanisms of Buffering and Context-Based Semantic Processing

It was mentioned that there is a consensus in the neuroscience literature on lexico-semantics that sensory information serving to ground semantic representations is widely distributed across cortical systems. The evidence reviewed in the previous sections, broadly interpreted, suggests that the hippocampal formation is central to the on-line processing of these representations, although few reports deal with the semantic processing of actual utterances. One emerging view is that the hippocampus, operating jointly with the prefrontal cortex, serves to rapidly activate and update semantic representations in memory as utterances unfold, and this is seen to support an incremental interpretation of verbal stimuli (see references to Duff & Brown-Schmidt, 2012, 2017; Piai et al., 2016). However, the hippocampus is not processing whole utterances all at once, and what exactly is being activated and updated, or the particular blocks on which the hippocampus operates, remains undefined. Moreover, how semantic representations are selected in context-appropriate fashion is not addressed by research on the hippocampus. Both questions are obviously central to the development of a working account of the mechanisms of speech comprehension, and both appear to entail subcortical functions. On the first question, it should be clear that utterances are not processed in terms of conventional units such as words or lexemes. Given that speech signals are ephemeral and require a memory of sound sequences, one has to presume that portions of utterances are buffered on-line in terms of a specifiable sensory frame that conforms to constraints on serial memory and the focus of

Processes of Utterance Interpretation

235

attention. Such a frame exists, as repeatedly mentioned in this book: a domaingeneral sensory chunking applies in processing and learning action sequences, including action sequences of speech. It was mentioned that this chunking has been linked to the striatum, seen as the locus of the regulation and formation of “preformed language segments” (Bohsali & Crosson, 2016; and Section 6.4.3). Yet few observations using LFPs confirm this thesis and the buffering function of chunking. Furthermore, some research suggests that another structure of the basal ganglia, the subthalamic nucleus, can act as an internal timer serving to initiate and terminate preformed action sequences (Beurrier, Garcia, Bioulac et al., 2002). The relevance of the subthalamic nucleus can be understood by recalling that the motor thalamus receives inputs from the basal ganglia and the cerebellum, and directs these inputs to separate cortical regions (Section 11.2.1). Input from the basal ganglia mainly projects to the prefrontal and premotor areas, while input from the cerebellum is principally directed to the motor cortex. The basal ganglia include the striatum, the globus pallidus (external and internal), the substantia nigra, and the subthalamic nucleus. In its circuits with the cortex and the thalamus, the basal ganglia support the selection and sequencing of motions (Alexander & Crutcher, 1990; Cousins & Grossman, 2017; Temel, Blokland, Steinbusch et al., 2005), although some have questioned whether this extends to the selection and sequencing of semantic items (Bohsali & Crosson, 2016; Crosson, Bejamin, & Levy, 2007). Briefly described, five circuits have been identified, each comprising distinct pathways designated as direct, indirect, and hyperdirect. Both the direct and indirect paths project to the striatum whereas the hyperdirect pathway presents scarce fibers that bypass the striatum and project to the subthalamic nucleus (DeLong & Wichmann, 2007; Jahanshahi, Obeso, Rothwell et al., 2015; Nambu, Tokuno, & Takada, 2002). According to Nambu et al. (2002), when a movement or sequence of movements is about to be initiated, a corollary signal via the hyperdirect (cortico–subthalamo–pallidal) pathway first inhibits large target areas of the thalamus and cerebral cortex that relate to a selected motor sequence and competing sequences. Then, another corollary signal, through the direct (cortico–striato–pallidal) pathway, disinhibits their target areas and releases only a selected motor program. Finally, a third corollary signal, possibly via the indirect pathway (following, in the authors’ terms, a cortico–striato–external pallido–subthalamo–internal pallidal route), inhibits target areas extensively. Through these sequential processes, only the selected motor program for an action block is initiated, executed, and terminated with a selected timing, while other competing programs are canceled. It is in the timing of these blocks that the subthalamic nucleus appears to play a role that extends to the processing of utterances. In a unique study, Watson and Montgomery (2006) recorded the LFPs in motor-sensory regions of the subthalamic nucleus while participants produced

236

The Processing of Speech Meaning

Figure 11.1 Analyses of LFPs in the subthalamic nucleus by Watson and Montgommery (2006, with permission). (A) Rows represent depth-related rasters of LFPs over time for a trial. (B) Histograms representing the sum of the activity across rasters at each time point. Note that line-up points at time 0 mark voice onsets first at the start of the utterance, then at Y (the start of lady), and then at L (the start of leans low).

the utterance The lonely lady leans low and seven unstructured syllables la-lala-. . . . The sequences contained repeated alveolar consonants in order to minimize electro-mechanical interference in the microelectrode recordings. Figure 11.1 provides an example of the analyses of LFPs recorded during the utterance. Watson and Montgomery reported that, before the production of utterances, the histograms of the LFPs showed a buildup of activity and rapid decreases at voice-onset points of groups. However, this occurred only for the utterances and not syllable repetitions. On syllable repetitions, there were no rapid decreases in activity as in Figure 11.1. The buildup that specifically occurred for the utterances was interpreted as an indication that the basal ganglia were involved in aspects of motor planning (cf. Graybiel, 1997; Temel et al., 2005). In fact, the differential responses on repetitive syllables and utterances also suggest that a buffering of activity via the subthalamic nucleus operates specifically when motor programs involve sequential order information, which is the case for utterances but not redundant syllables. Importantly, the decreases in activity occurred at points reflecting blocks of action. While the authors interpreted the timing of decreases as reflecting “syntax,” the changes did not align with word units or phrases but appeared to occur in terms of

Processes of Utterance Interpretation

237

temporal chunks, as in The lonely/ lady/ leans low, where “/” presents the points of decreases in activity corresponding to a block of sequential motions. Although the preceding interpretations may be premature and further research is still required to define the link with chunking functions of the striatum (as suggested in the work of Graybiel and others), the observations by Watson and Montgomery (2006) minimally indicate that input programs from the cortico–basal ganglia circuits to the motor thalamus are formatted in blocks of action reflecting temporal chunks of speech. It may also be surmised that the basal ganglia, working jointly with the thalamus and the cortex, and considering their major projections to the frontal cortex, contribute to the selection of chunks of speech action and the semantic representations that bind to these chunks (compare, e.g., Cousins & Grossman, 2017, and Bohsali & Crosson, 2016). As for input from the cerebellum to the motor thalamus and its role, as with the basal ganglia, there is evidence that the cerebellum can also support the selection of verbal items. As discussed by Moberget and Ivry (2016; see also Sokolov, Miall, & Ivry, 2017), research has shown that applying TMS to areas near the cerebellum can sporadically disrupt performance on various lexicosemantic tasks. For instance, Argyropoulos (2011) showed that lateral cerebellar TMS selectively affects associative priming of lexemes based on sequential probabilities (e.g., pigeon-HOLE), but does not affect priming based on semantic features (e.g., penny-COIN). Another study by Lesage, Morgan, Olson et al. (2012) used a task where presented sentences variably predicted a following picture item. Repetitive TMS slowed gaze responses, specifically on predictable picture items. However, it remains difficult to circumscribe the roles of the cerebellum and basal ganglia in the semantic processing of utterances. Despite the evidence using TMS, case reports (discussed in Section 6.4.3) reveal that spoken language can develop in individuals born without a cerebellum (although with considerable developmental delays). On the other hand, a body of clinical findings makes it clear that cerebellar dysfunctions associate with a general “dysmetria” in several behaviors extending to the semantic processing of verbal expression. This dysmetria at a motor level appears as an inability to precisely gage the sensory effects of motion, as well as in a tendency to undershoot or overshoot motion amplitudes and velocity (or a difficulty in producing fine gradations in motions, as illustrated in Figure 7.10; and see Ivry & Diener, 1991). A general dysmetria is also reflected in an inability to appropriately gage the subtle meanings of verbal expressions and the intensity of emotional responses (Guell et al., 2015; Koziol et al., 2014; Murdoch & Whelan, 2009; Schmahmann, 2004). On the role of the basal ganglia, there is substantial clinical evidence that these nuclei support the selection of motor and verbal responses, which is perhaps epitomized by Tourette’s syndrome where

238

The Processing of Speech Meaning

dysfunctional inhibitory functions of the ganglia underlie impulsive tics – including verbal tics – both of which reflect errorful selections of motions or semantic items that are inappropriate or out of context (see also Houk, Bastianen, Fansler et al., 2007). These analogous effects on both semantic and motor aspects support the preceding view of the role of basal ganglia circuits in selecting actions, including actions that bind with semantic attributes. In fact, the brain may not distinguish between general actions and actions used to modulate air in communication (or “speech acts”; Koziol et al., 2014; Koziol et al., 2012). Models of subcortical functions are continuously being updated with findings of new connectivity between neurological systems. For instance, it has been revealed that the cerebellum links directly to the basal ganglia (Bostan & Strick, 2018), which complicates a delimitation of the distinct roles of these systems. Recent research using tractographic techniques have also revealed bidirectional connections between the human cerebellum and the hippocampus (Arrigo, Mormina, Anastasi et al., 2014), in addition to well-attested connections via the thalamus (Yu & Krook-Magnuson, 2015). This connectivity suggests a network that can subserve action shaping, action selection, and reinforcement learning of context-related effects of actions (as also suggested by Butcher et al., 2017; McDougle et al., 2016; Sokolov et al., 2017). In line with such findings, clinical reports show that individuals with benign cerebellar tumors present a decreased ability on both verbal working memory and episodic memory, two functions traditionally associated with the hippocampal formation and neighboring structures (Chen & Desmond, 2005; Marvel & Desmond, 2010; Shiroma, Nishimura, Nagamine et al., 2016). Yet it is essential to bear in mind that lesions in the basal ganglia or the cerebellum do not lead to the type of deficits in semantic selection and “wordfinding” problems that are manifest in thalamic aphasia (Bohsali & Crosson, 2016). This is at once revealing of the integration or “super-ordinate function” of the thalamus, and the motor thalamus in particular (Murdoch & Whelan, 2009). Evidence of the role of the motor thalamus in language processing comes largely from studies of isolated infarcts and thalamotomy. Much of this evidence broadly confirms that lesions in the thalamus associate with lexical-semantic deficits and aphasic-like disorders while the ability to repeat verbal items is only minimally impaired (Crosson, 2013). However, observations of thalamic aphasia are often limited to conventional assessment batteries and lexico-semantic tasks, which offer little information on the semantic processing of speech.

Epilogue

One can surmise from the preceding chapters that, in research on the meaning of spoken language, there is the dominant assumption that semantic attributes link to notions of words. There is no working definition of these units, no consistent marks serving to divide them in speech, and they are only consistently recognized by people who have learned a culture-specific orthographic code. While some may see the failure to isolate word units in speech as irrelevant and that they simply reflect “thoughts,” it is difficult to even imagine how words could be represented without syllables, without some reference to articulate language. Yet even in studies of “embodied” semantics, there is the idea that sensory experiences ground meaning in terms of discarnate word-like units given to all speakers. The fact that not all languages have words and that such concepts link to orthographic conventions brings into view the problem of whether research can continue to examine the biological processes of spoken language using cultural constructs of writing. In acknowledging this basic problem, investigators face the need to define structures of verbal expression and units of semantic processing that appear across spoken languages. The preceeding discussion of findings offers a perspective on these structures and how they emerge from circumscribable physiological processes. These include syllable-size cycles, sensory chunks, and breath units of speech. As developed in this book, these structures can underlie the rise of symbolic language, they can serve to understand invariant temporal features, the domaingeneral process of chunking that underpins the formation of formulas and semantic schemas, the complexification of utterances that accompanies the growth of speech breathing, and the sensory frames to which semantic memory binds. But more generally, because these structures relate to observable attributes of signals, they can address the prevailing problem of the ontological incommensurability between instrumental observations and conventional language analysis. In the present monograph, this ontological problem was traced back to the tradition, instituted in the nineteenth century, of viewing spoken language through script. It can be argued that analysts during this period had little choice but to use writing. Nonetheless, the choice of an alphabet system in describing 239

240

Epilogue

spoken language had major consequences, as documented in this book. In conceptualizing language with this writing code, units like letters and words, along with categories drawn from Latin grammar, took on a theoretical status. Letters became phonemes, words became syntactic constituents of phrases and sentences, categories at one point became substantive universals of a “language organ,” and so forth. In viewing language through these concepts, there seemed to be a mental ability to constitute and combine given units much as in a grammar of writing. As this ability appeared wholly detached from mechanisms of speech, or so it seemed from examining transcripts, some language theorists in the nineteenth century concluded that it reflected a faculty located in cortical regions. Such claims were not uncommon at the time, but this too had major consequences. In commenting on the history of neuroscience, Parvizi (2009) noted that the tendency to focus on the cortex draws from the cultural climate of the Victorian era, which is reflected in the works of Spencer (1820–1903), Hughlings-Jackson (1835–1911), and especially Sherrington (1857–1952) whose observations of decerabrated cats bolstered the idea that the cortex governs “instinctive” subcortical functions (Sherrington, 1909). According to Parvizi (2009), cortico-centric views still influence cognitive neuroscience as seen in a tendency to overlook the reciprocal links between cortical and subcortical systems and the failure to acknowledge that there is no top-down hierarchy but only an interlinked network of loops (pp. 356–357). But corticocentric assumptions were cited by early language theorists as support for a language faculty, and this mentalist view was dominant in sectors of language study when models in the 50s and 60s adopted computer analogies. At that point, the notion of a faculty took the form of an inborn competence to manipulate strings of symbols based on conventional language analysis that, again, suggested no apparent links to motor speech. The common thread linking these developments is a concept of language erected on the tradition of analyzing utterances as sentences on a page, a method that does not capture the properties of articulate language. Historically, formal theories based on this method overlooked constraints on the speech medium that inherently shape structures of spoken-language processing. Experimentalists who adopted the formal theories tended to discount basic observations invalidating notions of letter- and word-like units at the center of these theories. In hindsight, it is difficult to explain this oversight other than by reference to the influence of a nineteenth-century concept of language located in a hierarchical brain. Such a concept may have contributed to the belief that linguistic assumptions have some theoretic primacy over instrumental observations. Yet, as emphasized in this book, one finds no justification in the literature supporting the idea that empirical observations need to be guided by writing-induced concepts of linguistic theory and analysis. Stetson (1928/1951) was perhaps the first to

Epilogue

241

demonstrate the inadequacy of these concepts and to warn the research community: “The series of characters which we read and write as representing an articulate language have given us a mistaken notion of the units which we utter and which we hear” (p. 137). Several decades later, however, observations of speech structure, including records of neural responses to these structures, can be overlooked when they do not match orthographic constructs. Indeed, some authors maintain that researchers have no choice but to refer to linguistic concepts in developing an understanding of the processes of spoken language, which is not the case. To clarify this last point, the fundamental problem with writing-induced concepts of language analyses was outlined in the Introduction in terms of their ontological incommensurability with instrumental records of spoken language at different levels of observation. The present monograph addresses this basic problem by providing a perspective on the processing of speech acts where context information has a fundamental role in constituting a semantic memory of verbal expressions. At the base of this perspective is the thesis that semantic and episodic memory binds to observable structures of motor speech and not to writing-induced concepts of linguistic description. In this approach, the identification of structures that are critical to the understanding of speech processes is data-driven, and not driven by orthographic conventions of language analysis. These two approaches differ, epistemologically, with respect to the role of instrumental records in defining structures of spoken language. The difference can be summarized by illustrating how one may interpret observations such as those displayed in Figure E.1. At the bottom of the figure (C), there is an example of an acoustic waveform for a speech stimulus along with the overlaid mean intensity curve of fifty stimuli with similar patterns. At the top of the figure (A) is the averaged EEG response of eighteen participants as they listened to the stimuli. In the middle (B) is an IPA transcript for the example stimulus (and all fifty stimuli had similar classes of items and syntax). This representation is meant to reflect assumptions of conventional language analysis where the spaces denote word divisions. Such analysis can proceed with other assumptions depending on one’s choice of language theory. For instance, one can analyze the letters as consonants and vowels, the space-divided words in terms of the categories of noun, verb, preposition (etc.), and further infer hierarchical embeddings of letters and words, or probabilistic chains of these “constituents,” and so forth. All these analyses rest on the identification of putative units and categories, which presupposes that the observer knows an orthographic code. But it should also be recognized that transcripts as such do not represent structures of articulated sounds. They do not represent syllables, rhythms, tempo, and other such attributes of heard speech (as captured in C) but rather assumptions about what has been termed “interim” units of language processing and

242

Epilogue

Figure E.1 An illustration of the relationship between neural responses, assumed “interim” units of language analysis, and speech stimuli. (A) Averaged EEG responses at Cz of 18 listeners during presentations of 50 utterances. (C) Mean energy contour of presented utterances along with an example of the waveform of a trial. (B) IPA transcription of the example trial. Note that similar responses were obtained with presented strings of nonsense syllables bearing the same rhythms as those of the utterance stimuli (original data from Gilbert et al., 2015b).

analysis. The problem of linking observations of signals to these interim units has a long history dating back to the first instruments used to observe speech. As documented in this book, early instrumentalists who used devices like the kymograph could view a waveform such as the one in C. Many reported that the data were not structured as in B. Current techniques used to record brain activity provide waveforms as the one in A, and such responses to speech have been available at least since the 80s (Beres, 2017). In examining these responses, one immediate conclusion is that they are not structured as in B: there are no patterns corresponding to letter-like units, and rises do not match syntactic constituents of language analysis. (In fact, response patterns as in A can be had with nonsense syllables that have no syntactic units.) Discounting both sets of records, A and C, as being irrelevant in view of other evidence, for example that of transcribed speech errors, which suggest the existence of letteror word-like entities, brings forth several epistemological questions. For instance, on what basis or by what criterion can researchers decide which types of data are relevant to spoken-language processing? The belief that observations need to be driven by assumed units such as those of conventional

Epilogue

243

language analyses, is one answer. Another answer is driven largely by the observations of structural attributes in C, and similar patterns in A and C. For example, the complex waveform in A shows that, beyond early auditory responses, large deflections match temporal delays marking groups in C, and minute changes within some groups suggest responses to syllable-size cycles in energy. In fact, research in the last decade has established that a spectral analysis of neural responses across trials using Fourier or Wavelet transforms and filtering operations can reveal a frequency-specific alignment of neural waves to syllable cycles and temporal groups. Investigators have interpreted this entrainment as defining sensory frames of utterance processing. But this does not conform to notions of interim units, as represented in B. The question that arises is whether researchers need to devise “processing models” that convert C to B and A to B to accommodate interim units of conventional language analysis, or whether a view on the alignment of A to C is not better suited to the development of an understanding of how utterance structures are processed. Currently, the latter approach on matching structures of speech and neural responses is rarely applied without referring to notions of letters and words. Such assumptions carry a perennial paradox on how communication systems can arise on the basis of units that are not in signals. Researchers who seek a compromise on some conversion process that maintains these assumptions at some “higher level” face basic questions of how a mechanism that converts signals (as in C) into letter- or word-like packets (as in B) came about, or why such a conversion was needed. Pursuing this viewpoint by claiming an innate propensity to recognize letter- and word-like entities as in European-style orthographic systems also faces repeated complaints of cultural centrism and writing-related bias. There is the need to address these ongoing complaints and problems. Hopefully, the perspective on the study of speech processes presented in this book can assist critical consumers of research in devising proposals that address the writing bias so as to move forward in developing a language science.

References

Abbs, J. H. (1973). Some mechanical properties of lower lip movement during speech production. Phonetica, 28, 65–75. doi:10.1159/000259446 Abbs, J. H. (ed.) (1996). Mechanisms of speech motor execution and control. St. Louis, MS: Mosby. Abbs, J. H., & Eilenberg, G. R. (1976). Peripheral mechanisms of speech motor control. In N. J. Lass (ed.), Contemporary issues in experimental phonetics (pp. 139–168). New York, NY: Academic Press. Abercrombie, D. (1967). Elements of general phonetics. Edinburgh, UK: Edinburgh University Press. Ackermann, H., Hage, S. R., & Ziegler, W. (2014). Brain mechanisms of acoustic communication in humans and nonhuman primates: An evolutionary perspective. Behavioral and Brain Sciences, 37, 529–546. doi:10.1017/S0140525X13003099 Adesnik, H., & Naka, A. (2018). Cracking the function of layers in the sensory cortex. Neuron, 100, 1028–1043. doi:10.1016/j.neuron.2018.10.032 Agostoni, E., & Hyatt, R. E. (1986). Static behavior of the respiratory system. In A. P. Fishman, P. T. Macklem, J. Mead, & S. R. Geiger (eds.), Handbook of physiology: The respiratory system (Vol. 3, section 3, pp. 113–130). Bethesda, MD: American Physiological Society. Aitchison, J. (2000). The seeds of speech: Language origin and evolution. Cambridge, UK: Cambridge University Press. Akhtar, N., & Tomasello, M. (2000). The social nature of words and word learning. In R. M. Golinkoff, K. Hirsh-Pasek, L. Bloom, & L. B. Smith (eds.), Becoming a word learner: A debate on lexical acquisition (pp. 115–135). New York, NY: Oxford University Press. Akmajian, A., Demers, R. A., Farmer, A. K., et al. (2010). Linguistics: An introduction to language and communication (6th ed.). Cambridge, MA: MIT Press. Alexander, G. E., & Crutcher, M. D. (1990). Functional architecture of basal ganglia circuits: Neural substrates of parallel processing. Trends in Neurosciences, 13, 266–271. doi:10.1016/0166-2236(90)90107-L Allen, C. (1996/2009). Teleological notions in biology. In E. N. Zalta (ed.), The Stanford encyclopedia of philosophy. Stanford, CA: Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/archives/win2009/entries/teleologybiology/ Allwood, J., Hendrikse, A. P., & Ahlsén, E. (2010). Words and alternative basic units for linguistic analysis. In P. J. Henrichsen (ed.), Linguistic theory and raw sound (pp. 9–26). Copenhagen, DK: Samfundslitteratur. 244

References

245

Ambridge, B. (2017). Syntactic categories in child language acquisition. In H. Cohen & C. Lefebvre (eds.), Handbook of categorization in cognitive science (pp. 567–580). San Diego, CA: Elsevier. Ambridge, B., & Lieven, E. V. M. (2011). Child language acquisition: Contrasting theoretical approaches. Cambridge, UK: Cambridge University Press. Ambridge, B., & Lieven, E. V. M. (2015). A constructivist account of child language acquisition. In B. Macwhinney & W. O’Grady (eds.), The handbook of language emergence (pp. 478–503). Hoboken, NJ: Wiley. Amerman, J. D., Daniloff, R., & Moll, K. L. (1970). Lip and jaw coarticulation for the phoneme/æ/. Journal of Speech, Language, and Hearing Research, 13, 147–161. Anderson, S. R. (1981). Why phonology isn’t “natural.” Linguistic Inquiry, 12, 493–539. Anderson, S. R. (1985). Phonology in the twentieth century. Chicago, IL: Chicago University Press. Anderson, S. R., & Lightfoot, D. W. (2002). The language organ: Linguistics as cognitive physiology. Cambridge UK: Cambridge University Press. Arbib, M. A. (2005). From monkey-like action recognition to human language: An evolutionary framework for neurolinguistics. Behavioral and Brain Sciences, 28, 105–124. doi:10.1017/S0140525X05000038 Arbib, M. A. (2011). From mirror neurons to complex imitation in the evolution of language and tool use. Annual Review of Anthropology, 40, 257–273. doi:10.1146/ annurev-anthro-081309-145722 Arbib, M. A. (2012). How the brain got language: The mirror system hypothesis. Oxford, UK: Oxford University Press. Arbib, M. A. (2015). Language evolution. In B. MacWhinney & W. O’Grady (eds.), The handbook of language emergence (pp. 600–623). Hoboken, NJ: Wiley. Arbib, M. A., Liebal, K., & Pika, S. (2008). Primate vocalization, gesture, and the evolution of human language. Current Anthropology, 49, 1053–1076. doi:10.1086/ 593015 Archibald, L. M., & Gathercole, S. E. (2007). Nonword repetition and serial recall: Equivalent measures of verbal short-term memory? Applied Psycholinguistics, 28, 587–606. doi:10.1017/S0142716407070324 Argyropoulos, G. P. (2011). Cerebellar theta-burst stimulation selectively enhances lexical associative priming. The Cerebellum, 10, 540–550. doi:10.1007/s12311011-0269-y Arlman-Rupp, A. J. L., van Niekerk de Haan, D., & van de Sandt-Koenderman, M. (1976). Brown’s early stages: Some evidence from Dutch. Journal of Child Language, 3, 267–274. doi:10.1017/S0305000900001483 Aronoff, M. (1992). Segmentalism in linguistics: The alphabetic basis of phonological theory. In P. Downing, S. D. Lima, & M. Noonan (eds.), The linguistics of literacy (pp. 71–82). Philadelphia, PA: John Benjamins. Arrigo, A., Mormina, E., Anastasi, G. P., et al. (2014). Constrained spherical deconvolution analysis of the limbic network in human, with emphasis on a direct cerebello-limbic pathway. Frontiers in Human Neuroscience, 8. doi:10.3389/ fnhum.2014.00987 Arrigoni, F., Romaniello, R., Nordio, A., et al. (2015). Learning to live without the cerebellum. NeuroReport, 26, 809–813. doi:10.1097/WNR.0000000000000428

246

References

Aslin, R. N. (1993). Segmentation of fluent speech into words: Learning models and the role of maternal input. In B. Boysson-Bardies, S. Schonen, P. W. Jusczyk, P. McNeilage, & J. Morton (eds.), Developmental neurocognition: Speech and face processing in the first year of life (Vol. 69, pp. 305–315). Boston, MA: Kluwer Academic. Aslin, R. N., Saffran, J. R., & Newport, E. L. (1998). Computation of conditional probability statistics by 8-month-old infants. Psychological Science, 9, 321–324. doi:10.1111/1467-9280.00063 Auroux, S. (ed.) (2000). Histoire des idées linguistiques (Tome 3). Paris, FR: Mardaga. Axmacher, N., Henseler, M. M., Jensen, O., et al. (2010). Cross-frequency coupling supports multi-item working memory in the human hippocampus. Proceedings of the National Academy of Sciences, 107, 3228–3233. doi:10.1073/pnas.0911531107 Aziz-Zadeh, L., Koski, L., Zaidel, E., et al. (2006). Lateralization of the human mirror neuron system. The Journal of Neuroscience, 26, 2964–2970. doi:10.1523/jneurosci.2921-05.2006 Babiloni, C., Vecchio, F., Mirabella, G., et al. (2009). Hippocampal, amygdala, and neocortical synchronization of theta rhythms is related to an immediate recall during rey auditory verbal learning test. Human Brain Mapping, 30, 2077–2089. doi:10.1002/hbm.20648 Bagur, S., & Benchenane, K. (2018). Taming the oscillatory zoo in the hippocampus and neo-cortex: A review of the commentary of Lockmann and Tort on Roy et al. Brain Structure and Function, 223, 5–9. doi:10.1007/s00429-017-1569-x Bak, T. H., O’Donovan, D. G., Xuereb, J. H., et al. (2001). Selective impairment of verb processing associated with pathological changes in Brodmann areas 44 and 45 in the motor neurone disease–dementia–aphasia syndrome. Brain, 124, 103–120. doi:10.1093/brain/124.1.103 Baken, R. J., & Orlikoff, R. F. (2000). Clinical measurement of speech and voice. San Diego, CA: Singular. Baker, M. C. (2009). Language universals: Abstract but not mythological. Behavioral and Brain Sciences, 32, 448–449. doi:10.1017/S0140525X09990604 Bally, C. (1950). Linguistique générale et linguistique française. Berne, CH: Francke. Barlow, S. M., & Andreatta, R. D. (1999). Handbook of clinical speech physiology. San Diego, CA: Singular. Baroni, A. (2011). Alphabetic vs. non-alphabetic writing: Linguistic fit and natural tendencies. Italian Journal of Linguistics/Rivista di Linguistica, 23, 127–159. Barsalou, L. W. (1999). Perceptions of perceptual symbols. Behavioral and Brain Sciences, 22, 637–660. doi:10.1017/s0140525x99532147 Barsalou, L. W. (2008). Grounded cognition. Annual Review of Psychology, 59, 617–645. doi:10.1146/annurev.psych.59.103006.093639 Barsalou, L. W. (2009). Simulation, situated conceptualization, and prediction. Philosophical Transactions of the Royal Society B: Biological Sciences, 364, 1281–1289. doi:10.1098/rstb.2008.0319 Barsalou, L. W. (2015). Situated conceptualization: Theory and applications. In Y. Coello & M. H. Fischer (eds.), Perceptual and emotional embodiment (pp. 19–45). London, UK: Routledge. Barsalou, L. W., Santos, A., Simmons, W. K., et al. (2008). Language and simulation in conceptual processing. In M. De Vega, A. M. Glenberg, & A. Graesser (eds.),

References

247

Symbols, embodiment, and meaning (pp. 245–283). Oxford, UK: Oxford University Press. Barutchu, A., Crewther, D. P., & Crewther, S. G. (2009). The race that precedes coactivation: Development of multisensory facilitation in children. Developmental Science, 12, 464–473. doi:10.1111/j.1467-7687.2008.00782.x Bates, E., Chen, S., Li, P., et al. (1993). Where is the boundary between compounds and phrases in Chinese? A reply to Shou et al. Brain and Language, 45, 94–107. doi:10.1006/brln.1993.1036 Bates, E., Chen, S., Tzeng, O., et al. (1991). The noun-verb problem in Chinese aphasia. Brain and Language, 41, 203–233. doi:10.1016/0093-934x(91)90153-r Bates, E., & Goodman, J. C. (2001). On the inseparability of grammar and the lexicon: Evidence from acquisition. In M. Tomasello & E. Bates (eds.), Language development: The essential readings (pp. 134–162). Malden, MA: Blackwell. Bathnagar, S. C. (2012). Neuroscience for the study of communicative disorders. Philadelphia, PA: Wolters Kluwer; Lippincott Williams & Wilkins. Bear, M. F., Connors, B. W., & Paradiso, A. P. (2016). Neuroscience: Exploring the brain (4th ed.). Philadelphia, PA: Lippincott. Beaugrande, R., & Dressler, W. (1981). Introduction to text linguistics. London, UK: Longman. Beck, D. (1999). Words and prosodic phrasing in Lushootseed narrative. In T. A. Hall & U. Kleinhentz (eds.), Amsterdam studies in the theory and history of linguistic science (Series 4, pp. 23–46). Amstedam, NL: John Benjamins. Beckner, C., Ellis, N. C., Blythe, R., et al. (2009). Language is a complex adaptive system: Position paper. Language Learning, 59, 1–26. doi:10.1111/j.14679922.2009.00533.x Behme, C., & Deacon, S. H. (2008). Language learning in infancy: Does the empirical evidence support a domain specific language acquisition device? Philosophical Psychology, 21, 641–671. doi:10.1080/09515080802412321 Behrens, H. (2006). The input–output relationship in first language acquisition. Language and Cognitive Proceses, 21, 2–24. doi:10.1080/01690960400001721 Belin, P. (2006). Voice processing in human and non-human primates. Philosophical Transactions of the Royal Society B: Biological Sciences, 361, 2091–2107. doi:10.1098/rstb.2006.1933 Bellugi, U., Poizner, H., & Klima, E. S. (1989). Language, modality and the brain. Trends in Neurosciences, 12, 380–388. doi:10.1016/0166-2236(89)90076-3 Bencini, G. M. L. (2017). Speech errors as a window on language and thought: A cognitive science perspective. Altre Modernità. doi:10.13130/2035-7680/8316 Beres, A. M. (2017). Time is of the essence: A review of electroencephalography (EEG) and event-related brain potentials (ERPs) in language research. Applied Psychophysiology and Biofeedback, 42, 247–255. doi:10.1007/s10484-0179371-3 Berg, T. (2006). A structural account of phonological paraphasias. Brain and Language, 96, 331–356. doi:10.1016/j.bandl.2006.01.005 Berke, J. D., Breck, J. T., & Eichenbaum, H. (2009). Striatal versus hippocampal representations during win-stay maze performance. Journal of Neurophysiology, 101, 1575–1587. doi:10.1152/jn.91106.2008

248

References

Bertelson, P., & De Gelder, B. (1991). The emergence of phonological awareness: Comparative approaches. In I. G. Mattingly & M. Studdert-Kennedy (eds.), Modularity and the motor theory of speech perception (pp. 393–412). Hillsdale, NJ: Lawrence Erlbaum. Bertelson, P., Gelder, B., Tfouni, L. V., et al. (1989). Metaphonological abilities of adult illiterates: New evidence of heterogeneity. European Journal of Cognitive Psychology, 1, 239–250. doi:10.1080/09541448908403083 Berthoud-Papandropoulou, I. (1978). An experimental study of children’s ideas about language. In A. Sinclair, R. J. Jarvella, & W. J. M. Levelt (eds.), The child’s conception of language (pp. 55–64). Berlin, DE: Springer. Berwick, R. C., & Chomsky, N. (2015). Why only us: Language and evolution. Cambridge, MA: MIT Press. Berwick, R. C., Chomsky, N., & Piattelli-Palmarini, M. (2013). Poverty of the stimulus stands: Why recent challenges fail. In M. Piattelli-Palmarini & R. C. Berwick (eds.), Rich languages from poor inputs (pp. 19–42). Oxford, UK: Oxford University Press. Best, C. T. (1993). Emergence of language-specific constraints in perception of nonnative speech: A window on early phonological development. In B. d. BoyssonBardies, S. d. Schonen, P. W. Jusczyk, P. McNeilage, & J. Morton (eds.), Developmental neurocognition: Speech and face processing in the first year of life (Vol. 69, pp. 289–304). Boston, MA: Kluwer Academic. Best, C. C., & McRoberts, G. W. (2003). Infant perception of non-native consonant contrasts that adults assimilate in different ways. Language and Speech, 46, 183–216. doi:10.1177/00238309030460020701 Beurrier, C., Garcia, L., Bioulac, B., et al. (2002). Subthalamic nucleus: A clock inside basal ganglia? Thalamus & Related Systems, 2, 1–8. doi:10.1016/S1472-9288(02) 00033-X Binder, J. R., & Desai, R. H. (2011). The neurobiology of semantic memory. TRENDS in Cognitive Sciences, 15, 527–536. doi:10.1016/j.tics.2011.10.001 Binder, J. R., Desai, R. H., Graves, W. W., et al. (2009). Where is the semantic system? A critical review and meta-analysis of 120 functional neuroimaging studies. Cerebral Cortex, 19, 2767–2796. doi:10.1093/cercor/bhp055 Bishop, D. V. M. (1983). Linguistic impairment after left hemidecortication for infantile hemiplegia? A reappraisal. Quarterly Journal of Experimental Psychology, 35A, 199–207. doi:10.1080/14640748308402125 Blake, J., Quartaro, G., & Onorati, S. (1993). Evaluating quantitative measures of grammatical complexity in spontaneous speech samples. Journal of Child Language, 20, 139–152. doi:10.1017/S0305000900009168 Blakemore, S.-J., Frith, C. D., & Wolpert, D. M. (2001). The cerebellum is involved in predicting the sensory consequences of action. NeuroReport, 12, 1879–1884. doi:10.1097/00001756-200107030-00023 Bloomfield, L. (1933). Language. New York, NY: Holt, Rinehart & Wiston. Bohsali, A., & Crosson, B. (2016). The basal ganglia and language: A tale of two loops. In J.-J. Soghomonian (ed.), The basal ganglia (pp. 217–242). Switzerland, CH: Springer. Bolhuis, J., J., Tattersal, I., Chomsky, N., et al. (2012). How could language have evolved? PLoS Biol, 12, e1001934. doi:10.1371/journal.pbio.1001934

References

249

Boliek, C. A., Hixon, T. J., Watson, P. J., et al. (2009). Refinement of speech breathing in healthy 4- to 6-year-old children. Journal of Speech, Language, and Hearing Research, 52, 990–1007. doi:10.1044/1092-4388(2009/07-0214) Boomer, D. S., & Laver, J. D. M. (1968). Slips of the tongue. British Journal of Disorders of Communication, 3, 2–12. doi:10.3109/13682826809011435 Borges, A. F. T., Giraud, A.-L., Mansvelder, H. D., et al. (2017). Scale-free amplitude modulation of neuronal oscillations tracks comprehension of accelerated speech. The Journal of Neuroscience, 38, 710–722. doi:10.1523/jneurosci.1515-17.2017 Bosch-Bouju, C., Hyland, B., & Parr-Brownlie, L. (2013). Motor thalamus integration of cortical, cerebellar and basal ganglia information: Implications for normal and parkinsonian conditions. Frontiers in Computational Neuroscience, 7. doi:10.3389/ fncom.2013.00163 Bostan, A. C., & Strick, P. L. (2018). The basal ganglia and the cerebellum: Nodes in an integrated network. Nature Reviews Neuroscience, 19, 338–350. doi:10.1038/ s41583-018-0002-7 Botha, R. P. (1979a). External evidence in the validation of mentalistic theories: A Chomskyan paradox. Lingua, 48, 299–328. doi:10.1016/0024-3841(79) 90055-X Botha, R. P. (1979b). Methodological bases of a progressive mentalism. Stellenbosch Papers in Linguistics, 3, 1–115. doi:10.5774/3-0-121 Bouchard, D. (2015). Brain readiness and the nature of language. Frontiers in Psychology, 6. doi:10.3389/fpsyg.2015.01376 Boucher, V. J. (1994). Alphabet-related biases in psycholinguistic enquiries: Considerations for direct theories of speech production and perception. Journal of Phonetics, 22, 1–18. doi:0.1016/S0095-4470(19)30264-5 Boucher, V. J. (2002). Timing relations in speech and the identification of voice-onset times: A stable perceptual boundary for voicing categories across speaking rates. Perception & Psychophysics, 64, 121–130. doi:10.3758/BF03194561 Boucher, V. J. (2006). On the function of stress rhythms in speech: Evidence of a link with grouping effects on serial memory. Language and Speech, 49, 495–519. doi:10.1177/00238309060490040301 Boucher, V. J. (2008). Intrinsic factors of cyclical motion in speech articulators: Reappraising postulates of serial-ordering in motor-control theories. Journal of Phonetics, 36, 295–307. doi:10.1016/j.wocn.2007.06.002 Boucher, V. J., & Ayad, T. (2010). Physiological attributes of vocal fatigue and their acoustic effects: A synthesis of findings for a criterion-based prevention of acquired voice disorders. Journal of Voice, 24, 324–336. doi:10.1016/j.jvoice.2008.10.001 Boucher, V. J., Gilbert, A. C., & Jemel, B. (2019). The role of low-frequency neural oscillations in speech processing: Revisiting delta entrainment. Journal of Cognitive Neuroscience, 31, 1205–1215. doi:10.1162/jocn_a_01410 %M 30990387 Boucher, V. J., Gilbert, A. C., & Rossier-Bisaillon, A. (2018). The structural effects of modality on the rise of symbolic language: A rebuttal of evolutionary accounts and a laboratory demonstration. Frontiers in Psychology, 9. doi:10.3389/ fpsyg.2018.02300 Boucher, V. J., & Lalonde, B. (2015). Effects of the growth of breath capacities on mean length utterances: How maturing production processes influence indices of language development. Journal of Phonetics, 52, 58–69. doi:10.1016/j.wocn.2015.04.005

250

References

Boucher, V. J., & Lamontagne, M. (2001). Effects of speaking rate on the control of vocal fold vibration: Clinical implications of active and passive aspects of devoicing. Journal of Speech, Language, and Hearing Research, 44, 1005–1014. doi:10.1044/ 1092-4388(2001/079) Boulenger, V., Hauk, O., & Pulvermüller, F. (2008). Grasping ideas with the motor system: Semantic somatotopy in idiom comprehension. Cerebral Cortex, 19, 1905–1914. doi:10.1093/cercor/bhn217 Boulenger, V., Mechtouff, L., Thobois, S., et al. (2008). Word processing in Parkinson’s disease is impaired for action verbs but not for concrete nouns. Neuropsychologia, 46, 743–756. doi:10.1016/j.neuropsychologia.2007.10.007 Bowerman, M. (1990). Mapping thematic roles onto syntactic functions: Are children helped by innate linking rules? Linguistics, 28, 1253–1290. doi:10.1515/ ling.1990.28.6.1253 Boyd, L. A., Edwards, J. D., Siengsukon, C. S., et al. (2009). Motor sequence chunking is impaired by basal ganglia stroke. Neurobiology of Learning and Memory, 92, 35–44. doi:10.1016/j.nlm.2009.02.009 Bradley, L., & Bryant, P. (1985). Rhyme and reason in reading and spelling. Ann Arbour, MI: University of Michigan Press. Branigan, H. P., & Pickering, M. J. (2016). An experimental approach to linguistic representation. Behavioral and Brain Sciences, 40, e282. doi:10.1017/ S0140525X16002028 Brauer, J., Anwander, A., & Friederici, A. D. (2011). Neuroanatomical prerequisites for language functions in the maturing brain. Cerebral Cortex, 21, 459–466. doi:10.1093/cercor/bhq108 Bresnan, J. (1982). The mental representation of grammatical reactions. Cambridge, MA: MIT Press. Broca, P. (1861). Remarques sur le siège de la faculté du langage articulé, suivies d’une observation d’aphémie (perte de la parole). Bulletin de la Société Anatomique, 6, 330–357. Browman, C. P. (1994). Lip aperture and consonant releases. In P. A. Keating (ed.), Phonological structure and phonetic form: Papers in laboratory phonology III (pp. 331–353). Cambridge, UK: Cambridge University Press. Browman, C. P., & Goldstein, L. (1984). Dynamic modeling of phonetic structure. Haskins Laboratories Report on Speech Research, SR-79/80, 1–17. Browman, C. P., & Goldstein, L. (1985). Dynamic modeling of phonetic structure. In V. A. Fromkin (ed.), Phonetic linguistics. Essays in honor of Peter Ladefoged (pp. 35–53). Orlando, FA: Academic Press. Browman, C. P., & Goldstein, L. (1990). Representation and reality: Physical systems and phonological structure. Journal of Phonetics, 18, 411–424. Browman, C. P., & Goldstein, L. (1992). Articulatory phonology: An overview. Phonetica, 49, 155–180. doi:10.1159/000261913 Browman, C. P., & Goldstein, L. (1993). Dynamics and articulatory phonology. Haskins Laboratories Status Report on Speech Research, SR-113, 51–62. Brown, R. (1973). A first language: The early stages. Cambridge, MA: Harvard University Press. Bryant, P., & Goswami, U. (2016). Phonological skills and learning to read. London, UK: Routledge.

References

251

Bulloch, M. J., Boysen, S. T., & Furlong, E. E. (2008). Visual attention and its relation to knowledge states in chimpanzees, Pan troglodytes. Animal Behaviour, 76, 1147–1155. doi:10.1016/j.anbehav.2008.01.033 Butcher, P. A., Ivry, R. B., Kuo, S.-H., et al. (2017). The cerebellum does more than sensory prediction error-based learning in sensorimotor adaptation tasks. Journal of Neurophysiology, 118, 1622–1636. doi:10.1152/jn.00451.2017 Buzsaki, G. (2006). Rhythms of the brain. New York, NY: Oxford University Press. Buzsáki, G., & Moser, E. I. (2013). Memory, navigation and theta rhythm in the hippocampal-entorhinal system. Nature Neuroscience, 16, 130. doi:10.1038/ nn.3304 Bybee, J. (2006). From usage to grammar: The mind’s response to repetition. Language, 82, 711–733. Bybee, J. (2010). Language, usage, and cognition. Cambridge, UK: Cambridge University Press. Bybee, J. (2011). Usage-based theory and grammaticalization. Oxford Handbooks Online. doi:10.1093/oxfordhb/9780199586783.013.0006 Bybee, J., & Beckner, C. (2009). A usage-based account of constituency and reanalysis. Language Learning, 59, 27–46. doi:10.1111/j.1467-9922.2009.00534.x Bybee, J., & McClelland, J. L. (2005). Alternatives to the combinatorial paradigm of linguistic theory based on domain general principles of human cognition. The Linguistic Review, 22, 381–410. doi:10.1515/tlir.2005.22.2-4.381 Byrd, D., Kaun, A., Narayanan, S., et al. (2000). Phrasal signatures in articulation. In M. B. Broe & J. B. Pierrehumbert (eds.), Papers in Laboratory Phonology V (pp. 70–87). Cambridge, UK: Cambridge University Press. Byrd, D., & Saltzman, E. (1998). Intragestural dynamics of multiple prosodic boundaries. Journal of Phonetics, 26, 173–199. doi:10.1006/jpho.1998.0071 Byrd, D., & Saltzman, E. (2003). The elastic phase: Modeling the dynamics of boundary-adjacent lengthening. Journal of Phonetics, 31, 149–180. doi:10.1016/ S0095-4470(02)00085-2 Cabrera, J. C. M. (2008). The written language bias in linguistic typology. Cuadernos de Lingüística del Instituto Universitario Investigación Ortega y Gasset, 15, 117–137. Cai, S., Ghosh, S. S., Guenther, F. H., et al. (2011). Focal manipulations of formant trajectories reveal a role of auditory feedback in the online control of both within-syllable and between-syllable speech timing. Journal of Neuroscience, 31, 16483–16490. doi:10.1523/JNEUROSCI.3653-11.2011 Calderone, D. J., Lakatos, P., Butler, P. D., et al. (2014). Entrainment of neural oscillations as a modifiable substrate of attention. TRENDS in Cognitive Sciences, 18, 300–309. doi:10.1016/j.tics.2014.02.005 Caligiore, D., Pezzulo, G., Baldassarre, G., et al. (2017). Consensus paper: Towards a systems-level view of cerebellar function: The interplay between cerebellum, basal ganglia, and cortex. The Cerebellum, 16, 203–229. doi:10.1007/s12311-0160763-3 Call, J., & Tomasello, M. (2008). Does the chimpanzee have a theory of mind? 30 years later. TRENDS in Cognitive Sciences, 12, 187–192. doi:10.1016/j.tics.2008.02.010 Cameron-Faulkner, T., Lieven, E., & Tomasello, M. (2003). A construction based analysis of child directed speech. Cognitive Science, 27, 843–873. doi:10.1207/ s15516709cog2706_2

252

References

Canolty, R. T., Edwards, E., Dalal, S. S., et al. (2006). High gamma power is phase-locked to theta oscillations in human neocortex. Science, 313, 1626–1628. doi:10.1126/science.1128115 Canolty, R. T., & Knight, R. T. (2010). The functional role of cross-frequency coupling. TRENDS in Cognitive Sciences, 14, 506–515. doi:10.1016/j.tics.2010.09.001 Capute, A. J., Palmer, F. B., Shapiro, B. K., et al. (1986). Clinical linguistic and auditory milestone scale: Prediction of cognition in infancy. Developmental Medicine and Child Neurology, 28, 762–771. doi:10.1111/j.1469-8749.1986.tb03930.x Caramazza, A., & Hillis, A. E. (1991). Lexical organization of nouns and verbs in the brain. Nature, 349, 788. doi:10.1038/349788a0 Caramazza, A., Hillis, A. E., Rapp, B. C., et al. (1990). The multiple semantics hypothesis: Multiple confusions? Cognitive Neuropsychology, 7, 161–189. doi:10.1080/02643299008253441 Carcagno, S., & Plack, C. J. (2017). Short-term learning and memory: Training and perceptual learning. In N. Kraus, S. R. Anderson, T. White-Schwoch, R. R. Fay, & R. N. Popper (eds.), The frequency-following response: A window into human communication (pp. 75–100). Cham, CH: Springer & ASA Press. Cäsar, C., Zuberbühler, K., Young, R. J., et al. (2013). Titi monkey call sequences vary with predator location and type. Biology Letters, 9, 1–5. doi:10.1098/rsbl.2013.0535 Catford, J. C. (1977). Fundamental problems in phonetics. Edinburgh, UK: Edinburgh University Press. Chabrol, F. P., Arenz, A., Wiechert, M. T., et al. (2015). Synaptic diversity enables temporal coding of coincident multisensory inputs in single neurons. Nature Neuroscience, 18, 718. doi:10.1038/nn.3974 Chafe, W. (1979). The flow of thought and the flow of language. In T. Givón (ed.), Discourse and syntax. Syntax and semantics 12 (pp. 159–181). New York, NY: Academic Press. Chan, A., McAllister, L., & Wilson, L. (1998). An investigation of the MLU–age relationship and predictors of MLU in 2- and 3-year-old Australian children. Asia Pacific Journal of Speech, Language and Hearing, 3, 97–108. doi:10.1179/ 136132898805577241 Chan, M. E., & Elliott, J. M. (2011). Cross-linguistic differences in digit memory span. Australian Psychologist, 46, 25–30. doi:10.1111/j.1742-9544.2010.00007.x Chandrasekaran, B., & Kraus, N. (2010). The scalp-recorded brainstem response to speech: Neural origins and plasticity. Psychophysiology, 47, 236–246. doi:10.1111/ j.1469-8986.2009.00928.x Charness, N., Park, D. C., & Sabel, B. A. (2001). Communication, technology and aging: Opportunities and challenges for the future. New York, NY: Springer. Chen, S. H. A., & Desmond, J. E. (2005). Cerebrocerebellar networks during articulatory rehearsal and verbal working memory tasks. NeuroImage, 24, 332–338. doi:10.1016/j.neuroimage.2004.08.032 Chenery, H. J., Angwin, A. J., & Copland, D. A. (2008). The basal ganglia circuits, dopamine, and ambiguous word processing: A neurobiological account of priming studies in Parkinson’s disease. Journal of the International Neuropsychological Society, 14, 351–364. doi:10.1017/S1355617708080491 Cheney, D. L., & Seyfarth, R. M. (1990). How monkeys see the world. Chicago, IL: University of Chicago Press.

References

253

Cholin, J. (2008). The mental syllabary in speech production: An integration of different approaches and domains. Aphasiology, 22, 1127–1141. doi:10.1080/ 02687030701820352 Cholin, J., Levelt, W. J. M., & Schiller, N. O. (2006). Effects of syllable frequency in speech production. Cognition, 99, 205–235. doi:10.1016/j.cognition.2005.01.009 Chomsky, N. (1957). Syntactic structures. The Hague, NL: Mouton. Chomsky, N. (1959). A review of B. F. Skinner’s Verbal Behavior. Language, Speech and Hearing Services in School, 35, 26–58. Chomsky, N. (1961). On the notion ‘rule of grammar’. In R. Jakobson (ed.), Proceedings of the Symposia in Applied Mathematics XII: Structure of language and its mathematical aspects (pp. 6–24). Providence, RI: American Mathematical Society. Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press. Chomsky, N. (1967). The formal nature of language. Appendix to E. H. Lenneberg, Biological foundations of language (pp. 397–442). New York, NY: John Wiley & Sons. Chomsky, N. (1972). Language and mind. New York, NY: Harcourt, Brace, Jovanovich. Chomsky, N. (1975a). The logical structure of linguistic theory. New York, NY: Plenum. Chomsky, N. (1975b). Reflections on language. New York, NY: Pantheon Books. Chomsky, N. (1980a). On cognitive structures and their development: A reply to Piaget. In M. Piattelli-Palmarini (ed.), Language and learning: The debate between Jean Piaget and Noam Chomsky (pp. 35–54). Cambridge, MA: Harvard University Press. Chomsky, N. (1980b). Rules and representations. New York, NY: Columbia University Press. Chomsky, N. (1986). Knowledge of language: Its nature, origin and use. New York, NY: Praeger. Chomsky, N. (1993). Lectures on government and binding: The Pisa lectures. Berlin, DE: Mouton de Gruyter. Chomsky, N. (1995). The minimalist program. Cambridge, MA: MIT Press. Chomsky, N. (2000). New horizons in the study of language and mind. Cambridge, UK: Cambridge University Press. Chomsky, N. (2002). On nature and language. Cambridge, UK: Cambridge University Press. Chomsky, N. (2006). Language and mind (3rd ed.). Cambridge, UK: Cambridge University Press. Chomsky, N. (2012). The science of language: Interviews with James McGilvray. Cambridge, UK: Cambridge University Press. Chomsky, N., & Halle, M. (1968). The sound pattern of English. New York, NY: Harper and Row. Christiansen, M. H., & Chater, N. (2008). Language as shaped by the brain. Behavioral and Brain Sciences, 31, 489–509. doi:10.1017/S0140525X08004998 Christiansen, M. H., & Kirby, S. (2003). Language evolution: Consensus and controversies. TRENDS in Cognitive Sciences, 7, 300–307. doi:10.1016/s13646613(03)00136-0 Christophe, A., & Mehler, J. (2001). Perception of prosodic boundary correlates by newborn infants. Infancy, 2, 385–394. doi:10.1207/S15327078IN0203_6

254

References

Christophe, A., Millotte, S., Bernal, S., et al. (2008). Bootstrapping lexical and syntactic acquisition. Language and Speech, 51, 61–75. doi:10.1177/00238309080510010501 Christophe, A., Peperkamp, S., Pallier, C., et al. (2004). Phonological phrase boundaries constrain lexical access I. Adult data. Journal of Memory and Language, 51, 523–547. doi:10.1016/j.jml.2004.07.001 Clark, B. (2012). Syntactic theory and the evolution of syntax. Biolinguistics, 7, 169–197. Clements, G. N., & Keyser, S. J. (1983). CV phonology: A generative theory of the syllable. Linguistic Inquiry Monographs, 9, 1–191. Cogan, G. B., & Poeppel, D. (2011). A mutual information analysis of neural coding of speech by low-frequency MEG phase information. Journal of Neurophysiology, 106, 554–563. doi:10.1152/jn.00075.2011 Conant, S. (1987). The relationship between age and MLU in young children: A second look at Klee and Fitzgerald’s data. Journal of Child Language, 14, 169–173. doi:10.1017/S0305000900012794 Cook, M., Murdoch, B. E., Cahill, L., et al. (2004). Higher-level language deficits resulting from left primary cerebellar lesions. Aphasiology, 18, 771–784. doi:10.1080/02687030444000291 Cools, R., Barker, R. A., Sahakian, B. J., et al. (2001). Enhanced or impaired cognitive function in Parkinson’s disease as a function of dopaminergic medication and task demands. Cerebral Cortex, 11, 1136–1143. doi:10.1093/cercor/11.12.1136 Corballis, M. C. (1992). On the evolution of language and generativity. Cognition, 44, 197–126. doi:10.1016/0010-0277(92)90001-X Corballis, M. C. (2002). From hand to mouth: The origins of language. Princeton NJ: Princeton University Press. Corballis, M. C. (2003). From mouth to hand: gesture, speech, and the evolution of right-handedness. Behavioral and Brain Sciences, 26, 199–208. doi:10.1017/ S0140525X03000062 Corballis, M. C. (2009). The evolution of language. Annals of the New York Academy of Sciences, 1156, 19–43. doi:10.1111/j.1749-6632.2009.04423.x Corballis, M. C. (2010). Mirror neurons and the evolution of language. Brain and Language, 112, 25–35. doi:10.1016/j.bandl.2009.02.002 Coulmas, F. (1989). The writing systems of the world. Oxford, UK: Blackwell. Coulmas, F. (1996). The Blackwell encyclopedia of writing systems. Oxford, UK: Wiley-Blackwell. Coulmas, F. (2003). Writing systems: An introduction to their linguistic analysis. Cambridge, UK: Cambridge University Press. Cousins, K. A. Q., & Grossman, M. (2017). Evidence of semantic processing impairments in behavioural variant frontotemporal dementia and Parkinson’s disease. Current opinion in neurology, 30, 617–622. doi:10.1097/WCO.0000000000 000498 Cowan, N. (2000). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24, 87–185. doi:10.1017/ S0140525X01003922 Cowan, N. (2005). Working memory capacity. New York, NY: Psychology Press. Cowie, F. (1999). What’s within? Nativism reconsidered. Oxford, UK: Oxford University Press.

References

255

Crago, M. B. (1990). Cultural context in communicative interaction of Inuit children (PhD). McGill University, Montreal, QC. Crago, M. B., & Allen, S. (1997). Linguistic and cultural aspects of simplicity and complexity in Inuktitut child directed speech. In E. Hughes, M. Hughes, & A. Greenhil (eds.), Proceedings of the 21st Annual Boston University Conference on Language Development (pp. 91–102). Somerville, MA: Cascadilla Press. Crain, S., & Pietroski, P. M. (2001). Nature, nurture and universal grammar. Linguistics and Philosophy, 24, 139–186. doi:10.1023/A:1005694100138 Crelin, E. S. (1987). The human vocal tract: Anatomy, function, development and evolution. New York, NY: Vantage Press. Crepaldi, D., Berlingeri, M., Cattinelli, I., et al. (2013). Clustering the lexicon in the brain: A meta-analysis of the neurofunctional evidence on noun and verb processing. Frontiers in Human Neuroscience, 7. doi:10.3389/fnhum.2013.00303 Crepaldi, D., Berlingeri, M., Paulesu, E., et al. (2011). A place for nouns and a place for verbs? A critical review of neurocognitive data on grammatical-class effects. Brain and Language, 116, 33–49. doi:10.1016/j.bandl.2010.09.005 Crockford, C., & Boesch, C. (2003). Context-specific calls in wild chimpanzees, Pan troglodytes verus: Analysis of barks. Animal Behaviour, 66, 115–125. doi:10.1006/ anbe.2003.2166 Crompton, A. (1981). Syllables and segments in speech production. Linguistics, 19, 663–716. Crosson, B. (1992). Subcortical functions in language and memory. New York, NY: Guilford. Crosson, B. (2013). Thalamic mechanisms in language: A reconsideration based on recent findings and concepts. Brain and Language, 126, 73–88. doi:10.1016/j. bandl.2012.06.011 Crosson, B., Bejamin, M., & Levy, I. (2007). Role of the basal ganglia in language and semantics: Supporting cast. In J. Hart & M. A. Kraut (eds.), Neural basis of semantic memory. (pp. 219–243). New York, NY: Cambridge University Press. Crosson, B., & Haaland, K. Y. (2003). Subcortical functions in cognition: Toward a consensus. Journal of the International Neuropsychological Society, 9, 1027–1030. doi:10.1017/S1355617703970068 Crystal, D. (1974). Review of the book A First Language: The Early Stages. Journal of Child Language, 1, 289–307. Cummins, F. (2012). Oscillators and syllables: A cautionary note. Frontiers in Psychology, 3. doi:10.3389/fpsyg.2012.00364 Curtiss, S., & de Bode, S. (2003). How normal is grammatical development in the right hemisphere following hemispherectomy? The root infinitive stage and beyond. Brain and Language, 86, 193–206. doi:10.1016/S0093-934X(02)00528-X Cutler, A. (1980). La leçon des lapsus. La Recherche, 11, 686–692. Cutler, A. (2012). Native listening: Language experience and the recognition of spoken words. Cambridge, MA: MIT Press. D’Angelo, E. (2018). Physiology of the cerebellum. In M. Manto & T. A. G. M. Huisman (eds.), Handbook of clinical neurology (Vol. 154, pp. 85–108). Amsterdam, NL: Elsevier. D’Ausilio, A., Maffongelli, L., Bartoli, E., et al. (2014). Listening to speech recruits specific tongue motor synergies as revealed by transcranial magnetic stimulation and

256

References

tissue-Doppler ultrasound imaging. Philosophical Transactions of the Royal Society B: Biological Sciences, 369, 1–10. doi:10.1098/rstb.2013.0418 Dabrowska, E. (2004). Language, mind and brain: Some psychological and neurological constraints on theories of grammar. Edinburgh, UK: Edinburgh University Press. Dalla Volta, R., Gianelli, C., Campione, G. C., et al. (2009). Action word understanding and overt motor behavior. Experimental Brain Research, 196, 403–412. doi:10.1007/ s00221-009-1864-8 Daniels, P. T. (2017). Writing systems. In M. Aronoff (ed.), The handbook of linguistics (2nd ed., pp. 75–94). Malden, MA: John Wiley. Daniels, P. T., & Bright, W. (eds.). (1996). The world’s writing systems. New York, NY: Oxford University Press. Daniels, P. T., & Share, D. L. (2018). Writing system variation and its consequences for reading and dyslexia. Scientific Studies of Reading, 22, 101–116. doi:10.1080/ 10888438.2017.1379082 Daniloff, R., & Moll, K. L. (1968). Coarticulation of lip rounding. Journal of Speech, Language, and Hearing Research, 11, 707–721. doi:10.1044/jshr.1104.707 Deacon, T. W. (1997). The symbolic species. New York, NY: W.W. Norton. Deacon, T. W. (2011). Incomplete nature: How mind emerged from matter. New York, NY: W.W. Norton. Deacon, T. W. (2012). Beyond the symbolic species. In T. Schilhab, F. Stjernfelt, & T. W. Deacon (eds.), The symbolic species evolved (pp. 9–38). Dordrecht, NL: Springer. Decety, J., & Grèzes, J. (2006). The power of simulation: Imagining one’s own and other’s behavior. Brain Research, 1079, 4–14. doi:10.1016/j.brainres.2005.12.115 DeHaene, S. (2009). Reading in the brain. NewYork, NY: Penguin. Delattre, P. (1966). A comparison of syllable length conditioning among languages. International Review of Applied Linguistics, 4, 183–198. Dell, G. S. (1986). A spreading-activation theory of retrieval in sentence production. Psychological Review, 93, 283–321. doi:10.1037/0033-295X.93.3.283 Della Rosa, P. A., Catricalà, E., Vigliocco, G., et al. (2010). Beyond the abstract– concrete dichotomy: Mode of acquisition, concreteness, imageability, familiarity, age of acquisition, context availability, and abstractness norms for a set of 417 Italian words. Behavior Research Methods, 42, 1042–1048. doi:10.3758/ brm.42.4.1042 DeLong, M. R., & Wichmann, T. (2007). Circuits and circuit disorders of the basal ganglia. JAMA Neurology, 64, 20–24. doi:10.1001/archneur.64.1.20 Delorme, A., & Makeig, S. (2004). EEGLAB: An open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. Journal of Neuroscience Methods, 134, 9–21. doi:10.1016/j.jneumeth.2003.10.009 de Meo, R., Murray, M. M., Clarke, S., et al. (2015). Top-down control and early multisensory processes: chicken vs. egg. Frontiers in Integrative Neuroscience, 9. doi:10.3389/fnint.2015.00017 Depue, B. E. (2012). A neuroanatomical model of prefrontal inhibitory modulation of memory retrieval. Neuroscience & Biobehavioral Reviews, 36, 1382–1399. doi:doi. org/10.1016/j.neubiorev.2012.02.012 Derwing, B. L. (1980). Against autonomous linguistics. In T. Perry (ed.), Evidence and argumentation in linguistics (pp. 163–189). Berlin, DE: de Gruyter.

References

257

Derwing, B. L. (1992). Orthographic aspects of linguistic competence. In P. Downing, S. D. Lima, & M. Noonan (eds.), The linguistics of literacy (pp. 193–210). Philadelphia, PA: John Benjamins. de Santos Loureiro, C., Braga, L. W., do Nascimento Souza, L., et al. (2004). Degree of illiteracy and phonological and metaphonological skills in unschooled adults. Brain and Language, 89, 499–502. doi:1016/j.bandl.2003.12.008 Desmond, K. J., Allen, P. D., Demizio, D. L., et al. (1997). Redefining end of test (EOT) criteria for pulmonary function testing in children. American Journal of Respiratory and Critical Care Medecine, 156, 542–545. doi:10.1164/ajrccm.156.2.9610116 DeThorne, L. S., Johnson, B. W., & Loeb, J. W. (2005). A closer look at MLU: What does it really measure? Clinical Linguistics & Phonetics, 19, 635–648. doi:10.1080/ 02699200410001716165 Devan, B. D., & White, N. M. (1999). Parallel information processing in the dorsal striatum: Relation to hippocampal function. The Journal of Neuroscience, 19, 2789–2798. doi:10.1523/jneurosci.19-07-02789.1999 Di Sciullo, A.-M., & Williams, E. (1987). On the definition of word. Cambridge, MA: MIT Press. Dickman, H., Fletke, K., & Redfern, R. E. (2016). Prolonged unassisted survival in an infant with anencephaly. BMJ Case Reports, 2016, bcr2016215986. doi:10.1136/bcr2016-215986 Diehl, R. L., Lotto, A. J., & Holt, L. L. (2004). Speech perception. Annual Review of Psychology, 55, 149–179. doi:10.1146/annurev.psych.55.090902.142028 Ding, N., Melloni, L., Zhang, H., et al. (2016). Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience, 19, 158–164. doi:10.1038/nn.4186 Ding, N., & Simon, J. Z. (2014). Cortical entrainment to continuous speech: functional roles and interpretations. Frontiers in Human Neuroscience, 8. doi:10.3389/ fnhum.2014.00311 Dixon, R. M. W., & Aikhenvald, A. Y. (2002). Word: A cross-linguistic typology. Cambridge, UK: Cambridge University Press. Doelling, K. B., Arnal, L. H., Ghitza, O., et al. (2014). Acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing. NeuroImage, 85 Pt 2, 761–768. doi:10.1016/j.neuroimage.2013.06.035 Donald, M. (1997). Preconditions for the evolution of protolanguages. In M. C. Corballis & S. E. G. Lea (eds.), The descent of mind: Psychological perspectives on hominid evolution (pp. 138–154). Oxford, UK: Oxford University Press. Donegan, P. J. (2015). The emergence of phonological representation. In B. MacWhinney & W. O’Grady (eds.), The handbook of language emergence (pp. 33–52). Chichester, UK: John Wiley. Dove, G. (2015). How to go beyond the body: An introduction. Frontiers in Psychology, 6. doi:10.3389/fpsyg.2015.00660 Doya, K. (2000). Complementary roles of basal ganglia and cerebellum in learning and motor control. Current Opinion in Neurobiology, 10, 732–739. doi:10.1016/S09594388(00)00153-7 Dragoi, G., & Tonegawa, S. (2013). Distinct preplay of multiple novel spatial experiences in the rat. Proceedings of the National Academy of Sciences, 110, 9100–9105. doi:10.1073/pnas.1306031110

258

References

Draper, M. H., Ladefoged, P., & Whitteridge, D. (1959). Respiratory muscles in speech. Journal of Speech & Hearing Research, 2, 16–27. doi:10.1044/jshr.0201.16 Du, Y., Huang, Q., Wu, X., et al. (2009). Binaural unmasking of frequency-following responses in rat amygdala. Journal of Neurophysiology, 101, 1647–1659. doi:10.1152/jn.91055.2008 Du, Y., Kong, L., Wang, Q., et al. (2011). Auditory frequency-following response: A neurophysiological measure for studying the “cocktail-party problem.” Neuroscience & Biobehavioral Reviews, 35, 2046–2057. doi:10.1016/j.neubiorev.2011.05.008 Du, Y., Ma, T., Wang, Q., et al. (2009). Two crossed axonal projections contribute to binaural unmasking of frequency-following responses in rat inferior colliculus. European Journal of Neuroscience, 30, 1779–1789. doi:10.1111/j.14609568.2009.06947.x Dubois, J., Dehaene-Lambertz, G., Perrin, M., et al. (2008). Asynchrony of the early maturation of white matter bundles in healthy infants: Quantitative landmarks revealed noninvasively by diffusion tensor imaging. Human Brain Mapping, 29, 14–27. doi:10.1002/hbm.20363 Dudai, Y. (1989). The neurobiology of memory: Concepts, findings, trends. New York, NY: Oxford University Press. Duff, M. C., & Brown-Schmidt, S. (2012). The hippocampus and the flexible use and processing of language. Frontiers in Human Neuroscience, 6. doi:10.3389/ fnhum.2012.00069 Duff, M. C., & Brown-Schmidt, S. (2017). Hippocampal contributions to language use and processing. In D. E. Hannula & M. C. Duff (eds.), The Hippocampus from cells to systems: Structure, connectivity, and functional contributions to memory and flexible cognition (pp. 503–536). Cham, CH: Springer. Duranti, A. I., & Goodwin, C. (eds.). (1992). Rethinking context: Language as an interactive phenomenon. Cambridge, UK: Cambridge University Press. Ebner, T. J., & Pasalar, S. (2008). Cerebellum predicts the future motor state. Cerebellum, 7, 583–588. doi:10.1007/s12311-008-0059-3 Ehri, L. C. (1975). Word consciousness in readers and prereaders. Journal of Educational Psychology, 67, 204–212. doi:10.1037/h0076942 Ehri, L. C. (1998). Grapheme-phoneme knowledge is essential for learning to read words in English. In J. L. Metsala & L. C. Ehri (eds.), Word recognition in beginning literacy (pp. 3–40). Mahwah, NJ: Lawrence Erlbaum. Ehri, L. C. (2014). Orthographic mapping in the acquisition of sight word reading, spelling memory, and vocabulary learning. Scientific Studies of Reading, 18, 5–21. doi:10.1080/10888438.2013.819356 Ehri, L. C., & Wilce, L. (1980). The influence of orthography on readers’ conceptualization of the phonemic structure of words. Applied Psycholinguistics, 1, 371–385. doi:10.1017/S0142716400009802 Ehri, L. C., Wilce, L. S., & Taylor, B. B. (1987). Children’s categorization of short vowels in words and the influence of spellings. Merrill-Palmer Quarterly, 33, 393–421. Eichenbaum, H. (2017). The role of the hippocampus in navigation is memory. Journal of Neurophysiology, 117, 1785–1796. doi:10.1152/jn.00005.2017 Eichenbaum, H., Amaral, D. G., Buffalo, E. A., et al. (2016). Hippocampus at 25. Hippocampus, 26, 1238–1249. doi:10.1002/hipo.22616

References

259

Eichenbaum, H., & Cohen, N. J. (2001). From conditioning to conscious recollection: Memory systems of the brain. New York, NY: Oxford University Press. Ekmekci, F. O. (1982). Language development of a Turkish child: A speech analysis in terms of length and complexity. Journal of Human Sciences, 1, 103–112. Ekstrom, A. D., & Ranganath, C. (2018). Space, time, and episodic memory: The hippocampus is all over the cognitive map. Hippocampus, 28, 680–687. doi:10.1002/hipo.22750 Ekstrom, A. D., Spiers, H. J., Bohbot, V. D., et al. (2018). Human spatial navigation. Princeton, NJ: Princeton University Press. Elbeheri, G., Everatt, J., Reid, G., et al. (2006). Dyslexia assessment in Arabic. Journal of Research in Special Educational Needs, 6, 143–152. doi:10.1111/j.14713802.2006.00072.x Ellis, A. W. (1980). Errors in speech and short-term memory: The effects of phonemic similarity and syllable position. Journal of Verbal Learning and Verbal Behavior, 19, 624–634. doi:10.1016/S0022-5371(80)90672-6 Ellis, N. C. (2002). Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition, 24, 143–188. doi:10.1017/S0272263102002024 Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179–211. doi:10.1207/s15516709cog1402_1 Elman, J. L. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48, 71–99. doi:10.1016/0010-0277(93)90058-4 Elman, J. L., Bates, E. A., Johnson, M. H., et al. (1996). Rethinking innateness: A connectionist perspective on development. Cambridge, MA: MIT Press. Embick, D., & Poeppel, D. (2015). Towards a computational(ist) neurobiology of language: Correlational, integrated and explanatory neurolinguistics. Language, Cognition and Neuroscience, 30, 357–366. doi:10.1080/23273798.2014.980750 Emmorey, K. (2005). Sign languages are problematic for a gestural origins theory of language evolution. Behavioral and Brain Sciences, 28, 130–131. doi:10.1017/ S0140525X05270036 Endress, A. D., & Hauser, M. D. (2010). Word segmentation with universal prosodic cues. Cognitive Psychology, 61, 177–199. doi:10.1016/j.cogpsych.2010.05.001 Endress, A. D., & Mehler, J. (2009). The surprising power of statistical learning: When fragment knowledge leads to false memories of unheard words. Journal of Memory and Language, 60, 351–367. doi:10.1016/j.jml.2008.10.003 Erickson, L. C., & Thiessen, E. D. (2015). Statistical learning of language: Theory, validity, and predictions of a statistical learning account of language acquisition. Developmental Review, 37, 66–108. doi:10.1016/j.dr.2015.05.002 Eschenko, O., & Mizumori, S. J. Y. (2007). Memory influences on hippocampal and striatal neural codes: Effects of a shift between task rules. Neurobiology of Learning and Memory, 87, 495–509. doi:10.1016/j.nlm.2006.09.008 Esteve-Gibert, N., & Prieto, P. (2018). Early development of the prosody–meaning interface. In P. Prieto & N. Esteve-Gibert (eds.), The development of prosody in first language acquisition (pp. 227–246). Amsterdam, NL: John Benjamins. Etkin, A., Egner, T., Peraza, D. M., et al. (2006). Resolving emotional conflict: A role for the rostral anterior cingulate cortex in modulating activity in the amygdala. Neuron, 51, 871–882. doi:10.1016/j.neuron.2006.07.029

260

References

Evans, N., & Levinson, S. C. (2009). The myth of language universals: Language diversity and its importance for cognitive science. Behavioral and Brain Sciences, 32, 429–448. doi:10.1017/S0140525X0999094X Everatt, J., Smythe, I., Ocampo, D., et al. (2004). Issues in the assessment of literacyrelated difficulties across language backgrounds: A cross-linguistic comparison. Journal of Research in Reading, 27, 141–151. doi:10.1111/j.1467-9817.2004 .00222.x Ezeizabarrena, M.-J., & Garcia Fernandez, I. (2018). Length of utterance, in morphemes or in words? MLU3-w, a reliable measure of language development in early Basque. Frontiers in Psychology, 8. doi:10.3389/fpsyg.2017.02265 Faber, A. (1990). Phonemic segmentation as epiphenomenon: Evidence from the history of alphabetic writing. Haskins Laboratories Report on Speech Research, SR-101/102, 1–13. Fadiga, L., Craighero, L., Buccino, G., et al. (2002). Speech listening specifically modulates the excitability of tongue muscles: A TMS study. European Journal of Neuroscience, 15, 399–402. doi:10.1046/j.0953-816x.2001.01874.x Fagan, M. K. (2009). Mean length of utterance before words and grammar: Longitudinal trends and developmental implications of infant vocalizations. Journal of Child Language, 36, 495–527. doi:10.1017/s0305000908009070 Fang, P. C., Stepniewska, I., & Kaas, J. H. (2006). The thalamic connections of motor, premotor, and prefrontal areas of cortex in a prosimian primate (Otolemur garnetti). Neuroscience, 143, 987–1020. doi:10.1016/j.neuroscience.2006.08.053 Fant, G. (1960). Acoustic theory of speech production. The Hague, NL: Mouton. Feldman, A. G. (1966). Functional tuning of the nervous system with control of movement or maintenance of a steady posture-II. Controllable parameters of the muscles. Biophysics, 11, 565–578. Feldman, A. G. (1986). Once more on the equilibrium-point hypothesis for motor control. Journal of Motor Behavior, 18, 17–54. doi:10.1080/00222895.1986.10735369 Fell, J., & Axmacher, N. (2011). The role of phase synchronization in memory processes. Nature Reviews Neuroscience, 12, 105–118. doi:10.1038/nrn2979 Ferrari, P. F., Rozzi, S., & Fogassi, L. (2005). Mirror neurons responding to observation of actions made with tools in monkey ventral premotor cortex. Journal of Cognitive Neuroscience, 17, 212–226. doi:10.1162/0898929053124910 Ferreira, F. (2005). Psycholinguistics, formal grammars, and cognitive science. The Linguistic Review, 22, 365. doi:10.1515/tlir.2005.22.2-4.365 Ferry, A. L., Hespos, S. J., & Waxman, S. R. (2010). Categorization in 3- and 4-monthold infants: An advantage of words over tones. Child Development, 81, 472–479. doi:10.1111/j.1467-8624.2009.01408.x Finestack, L. H., Payesteh, B., Disher, J., et al. (2014). Reporting child language sampling procedures. Journal of Speech, Language, and Hearing Research, 57, 2274–2279. doi:10.1044/2014_JSLHR-L-14-0093 Fischer-Jørgensen, E. (1975). Trends in phonological theory: A historical introduction. Copenhagen, DK: Akademisk Forlag. Fisher, C., & Tokura, H. (1996). Prosody in speech to infants: Direct and indirect acoustic cues to syntactic structure. In J. L. Morgan & K. Demuth (eds.), Signal to syntax: Bootstrapping from speech to grammar in early acquisition (pp. 343–363). Mahwah, NJ: Lawrence Erlbaum.

References

261

Fitch, W. T. (2000). The evolution of speech: A comparative review. TRENDS in Cognitive Sciences, 4, 258–267. doi:10.1016/S1364-6613(00)01494-7 Fitch, W. T. (2005). Production of vocalizations in mammals. In K. Brown (ed.), Encyclopedia of language and linguistics (pp. 115–121). New York, NY: Elsevier. Fitch, W. T. (2010). The evolution of language. Cambridge, UK: Cambridge University Press. Fitch, W. T., Hauser, M. D., & Chomsky, N. (2005). The evolution of the language faculty: clarifications and implications. Cognition, 97, 179–210. doi:10.1016/j. cognition.2005.02.005 Fitch, W. T., & Reby, D. (2001). The descended larynx is not uniquely human. Proceedings of the Royal Society of London B: Biological Sciences, 268, 1669–1675. doi:10.1098/rspb.2001.1704 Fletcher-Flinn, C. M., Thompson, G. B., Yamada, M., et al. (2011). The acquisition of phoneme awareness in children learning the hiragana syllabary. Reading and Writing, 24, 623–633. doi:10.1007/s11145-010-9257-8 Fodor, J. A., Bever, T. G., & Garrett, M. F. (1974). The psychology of language: An introduction to psycholinguistics and generative grammar. New York, NY: McGrawHill. Fortescue, M. (1984). Learning to speak Greenlandic: A case study of a two-year-old’s morphology in a polysynthetic language. First Language, 5, 101–112. doi:10.1177/ 014272378400501402 Fowler, C. A. (1985). Current perspectives on language and speech production: A critical overview. In R. G. Daniloff (ed.), Speech science: Recent advances (pp. 193–278). San Diego, CA: College-Hill Press. Fowler, C. A. (2010). The reality of phonological forms: A reply to Port. Language Sciences, 32, 56–59. doi:10.1016/j.langsci.2009.10.015 Fowler, C. A., Shankweiler, D. P., & Studdert-Kennedy, M. (2016). “Perception of the speech code” revisited: Speech is alphabetic after all. Psychological Review, 123, 125–150. doi:10.1037/rev0000013 Fox, B., & Routh, D. K. (1975). Analyzing spoken language into words, syllables, and phonomes: A developmental study. Journal of Psycholinguistic Research, 4, 331–342. Frank, M. C., Everett, D. L., Fedorenko, E., et al. (2008). Number as a cognitive technology: Evidence from Pirahã language. Cognition, 108, 819–824. doi:10.1016/j.cognition.2008.04.007 Fries, C. C. (1952). The structure of English: An introduction to the construction of English sentences. New York, NY: Harcourt Brace. Frisch, S. A., & Wright, R. (2002). The phonetics of phonological speech errors: An acoustic analysis of slips of the tongue. Journal of Phonetics, 30, 139–162. doi:10.1006/jpho.2002.0176 Fromkin, V. A. (1966a). Neuro-muscular specification of linguistic units. Language and Speech, 9, 170–199. doi:10.1177/002383096600900304 Fromkin, V. A. (1966b). Some requirements for a model of performance. UCLA Working Papers in Phonetics, 4, 19–39. Fromkin, V. A. (1970). Tips of the slung – or – to err is human. UCLA Working Papers in Phonetics, 14, 40–79.

262

References

Fromkin, V. A. (1973). The non-anomalous nature of anomalous utterances. In V. A. Fromkin (ed.), Speech errors as linguistic evidence (pp. 215–269). The Hague, NL: Mouton. Fromkin, V. A., Rodman, R., & Hyams, N. (2013). An introduction to language (10th ed.). Boston, MA: Wadsworth. Fulkerson, A. L., & Waxman, S. R. (2007). Words (but not tones) facilitate object categorization: Evidence from 6- and 12-month-olds. Cognition, 105, 218–228. doi:10.1016/j.cognition.2006.09.005 Galbraith, G. C., Arbagey, P. W., Branski, R., et al. (1995). Intelligible speech encoded in the human brain stem frequency-following response. NeuroReport, 6, 2363–2367. doi :10.1097/00001756-199511270-00021 Galbraith, G. C., Bhuta, S. M., Choate, A. K., et al. (1998). Brain stem frequency-following response to dichotic vowels during attention. NeuroReport, 9, 1889–1893. doi:10.1097/00001756-199806010-00041 Galbraith, G. C., Olfman, D. M., & Huffman, T. M. (2003). Selective attention affects human brain stem frequency-following response. NeuroReport, 14, 735–738. doi:10.1097/00001756-200304150-00015 Gallese, V., & Lakoff, G. (2005). The brain’s concepts: The role of the sensory-motor system in conceptual knowledge. Cognitive Neuropsychology, 22, 455–479. doi:10.1080/02643290442000310 Gallese, V., & Sinigaglia, C. (2011). What is so special about embodied simulation? TRENDS in Cognitive Sciences, 15, 512–519. doi:10.1016/j.tics.2011.09.003 Gambell, T., & Yang, C. (2005). Mechanisms and constraints in word segmentation. Manuscript, Yale University, 31. Gardner, B. T., & Gardner, R. A. (1985). Signs of Intelligence in cross-fostered chimpanzees. Philosophical Transactions of the Royal Society B: Biological Sciences, 308, 159–176. doi:10.1098/rstb.1985.0017 Gardner, H. (1985). The mind’s new science: A history of the cognitive revolution. New York, NY: Basic Books. Gardner, R. A., & Gardner, B. T. (1969). Teaching sign language to a chimpanzee. Science, 165, 664–672. Gasser, M. (2004). The origins of arbitrariness in language. In K. D. Forbus, D. Gentner, & T. Regier (eds.), Proceedings of the 26th annual meeting of the Cognitive Science Society (pp. 434–439). Mahwah, NJ: Lawrence Erlbaum. Gathercole, S. E., & Baddeley, A. D. (1989). Evaluation of the role of phonological STM in the development of vocabulary in children: A longitudinal study. Journal of Memory and Language, 28, 200–213. doi:10.1016/0749-596X(89) 90044-2 Gelb, I. J. (1963). A study of writing. Chicago, IL: University of Chicago Press. Gentilucci, M., & Corballis, M. C. (2006). From manual gesture to speech: A gradual transition. Neuroscience and Biobehavioral Reviews, 30, 949–960. doi:10.1016/j. neubiorev.2006.02.004 Gerken, L. (1994). Young children’s representation of prosodic phonology: Evidence from English-speakers’ weak syllable productions. Journal of Memory and Language, 33, 19–38. doi:10.1006/jmla.1994.1002 Gerken, L. (1996a). Prosodic structure in young children’s language production. Language, 72, 683–712. doi:10.2307/416099

References

263

Gerken, L. (1996b). Prosody’s role in language acquisition and adult parsing. Journal of Psycholinguistic Research, 25, 345–356. doi:10.1007/BF01708577 Gerken, L., Jusczyk, P. W., & Mandel, D. R. (1994). When prosody fails to cue syntactic structure: 9-month-olds’ sensitivity to phonological versus syntactic phrases. Cognition, 51, 237–265. doi:10.1016/0010-0277(94)90055-8 Geschwind, N. (1970). The organization of language and the brain. Science, 170, 940–944. doi:10.1126/science.170.3961.940 Geudens, A. (2006). Phonological awareness and learning to read a first language: Controversies and new perspectives. LOT Occasional Series, 6, 25–43. doi:10.1.1.624.1580 Geudens, A., & Sandra, D. (2003). Beyond implicit phonological knowledge: No support for an onset–rime structure in children’s explicit phonological awareness. Journal of Memory and Language, 49, 157–182. doi:10.1016/S0749-596X(03) 00036-6 Ghitza, O. (2017). Acoustic-driven delta rhythms as prosodic markers. Language, Cognition and Neuroscience, 32, 545–561. doi:10.1080/23273798.2016.1232419 Gil, D. (2002). Escaping eurocentrism: Fieldwork as a process of unlearning. In P. Newman & M. Ratliff (eds.), Linguistic fieldwork (pp. 102–132). Cambridge, UK: Cambridge University Press. Gilbert, A. C., & Boucher, V. J. (2007). What do listeners attend to in hearing prosodic structures? Investigating the human speech-parser using short-term recall. In 8th Annual conference of the International Speech Communication Association. InterSpeech-2007 (pp. 430–433). Antwerp, BE: ISCA. Gilbert, A. C., Boucher, V. J., & Jemel, B. (2014). Perceptual chunking and its effect on memory in speech processing: ERP and behavioral evidence. Frontiers in Psychology, 5. doi:10.3389/fpsyg.2014.00220 Gilbert, A. C., Boucher, V. J., & Jemel, B. (2015a). Individual differences in working memory and their effects on speech processing. In Proceedings of the 18th International Congress of Phonetic Sciences (Vol. Paper no 0772, pp. 1–4). Glasgow, UK: International Phonetic Association. doi:10.1016/0749-596X(86) 90018-5 Gilbert, A. C., Boucher, V. J., & Jemel, B. (2015b). The perceptual chunking of speech: A demonstration using ERPs. Brain Research, 1603, 101–113. doi:10.1016/j. brainres.2015.01.032 Gillon, G. T. (ed.) (2018). Phonological awareness: From research to practice (2nd ed.). New York, NY: Guilford. Gilman, S., Carr, D., & Hollenberg, J. (1976). Kinematic effects of deafferentation and cerebellar ablation. Brain: A Journal of Neurology, 99, 311–330. doi:10.1093/brain/ 99.2.311 Ginzburg, J., & Poesio, M. (2016). Grammar is a system that characterizes talk in interaction. Frontiers in Psychology, 7. doi:10.3389/fpsyg.2016.01938 Giraud, A.-L., Kleinschmidt, A., Poeppel, D., et al. (2007). Endogenous cortical rhythms determine cerebral specialization for speech perception and production. Neuron, 56, 1127–1134. doi:10.1016/j.neuron.2007.09.038 Giraud, A.-L., & Poeppel, D. (2012). Cortical oscillations and speech processing: Emerging computational principles and operations. Nature Neuroscience, 15, 511–517. doi:10.1038/nn.3063

264

References

Glenberg, A. M., & Gallese, V. (2012). Action-based language: A theory of language acquisition, comprehension, and production. Cortex, 48, 905–922. doi:10.1016/j. cortex.2011.04.010 Glickstein, M. (1994). Cerebellar agenesis. Brain, 117, 1209–1212. doi:10.1093/brain/ 117.5.1209 Gogate, L. J., Bahrick, L. E., & Watson, J. D. (2000). A study of multimodal motherese: The role of temporal synchrony between verbal labels and gestures. Child Development, 71, 878–894. doi:10.1111/1467-8624.00197 Goldinger, S. D. (1996). Words and voices: Episodic traces in spoken word identification and recognition memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 1166–1183. doi:10.1037/0278-7393.22.5.1166 Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review, 105, 251–279. doi:10.1037/0033-295X.105.2.251 Goldinger, S. D. (2007). A complementary-systems approach to abstract and episodic speech perception. In Proceedings of the 16th International Congress of Phonetic Sciences (pp. 49–54). Saarbrücken, DE: ICPhS. Goldinger, S. D., & Azuma, T. (2003). Puzzle-solving science: The quixotic quest for units in speech perception. Journal of Phonetics, 31, 305–320. doi:10.1016/S00954470(03)00030-5 Goldinger, S. D., Papesh, M. H., Barnhart, A. S., et al. (2016). The poverty of embodied cognition. Psychonomic Bulletin & Review, 23, 959–978. doi:10.3758/s13423-0150860-1 Goldman-Rakic, P. S. (1994). The issue of memory in the study of prefrontal function. In A. M. Thierry, J. Glowinski, P. S. Goldman-Rakic, & Y. Christen (eds.), Motor and cognitive functions of the prefrontal cortex. Research and perspectives in neurosciences (pp. 112–121). Berlin, DE: Springer. Goldstein, L., Nam, H., Saltzman, E., et al. (2009). Coupled oscillator planning model of speech timing and syllable structure. In G. Fant, H. Fujisaki, & S. J. (eds.), Frontiers in phonetics and speech science (pp. 239–250). Beijing, CH: The Commercial Press. Goldstein, L., & Pouplier, M. (2014). The temporal organization of speech. In M. Goldrick, V. S. Ferreira, & M. Miozzo (eds.), The Oxford handbook of language production (pp. 210–229). Oxfod, UK: Oxford University Press. Goldstein, L., Pouplier, M., Chen, L., et al. (2007). Dynamic action units slip in speech production errors. Cognition, 103, 386–412. doi:10.1016/j. cognition.2006.05.010 Golumbic, E. M. Z., Ding, N., Bickel, S., et al. (2013). Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party.” Neuron, 77, 980–991. doi:10.1016/j.neuron.2012.12.037 Goody, J. (1977). The domestication of the savage mind. Cambridge, UK: Cambridge University Press. Goswami, U. (2002). In the beginning was the rhyme? A reflection on Hulme, Hatcher, Nation, Brown, Adams, and Stuart (2002). Journal of Experimental Child Psychology, 82, 47–57. doi:10.1006/jecp.2002.2673 Goswami, U. (2011). A temporal sampling framework for developmental dyslexia. TRENDS in Cognitive Sciences, 15, 3–10. doi:10.1016/j.tics.2010.10.001 Graybiel, A. M. (1997). The basal ganglia and cognitive pattern generators. Schizophrenia Bulletin, 23, 459–469. doi:10.1093/schbul/23.3.459

References

265

Graybiel, A. M. (1998). The basal ganglia and chunking of action repertoires. Neurobiology of Learning and Memory, 70, 119–136. doi:10.1006/nlme.1998 .3843 Graybiel, A. M. (2000). The basal ganglia. Current Biology, 10, R509-R511. doi:10.1016/S0960-9822(00)00593-5 Graybiel, A. M. (2008). Habits, rituals, and the evaluative brain. Annual Review of Neuroscience, 31, 359–387. doi:10.1146/annurev.neuro.29.051605.112851 Graybiel, A. M., & Grafton, S. T. (2015). The striatum: Where skills and habits meet. Cold Spring Harbor perspectives in biology, 7, a021691. doi:10.1101/cshperspect. a021691 Griffin, A. L. (2015). Role of the thalamic nucleus reuniens in mediating interactions between the hippocampus and medial prefrontal cortex during spatial working memory. Frontiers in systems neuroscience, 9. doi:10.3389/ fnsys.2015.00029 Grimaldi, M. (2012). Toward a neural theory of language: Old issues and new perspectives. Journal of Neurolinguistics, 25, 304–327. doi:10.1016/j. jneuroling.2011.12.002 Grimaldi, M. (2017). From brain noise to syntactic structures: A formal proposal within the oscillatory rhythms perspective. bioRxiv, 171702. doi:10.1101/171702 Gross, J., Hoogenboom, N., Thut, G., et al. (2013). Speech rhythms and multiplexed oscillatory sensory coding in the human brain. PLoS Biology, 11, e1001752. doi:10.1371/journal.pbio.1001752 Guell, X., Hoche, F., & Schmahmann, J. D. (2015). Metalinguistic deficits in patients with cerebellar dysfunction: Empirical support for the dysmetria of thought theory. The Cerebellum, 14, 50–58. doi:10.1007/s12311-014-0630-z Guenther, F. H. (1994). A neural network model of speech acquisition and motor equivalent speech production. Biological Cybernetics, 72, 43–53. doi:10.1007/ bf00206237 Guenther, F. H. (1995). Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production. Psychological Review, 102, 594–621. doi:10.1037/0033-295X.102.3.594 Guenther, F. H. (2016). Neural control of speech. Cambridge, MA: MIT Press. Guenther, F. H., Ghosh, S. S., & Tourville, J. A. (2006). Neural modeling and imaging of the cortical interactions underlying syllable production. Brain and Language, 96, 280–301. doi:10.1016/j.bandl.2005.06.001 Guenther, F. H., Hampson, M., & Johnson, D. (1998). A theoretical investigation of reference frames for the planning of speech movements. Psychological Review, 105, 611–633. doi:10.1037/0033-295X.105.4.611-633 Guenther, F. H., & Vladusich, T. (2012). A neural theory of speech acquisition and production. Journal of Neurolinguistics, 25, 408–422. doi:10.1016/j.jneuroling.2009 .08.006 Guise, K. G., & Shapiro, M. L. (2017). Medial prefrontal cortex reduces memory interference by modifying hippocampal encoding. Neuron, 94, 183–192.e188. doi:10.1016/j.neuron.2017.03.011 Gutman, A., Dautriche, I., Crabbé, B., et al. (2015). Bootstrapping the syntactic bootstrapper: Probabilistic labeling of prosodic phrases. Language Acquisition, 22, 285–309. doi:10.1080/10489223.2014.971956

266

References

Hagoort, P., & van Berkum, J. (2007). Beyond the sentence given. Philosophical Transactions of the Royal Society B: Biological Sciences, 362, 801–811. doi:10.1098/rstb.2007.2089 Halle, M., & Vergnaud, J.-R. (1980). Three dimensional phonology. Journal of Linguistic Research, 1, 83–105. Halliday, M. A. K. (1985). Spoken and written language. Oxford, UK: Oxford University Press. Hanhong, L. I., & Fang, A. C. (2011). Word frequency of the CHILDES corpus: Another perspective of child language features. ICAME Journal, 35, 95–116. doi:10.1.1.364.9909 Hannas, W. C. (2003). The writing on the wall: How asian orthography curbs creativity. Philadelphia, PA: University of Philadelphia Press. Hardcastle, W. J., Gibbon, F. E., & Jones, W. (1991). Visual display of tongue-palate contact: Electropalatography in the assessment and remediation of speech disorders. International Journal of Language & Communication Disorders, 26, 41–74. doi:10.3109/13682829109011992 Harris, R. (1980). The language-makers. Ithaca, NY: Cornell University Press. Harris, R. (1986). The origin of writing. London, UK: Duckworth. Harris, R. (1990). On redefining linguistics. In H. G. Davis & T. J. Taylor (eds.), Redefining linguistics (pp. 18–52). London, UK: Routledge. Harris, R. (1998). Introduction to integrational linguistics. Oxford, UK: Pergamon. Harris, R. (2000). Rethinking writing. London, UK: Continuum. Harris, R. (2002). The role of the language myth in the western cultural tradition. In R. Harris (ed.), The Language Myth in Western Culture (pp. 1–24). Richmond, UK: Curzon Press. Harris, Z. S. (1946). From morpheme to utterance. Language, 22, 161–183. Harris, Z. S. (1951). Methods in structural linguistics. Chicago IL: Chicago University Press. Hashikawa, T. (1983). The inferior colliculopontine neurons of the cat in relation to other collicular descending neurons. Journal of Comparative Neurology, 219, 241–249. doi:10.1002/cne.902190209 Haspelmath, M. (2011). The indeterminacy of word segmentation and the nature of morphology and syntax. Folia Linguistica, 45, 31–80. doi:10.1515/flin.2011.002 Hasselmo, M. E. (2005). What is the function of hippocampal theta rhythm? – Linking behavioral data to phasic properties of field potential and unit recording data. Hippocampus, 15, 936–949. doi:10.1002/hipo.20116 Hauk, O. (2015). Representing mental representations – Neuroscientific and computational approaches to study language processing in the brain. Language, Cognition and Neuroscience, 30, 355–356. doi:10.1080/23273798.2014.995680 Hauk, O., Johnsrude, I., & Pulvermüller, F. (2004). Somatotopic representation of action words in human motor and premotor cortex. Neuron, 41, 301–307. doi:10.1016/ S0896-6273(03)00838-9 Hauk, O., & Tschentscher, N. (2013). The body of evidence: What can neuroscience tell us about embodied semantics? Frontiers in Psychology. doi:10.3389/fpsyg.2013.00050 Hauser, M. D. (2016). Challenges to the what, when, and why? Biolinguistics, 10, 1–5. Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: What is it, who has it, and how did it evolve? Science, 298, 1569–1579. doi:10.1126/ science.298.5598.1569

References

267

Hauser, M. D., Newport, E. L., & Aslin, R. N. (2001). Segmentation of the speech stream in a non-human primate: Statistical learning in cotton-top tamarins. Cognition, 78, B53-B64. doi:10.1016/S0010-0277(00)00132-3 Hauser, M. D., Yang, C., Berwick, R. C., et al. (2014). The mystery of language evolution. Frontiers in Psychology, 5. doi:10.3389/fpsyg.2014.00401 Hayes, B. (1980). A metrical theory of stress rules. Phd dissertation MIT. Bloomington, IN: Indiana University Linguistics Club. Hayes, B. (1984). The phonology of rhythm in English. Linguistic Inquiry, 15, 33–74. Hembrook, J. R., & Mair, R. G. (2011). Lesions of reuniens and rhomboid thalamic nuclei impair radial maze win-shift performance. Hippocampus, 21, 815–826. doi:10.1002/hipo.20797 Henderson, A., Goldman-Eisler, F., & Skarbek, A. (1965). Temporal patterns of cognitive activity and breath control in speech. Language and Speech, 8, 236–242. doi:10.1177/002383096500800405 Henry, M. J., Herrmann, B., & Obleser, J. (2014). Entrained neural oscillations in multiple frequency bands comodulate behavior. Proceedings of the National Academy of Sciences, 111, 14935–14940. doi:10.1073/pnas.1408741111 Hepper, P. G., & Shahidullah, B. S. (1994). Development of fetal hearing. Archives of Disease in Childhood: Fetal and Neonatal Edition, 71, F81–F87. doi:10.1136/fn.71.2.F81 Hewes, G. W. (1996). A history of the study of language origins and the gestural primacy hypothesis. In A. Lock & C. Peters (eds.), Handbook of human symbolic evolution (pp. 571–595). Oxford, UK: Oxford University Press. Heyes, C. M. (1993). Imitation, culture and cognition. Animal Behaviour, 46, 999–1010. doi:10.1006/anbe.1993.1281 Hickey, T. (1991). Mean length of utterance and the acquisition of Irish. Journal of Child Language, 18, 553–569. doi:10.1017/s0305000900011247 Hickok, G. (2014). The architecture of speech production and the role of the phoneme in speech processing. Language, Cognition and Neuroscience, 29, 2–20. doi:10.1080/ 01690965.2013.834370 Hickok, G., Houde, J. F., & Rong, F. (2011). Sensorimotor integration in speech processing: Computational basis and neural organization. Neuron, 69, 407–422. doi:10.1016/j.neuron.2011.01.019 Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews Neuroscience, 8, 393–402. doi:10.1038/nrn2113 Hill, A. V. (1925). Length of muscle, and the heat and tension developed in an isometric contraction. Journal of Physiology, 60, 237–263. doi:10.1113/jphysiol.1925. sp002242 Hillock-Dunn, A., Grantham, D. W., & Wallace, M. T. (2016). The temporal binding window for audiovisual speech: Children are like little adults. Neuropsychologia, 88, 74–82. doi:10.1016/j.neuropsychologia.2016.02.017 Himelstein, L. (2011, July 5). Unlocking dyslexia in Japanese. Wall Street Journal. www.wsj.com/articles/SB10001424052702303763404576416273856397078 Hinterleitner, F. (2017). Speech synthesis. In S. Möller, A. Küpper, & A. Raake (eds.), Quality of synthetic apeech: Perceptual dimensions, influencing factors, and instrumental assessment (pp. 5–18). Singapore, CN: Springer. Hintzman, D. L. (1986). “Schema abstraction” in a multiple-trace memory model. Psychological Review, 93, 411–428. doi:10.1037/0033-295X.93.4.411

268

References

Hirano, M., Kurita, S., & Nakashima, T. (1983). Growth, development, and aging of human vocal folds. In D. M. Bless & J. H. Abbs (eds.), Vocal fold physiology: Contemporary research and clinical issues (pp. 22–43). San Diego, CA: College-Hill Press. Hirose, H., Sawashima, M., & Yoshioka, H. (1980). Laryngeal control for initiation of utterances: A simultaneous observation of glottal configuration and laryngeal EMG. Annual Bulletin Research Institute of Logopedics and Phoniatrics, 14, 113–123. Hirose, H., Sawashima, M., & Yoshioka, H. (1983). Laryngeal adjustment for initiation of utterances: A simultaneous EMG and fiberscopic study. In D. M. Bless & J. H. Abbs (eds.), Vocal fold physiology: Contemporary research and clinical issues (pp. 253–263). San Diego CA: College-Hill Press. Hixon, T. J., Goldman, M. D., & Mead, J. (1973). Kinematics of the chest wall during speech production: Volume displacements of the rib cage, abdomen, and lung. Journal of Speech and Hearing Research, 16, 78–115. doi:10.1044/jshr.1601.78 Hobaiter, C. L., & Byrne, R. W. (2012). Gesture use in consortship: Wild chimpanzees’ use of gesture for an “evolutionarily urgent” purpose. In S. Pika & K. Liebal (eds.), Developments in primate gesture research (pp. 129–146). Amsterdam, NL: John Benjamins. Hockett, C. F. (1944). Review of Nida 1944. Language, 20, 252–255. Hockett, C. F. (1958). A Course in modern linguistics. New York, NY: Macmillan. Hockett, C. F. (1960). The origin of speech. Scientific American, 203, 88–111. Hoit, J. D., & Hixon, T. J. (1987). Age and speech breathing. Journal of Speech and Hearing Research, 30, 351–366. doi:10.1044/jshr.3003.351 Hoit, J. D., Hixon, T. J., Watson, P. J., et al. (1990). Speech breathing in children and adolescents. Journal of Speech, Language, and Hearing Research, 33, 51–69. doi:10.1044/jshr.3301.51 Hoosain, R. (1991). Psycholinguistic implications for linguistic relativity: A case study of Chinese. Hillsdale, NJ: Lawrence Earlbaum. Hoosain, R. (1992). Psychological reality of the word in Chinese. In C. Hsuan-Chih & J. L. T. Ovid (eds.), Advances in Psychology (Vol. 90, pp. 111–130). Oxford, UK: North-Holland. Hopper, P. J. (1998). Emergent grammar. In M. Tomasello (ed.), The new psychology of language: Cognitive and functional approaches (pp. 155–175). Mahwah, NJ: Lawrence Erlbaum. Houde, J., & Nagarajan, S. (2011). Speech production as state feedback control. Frontiers in Human Neuroscience, 5. doi:10.3389/fnhum.2011.00082 Houde, J. F., & Jordan, M. I. (1998). Sensorimotor adaptation in speech production. Science, 279, 1213–1216. doi:10.1126/science.279.5354.1213 Houk, J. C., Bastianen, C., Fansler, D., et al. (2007). Action selection and refinement in subcortical loops through basal ganglia and cerebellum. Philosophical Transactions of the Royal Society B: Biological Sciences, 362, 1573–1583. doi:10.1098/ rstb.2007.2063 Howard, M. F., & Poeppel, D. (2010). Discrimination of speech stimuli based on neuronal response phase patterns depends on acoustics but not comprehension. Journal of Neurophysiology, 104, 2500–2511. doi:10.1152/jn.00251.2010 Huettig, F., & Mani, N. (2016). Is prediction necessary to understand language? Probably not. Language, Cognition and Neuroscience, 31, 19–31. doi:10.1080/ 23273798.2015.1072223

References

269

Huth, A. G., de Heer, W. A., Griffiths, T. L., et al. (2016). Natural speech reveals the semantic maps that tile human cerebral cortex. Nature, 532. doi:10.1038/ nature17637 Huth, A. G., Nishimoto, S., Vu, A. T., et al. (2012). A continuous semantic space describes the representation of thousands of object and action categories across the human brain. Neuron, 76, 1210–1224. doi:10.1016/j.neuron.2012.10.014 Huysmans, E., de Jong, J., Festen, J. M., et al. (2017). Morphosyntactic correctness of written language production in adults with moderate to severe congenital hearing loss. Journal of Communication Disorders, 68, 35–49. doi:10.1016/j. jcomdis.2017.06.005 Imai, M., & Kita, S. (2014). The sound symbolism bootstrapping hypothesis for language acquisition and language evolution. Philosophical Transactions of the Royal Society B: Biological Sciences, 369, 1–13. doi:10.1098/rstb.2013.0298 Ingram, J. C. L. (2007). Neurolinguistics. An introduction to spoken language processing and its disorders. Cambridge, UK: Cambridge University Press. Ishikawa, T., Shimuta, M., & Häusser, M. (2015). Multimodal sensory integration in single cerebellar granule cells in vivo. eLife, 4, e12916. doi:10.7554/eLife.12916 Ito, M. (2008). Control of mental activities by internal models in the cerebellum. Nature Reviews Neuroscience, 9, 304–313. doi:10.1038/nrn2332 Ito, T., Tiede, M., & Ostry, D. J. (2009). Somatosensory function in speech perception. Proceedings of the National Academy of Sciences, 106, 1245–1248. doi:10.1073/ pnas.0810063106 Iverson, J. M., Hall, A. J., Nickel, L., et al. (2007). The relationship between reduplicated babble onset and laterality biases in infant rhythmic arm movements. Brain and Language, 101, 198–207. doi:10.1016/j.bandl.2006.11.004 Ivry, R. (1993). Cerebellar involvement in the explicit representation of temporal information. In P. Tallal, A. M. Galaburda, R. R. Llinás, & C. Von Euler (eds.), Temporal information processing in the nervous system. Special reference to dyslexia and dysphasia (Vol. 682, pp. 214–230). New York, NY: New York Academy of Sciences. Ivry, R., & Diener, H. C. (1991). Impaired velocity perception in patients with lesions of the cerebellum. Journal of Cognitive Neuroscience, 3, 355–366. doi:10.1162/ jocn.1991.3.4.355 %M 23967816 Ivry, R. B., Spencer, R. M., Zelaznik, H. N., et al. (2002). The cerebellum and event timing. Annals of the New York Academy of Sciences, 978, 302–317. doi:10.1111/ j.1749-6632.2002.tb07576.x Iwatsubo, T., Kuzuhara, S., Kanemitsu, A., et al. (1990). Corticofugal projections to the motor nuclei of the brainstem and spinal cord in humans. Neurology, 40, 309–309. doi:10.1212/WNL.40.2.309 Jackendoff, R. (2002). Foundations of language: Brain, meaning, grammar, evolution. New York, NY: Oxford University Press. Jackendoff, R. (2007a). Linguistics in cognitive science: The state of the art. The Linguistic Review, 24, 347–401. doi:10.1515/TLR.2007.014 Jackendoff, R. (2007b). A parallel architecture perspective on language processing. Brain Research, 1146, 2–22. doi:10.1016/j.brainres.2006.08.111 Jackendoff, R. (2009). Language, consciousness, culture: Essays on mental structure. Cambridge, MA: MIT Press.

270

References

Jackendoff, R. (2017). In defense of theory. Cognitive Science, 41, 185–212. doi:10.1111/cogs.12324 Jahanshahi, M., Obeso, I., Rothwell, J. C., et al. (2015). A fronto–striato–subthalamic– pallidal network for goal-directed and habitual inhibition. Nature Reviews Neuroscience, 16, 719–732. doi:10.1038/nrn4038 Jalilevand, N., & Ebrahimipour, M. (2014). Three measures often used in language samples analysis. Journal of Child Language Acquisition and Development, 2, 1–12. Jeng, F.-C. (2017). Infant and childhood development: Intersections between development and language experience. In N. Kraus, S. R. Anderson, T. White-Schwoch, R. R. Fay, & R. N. Popper (eds.), The frequency-following response: A window into human communication (pp. 17–43). Cham, CH: Springer & ASA Press. Jeng, F.-C., Chung, H.-K., Lin, C.-D., et al. (2011). Exponential modeling of human frequency-following responses to voice pitch. International journal of audiology, 50, 582–593. doi:10.3109/14992027.2011.582164 Jensen, O., Kaiser, J., & Lachaux, J.-P. (2007). Human gamma-frequency oscillations associated with attention and memory. Trends in Neurosciences, 30, 317–324. doi:10.1016/j.tins.2007.05.001 Jensen, O., & Lisman, J. E. (1996). Novel lists of 7 +/– 2 known items can be reliably stored in an oscillatory short-term memory network: Interaction with long-term memory. Learning and Memory, 3, 257–263. doi:10.1101/lm.3.2-3.257 Jensen, O., & Lisman, J. E. (1998). An oscillatory short-term memory buffer model can account for data on the Sternberg task. The Journal of Neuroscience, 18, 10688–10699. doi:10.1523/JNEUROSCI.18-24-10688.1998 Jensen, O., & Tesche, C. D. (2002). Frontal theta activity in humans increases with memory load in a working memory task. European Journal of Neuroscience, 15, 1395–1399. doi:10.1046/j.1460-9568.2002.01975.x Jin, X., Tecuapetla, F., & Costa, R. M. (2014). Basal ganglia subcircuits distinctively encode the parsing and concatenation of action sequences. Nature Neuroscience, 17. doi:10.1038/nn.3632 Johansson, B. B. (2006). Cultural and linguistic influence on brain organization for language and possible consequences for dyslexia: A review. Annals of Dyslexia, 56, 13–50. doi:10.1007/s11881-006-0002-6 Johnson, E. K., & Jusczyk, P. W. (2001). Word segmentation by 8-month-olds: When speech cues count more than statistics. Journal of Memory and Language, 44, 548–567. doi:10.1006/jmla.2000.2755 Jones, D. (1929). Definition of a phoneme. Le Maître Phonétique, 3, 43–44. Jones, E. G. (2001). The thalamic matrix and thalamocortical synchrony. Trends in Neurosciences, 24, 595–601. doi:10.1016/S0166-2236(00)01922-6 Jones, E. G. (2007). The thalamus. Cambridge, UK: Cambridge University Press. Jürgens, U. (1976). Projections from the cortical larynx area in the squirrel monkey. Experimental Brain Research, 25, 401–411. doi:10.1007/bf00241730 Jürgens, U. (1992). On the neurobiology of vocal communication. In H. Papousek, U. Jürgens, & M. Papousek (eds.), Nonverbal vocal communication: Comparative and developmental approaches (pp. 31–42). Cambridge, UK: Cambridge University Press. Jürgens, U. (2002). Neural pathways underlying vocal control. Neuroscience and Biobehavioral Reviews, 26, 235–258. doi:10.1016/S0149-7634(01)00068-9

References

271

Jusczyk, P. W. (1986). Toward a model of the development of speech perception. In J. S. Perkell & D. H. Klatt (eds.), Invariance and variability in speech process (pp. 1–35). Hillsdale, NJ: Lawrence Erlbaum. Jusczyk, P. W. (1993). How word recognition may evolve from infant speech perception capacities. In G. Altmann & R. Shillcock (eds.), Cognitive models of speech processing (pp. 27–55). Hove, UK: Lawrence Erlbaum. Jusczyk, P. W. (1996). Developmental speech perception. In N. J. Lass (ed.), Principles of experimental phonetics (pp. 328–357). St. Louis, MO: Mosby. Jusczyk, P. W. (2001). Bootstrapping from the signal: Some further directions. In J. Weissenborn & B. Höhle (eds.), Approaches to bootstrapping: Phonological, lexical, syntactic and neurophysiological aspects of early language acquisition (Vol. 1, pp. 3–23). Philadelphia, PA: John Benjamins. Jusczyk, P. W., & Aslin, R. N. (1995). Infants0 detection of the sound patterns of words in fluent speech. Cognitive Psychology, 29, 1–23. doi:10.1006/cogp.1995.1010 Jusczyk, P. W., Houston, D. M., & Newsome, M. (1999). The beginnings of word segmentation in English-learning infants. Cognitive Psychology, 39, 159–207. doi:10.1006/cogp.1999.0716 Kandhadai, P., & Sproat, R. (2010). Impact of spatial ordering of graphemes in alphasyllabic scripts on phonemic awareness in Indic languages. Writing Systems Research, 2, 105–116. doi:10.1093/wsr/wsq009 Karihwénhawe Lazore, D. (1993). The Mohawk language standardization project. https://kanienkeha.net/the-mohawk-language-standardisation-project/ Katz, J. J. (1980). Language and other abstract objects. Totowan, NJ: Rowman & Littlefield. Katz, J. J. (1996). The unfinished Chomskyan revolution. Mind & Language, 11, 270–294. doi:10.1111/j.1468-0017.1996.tb00047.x Katz, J. J., & Postal, P. M. (1991). Realism vs conceptualism in linguistics. Linguistics and Philosophy, 14, 515–554 Kazanina, N., Bowers, J. S., & Idsardi, W. J. (2018). Phonemes: Lexical access and beyond. Psychonomic Bulletin & Review, 25, 560–585. doi:10.3758/s13423-0171362-0 Kelso, J. A. S. (1997). Relative timing in brain and behavior: Some observations about the generalized motor program and self-organized coordination dynamics. Human Movement Science, 16, 453–460. doi:10.1016/S01679457(96)00044-9 Kelso, J. A. S., Saltzman, E. L., & Tuller, B. (1986). The dynamical perspective on speech production: Data and theory. Journal of Phonetics, 14, 29–59. doi:10.1016/ S0095-4470(19)30608-4 Kemmerer, D. (2015a). Are the motor features of verb meanings represented in the precentral motor cortices? Yes, but within the context of a flexible, multilevel architecture for conceptual knowledge. Psychonomic Bulletin & Review, 22, 1068–1075. doi:10.3758/s13423-016-1031-8 Kemmerer, D. (2015b). Cognitive neuroscience of language. New York, NY: Psychology Press. Kent, R. D. (1984). Psychobiology of speech development: Coemergence of language and a movement system. American Journal of Physiology, 246, R888–R894. doi:10.1152/ajpregu.1984.246.6.R888

272

References

Kent, R. D. (1994). Reference manual for communicative sciences and disorders: Speech and language. Austin, TX: Pro-ed Kent, R. D. (2004). The uniqueness of speech among motor systems. Clinical Linguistics & Phonetics, 18, 495–505. doi:10.1080/02699200410001703600 Kent, R. D., & Forner, L. L. (1980). Speech segment durations in sentence recitations by children and adults. Journal of Phonetics, 8, 157–168. doi:10.1016/S0095-4470(19) 31460-3 Kent, R. D., & Minifie, F. D. (1977). Coarticulation in recent speech production models. Journal of Phonetics, 5, 115–133. doi:10.1016/S0095-4470(19)31123-4 Kent, R. D., & Moll, K. L. (1972). Tongue body articulation during vowel and diphthong gestures. Folia Phoniatrica, 24, 286–300. doi:10.1159/000263574 Kent, R. D., & Vorperian, H. K. (1995). Development of the craniofacial–oral– laryngeal anatomy: A review. Journal of Medical Speech-Language Pathology, 3, 145–190. Kilian-Hütten, N., Formisano, E., & Vroomen, J. (2017). Multisensory integration in speech processing: Neural mechanisms of cross-modal aftereffects. In M. Mody (ed.), Neural mechanisms of language (pp. 105–126). New York, NY: Springer. Kindell, J., Keady, J., Sage, K., et al. (2017). Everyday conversation in dementia: A review of the literature to inform research and practice. International Journal of Language & Communication Disorders, 52, 392–406. Kita, S. (1997). Two-dimensional semantic analysis of Japanese mimetics. Linguistics, 35, 379–415. doi:10.1515/ling.1997.35.2.379 Klee, T., Schaffer, M., May, S., et al. (1989). A comparison of the age–MLU relation in normal and specifically language-impaired preschool children. Journal of Speech and Hearing Disorders, 54, 226–233. doi:10.1044/jshd.5402.226 Kleiman, G. M., Winogard, P. N., & Humphrey, M. M. (1979). Prosody and children’s parsing of sentences. Urbana-Champaign, IL: University of Illinois. Knuijt, S., Kalf, J., Van Engelen, B., et al. (2017). Reference values of maximum performance tests of speech production. International Journal of Speech-Language Pathology, 1–9. doi:10.1080/17549507.2017.1380227 Kocsis, B., Pittman-Polletta, B. R., & Roy, A. C. (2018). Respiration-coupled rhythms in prefrontal cortex: Beyond if, to when, how, and why. Brain Structure and Function, 223, 11–16. doi:10.1007/s00429-017-1587-8 Kok, P., & de Lange, F. P. (2015). Predictive coding in sensory cortex. In B. U. Forstmann & E.-J. Wagenmakers (eds.), An introduction to model-based cognitive neuroscience (pp. 221–244). New York, NY: Springer. Kosslyn, S. M. (1980). Image and mind. Cambridge, UK: Cambridge University Press. Kotz, S. A., & Gunter, T. C. (2015). Can rhythmic auditory cuing remediate languagerelated deficits in Parkinson’s disease? Annals of the New York Academy of Sciences, 1337, 62–68. doi:10.1111/nyas.12657 Kotz, S. A., & Schwartze, M. (2010). Cortical speech processing unplugged: A timely subcortico-cortical framework. TRENDS in Cognitive Sciences, 14, 392–399. doi:10.1016/j.tics.2010.06.005 Kousta, S.-T., Vigliocco, G., Vinson, D. P., et al. (2011). The representation of abstract words: Why emotion matters. Journal of Experimental Psychology: General, 140, 14–34. doi:10.1037/a0021446

References

273

Kovelman, I., Mascho, K., Millott, L., et al. (2012). At the rhythm of language: Brain bases of language-related frequency perception in children. NeuroImage, 60, 673–682. doi:10.1016/j.neuroimage.2011.12.066 Kozhevnikov, V. A., & Chistovich, L. A. (1966). Speech: Articulation and perception (2nd ed.). Washington, DC: Clearinghouse for Federal Scientific and Technical Information. Koziol, L. F., Budding, D., Andreasen, N., et al. (2014). Consensus paper: The cerebellum’s role in movement and cognition. Cerebellum, 13, 151–177. doi:10.1007/ s12311-013-0511-x Koziol, L. F., Budding, D., & Chidekel, D. (2012). From movement to thought: Executive function, embodied cognition, and the cerebellum. Cerebellum, 11, 505–525. doi:10.1007/s12311-011-0321-y Krack, P., Dostrovsky, J., Ilinsky, I., et al. (2002). Surgery of the motor thalamus: Problems with the present nomenclatures. Movement Disorders, 17, S2–S8. doi:10.1002/mds.10136 Kramsky, J. (1969). The word as a linguistic unit. The Hague, NL: Mouton. Kraus, N., Anderson, S. R., & White-Schwoch, T. (2017). The frequency-following response: A window into human communication. In N. Kraus, S. R. Anderson, T. WhiteSchwoch, R. R. Fay, & R. N. Popper (eds.), The frequency-following response: A window into human communication (pp. 1–15). Cham, CH: Springer & ASA Press. Kraus, N., & White-Schwoch, T. (2015). Unraveling the biology of auditory learning: A cognitive–sensorimotor–reward framework. TRENDS in Cognitive Sciences, 19, 642–654. doi:10.1016/j.tics.2015.08.017 Kress, G. (1994). Learning to write (2nd ed.). London, UK: Routledge. Kuljic-Obradovic, D. C. (2003). Subcortical aphasia: Three different language disorder syndromes? European Journal of Neurology, 10, 445–448. doi:10.1046/j.14681331.2003.00604.x Kumar, V., Croxson, P. L., & Simonyan, K. (2016). Structural organization of the laryngeal motor cortical network and its implication for evolution of speech production. Journal of Neuroscience, 36, 4170–4181. doi:10.1523/ JNEUROSCI.3914-15.2016 Kurowski, K., & Blumstein, S. E. (2016). Phonetic basis of phonemic paraphasias in aphasia: Evidence for cascading activation. Cortex, 75, 193–203. doi:10.1016/j. cortex.2015.12.005 Kurvers, J. (2015). Emerging literacy in adult second-language learners: A synthesis of research findings in the Netherlands. Writing Systems Research, 7, 58–78. doi:10.1080/17586801.2014.943149 Kuypers, H. G. (1958). Corticobular connexions to the pons and lower brain-stem in man: An anatomical study. Brain, 81, 364–388. doi:10.1093/brain/81.3.364 Kwantes, P. J. (2005). Using context to build semantics. Psychonomic Bulletin & Review, 12, 703–710. doi:10.3758/BF03196761 Ladefoged, P. (1962). Subglottal activity during speech. In Proceedings of the Fourth International Congress of Phonetic Sciences (pp. 73–91). The Hague, NL: Mouton. Ladefoged, P. (1967). Three areas of experimental phonetics. London, UK: Oxford University Press. Ladefoged, P. (1971). Preliminaries to linguistic phonetics. Chicago, IL: University of Chicago Press.

274

References

Ladefoged, P. (1973). The features of the larynx. Journal of Phonetics, 1, 73–83. doi:10.1016/S0095-4470(19)31376-2 Ladefoged, P. (1984). The limits of biological explanations in phonetics. UCLA Working Papers in Phonetics, 59, 1–10. Lafleur, A., & Boucher, V. J. (2015). The ecology of self-monitoring effects on memory of verbal productions: Does speaking to someone make a difference? Consciousness and Cognition, 36, 139–146. doi:10.1016/j.concog.2015.06.015 Lakatos, P., Barczak, A., Neymotin, S. A., et al. (2016). Global dynamics of selective attention and its lapses in primary auditory cortex. Nature Neuroscience, 19, 1707–1717. doi:10.1038/nn.4386 Lakatos, P., Chen, C.-M., O’Connell, M. N., et al. (2007). Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron, 53, 279–292. doi:10.1016/j.neuron.2006.12.011 Lakatos, P., Karmos, G., Mehta, A. D., et al. (2008). Entrainment of neuronal sscillations as a mechanism of attentional selection. Science, 320, 110–113. doi:10.1126/ science.1154735 Lakatos, P., O’Connell, M. N., Barczak, A., et al. (2009). The leading sense: Supramodal control of neurophysiological context by attention. Neuron, 64, 419–430. doi:10.1016/j.neuron.2009.10.014 Lakatos, P., Shah, A. S., Knuth, K. H., et al. (2005). An oscillatory hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex. Journal of Neurophysiology, 94, 1904–1911. doi:10.1152/jn.00263.2005 Lamarre, Y., Bioulac, B., & Jacks, B. (1978). Activity of pre-central neurones in conscious monkeys: Effects of deafferentation and cerebellar ablation. Journal de Physiologie, 74, 253–264. Lametti, D. R., Krol, S. A., Shiller, D. M., et al. (2014). Brief periods of auditory perceptual training can determine the sensory targets of speech motor learning. Psychological Science, 25, 1325–1336. doi:10.1177/ 0956797614529978 Lametti, D. R., Smith, H. J., Freidin, P. F., et al. (2018). Cortico–cerebellar networks drive sensorimotor learning in speech. Journal of Cognitive Neuroscience, 30, 540–551. doi:10.1162/jocn_a_01216 Langacker, R. W. (1991). Foundations of cognitive grammar: Descriptive application (Vol. II). Stanford, CA: Stanford University Press. Langacker, R. W. (1995). Conceptual grouping and constituency in cognitive grammar. Linguistics in the Morning Calm, 3, 149–172. Lashley, K. S. (1948/1951). The problem of serial order in behavior. In L. A. Jeffress (ed.), Cerebral mechanisms in behavior (pp. 112–147). New York, NY: Wiley. Latash, M. L. (2010). Motor synergies and the equilibrium-point hypothesis. Motor control, 14, 294–322. Leavens, D., Racine, T. P., & Hopkins, W. D. (2009). The ontogeny and phylogeny of non-verbal deixis. In R. P. Botha & C. Knight (eds.), The prehistory of language (pp. 142–165). New York, NY: Oxford University Press. Lee, N., & Schumann, J. H. (2005). The interactional instinct: The evolution and acquisition of language. Paper presented at the Congress of the International Association for Applied Linguistics, Madison, WI. doi:10.1.1.510.5379&

References

275

Lega, B. C., Jacobs, J., & Kahana, M. J. (2012). Human hippocampal theta oscillations and the formation of episodic memories. Hippocampus, 22, 748–761. doi:10.1002/ hipo.20937 Legate, J. A., & Yang, C. D. (2002). Empirical re-assessment of stimulus poverty arguments. The Linguistic Review, 19, 151–162. doi:10.1515/tlir.19.1-2.151 Lehmann, A., & Schönwiesner, M. (2014). Selective attention modulates human auditory brainstem responses: Relative contributions of frequency and spatial cues. PLoS One, 9, e85442. doi:10.1371/journal.pone.0085442 Lemon, R. N. (2008). Descending pathways in motor control. Annual Review of Neuroscience, 31, 195–218. doi:10.1146/annurev.neuro.31.060407.125547 Lemon, R. N., & Edgley, S. A. (2010). Life without a cerebellum. Brain, 133, 652–654. doi:10.1093/brain/awq030 Lenneberg, E. H. (1967). Biological foundations of language. New York, NY: Wiley. Lesage, E., Morgan, B. E., Olson, A. C., et al. (2012). Cerebellar rTMS disrupts predictive language processing. Current Biology, 22, R794-R795. doi:10.1016/j .cub.2012.07.006 Leshinskaya, A., & Caramazza, A. (2016). For a cognitive neuroscience of concepts: Moving beyond the grounding issue. Psychonomic Bulletin & Review, 23, 991–1001. doi:10.3758/s13423-015-0870-z Leutgeb, J. K., Leutgeb, S., Treves, A., et al. (2005). Progressive transformation of hippocampal neuronal representations in “morphed” environments. Neuron, 48, 345–358. doi:10.1016/j.neuron.2005.09.007 Leutgeb, S., Leutgeb, J. K., Barnes, C. A., et al. (2005). Independent codes for spatial and episodic memory in hippocampal neuronal ensembles. Science, 309, 619–623. doi:10.1126/science.1114037 Levelt, W. J. M. (1989). Speaking: From intention to articulation. Cambridge, MA: MIT Press. Levelt, W. J. M. (1993). Timing in speech production with special reference to word form encoding. In P. Tallal, A. M. Galaburda, R. R. Llinás, & C. Von Euler (eds.), Temporal information processing in the nervous system. Special reference to dyslexia and dysphasia (Vol. 682, pp. 283–295). New York, NY: New York Academy of Sciences. Levelt, W. J. M. (2000). Producing spoken language: A blueprint of the speaker. In C. M. Brown & P. Hagoort (eds.), The neurocognition of language (pp. 83–122). Oxford, UK: Oxford University Press. Levelt, W. J. M. (2001). Spoken word production: A theory of lexical access. Proceedings of the National Academy of Sciences, 98, 13464–13471. doi:10.1073/ pnas.231459498 Levelt, W. J. M., Roelofs, A., & Meyer, A. S. (1999). A theory of lexical access in speech production. Behavioral and Brain Sciences, 22, 1–75. doi:10.1017/ s0140525x99001776. Levelt, W. J. M., & Wheeldon, L. (1994). Do speakers have access to a mental syllabary? Cognition, 50, 239–269. doi:10.1016/0010-0277(94)90030-2 Levisen, C. (2018). Biases we live by: Anglocentrism in linguistics and cognitive sciences. Language Sciences. doi:10.1016/j.langsci.2018.05.010 Liberman, A. M. (1993). Some assumptions about speech and how they changed. Haskins Laboratories Report on Speech Research, SR-113, 1–32.

276

References

Liberman, A. M., Cooper, F. S., Shankweiler, D. P., et al. (1967). Perception of the speech code. Psychological Review, 74, 431–461. doi:10.1037/h0020279 Liberman, A. M., & Mattingly, I. G. (1985). The motor theory of speech perception revised. Cognition, 21, 1–36. doi:10.1016/0010-0277(85)90021-6 Liberman, I. Y., Shankweiler, D., Fischer, F. W., et al. (1974). Explicit syllable and phoneme segmentation in the young child. Journal of Experimental Child Psychology, 18, 201–212. doi:10.1016/0022-0965(74)90101-5 Lieberman, P. (1968). Primate vocalizations and human linguistic ability. The Journal of the Acoustical Society of America, 44, 1574–1584. doi:10.1121/1.1911299 Lieberman, P. (1984). The biology and evolution of language. Cambridge, MA: Harvard University Press. Lieberman, P. (2000). Human language and our reptilian brain. Cambridge, MA: Harvard University Press. Lieberman, P. (2003). Language evolution and innateness. In M. T. Banich & M. Mack (eds.), Mind, brain, and language: Multidisciplinary perspectives (pp. 3–21). Mahwah, NJ: Lawrence Erlbaum. Lieberman, P. (2006a). Limits on tongue deformation: Diana monkey vocalizations and the impossible vocal tract shapes proposed by Riede et al. (2005). Journal of Human Evolution, 50, 219–221. doi:10.1016/j.jhevol.2005.07.010 Lieberman, P. (2006b). Towards an evolutionary biology of language. Cambridge, MA: Harvard University Press. Lieberman, P. (2007). The evolution of human speech. Its anatomical and neural bases. Current Anthropology, 48, 39–66. doi:10.1086/509092 Lieberman, P. (2016). The evolution of language and thought. Journal of Anthropological Science, 94, 127–146. doi:10.4436/JASS.94029 Lieberman, P., Crelin, E. S., & Klatt, D. H. (1972). Phonetic ability and related anatomy of the newborn and adult human, Neanderthal man, and the chimpanzee. American Anthropologist, 74, 287–307. doi:10.1525/aa.1972.74.3.02a00020 Lieberman, P., Laitman, J. T., Reidenberg, J. S., et al. (1992). The anatomy, physiology, acoustics and perception of speech: Essential elements in analysis of the evolution of human speech. Journal of Human Evolution, 23, 447–467. doi:10.1016/00472484(92)90046-C Lieven, E., Behrens, H., Speares, J., et al. (2003). Early syntactic creativity: A usage-based approach. Journal of Child Language, 30, 333–370. doi:10.1111/ j.1467-7687.2007.00629.x Lieven, E., Pine, J. M., & Baldwin, G. (1997). Lexically-based learning and early grammatical development. Journal of Child Language, 24, 187–220. doi:10.1017/ S0305000996002930 Lightfoot, D. (1989). The child’s trigger experience: Degree-0 learnability. Behavioral and Brain Sciences, 12, 321–334. doi:10.1017/S0140525X00048883 Lin, J.-J., Rugg, M. D., Das, S., et al. (2017). Theta band power increases in the posterior hippocampus predict successful episodic memory encoding in humans. Hippocampus, 27, 1040–1053. doi:10.1002/hipo.22751 Lin, T. J., Anderson, R. C., Ku, Y. M., et al. (2011). Chinese children’s concept of word. Writing Systems Research, 3, 41–57. doi:10.1093/wsr/wsr007 Lindblom, B. (1986). Phonetic universals in vowel systems. In J. J. Ohala & J. J. Jaeger (eds.), Experimental phonology (pp. 13–44). Orlando, FL: Academic Press.

References

277

Lindblom, B. (1999). Emergent phonology. Annual Meeting of the Berkeley Linguistics Society, 25, 195–209. Lindblom, B. (2000). Developmental origins of adult phonology: The interplay between phonetic emergents and evolutionary adaptations of sound patterns. Phonetica, 57, 297–314. doi:10.1159/000028482 Lindblom, B., Lyberg, B., & Holmgren, K. (1981). Durational patterns of Swedish phonology: Do they reflect short-term memory processes? Bloomingtion, IN: Indiana University Linguistics Club. Lindblom, B., MacNeilage, P. F., & Studdert-Kennedy, M. (1984). Self-organizing processes and the explanation of phonological universals. In B. Butterworth, B. Comrie, & Ö. Dahl (eds.), Explanations for language universals (pp. 181–203). Berlin, DE: Mouton. Linell, P. (2005). The written language bias in linguistics: Its nature, origins and transformations. New York, NY: Routledge. Lisman, J. (2010). Working memory: The importance of theta and gamma oscillations. Current Biology, 20, R490–R492. doi:10.1016/j.cub.2010.04.011 Livingstone, M., Srihasam, K., & Morocz, I. (2010). The benefit of symbols: Monkeys show linear, human-like, accuracy when using symbols to represent scalar value. Animal Cognition, 13, 711–719. doi:10.1007/s10071-010-0321-1 Livingstone, M. S., Pettine, W. W., Srihasam, K., et al. (2014). Symbol addition by monkeys provides evidence for normalized quantity coding. Proceedings of the National Academy of Sciences, 111, 6822–6827. doi:10.1073/pnas.1404208111 Locke, J. L. (1995). The child’s path to spoken language. Cambridge, MA: Harvard University Press. Lotto, A. J., & Holt, L. L. (2000). The illusion of the phoneme. In S. J. Billings (ed.), Panels chicago linguistic society (Vol. 35, pp. 191–204). Chicago, IL: Chicago Linguistic Society. doi:10.1184/R1/6618578.v1 Love, N. (ed.) (2014). The foundations of linguistic theory: Selected writings of Roy Harris. London, UK: Routlege. Lowe, R. (1981). Analyse linguistique et ethnocentrisme. Essai sur la structure du mot en inuktitut. Ottawa, ON: Musée National de l’Homme. Lü, X., & Zhang, J. (1999). Reading efficiency: A comparative study of English and Chinese orthographies. Literacy Research and Instruction, 38, 301–317. doi:10.1080/ 19388079909558298 Ludtke, H. (1969). Die Alphabetschrift und das Problem des Lautsegmentierung. Phonetica, 20, 147–176. doi:10.1159/000259279 Lundberg, I. (1987). Are letters necessary for the development of phonemic awareness? European Bulletin of Cognitive Psychology, 7, 472–475. Lundberg, I. (1991). Phonemic awareness can be developed without reading instruction. In S. A. Brady & D. P. Shankweiler (eds.), Phonological processes in literacy: A tribute to Isabelle Y. Liberman (pp. 47–53). Hillsdale, NJ: Lawrence Erlbaum. Lundberg, I. (1999). Learning to read in Scandinavia. In M. O. Harris & G. Hatano (eds.), Learning to read and write: A cross-linguistic perspective (pp. 157–172). Cambridge, UK: Cambrige University Press. Lundberg, I., Olofsson, Å., & Wall, S. (1980). Reading and spelling skills in the first school years predicted from phonemic awareness skills in kindergarten. Scandinavian Journal of Psychology, 21, 159–173. doi:doi:10.1111/j.1467-9450.1980.tb00356.x

278

References

Luo, H., & Poeppel, D. (2012). Cortical oscillations in auditory perception and speech: Evidence for two temporal windows in human auditory cortex. Frontiers in Psychology, 3. doi:/10.3389/fpsyg.2012.00170 Lurie, D. B. (2006). Language, writing, and disciplinarity in the critique of the “ideographic myth”: Some proleptical remarks. Language & Communication, 26, 250–269. doi:10.1016/j.langcom.2006.02.015 Lyn, H., & Savage-Rumbaugh, E. S. (2000). Observational word learning by two bonobos: Ostensive and non-ostensive contexts. Language & Communication, 20, 255–273. doi:10.1016/S0271-5309(99)00026-9 Lynch, M. P., Oller, D. K., Steffens, M. L., et al. (1995). Phrasing in prelinguistic vocalizations. Developmental Psychobiology, 28, 3–25. doi:10.1002/dev.420280103 Lynip, A. (1951). The use of magnetic devices in the collection and analysis of preverbal utterances of an infant. Genetic Psychology Monograph, 44, 221–262. Lyons, J. (1991). In defence of (so-called) autonomous linguistics. In Natural language and universal grammar: Essays in linguistic theory (Vol. 1, pp. 12–26). Cambridge, UK: Cambridge University Press. MacKay, D. G., Stewart, R., & Burke, D. M. (1998). HM revisited: Relations between language comprehension, memory, and the hippocampal system. Journal of Cognitive Neuroscience, 10, 377–394. doi:10.1162/089892998562807 MacKenzie, H., Graham, S. A., & Curtin, S. (2011). Twelve-month-olds privilege words over other linguistic sounds in an associative learning task. Developmental Science, 14, 249–255. doi:10.1111/j.1467-7687.2010.00975.x Maclarnon, A., & Hewitt, G. (1999). The evolution of human speech: The role of enhanced breathing control. American Journal of Physical Anthropology, 109, 341–363. doi:10.1002/(SICI)1096-8644(199907)109:33.0. CO2-2 Maclarnon, A., & Hewitt, G. (2004). Increased breathing control: Another factor in the evolution of human language. Evolutionary Anthropology: Issues, News, and Reviews, 13, 181–197. doi:10.1002/evan.20032 MacNeilage, P. F. (1970). Motor control of serial ordering of speech. Psychological Review, 77, 182–196. doi:10.1037/h0029070 MacNeilage, P. F. (1998). The frame/content theory of evolution of speech production. Behavioral and Brain Sciences, 21, 499–546. doi:10.1017/s0140525x98001265 MacNeilage, P. F. (2010). The origin of speech. New York, NY: Oxford University Press. MacNeilage, P. F., & Davis, B. L. (2000). On the origin of internal structure of word forms. Science, 288, 527–531. doi:10.1126/science.288.5465.527 MacNeilage, P. F., & DeClerk, J. L. (1967). On the motor control of coarticulation in CVC monosyllables. Haskins Laboratories Status Report: SR-12, 9–78. doi:10.1121/ 1.1911593 MacNeilage, P. F., & Ladefoged, P. (1976). The production of speech and language. In E. C. Cartere & M. P. Friedman (eds.), Handbook of perception (Vol. 7, pp. 76–120). New York, NY: Academic Press. MacWhinney, B. (2000). The CHILDES Project: Tools for analyzing talk. Mahwah, NJ: Lawrence Erlbaum. MacWhinney, B. (2019). Tools for analyzing talk, Part 1: The CHAT transcription format. doi:10.21415/3mhn-0z89

References

279

Maeda, S. (1990). Compensatory articulation during speech: Evidence from the analysis and synthesis of vocal-tract shapes using an articulatory model. In W. J. Hardcastle & A. Marchal (eds.), Speech production and speech modelling (pp. 131–149). Boston, MA: Kluwer Academic. Magrassi, L., Aromataris, G., Cabrini, A., et al. (2015). Sound representation in higher language areas during language generation. Proceedings of the National Academy of Sciences, 112, 1868–1873. doi:10.1073/pnas.1418162112 Mahon, B. Z. (2015). What is embodied about cognition? Language, Cognition and Neuroscience, 30, 420–429. doi:10.1080/23273798.2014.987791 Mahon, B. Z., & Caramazza, A. (2008). A critical look at the embodied cognition hypothesis and a new proposal for grounding conceptual content. Journal of Physiology-Paris, 102, 59–70. doi:10.1016/j.jphysparis.2008.03.004 Mahon, B. Z., & Hickok, G. (2016). Arguments about the nature of concepts: Symbols, embodiment, and beyond. Psychonomic Bulletin & Review, 23, 941–958. doi:10.3758/s13423-016-1045-2 Mallory, C. S., & Giocomo, L. M. (2018). Heterogeneity in hippocampal place coding. Current Opinion in Neurobiology, 49, 158–167. doi:10.1016/j.conb.2018.02.014 Malmberg, B. (1972). Les nouvelles tendances de la linguistique. Paris, FR: Presses Universitaires de France. Mann, V. A. (1986). Phonological awareness: The role of reading experience. Haskins Laboratories: Status Report on Speech Research, SR-85, 1–22. Mann, V. A. (1991). Are we taking too narrow a view of the conditions for development of phonological awareness. In S. A. Brady & D. P. Shankweiler (eds.), Phonological processes in literacy: A tribute to Isabelle Y. Liberman (pp. 55–64). Hillsdale, NJ: Lawrence Erlbaum. Marian, V., Bartolotti, J., Chabal, S., et al. (2012). CLEARPOND: Cross-linguistic easy-access resource for phonological and orthographic neighborhood densities. PLoS One, 7, e43230. doi:10.1371/journal.pone.0043230 Marie, P. (1906). Que faut-il penser des aphasies sous-corticales aphasies pures? Semaine Médicale, 26, 493–500. Mariën, P. (2017). A role for the cerebellum in language and related cognitive and affective functions. In M. Mody (ed.), Neural mechanisms of language (pp. 175–198). New York, NY: Springer. Marno, H., Farroni, T., Dos Santos, Y. V., et al. (2015). Can you see what I am talking about? Human speech triggers referential expectation in four-month-old infants. Scientific Reports, 5, 13594. doi:10.1038/srep13594 Marno, H., Guellai, B., Vidal, Y., et al. (2016). Infants’ selectively pay attention to the information they receive from a native speaker of their language. Frontiers in Psychology, 7, 1150. doi:10.3389/fpsyg.2016.01150 Marquardt, T. P., Sussman, H. M., Snow, T., et al. (2002). The integrity of the syllable in developmental apraxia of speech. Journal of Communication Disorders, 35, 31–49. doi:10.1016/S0021-9924(01)00068-5 Marslen-Wilson, W. (2001). Access to lexical representations: Cross-linguistic issues. Language and Cognitive Processes, 16, 699–708. doi:10.1080/ 01690960143000164 Martin, A. J. (2016). GRAPES – Grounding representations in action, perception, and emotion systems: How object properties and categories are represented in the human

280

References

brain. Psychonomic Bulletin & Review, 23, 979–990. doi:10.3758/s13423-0150842-3 Martin, J. G. (1972). Rhythmic (hierarchical) versus serial structure in speech and other behavior. Psychological Review, 79, 487–509. Martin, P. (2014). Ondes cérébrales et contraintes de la structure prosodique. Synergies Europe, 9, 161–176. Martin, P. (2015). The structure of spoken language: Intonation in Romance. Cambridge, UK: Cambridge University Press. Marvel, C. L., & Desmond, J. E. (2010). Functional topography of the cerebellum in verbal working memory. Neuropsychology Review, 20, 271–279. doi:10.1007/ s11065-010-9137-7 Massaro, D. W., & Perlman, M. (2017). Quantifying iconicity’s contribution during language acquisition: Implications for vocabulary learning. Frontiers in Communication, 2. doi:10.3389/fcomm.2017.00004 Matheson, H. E., & Barsalou, L. W. (2018). Embodiment and grounding in cognitive neuroscience. In J. T. Wixted (ed.), The Stevens’ handbook of experimental psychology and cognitive neuroscience (pp. 1–32). New York, NY: Wiley. Mattingly, I. G. (1987). Morphological structure and segmental awareness. Haskins Laboratories Report on Speech Research, SR-92, 107–111. Mattox, S. T., Valle-Inclán, F., & Hackley, S. A. (2006). Psychophysiological evidence for impaired reward anticipation in Parkinson’s disease. Clinical Neurophysiology, 117, 2144–2153. doi:10.1016/j.clinph.2006.05.026 Mattys, S. L., Jusczyk, P. W., Luce, P. A., et al. (1999). Phonotactic and prosodic effects on word segmentation in infants. Cognitive Psychology, 38, 465–494. doi:10.1006/ cogp.1999.0721 Mattys, S. L., White, L., & Melhorn, J. F. (2005). Integration of multiple speech segmentation cues: A hierarchical framework. Journal of Experimental Psychology: General, 134, 477–500. doi:10.1037/0096-3445.134.4.477 Maurer, D., Pathman, T., & Mondloch, C. J. (2006). The shape of boubas: Sound–shape correspondences in toddlers and adults. Developmental Science, 9, 316–322. doi:10.1111/j.1467-7687.2006.00495.x McCarthy, J. J. (1982). Prosodic templates, morphemic templates, and morphemic tiers. In H. van der Hulst & B. N. Smith (eds.), The structure of phonological representations (Linguistic Models 2) (pp. 191–223). Dordrecht, NL: Foris. McClelland, J. L., Mirman, D., & Holt, L. L. (2006). Are there interactive processes in speech perception? TRENDS in Cognitive Sciences, 10, 363–369. doi:10.1016/j. tics.2006.06.007 McClelland, J. L., & Rumelhart, D. (1986). Parallel distributed processing: Explorations in the microstructure of cognition. Cambridge, MA: MIT Press. McDougle, S. D., Ivry, R. B., & Taylor, J. A. (2016). Taking aim at the cognitive side of learning in sensorimotor adaptation tasks. TRENDS in Cognitive Sciences, 20, 535–544. doi:10.1016/j.tics.2016.05.002 McFarland, D. H. (2001). Respiratory markers of conversational interaction. Journal of Speech, Language, and Hearing Research, 44, 128–143. doi:10.1044/1092-4388 (2001/012)

References

281

McFarland, D. H., & Smith, A. (1992). Effects of vocal task and respiratory phase on prephonatory chest wall movements. Journal of Speech, Language, and Hearing Research, 35, 971–982. doi:10.1044/jshr.3505.971 McGillion, M., Herbert, J. S., Pine, J. M., et al. (2017). What paves the way to conventional language? The predictive value of babble, pointing, and socioeconomic status. Child Development, 88, 156–166. doi:10.1111/cdev.12671 McGilvray, J. (1999). Chomsky: Language, mind and politics. Cambridge, MA: Polity Press. McLachlan, N. M., & Wilson, S. J. (2017). The contribution of brainstem and cerebellar pathways to auditory recognition. Frontiers in Psychology, 8. doi:10.3389/ fpsyg.2017.00265 McNeill, D. (2005). Gesture and thought. Chicago, IL: University of Chicago Press. McNorgan, C., Reid, J., & McRae, K. (2011). Integrating conceptual knowledge within and across representational modalities. Cognition, 118, 211–233. doi:10.1016/j. cognition.2010.10.017 Mégevand, P., Mercier, M. R., Groppe, D. M., et al. (2018). Phase resetting in human auditory cortex to visual speech. bioRxiv, 405597. doi:10.1101/405597 Meister, I. G., Wilson, S. M., Deblieck, C., et al. (2007). The essential role of premotor cortex in speech perception. Current Biology, 17, 1692–1696. doi:10.1016/j. cub.2007.08.064 Merkus, P. J. F. M., de Jongste, J. C., & Stocks, J. (2005). Respiratory function measurements in infants and children. Lung function testing. European Respiratory Society Monograph, 31, 166–194. Meteyard, L., & Vigliocco, G. (2018). Lexico-semantics. In S.-A. Rueschemeyer & M. G. Gaskell (eds.), The Oxford handbook of psycholinguistics. New York, NY: Oxford University Press. Meyer, L. (2017). The neural oscillations of speech processing and language comprehension: State of the art and emerging mechanisms. European Journal of Neuroscience, 48, 1–13. doi:10.1111/ejn.13748 Meyer, L., Henry, M. J., Gaston, P., et al. (2016). Linguistic bias modulates interpretation of speech via neural delta-band oscillations. Cerebral Cortex, 1–10. doi:10.1093/cercor/bhw228 Miall, R. C., Christensen, L. O. D., Cain, O., et al. (2007). Disruption of state estimation in the human lateral cerebellum. PLoS Biology, 5, e316. doi:10.1371/journal. pbio.0050316 Miceli, G., Silveri, M. C., Nocentini, U., et al. (1988). Patterns of dissociation in comprehension and production of nouns and verbs. Aphasiology, 2, 351–358. Miller, G. A. (1956). The magic number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81–97. Miller, G. A. (1962). Decision units in the perception of speech. IRE Transactions on Information Theory, 8, 81–83. doi:10.1109/TIT.1962.1057697 Miller, G. A. (2003). The cognitive revolution: A historical perspective. TRENDS in Cognitive Sciences, 7, 141–144. doi:10.1016/S1364-6613(03)00029-9 Miller, G. A., Galanter, E., & Pribram, K. H. (1960). Plans and the structure of behavior. New York, NY: Henry Holt.

282

References

Miller, J., & Weinert, R. (1998). Spontaneous spoken language. Oxford, UK: Clarendon Press. Miller, J. F., & Chapman, R. S. (1981). The relation between age and mean length of utterance in morphemes. Journal of Speech and Hearing Research, 24, 154–161. doi:10.1044/jshr.2402.154 Miller, K. F. (2002). Children’s early understanding of writing and language: The impact of characters and alphabetic orthographies. In L. Wenling, R. C. Anderson, & W. Nagy (eds.), Chinese children’s reading acquisition (pp. 17–29). Boston, MA: Springer. Mithun, M. (1989). The acquisition of polysynthesis. Journal of Child Language, 16, 285–312. doi:10.1017/S0305000900010424 Mizumori, S. J. Y., Puryear, C. B., & Martig, A. K. (2009). Basal ganglia contributions to adaptive navigation. Behavioural Brain Research, 199, 32–42. doi:10.1016/j. bbr.2008.11.014 Moberget, T., & Ivry, R. B. (2016). Cerebellar contributions to motor control and language comprehension: Searching for common computational principles. Annals of the New York Academy of Sciences, 1369, 154–171. doi:10.1111/nyas.13094 Moll, K. L., & Daniloff, R. G. (1971). Investigation of the timing of velar movements during speech. Journal of the Acoustical Society of America, 50, 678–684. Mollo, G., Pulvermüller, F., & Hauk, O. (2016). Movement priming of EEG/MEG brain responses for action-words characterizes the link between language and action. Cortex, 74, 262–276. doi:10.1016/j.cortex.2015.10.021 Monaco, J. D., Rao, G., Roth, E. D., et al. (2014). Attentive scanning behavior drives one-trial potentiation of hippocampal place fields. Nature Neuroscience, 17, 725. doi:10.1038/nn.3687 Monaghan, P., & Christiansen, M. H. (2006). Why form-meaning mappings are not entirely arbitrary in language. Proceedings of the 28th Annual Conference of the Cognitive Science Society, 1838–1843. doi:csjarchive.cogsci.rpi.edu/proceedings/ 2006/docs/p1838.pdf Monaghan, P., Christiansen, M. H., & Fitneva, S. A. (2011). The arbitrariness of the sign: learning advantages from the structure of the vocabulary. Journal of Experimental Psychology: General, 140, 325–347. doi:10.1037/a0022924 Monaghan, P., Mattock, K., & Walker, P. (2012). The role of sound symbolism in language learning. Journal of Experimental Psychology: Learning, Memory and Cognition, 38, 1152–1164. doi:10.1037/a0027747 Monaghan, P., & Rowland, C. F. (2017). Combining language corpora with experimental and computational approaches for language acquisition research. Language Learning, 67, 14–39. doi:10.1111/lang.12221 Monaghan, P., Shillcock, R. C., Christiansen, M. H., et al. (2014). How arbitrary is language? Philosophical Transactions of the Royal Society B: Biological Sciences, 369, 20130299. doi:10.1098/rstb.2013.0299 Morais, J. (1991). Constraints on the development of phonological awareness. In S. Brady & D. Shankweiler (eds.), Phonological processes in literacy: A tribute to Isabelle Y. Liberman (pp. 5–27). Hillsdale, NJ: Lawrence Erlbaum. Morais, J., Alegría, J., & Content, A. (1987). The relationships between segmental analysis and alphabetic literacy: An interactive view. Cahiers de psychologie cognitive, 7, 415–438.

References

283

Morais, J., Cary, L., Alegria, J., et al. (1979). Does awareness of speech as a sequence of phones arise spontaneously? Cognition, 7, 323–331. doi:10.1016/0010-0277(79) 90020-9 Morais, J., Kolinsky, R., Alegria, J., et al. (1998). Alphabetic literacy and psychological structure. Letras de Hoje, 33, 61–79. Morgan, J. L. (1996). Prosody and the roots of parsing. Language and Cognitive Processes, 11, 69–106. doi:10.1080/016909696387222 Morton, E. S. (1994). Sound symbolism and its role in non-human vertebrate communication. In L. Hinton, J. Nichols, & J. J. Ohala (eds.), Sound symbolism (pp. 348–365). Cambridge, UK: Cambridge University Press. Moseley, R. L., Kiefer, M., & Pulvermüller, F. (2015). Grounding and embodiment of concepts and meaning. In Y. Coello & M. H. Fischer (eds.), Perceptual and emotional embodiment: Foundations of embodied cognition (Vol. 1, pp. 93–113). London, UK: Routledge. Moseley, R. L., & Pulvermüller, F. (2014). Nouns, verbs, objects, actions, and abstractions: Local fMRI activity indexes semantics, not lexical categories. Brain and Language, 132, 28–42. doi:10.1016/j.bandl.2014.03.001 Moser, M.-B., Rowland, D. C., & Moser, E. I. (2015). Place cells, grid cells, and memory. Cold Spring Harbor Perspectives in Biology, 7. doi:10.1101/cshperspect. a021808 Mower, G., Gibson, A., & Glickstein, M. (1979). Tectopontine pathway in the cat: Laminar distribution of cells of origin and visual properties of target cells in dorsolateral pontine nucleus. Journal of Neurophysiology, 42, 1–15. doi:10.1152/ jn.1979.42.1.1 Mowrey, R. A., & MacKay, I. R. (1990). Phonological primitives: Electromyographic speech error evidence. Journal of the Acoustical Society of America, 88, 1299–1312. doi:10.1121/1.399706 Mtui, E., Gruener, G., & Dockery, P. (2015). Fitzgerald’s clinical neuroanatomy and neuroscience. Edingburgh, UK: Elsevier Saunders. Mu, Y., Cerritos, C., & Khan, F. (2018). Neural mechanisms underlying interpersonal coordination: A review of hyperscanning research. Social and Personality Psychology Compass, 12, e12421. doi:10.1111/spc3.12421 Murdoch, B. E. (2001). Subcortical brain mechanisms in speech and language. Folia Phoniatrica et Logopaedica, 53, 233–251. doi:10.1159/000052679 Murdoch, B. E. (2010a). Acquired speech and language disorders: A neuroanatomical and functional neurological approach. Chichester, UK: Willey-Blackwell. Murdoch, B. E. (2010b). The cerebellum and language: Historical perspective and review. Cortex, 46, 858–868. doi:10.1016/j.cortex.2009.07.018 Murdoch, B. E., & Whelan, B.-M. (2009). Speech and language disorders associated with subcortical pathology. Chichester, UK: Wiley-Blackwell. Murray, S. O. (1994). Theory groups and the study of language in North America: A social history. Amsterdam, NL: Benjamins. Näätänen, R., Paavilainen, P., Rinne, T., et al. (2007). The mismatch negativity (MMN) in basic research of central auditory processing: A review. Clinical Neurophysiology, 118, 2544–2590. doi:10.1016/j.clinph.2007.04.026 Nakamura, K. C., Sharott, A., & Magill, P. J. (2012). Temporal coupling with cortex distinguishes spontaneous neuronal activities in identified basal ganglia-recipient and

284

References

cerebellar-recipient zones of the motor thalamus. Cerebral Cortex, 24, 81–97. doi:10.1093/cercor/bhs287 Nam, H., Goldstein, L., & Saltzman, E. (2009). Self-organization of syllable structure: A coupled oscillator model. In F. Pellegrino, E. Marisco, I. Chitoran, & C. Coupé (eds.), Approaches to phonological complexity (Vol. 16, pp. 299–328). New York, NY: Mouton. Nam, H., Goldstein, L., Saltzman, E., et al. (2004). TADA: An enhanced, portable Task Dynamics model in MATLAB. Journal of the Acoustical Society of America, 115, 2430. Nambu, A., Tokuno, H., & Takada, M. (2002). Functional significance of the cortico– subthalamo–pallidal “hyperdirect” pathway. Neuroscience Research, 43, 111–117. doi:10.1016/S0168-0102(02)00027-5 Namburi, P., Al-Hasani, R., Calhoon, G. G., et al. (2015). Architectural representation of valence in the limbic system. Neuropsychopharmacology, 41, 1697–1715. doi:10.1038/npp.2015.358 Nathani, S., & Oller, D. K. (2001). Beyond ba-ba and gu-gu: Challenges and strategies in coding infant vocalizations. Behavior Research Methods, Instruments, & Computers, 33, 321–330. doi:10.3758/bf03195385 Nazzi, T., Iakimova, G., Bertoncini, J., et al. (2006). Early segmentation of fluent speech by infants acquiring French: Emerging evidence for crosslinguistic differences. Journal of Memory and Language, 54, 283–299. doi:10.1016/j.jml.2005.10.004 Negus, V. E. (1949). The comparative anatomy and physiology of the larynx. New York, NY: Grune & Stratton. Neininger, B., & Pulvermüller, F. (2003). Word-category specific deficits after lesions in the right hemisphere. Neuropsychologia, 41, 53–70. doi:10.1016/S0028-3932(02) 00126-4 Nespor, M., & Vogel, I. H. (1983). Prosodic structure above the word. In A. Cutler & D. R. Ladd (eds.), Prosody: Models and measurements (pp. 123–140). Berlin, DE: Springer. Nespor, M., & Vogel, I. H. (1986). Prosodic phonology. Dordrecht, NL: Foris. Newmeyer, F. J. (1986a). Linguistic theory in America. The first quarter-century of transformational generative grammar. New York, NY: Academic Press. Newmeyer, F. J. (1986b). The politics of linguistics. Chicago IL: Chicago University Press. Newmeyer, F. J. (2010). Formalism and functionalism in linguistics. Wiley Interdisciplinary Reviews: Cognitive Science, 1, 301–307. doi:10.1002/wcs.6 Nieminen, L. (2009). MLU and IPSyn measuring absolute complexity. Estonian Papers in Applied Linguistics (Eesti Rakenduslingvistika Ühingu aastaraamat), 5, 173–185. Nishimura, T., Mikami, A., Suzuki, J., et al. (2006). Descent of the hyoid in chimpanzees: Evolution of face flattening and speech. Journal of Human Evolution, 51, 244–254. doi:10.1016/j.jhevol.2006.03.005 Nooteboom, S. G. (2007). Alphabetics: From phonemes to letters or from letters to phonemes? Written Language & Literacy, 10, 129–143. doi:doi:10.1075/ wll.10.2.05noo Norman-Haignere, S., Kanwisher, N. G., & McDermott, J. H. (2015). Distinct cortical pathways for music and speech revealed by hypothesis-free voxel decomposition. Neuron, 88, 1281–1296. doi:10.1016/j.neuron.2015.11.035

References

285

Nozaradan, S., Schwartze, M., Obermeier, C., et al. (2017). Specific contributions of basal ganglia and cerebellum to the neural tracking of rhythm. Cortex, 95, 156–168. doi:10.1016/j.cortex.2017.08.015 Núñez, R. E. (2017). Is there really an evolved capacity for number? TRENDS in Cognitive Sciences, 21, 409–424. doi:10.1016/j.tics.2017.03.005 O’Donnell, R. C. (1974). Syntactic differences between speech and writing. American Speech, 49, 102–110. O’Grady, W. (2008). The emergentist program. Lingua, 118, 447–464. doi:10.1016/j. lingua.2006.12.001 O’Keefe, J., & Nadel, L. (1978). The hippocampus as a cognitive map. Oxford, UK: Clarendon Press. Obleser, J., Herrmann, B., & Henry, M. J. (2012). Neural oscillations in speech: Don’t be enslaved by the envelope. Frontiers in Human Neuroscience, 6. doi:10.3389/ fnhum.2012.00250 Ohala, J. J. (1990). Respiratory activity in speech. In W. J. Hardcastle & A. Marchal (eds.), Proceedings of the NATO advanced study institute on speech production and speech modelling (pp. 23–53). Dordrecht, NL: Kluwer Academic. Ohala, J. J. (1994). The frequency code underlies the sound-symbolic use of voice pitch. In L. Hinton, J. Nichols, & J. J. Ohala (eds.), Sound symbolism (pp. 325–365). Cambridge, UK: Cambridge University Press. Ohala, J. J. (1995). Speech technology: Historical antecedents. In R. E. Asher (ed.), Concise history of the language sciences (pp. 416–419). Amsterdam, NL: Pergamon. Oller, D. K. (2000). The emergence of the speech capacity. Mahwah, NJ: Lawrence Erlbaum. Oller, D. K., Dale, R., & Griebel, U. (2016). New frontiers in language evolution and development. Topics in Cognitvie Sciences, 8, 353–360. doi:10.1111/tops.12204 Oller, D. K., & Eilers, R. E. (1988). The role of audition in infant babbling. Child Development, 59, 441–449. doi:10.2307/1130323 Oller, D. K., & Griebel, U. (2014). On quantitative comparative research in communication and language evolution. Biological Theory, 9, 296–308. doi:10.1007/s13752014-0186-7 Oller, D. K., & Lynch, M. P. (1992). Infant vocalizations and innovations in infraphonology: Toward a broader theory of development and disorders. In C. A. Ferguson, L. Menn, & C. Stoel-Gammon (eds.), Phonological development: Models, research, implications (pp. 509–536). Timonium, MD: York Press. Olson, D. R. (1993). How writing represents speech. Language & Communication, 13, 1–17. doi:10.1016/0271-5309(93)90017-H Olson, D. R. (1996). Towards a psychology of literacy: On the relations between speech and writing. Cognition, 60, 83–104. doi:10.1016/0010-0277(96)00705-6 Olson, D. R. (2017). History of writing, history of rationality. In M. Fernández-Götz & D. Krausse (eds.), Eurasia at the dawn of history: Urbanization and social change (pp. 40–51). Cambridge, UK: Cambridge University Press. Onderdelinden, L., van de Craats, I., & Kurvers, J. (2009). Word concept of illiterates and low-literates: Worlds apart? LOT Occasional Series, 15, 35–48. Ong, W. J. (1982). Orality and literacy: The technologizing of the word. London, UK: Methuen.

286

References

Ortega, G. (2017). Iconicity and sign lexical acquisition: A review. Frontiers in Psychology, 8. doi:10.3389/fpsyg.2017.01280 Osipova, D., Takashima, A., Oostenveld, R., et al. (2006). Theta and gamma oscillations predict encoding and retrieval of declarative memory. Journal of Neuroscience, 26, 7523–7531. doi:10.1523/JNEUROSCI.1948-06.2006 Ostry, D. J., Gribble, P. L., & Gracco, V. L. (1996). Coarticulation of jaw movements in speech production: Is context sensitivity in speech kinematics centrally planned? Journal of Neuroscience, 16, 1570–1579. Otsuka, Y., Suzuki, K., Fujii, T., et al. (2005). Proper name anomia after left temporal subcortical hemorrhage. Cortex, 41, 39–47. doi:10.1016/S0010-9452(08)70176-X Oudeyer, P.-Y., & Kaplan, F. (2006). Discovering communication. Connection Science, 18, 189–206. doi:10.1080/09540090600768567 Packard, J. L. (2000). The morphology of Chinese: A linguistic and cognitive approach. Cambridge, UK: Cambridge University Press. Paivio, A. (2007). Mind and its evolution: A dual coding theoretical approach. Mahwah, NJ: Lawrence Erlbaum. Palmeri, T. J., Goldinger, S. D., & Pisoni, D. B. (1993). Episodic encoding of voice attributes and recognition memory for spoken words. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 309–328. doi:10.1037/02787393.19.2.309 Papoušek, M., Bornstein, M. H., Nuzzo, C., et al. (1990). Infant responses to prototypical melodic contours in parental speech. Infant Behavior and Development, 13, 539–545. doi:10.1016/0163-6383(90)90022-Z Parham, D. F., Buder, E. H., Oller, D. K., et al. (2011). Syllable-related breathing in infants in the second year of life. Journal of Speech, Language, and Hearing Research, 54, 1039–1050. doi:10.1044/1092-4388(2010/09-0106) Parisse, C. (2005). New perspectives on language development and the innateness of grammatical knowledge. Language Sciences, 27, 383–401. doi:10.1016/j. langsci.2004.09.015 Park, H., Ince, R. A. A., Schyns, P. G., et al. (2015). Frontal top-down signals increase coupling of auditory low-frequency oscillations to continuous speech in human listeners. Current Biology, 25, 1649–1653. doi:10.1016/j.cub.2015.04.049 Parker, M. D., & Brorson, K. (2005). A comparative study between mean length of utterance in morphemes (MLUm) and mean length of utterance in words (MLUw). First Language, 25, 365–376. doi:10.1177/0142723705059114 Parkes, M. (1992). Pause and effect. London, UK: Routledge. Parry, A. (1987). The making of Homeric verse: The collected papers of Milman Parry. Oxford, UK: Oxford University Press Partanen, E., Kujala, T., Näätänen, R., et al. (2013). Learning-induced neural plasticity of speech processing before birth. Proceedings of the National Academy of Sciences, 110, 15145–15150. doi:10.1073/pnas.1302159110 Parvizi, J. (2009). Corticocentric myopia: Old bias in new cognitive sciences. TRENDS in Cognitive Sciences, 13, 354–359. doi:10.1016/j.tics.2009.04.008 Patel, A. D. (1993). Ancient India and the orality–literacy divide theory. In R. J. Scholes (ed.), Literacy and language analysis (pp. 199–208). Hillsdale, NJ: Lawrence Erlbaum.

References

287

Patri, J.-F. (2018). Bayesian modeling of speech motor planning: Variability, multisensory goals and perceptuo-motor interactions (PhD). Grenoble, FR: Université Grenoble-Alpes. Patterson, K., Nestor, P. J., & Rogers, T. T. (2007). Where do you know what you know? The representation of semantic knowledge in the human brain. Nature Reviews Neuroscience, 8. doi:10.1038/nrn2277 Payan, Y., & Perrier, P. (1997). Synthesis of V-V sequences with a 2D biomechanical tongue model controlled by the Equilibrium Point Hypothesis. Speech Communication, 22, 185–205. doi:10.1016/S0167-6393(97)00019-8 Pedersen, J. (2012). The symbolic mind: Apes, symbols, and the evolution of language (PhD). Iowa State University, Ames, Iowa. Peelle, J. E., & Davis, M. H. (2012). Neural oscillations carry speech rhythm through to comprehension. Frontiers in Psychology, 3. doi:10.3389/fpsyg.2012.00320 Peelle, J. E., Gross, J., & Davis, M. H. (2013). Phase-locked responses to speech in human auditory cortex are enhanced during comprehension. Cerebral Cortex, 23, 1378–1387. doi:10.1093/cercor/bhs118 Peeva, M. G., Guenther, F. H., Tourville, J. A., et al. (2010). Distinct representations of phonemes, syllables, and supra-syllabic sequences in the speech production network. NeuroImage, 50, 626–638. doi:10.1016/j.neuroimage.2009.12.065 Peirce, C. S. (1998). What is a sign? In P. E. Project (ed.), The essential Peirce: Selected philosophical writings, Vol. 2 (1893–1913) (pp. 4–10). Bloomington, IN: Indiana University Press. Pena, M., Bonatti, L. L., Nespor, M., et al. (2002). Signal-driven computations in speech processing. Science, 298, 604–607. doi:10.1126/science.1072901 Penfield, W., & Roberts, L. (1959). Speech and brain mechanisms. Princeton, NJ: Princeton University Press. Penn, D. C., & Povinelli, D. J. (2007). On the lack of evidence that non-human animals possess anything remotely resembling a “theory of mind”. Philosophical Transactions of the Royal Society B: Biological Sciences, 362, 731–744. doi:10.1098/rstb.2006.2023 Penner, M. R., & Mizumori, S. J. Y. (2012). Neural systems analysis of decision making during goal-directed navigation. Progress in Neurobiology, 96, 96–135. doi:10.1016/ j.pneurobio.2011.08.010 Pergnier, M. (1986). Le mot. Paris, FR: Presses Universitaires de France. Perlman, M., Dale, R., & Lupyan, G. (2015). Iconicity can ground the creation of vocal symbols. Royal Society Open Science, 2, 150152. doi:10.1098/rsos.150152 Perrier, P. (2005). Control and representations in speech production. ZAS Papers in Lingustics, 40, 109–132. Perrier, P., Loevenbruck, H., & Payan, Y. (1996). Control of tongue movements in speech: The equilibrium point hypothesis perspective. Journal of Phonetics, 24, 53–75. doi:10.1006/jpho.1996.0005 Perrier, P., Ostry, D. J., & Laboissière, R. (1996). The equilibrium point hypothesis and its application to speech motor control. Journal of Speech, Language, and Hearing Research, 39, 365–378. doi:10.1044/jshr.3902.365 Perrier, P., Payan, Y., Zandipour, M., et al. (2003). Influences of tongue biomechanics on speech movements during the production of velar stop consonants: A modeling study.

288

References

Journal of the Acoustical Society of America, 114, 1582–1599. doi:10.1121/ 1.1587737 Perruchet, P., Peereman, R., & Tyler, M. D. (2006). Do we need algebraic-like computations? A reply to Bonatti, Pena, Nespor, and Mehler (2006). Journal of Experimental Psychology: General, 135, 322–326. doi:10.1037/00963445.135.2.322 Perruchet, P., & Rey, A. (2005). Does the mastery of center-embedded linguistic structures distinguish humans from nonhuman primates? Psychonomic Bulletin & Review, 12, 307–313. doi:10.3758/bf03196377 Perruchet, P., Tyler, M. D., Galland, N., et al. (2004). Learning nonadjacent dependencies: No need for algebraic-like computations. Journal of Experimental Psychology: General, 133, 573–583. doi:10.1037/0096-3445.133.4.573 Petersson, K. M., Reis, A., & Ingvar, M. (2001). Cognitive processing in literate and illiterate subjects: A review of some recent behavioral and functional neuroimaging data. Scandinavian Journal of Psychology, 42, 251–267. doi:10.1111/1467-9450.00235 Pettinato, M., Tuomainen, O., Granlund, S., et al. (2016). Vowel space area in later childhood and adolescence: Effects of age, sex and ease of communication. Journal of Phonetics, 54, 1–14. doi:10.1016/j.wocn.2015.07.002 Piai, V., Anderson, K. L., Lin, J. J., et al. (2016). Direct brain recordings reveal hippocampal rhythm underpinnings of language processing. Proceedings of the National Academy of Sciences, 113, 11366–11371. doi:10.1073/pnas.1603312113 Piattelli-Palmarini, M., & Berwick, R. C. (2013). Rich languages from poor inputs. Oxford, UK: Oxford University Press. Pietrandrea, P. (2002). Iconicity and arbitrariness in Italian sign language. Sign Language Studies, 2, 296–321. doi:10.1353/sls.2002.0012 Pika, S., Liebal, K., Call, J., et al. (2005). The gestural communication of apes. Gesture, 5, 41–56. doi:10.1075/gest.5.1.05pik Pine, J. M., & Lieven, E. V. M. (1997). Slot and frame patterns and the development of the determiner category. Applied Psycholinguistics, 18, 123–138. doi:10.1017/ S0142716400009930 Pine, J. M., & Martindale, H. (1996). Syntactic categories in the speech of young children: The case of the determiner. Journal of Child Language, 23, 369–395. doi:10.1017/S0305000900008849 Pinker, S. (1984). Language learnability and language development. Cambridge, MA: Harvard University Press. Pinker, S. (1987). The bootstrapping problem in language acquisition. In B. MacWhinney (ed.), Mechanisms of language acquisition (pp. 399–441). Hillsdale, NJ: Lawrence Erlbaum. Pinker, S. (1994a). How could a child use verb syntax to learn verb semantics? Lingua, 92, 377–410. doi:10.1016/0024-3841(94)90347-6 Pinker, S. (1994b). The language instinct. New York, NY: William Morrow. Pinker, S., & Jackendoff, R. (2005). The faculty of language: What’s special about it? Cognition, 95, 201–236. doi:10.1016/j.cognition.2004.08.004 Poeppel, D. (2012). The maps problem and the mapping problem: Two challenges for a cognitive neuroscience of speech and language. Cognitive Neuropsychology, 29, 34–55. doi:10.1080/02643294.2012.710600

References

289

Poeppel, D., & Embick, D. (2005). Defining the relation between linguistics and neuroscience. In A. Cutler (ed.), Twenty-first century psycholinguistics: Four cornerstones (pp. 103–118). Hillsdale, NJ: Lawrence Erlbaum. Poeppel, D., & Hickok, G. (2004). Towards a new functional anatomy of language. Cognition, 92, 1–12. doi:10.1016/j.cognition.2003.11.001 Poeppel, D., Idsardi, W. J., & van Wassenhove, V. (2007). Speech perception at the interface of neurobiology and linguistics. Philosophical Transactions of the Royal Society B: Biological Sciences, 363, 1071–1086. doi:10.1098/rstb.2007.2160 Polivanov, V. E. (1936). Zur Frage der Betonungsfunktionen. Travaux du Cercle de Linguistique de Prague, 6, 75–81. Polka, L., Orena, A. J., Sundara, M., et al. (2017). Segmenting words from fluent speech during infancy: Challenges and opportunities in a bilingual context. Developmental Science, 20, 1–14. doi:10.1111/desc.12419 Pollick, A. S., & de Waal, F. B. M. (2007). Ape gestures and language evolution. Proceedings of the National Academy of Sciences, 104, 8184–8189. doi:10.1073/ pnas.0702624104 Ponari, M., Norbury, C. F., & Vigliocco, G. (2017). How do children process abstract concepts? Evidence from a lexical decision task. Developmental Science, 10, 10–11. doi:10.1111/desc.12549 Port, R. F. (2006). The graphical basis of phones and phonemes. In M. Munro & O.S. Bohn (eds.), Second language speech learning: The language experience in speech production and perception (pp. 349–365). Amsterdam, NL: John Benjamins. Port, R. F. (2007). How are words stored in memory? Beyond phones and phonemes. New Ideas in Psychology, 25, 143–170. doi:10.1016/j. newideapsych.2007.02.001 Port, R. F. (2010). Language as a social institution: Why phonemes and words do not live in the brain. Ecological Psychology, 22, 304–326. doi:10.1080/ 10407413.2010.517122 Port, R. F., & Leary, A. P. (2005). Against formal phonology. Language Acquisition, 81, 927–964. doi:10.1353/lan.2005.0195 Postal, P. M. (2003). Remarks on the foundations of linguistics. The Philosophical Forum, 34, 233–252. doi:10.1111/1467-9191.00137 Postal, P. M. (2009). The incoherence of Chomsky’s “biolinguistic” ontology. Biolinguistics, 3, 104–123. Pouplier, M. (2007). Tongue kinematics during utterances elicited with the SLIP technique. Language and Speech, 50, 311–341. doi:10.1177/ 00238309070500030201 Pouplier, M., & Goldstein, L. (2005). Asymmetries in the perception of speech production errors. Journal of Phonetics, 33, 47–75. doi:10.1016/j.wocn.2004.04.001 Pouplier, M., & Goldstein, L. (2010). Intention in articulation: Articulatory timing in alternating consonant sequences and its implications for models of speech production. Language and Cognitive Processes, 25, 616–649. doi:10.1080/ 01690960903395380 Pouplier, M., & Hardcastle, W. (2005). A re-evaluation of the nature of speech errors in normal and disordered speakers. Phonetica, 62, 227–243. doi:10.1159/000090100 Powell, B. B. (2009). Writing: Theory and history of the technology of civilization. Hoboken, NJ: John Wiley.

290

References

Prakash, P., Rekha, D., Nigam, R., et al. (1993). Phonological awareness, orthography, and literacy. In R. J. Scholes (ed.), Literacy and language analysis (pp. 55–70). New York, NY: Lawrence Erlbaum. Preston, A. R., & Eichenbaum, H. (2013). Interplay of hippocampus and prefrontal cortex in memory. Current Biology, 23, R764–R773. doi:10.1016/j.cub.2013.05.041 Price, T., Wadewitz, P., Cheney, D., et al. (2015). Vervets revisited: A quantitative analysis of alarm call structure and context specificity. Scientific Reports, 5, 13220. doi:10.1038/srep13220 Provine, R. R. (2005). Walkie-talkie evolution: Bipedalism and vocal production. Behavioral and Brain Sciences, 27, 520–521. doi:10.1017/S0140525X04410115 Pujol, J., Soriano-Mas, C., Ortiz, H., et al. (2006). Myelination of language-related areas in the developing brain. Neurology, 66, 339–343. doi:10.1212/01. wnl.0000201049.66073.8d Pulvermüller, F. (1999). Words in the brain’s language. Behavioral and Brain Sciences, 22, 253–279. doi:10.1017/S0140525X9900182X Pulvermüller, F. (2013). How neurons make meaning: Brain mechanisms for embodied and abstract-symbolic semantics. TRENDS in Cognitive Sciences, 17, 458–470. doi:10.1016/j.tics.2013.06.004 Radford, K., Taylor, R. C., Hall, J. G., et al. (2019). Aerodigestive and communicative behaviors in anencephalic and hydranencephalic infants. Birth Defects Research, 111, 41–52. doi:10.1002/bdr2.1424 Ramsdell, H. L., Oller, D. K., & Ethington, C. A. (2007). Predicting phonetic transcription agreement: Insights from research in infant vocalizations. Clinical Linguistics & Phonetics, 21, 793–831. doi:10.1080/02699200701547869 Read, C., Yun-Fei, Z., Hong-Yin, N., et al. (1986). The ability to manipulate speech sounds depends on knowing alphabetic writing. Cognition, 24, 31–44. doi:10.1016/ 0010-0277(86)90003-X Redcay, E., Dodell-Feder, D., Pearrow, M. J., et al. (2010). Live face-to-face interaction during fMRI: A new tool for social cognitive neuroscience. NeuroImage, 50, 1639–1647. doi:10.1016/j.neuroimage.2010.01.052 Reddy, P. P., & Koda, K. (2013). Orthographic constraints on phonological awareness in biliteracy development. Writing Systems Research, 5, 110–130. doi:10.1080/ 17586801.2012.748639 Reig, R., & Silberberg, G. (2014). Multisensory integration in the mouse striatum. Neuron, 83, 1200–1212. doi:10.1016/j.neuron.2014.07.033 Reilly, J., Peelle, J. E., Garcia, A., et al. (2016). Linking somatic and symbolic representation in semantic memory: The dynamic multilevel reactivation framework. Psychonomic Bulletin & Review, 23, 1002–1014. doi:10.3758/s13423015-0824-5 Reimchen, M., & Soderstrom, M. (2016). Do questions get infants talking? Infant vocal responses to questions and declaratives in maternal speech. Infant and Child Development, 26, 1–16. doi:10.1002/icd.1985 Reis, A., & Castro-Caldas, A. (1997). Illiteracy: A cause for biased cognitive development. Journal of the International Neuropsychological Society, 3, 444–450. doi:10.1017/S135561779700444X Reiss, C. (2017). Substance free phonology. In S. J. Hannahs & A. Bosch (eds.), The Routledge handbook of phonological theory (pp. 425–452). London, UK: Routledge.

References

291

Rice, M. L., Redmond, S. M., & Hoffman, L. (2006). Mean length of utterance in children with specific language impairment and in younger control children shows concurrent validity and stable and parallel growth trajectories. Journal of Speech, Language, and Hearing Research, 49, 793–808. doi:10.1044/10924388(2006/056) Ridouane, R. (2008). Syllables without vowels: Phonetic and phonological evidence from Tashlhiyt Berber. Phonology, 25, 321–359. Rimzhim, A., Katz, L., & Fowler, C. A. (2014). Brāhmī-derived orthographies are typologically Āksharik but functionally predominantly alphabetic. Writing Systems Research, 6, 41–53. doi:10.1080/17586801.2013.855618 Rinne, T., Balk, M. H., Koistinen, S., et al. (2008). Auditory selective attention modulates activation of human inferior colliculus. Journal of Neurophysiology, 100, 3323–3327. doi:10.1152/jn.90607.2008 Rizzolatti, G., & Arbib, M. A. (1998). Language within our grasp. Trends in Neurosciences, 21, 188–194. doi:10.1016/S0166-2236(98)01260-0 Roberts, E. W. (1975). Speech errors as evidence for the reality of phonological units. Lingua, 35, 263–296. doi:10.1016/0024-3841(75)90061-3 Rochet-Capellan, A., & Fuchs, S. (2014). Take a breath and take the turn: How breathing meets turns in spontaneous dialogue. Philosophical Transactions of the Royal Society B: Biological Sciences, 369, 20130399-. doi:10.1098/rstb.2013.0399 Rochette, C. (1973). Les groupes de consonnes en français: Étude de l’enchaînement articulatoire à l’aide de la radiocinématographie. Québec, QC: Presses de l’Université Laval. Rodriguez, E., George, N., Lachaux, J. P., et al. (1999). Perception’s shadow: Longdistance synchronization of human brain activity. Nature, 397, 430–433. doi:10.1038/ 17120 Rogers, R. D., Sahakian, B. J., Hodges, J. R., et al. (1998). Dissociating executive mechanisms of task control following frontal lobe damage and Parkinson’s disease. Brain, 121, 815–842. doi:10.1093/brain/121.5.815 Roitman, A. V., Pasalar, S., Johnson, M. T. V., et al. (2005). Position, direction of movement, and speed tuning of cerebellar Purkinje cells during circular manual tracking in monkey. Journal of Neuroscience, 25, 9244–9257. doi:10.1523/ JNEUROSCI.1886-05.2005 Rollins, P. R., Snow, C. E., & Willett, J. B. (1996). Predictors of MLU: Semantic and morphological developments. First Language, 16, 243–259. doi:10.1177/ 014272379601604705 Rom, A., & Leonard, L. (1990). Interpreting deficits in grammatical morphology in specifically language-impaired children: Preliminary evidence from Hebrew. Clinical Linguistics & Phonetics, 4, 93–105. doi:10.3109/02699209008985474 Ronconi, L., Casartelli, L., Carna, S., et al. (2017). When one is enough: Impaired multisensory integration in cerebellar agenesis. Cereb Cortex, 27, 2041–2051. doi:10.1093/cercor/bhw049 Rondal, J. A., & Comblain, A. (1996). Language in adults with Down syndrome. Down Syndrome Research and Practice, 4, 3–14. doi:10.3104/reviews.58 Rosemann, S., Brunner, F., Kastrup, A., et al. (2017). Musical, visual and cognitive deficits after middle cerebral artery infarction. eNeurologicalSci, 6, 25–32. doi:10.1016/j.ensci.2016.11.006

292

References

Ross, E. D. (2010). Cerebral localization of functions and the neurology of language: Fact versus fiction or is it something else? The Neuroscientist, 16, 222–243. doi:10.1177/1073858409349899 Rousselot, J.-P. (1897). Principes de phonétique expérimentale. Paris, FR: H. Welter. Roy, A. C., Craighero, L., Fabbri-Destro, M., et al. (2008). Phonological and lexical motor facilitation during speech listening: A transcranial magnetic stimulation study. Journal of Physiology-Paris, 102, 101–105. doi:10.1016/j. jphysparis.2008.03.006 Roy, A. C., Svensson, F.-P., Mazeh, A., et al. (2017). Prefrontal-hippocampal coupling by theta rhythm and by 2–5 Hz oscillation in the delta band: The role of the nucleus reuniens of the thalamus. Brain Structure and Function, 222, 2819–2830. doi:10.1007/s00429-017-1374-6 Roy, M., Shohamy, D., & Wager, T. D. (2012). Ventromedial prefrontal-subcortical systems and the generation of affective meaning. TRENDS in Cognitive Sciences, 16, 147–156. doi:10.1016/j.tics.2012.01.005 Rubin, P., Baer, T., & Mermelstein, P. (1981). An articulatory synthesizer for perceptual research. The Journal of the Acoustical Society of America, 70, 321–328. Rueschemeyer, S.-A., & Gaskell, M. G. (eds.). (2018). The Oxford handbook of psycholinguistics Oxford, UK: Oxford University Press. Rueschemeyer, S.-A., Pfeiffer, C., & Bekkering, H. (2010). Body schematics: On the role of the body schema in embodied lexical–semantic representations. Neuropsychologia, 48, 774–781. doi:10.1016/j.neuropsychologia.2009.09.019 Russell, N. K., & Stathopoulos, E. (1988). Lung volume changes in children and adults during speech production. Journal of Speech and Hearing Research, 31, 146–155. doi:10.1044/jshr.3102.146 Saenger, P. (1997). Space between words: The origins of silent reading. Standford, CA: Stanford University Press. Saffran, J. R. (2001a). The use of predictive dependencies in language learning. Journal of Memory and Language, 44, 493–515. doi:10.1006/jmla.2000.2759 Saffran, J. R. (2001b). Words in a sea of sounds: The output of infant statistical learning. Cognition, 81, 149–169. doi:10.1016/S0010-0277(01)00132-9 Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274, 1926–1928. doi:10.1126/science.274.5294.1926 Saffran, J. R., Johnson, E. K., Aslin, R. N., et al. (1999). Statistical learning of tone sequences by human infants and adults. Cognition, 70, 27–52. doi:10.1016/S00100277(98)00075-4 Saffran, J. R., Werker, J. F., & Werner, L. A. (2007). The infant’s auditory world: Hearing, speech, and the beginnings of language. In R. Siegler & D. Kuhn (eds.), Handbook of child psychology (pp. 58–108). New York, NY: Wiley. Saito, A., & Inoue, T. (2017). The frame constraint on experimentally elicited speech errors in Japanese. Journal of Psycholinguistic Research, 46, 583–596. doi:10.1007/ s10936-016-9454-y Sakata, S., & Harris, K. D. (2009). Laminar structure of spontaneous and sensory-evoked population activity in auditory cortex. Neuron, 64, 404–418. doi:10.1016/j.neuron.2009.09.020 Salomon, R. G. (1996). Brahmi and Kharoshthi. In P. T. Daniels (ed.), The world’s writing systems (pp. 373–383). Oxford, UK: Oxford University Press.

References

293

Salomon, R. G. (2000). Typological observations on the Indic script group and its relationship to other alphasyllabaries. Studies in the Linguistic Sciences, 30, 87–103. Saltzman, E., Löfqvist, A., & Mitra, S. (2000). “Glue” and “clocks”: Intergestural cohesion and global timing. In M. B. Broe & J. B. Pierrehumbert (eds.), Papers in laboratory phonology V: Acquisition and the lexicon (pp. 88–101). Cambridge, UK: Cambridge University Press. Saltzman, E. L., & Kelso, J. A. S. (1987). Skilled actions: A task-dynamic approach. Psychological Review, 94, 84–106. doi:10.1037/0033-295X.94.1.84 Saltzman, E. L., & Munhall, K. G. (1989). A dynamical approach to gestural patterning in speech production. Ecological Psychology, 1, 333–382. doi:10.1207/s15326969 eco0104_2 Sampson, G. (1985). Writing systems. Stanford, CA: Stanford University Press. Sampson, G. (2002). Exploring the richness of the stimulus. The Linguistic Review, 19, 73–104. doi:10.1515/tlir.19.1-2.73 Sandler, W. (2012). Dedicated gestures and the emergence of sign language. Gesture, 12, 265–307. doi:10.1075/gest.12.3.01san Sandler, W. (2013). Vive la différence: Sign language and spoken language in language evolution. Language and Cognition, 5, 189–203. doi:10.1515/langcog2013-0013 Sandler, W. (2017). The challenge of sign language phonology. Annual Review of Linguistics, 7, 43–63. doi:10.1146/annurev-linguistics-011516-034122 Sandler, W., Aronoff, M., Padden, C., et al. (2014). Language emergence: Al-Sayyid Bedouin sign language. In N. J. Enfield, P. Kockelman, & J. Sidnell (eds.), The Cambridge handbook of linguistic anthropology (pp. 246–278). Cambridge, UK: Cambridge University Press. Sanguineti, V., Laboissière, R., & Ostry, D. J. (1998). A dynamic biomechanical model for neural control of speech production. Journal of the Acoustical Society of America, 103, 1615–1627. doi:10.1121/1.421296 Sapir, E. (1921). Language. New York, NY: Harcourt, Brace. Sapir, E. (1933). The psychological reality of phonemes. Journal de Psychologie Normale et Pathologique, 30, 247–265. Sasaki, S., Isa, T., Pettersson, L.-G., et al. (2004). Dexterous finger movements in primate without monosynaptic corticomotoneuronal excitation. Journal of Neurophysiology, 92, 3142–3147. doi:10.1152/jn.00342.2004 Sato, M., Schwartz, J.-L., & Perrier, P. (2014). Phonemic auditory and somatosensory goals in speech production. Language, Cognition and Neuroscience, 29, 41–43. doi:10.1080/01690965.2013.849811 Saussure, F. d. (1916/1966). Course in general linguistics. New York, NY: McGrawHill. Savage-Rumbaugh, E. S., McDonald, K., Sevcik, R. A., et al. (1986). Spontaneous symbol acquisition and communicative use by pygmy chimpanzees (Pan paniscus). Journal of Experimental Psychology: General, 115, 211–235. doi:10.1037/00963445.115.3.211 Savage-Rumbaugh, E. S., Murphy, J., Sevcik, R. A., et al. (1993). Language comprehension in ape and child. Monographs of the Society for Research in Child Development, 58, 1–252. doi:10.2307/1166068

294

References

Savage-Rumbaugh, E. S., & Rumbaugh, D. M. (1978). Symbolization, language, and chimpanzees: A theoretical reevaluation based on initial language acquisition processes in four young Pan troglodytes. Brain and Language, 6, 265–300. doi:10.1016/ 0093-934X(78)90063-9 Savage-Rumbaugh, E. S., Shanker, G. S., & Taylor, J. T. (2001). Apes, language, and the human mind. New York, NY: Oxford University Press. Scarborough, H., Wyckoff, J., & Davidson, R. (1986). A reconsideration of the relation between age and mean utterance length. Journal of Speech and Hearing Research, 29, 394–399. doi:10.1044/jshr.2903.394 Scarborough, H. S., Rescorla, L., Tager-Flusberg, H., et al. (1991). The relation of utterance length to grammatical complexity in normal and language-disorders groups. Applied Psycholinguistics, 12, 23–45. Schack, B., & Weiss, S. (2005). Quantification of phase synchronization phenomena and their importance for verbal memory processes. Biological Cybernetics, 92, 275–287. doi:10.1007/s00422-005-0555-1 Schmahmann, J. D. (2004). Disorders of the cerebellum: Ataxia, dysmetria of thought, and the cerebellar cognitive affective syndrome. The Journal of Neuropsychiatry and Clinical Neurosciences, 16, 367–378. doi:10.1176/jnp.16.3.367 Schmidt, R. A. (1988). Motor and action perspectives on motor behaviour. In O. G. Meijer & K. Roth (eds.), Complex movement behaviour: “The” motor-action controversy (Vol. 50, pp. 3–44). Amsterdam, NL: Elsevier. Schmidt, R. A., & Lee, T. (2014). Motor learning and performance: From principles to application. Champaign, IL: Human Kinetics. Scholes, R. J. (1995). On the orthographic basis of morphology. In R. J. Scholes (ed.), Literacy and language analysis (pp. 73–95). Hillsdale, NJ: Lawrence Erlbaum. Scholes, R. J. (1998). The case against phonemic awareness. Journal of Research in Reading, 21, 177–188. doi:10.1111/1467-9817.00054 Schreuder, R., & Bon, W. H. J. (1989). Phonemic analysis: Effects of word properties. Journal of Research in Reading, 12, 59–78. doi:10.1111/j.1467-9817.1989.tb00303.x Schroeder, C. E., & Lakatos, P. (2009). Low-frequency neuronal oscillations as instruments of sensory selection. Trends in Neurosciences, 32, 9–18. doi:10.1016/j. tins.2008.09.012 Schroeder, C. E., Lakatos, P., Kajikawa, Y., et al. (2008). Neuronal oscillations and visual amplification of speech. TRENDS in Cognitive Sciences, 12, 106–113. doi:10.1016/j.tics.2008.01.002 Schultz, W., & Romo, R. (1992). Role of primate basal ganglia and frontal cortex in the internal generation of movements. Experimental Brain Research, 91, 363–384. doi:10.1007/bf00227834 Schwanenflugel, P. J., & Akin, C. E. (1994). Developmental trends in lexical decisions for abstract and concrete words. Reading Research Quarterly, 29, 251–264. doi:10.2307/747876 Schwanenflugel, P. J., Stahl, S. A., & McFalls, E. L. (1997). Partial word knowledge and vocabulary growth during reading comprehension. Journal of Literacy Research, 29, 531–553. doi:10.1080/10862969709547973 Schwartze, M., & Kotz, S. A. (2016). Contributions of cerebellar event-based temporal processing and preparatory function to speech perception. Brain and Language, 161, 28–32. doi:10.1016/j.bandl.2015.08.005

References

295

Sciote, J. J., Morris, T. J., Horton, M. J., et al. (2002). Unloaded shortening velocity and myosin heavy chain variations in human laryngeal muscle fibers. Annals of Otology, Rhinology & Laryngology, 111, 120–127. doi:10.1177/ 000348940211100203 Scott, A. D., Wylezinska, M., Birch, M. J., et al. (2014). Speech MRI: Morphology and function. Physica Medica, 30, 604–618. doi:10.1016/j.ejmp.2014.05.001 Scripture, E. W. (1902). Elements of experimental phonetics. New York, NY: Charles Scribner. Searle, J. R. (1969). Speech acts: An essay in the philosophy of language. Cambridge, UK: Cambridge University Press. Searle, J. R. (1980). Minds, brains, and programs. Behavioral and Brain Sciences, 3, 417–424. Seikel, J. A., King, D. W., & Drumwright, D. G. (1997). Anatomy and physiology for speech and language. San Diego, CA: Singular. Selkirk, E. O. (1981). On the nature of phonological representation. In T. Myers, J. Laver, & J. Anderson (eds.), The cognitive representation of speech (pp. 379–388). Amsterdam, NL: North-Holland. Selkirk, E. O. (1984). Phonology and syntax: The relation between sound and structure. Cambridge, MA: MIT Press. Selkirk, E. O. (1986). On derived domains in sentence phonology. Phonology Yearbook, 3, 371–405. doi:10.1017/S0952675700000695 Selkirk, E. O. (1996). The prosodic structure of function words. In J. L. Morgan & K. Demuth (eds.), Signal to syntax: Bootstrapping from speech to grammar in early acquisition (pp. 187–214). Mahwah, NJ: Lawrence Erlbaum. Selkirk, E. O. (2000). The interaction of constraints on prosodic phrasing. In M. Horne (ed.), Prosody, theory and experiment: Studies presented to Gösta Bruce (pp. 231–262). Dordrecht, NL: Kluwer Academic. Selkirk, E. O. (2011). The syntax–phonology interface. In J. Goldsmith, J. Riggle, & A. C. L. Yu (eds.), The handbook of phonological theory (pp. 435–531). Malden, MA: Wiley-Blackwell. Sereno, M. I. (2014). Origin of symbol-using systems: Speech, but not sign, without the semantic urge. Philosophical Transactions of the Royal Society B: Biological Sciences, 369, 20130303. doi:10.1098/rstb.2013.0303 Seyfarth, R. M., Cheney, D. L., & Marler, P. (1980). Vervet monkey alarm calls: Semantic communication in a free-ranging primate. Animal Behaviour, 28, 1070–1094. doi:10.1016/S0003-3472(80)80097-2 Shaffer, L. H. (1982). Rhythm and timing in skill. Psychological Review, 89, 109–122. doi:10.1037/0033-295X.89.2.109 Shannon, C. E., & Weaver, W. (1949). The mathematical theory of communication. Urbana, IL: University of Illinois Press. Share, D. L. (2014). Alphabetism in reading science. Frontiers in Psychology, 5. doi:10.3389/fpsyg.2014.00752 Share, D. L., & Daniels, P. T. (2016). Aksharas, alphasyllabaries, abugidas, alphabets, and orthographic depth: Reflections on Rimzhim, Katz and Fowler (2014). Writing Systems Research, 8, 17–31. doi:10.1080/17586801.2015.1016395 Shattuck-Hufnagel, S. (1979). Speech errors as evidence for a serial-order mechanism in sentence production. In W. E. Cooper & C. Walker, T. (eds.), Sentence processing:

296

References

Psycholinguistic studies presented to Merrill Garrett (pp. 295–342). Hillsdale, NJ: Lawrence Erlbaum. Shattuck-Hufnagel, S. (1983). Sublexical units and suprasegmental structure in speech production planning. In P. F. MacNeilage (ed.), The production of speech (pp. 109–136). New York, NY: Springer. Shattuck-Hufnagel, S. (1992). The role of word structure in segmental serial ordering. Cognition, 42, 213–259. doi:10.1016/0010-0277(92)90044-I Shebani, Z., & Pulvermüller, F. (2013). Moving the hands and feet specifically impairs working memory for arm- and leg-related action words. Cortex, 49, 222–231. doi:10.1016/j.cortex.2011.10.005 Sherrington, C. S. (1909). A mammalian spinal preparation. The Journal of Physiology, 38, 375–383. doi:10.1113/jphysiol.1909.sp001311 Sherzer, J. (1982). Play languages, with a note on ritual languages. In L. Obler & L. Menn, Exceptional language and linguistics (pp. 175–199). New York, NY: Academic Press. Shewmon, D. A. (1988). Anencephaly: Selected medical aspects. Hastings Center Report, 18, 11–19. doi:10.2307/3562217 Shiller, D. M., Sato, M., Gracco, V. L., et al. (2009). Perceptual recalibration of speech sounds following speech motor learning. The Journal of the Acoustical Society of America, 125, 1103–1113. doi:10.1121/1.3058638 Shiroma, A., Nishimura, M., Nagamine, H., et al. (2016). Cerebellar contribution to pattern separation of human hippocampal memory circuits. The Cerebellum, 15, 645–662. doi:10.1007/s12311-015-0726-0 Shohamy, D., Myers, C. E., Kalanithi, J., et al. (2008). Basal ganglia and dopamine contributions to probabilistic category learning. Neuroscience & Biobehavioral Reviews, 32, 219–236. doi:10.1016/j.neubiorev.2007.07.008 Shukla, M., Nespor, M., & Mehler, J. (2007). An interaction between prosody and statistics in the segmentation of fluent speech. Cognitive Psychology, 54, 1–32. doi:10.1016/j.cogpsych.2006.04.002 Simmons, W. K., & Barsalou, L. W. (2003). The similarity-in-topography principle: Reconciling theories of conceptual deficits. Cognitive Neuropsychology, 20, 451–486. doi:10.1080/02643290342000032 Simonyan, K. (2014). The laryngeal motor cortex: Its organization and connectivity. Current Opinion in Neurobiology, 28, 15–21. doi:10.1016/j.conb.2014.05.006 Simonyan, K., & Horwitz, B. (2011). Laryngeal motor cortex and control of speech in humans. The Neuroscientist, 17, 197–208. doi:10.1177/1073858410386727 Siri, S., Tettamanti, M., Cappa, S. F., et al. (2008). The neural substrate of naming events: effects of processing demands but not of grammatical class. Cerebral Cortex, 18, 171–177. doi:10.1093/cercor/bhm043 Skinner, B. F. (1957). Verbal behavior. New York, NY: Appleton-Century-Crofts. Skipper, J. I. (2015). The NOLB model: A model of the natural organization of language and the brain. In R. M. Willems (ed.), Cognitive neuroscience of natural language use (pp. 101–134). Cambridge, UK: Cambridge University Press. Skipper, J. I., Devlin, J. T., & Lametti, D. R. (2017). The hearing ear is always found close to the speaking tongue: Review of the role of the motor system in speech perception. Brain and Language, 164, 77–105. doi:10.1016/j. bandl.2016.10.004

References

297

Skoe, E., & Kraus, N. (2013). Musical training heightens auditory brainstem function during sensitive periods in development. Frontiers in Psychology, 4. doi:10.3389/ fpsyg.2013.00622 Smith, A., & Zelaznik, H. N. (2004). Development of functional synergies for speech motor coordination in childhood and adolescence. Developmental Psychobiology, 45, 22–33. doi:10.1002/dev.20009 Smith, B. L. (1978). Temporal aspects of English speech production: A developmental perspective. Journal of Phonetics, 6, 37–67. doi:10.1016/S0095-4470(19)31084-8 Snodgrass, J. G., & Vanderwart, M. (1980). A standardized set of 260 pictures: Norms for name agreement, image agreement, familiarity, and visual complexity. Journal of Experimental Psychology: Human Learning and Memory, 6, 174–215. doi:10.1037/ 0278-7393.6.2.174 Soderstrom, M., Seidl, A., Nelson Kemler, D. G., et al. (2003). The prosodic bootstrapping of phrases: Evidence from prelinguistic infants. Journal of Memory and Language, 49, 249–267. doi:10.1016/S0749-596X(03)00024-X Sokolov, A. A., Miall, R. C., & Ivry, R. B. (2017). The cerebellum: Adaptive prediction for movement and cognition. TRENDS in Cognitive Sciences, 21, 313–332. doi:10.1016/j.tics.2017.02.005 Sommer, M. A. (2003). The role of the thalamus in motor control. Current Opinion in Neurobiology, 13, 663–670. doi:10.1016/j.conb.2003.10.014 Spagnoletti, C., Morais, J., Alegria, J., et al. (1989). Metaphonological abilities of Japanese children. Reading and Writing, 1, 221–244. doi:10.1007/BF00377644 Speed, L. J., Vinson, D. P., & Vigliocco, G. (2015). Representing meaning. In E. Dabrowska & D. Divjak (eds.), Handbook of cognitive linguistics (pp. 190–211). London, UK: Mouton. Sproat, R. (2006). Brahmi-derived scripts, script layout, and phonological awareness. Written Language and Literacy, 9, 45–66. doi:10.1075/wll.9.1.05spr Squire, L. R., & Zola, S. M. (1996). Structure and function of declarative and nondeclarative memory systems. Proceedings of the National Academy of Sciences, 93, 13515–13522. doi:10.1073/pnas.93.24.13515 Stahl, S. A., & Murray, B. A. (1994). Defining phonological awareness and its relationship to early reading. Journal of Educational Psychology, 86, 221. doi:10.1037/00220663.86.2.221 Stål, P., Marklund, S., Thornell, L.-E., et al. (2003). Fibre composition of human intrinsic tongue muscles. Cells Tissues Organs, 173, 147–161. doi:10.1159/ 000069470 Staudigl, T., & Hanslmayr, S. (2013). Theta oscillations at encoding mediate the context-dependent nature of human episodic memory. Current Biology, 23, 1101–1106. doi:10.1016/j.cub.2013.04.074 Stein, B. E., & Meredith, M. A. (1993). The merging of the senses. Cambridge, MA: MIT Press. Stein, B. E., & Rowland, B. A. (2011). Organization and plasticity in multisensory integration: Early and late experience affects its governing principles. Progress in Brain Research, 191, 145–163. doi:10.1016/B978-0-444-53752-2.00007-2 Steinberg, J. C. (1934). Application of sound measuring instruments to the study of phonetic sounds. Journal of the Acoustical Society of America, 6, 16–24. doi:10.1121/ 1.1915684

298

References

Steinhauer, K. (2003). Electrophysiological correlates of prosody and punctuation. Brain and Language, 86, 142–164. doi:10.1016/S0093-934X(02)00542-4 Steinhauer, K., & Friederici, A. D. (2001). Prosodic boundaries, comma rules, and brain responses: The closure positive shift in ERP’s as a universal marker for prosodic phrasing in listeners and readers. Journal of Psycholinguistic Research, 30, 267–295. doi:10.1023/A:1010443001646 Stemberger, J. P. (1983). Speech errors and theoretical phonology: A review. Bloomingtion, IN: Indiana University Linguistics Club. Stemberger, J. P. (1989). Speech errors in early child language production. Journal of Memory and Language, 28, 164–188. doi:10.1016/0749-596X(89)90042-9 Steriade, M., Gloor, P., Llinás, R. R., et al. (1990). Basic mechanisms of cerebral rhythmic activities. Electroencephalography and Clinical Neurophysiology, 76, 481–508. doi:10.1016/0013-4694(90)90001-Z Stetson, R. H. (1928/1951). Motor phonetics: A study of speech movements in action (2nd ed.). Amsterdam, NL: North-Holland. Stigler, J. W., Lee, S.-Y., & Stevenson, H. W. (1986). Digit memory in Chinese and English: Evidence for a temporally limited store. Cognition, 23, 1–20. doi:10.1016/ 0010-0277(86)90051-X Stjernfelt, F. (2012). The evolution of semiotic self-control. In T. Schilhab, F. Stjernfelt, & T. W. Deacon (eds.), The symbolic species evolved (pp. 39–63). Dordrecht, NL: Springer. Straka, G. (1965). Album phonétique. Québec, QC: Presses de l’Université Laval. Studdert-Kennedy, M. (1987). The phoneme as a perceptuomotor structure. Haskins Laboratories Report on Speech Research, SR-91, 45–57. Studdert-Kennedy, M. (1998). Introduction: The emergence of phonology. In J. R. Hurford, M. Studdert-Kennedy, & C. Knight (eds.), Approaches to the evolution of language: Social and cognitive bases (pp. 169–176). Cambridge, UK: Cambridge University Press. Studdert-Kennedy, M. (2000). Imitation and the emergence of segments. Phonetica, 57, 275–283. doi:10.1159/000028480 Studdert-Kennedy, M. (2005). How did language go discrete? In M. Tallerman & H. H. Pascal (eds.), Language origins: Perspectives on evolution (pp. 48–67). Oxford, UK: Oxford University Press. Sussman, H. M. (1984). A neuronal model for syllable representation. Brain and Language, 22, 167–177. doi:10.1016/0093-934X(84)90087-7 Sutton, D., Larson, C., Taylor, E. M., et al. (1973). Vocalization in rhesus monkeys: Conditionability. Brain Research, 52, 225–231. doi:10.1016/0006-8993(73) 90660-4 Sweet, H. (1877). A handbook of phonetics: Including a popular exposition of the principles of spelling reform. Oxford, UK: Clarendon Press. Taub, S. F. (2001). Language from the body: Iconicity and metaphor in American Sign Language. Cambridge, UK: Cambrige University Press. Tavakoli, M., Jalilevand, N., Kamali, M., et al. (2015). Language sampling for children with and without cochlear implant: MLU, NDW, and NTW. International Journal of Pediatric Otorhinolaryngology, 79, 2191–2195. doi:10.1016/j.ijporl.2015.10.001 Taylor, I. (1988). Psychology of literacy: East and west. In D. de Kerckhove & C. J. Lumsden (eds.), The alphabet and the brain: The lateralization of writing (pp. 202–233). Berlin, DE: Springer.

References

299

Taylor, J. A., & Ivry, R. B. (2014). Cerebellar and prefrontal cortex contributions to adaptation, strategies, and reinforcement learning. In N. Ramnani (ed.), Progress in brain research (Vol. 210, pp. 217–253). Amsterdam, NL: Elsevier. Taylor, J. R. (2007). Cognitive linguistics and autonomous linguistics. In D. Geeraerts, H. Cuyckens, & J. R. Taylor (eds.), The Oxford handbook of cognitive linguistics (pp. 566–588). Oxford, UK: Oxford University Press. Taylor, T. (1997). Theorizing language analysis, normativity, rhetoric, history. Amsterdam, NL: Pergamon. Temel, Y., Blokland, A., Steinbusch, H. W. M., et al. (2005). The functional role of the subthalamic nucleus in cognitive and limbic circuits. Progress in Neurobiology, 76, 393–413. doi:10.1016/j.pneurobio.2005.09.005 Ten Oever, S., Hausfeld, L., Correia, J. M., et al. (2016). A 7T fMRI study investigating the influence of oscillatory phase on syllable representations. NeuroImage, 141, 1–9. doi:10.1016/j.neuroimage.2016.07.011 Ten Oever, S., & Sack, A. T. (2015). Oscillatory phase shapes syllable perception. Proceedings of the National Academy of Sciences, 112, 15833–15837. doi:10.1073/ pnas.1517519112 Terrace, H. S. (2001). Chunking and serially organized behavior in pigeons,monkeys and humans. In R. G. Cook (ed.), Avian visual cognition. [Online]. doi:pigeon.psy. tufts.edu/avc/terrace/ Thiessen, E. D., & Saffran, J. R. (2003). When cues collide: Use of stress and statistical cues to word boundaries by 7- to 9-month-old infants. Developmental Psychology, 39, 706–716. doi:10.1037/0012-1649.39.4.706 Thom, S. A., Hoit, J. D., Hixon, T. J., et al. (2006). Velopharyngeal function during vocalization in infants. Cleft Palate–Craniofacial Journal, 43, 539–546. doi:10.1597/05-113 Thompson, R. L., Vinson, D. P., Woll, B., et al. (2012). The road to language learning is iconic: Evidence from British sign language. Psychological Science, 23, 1443–1448. doi:10.1177/0956797612459763 Thomson, J. M., Fryer, B., Maltby, J., et al. (2006). Auditory and motor rhythm awareness in adults with dyslexia. Journal of Research in Reading, 29, 334–348. doi:10.1111/j.1467-9817.2006.00312.x Tierney, A., & Kraus, N. (2014). Auditory-motor entrainment and phonological skills: Precise auditory timing hypothesis (PATH). Frontiers in Human Neuroscience, 8. doi:10.3389/fnhum.2014.00949 Timmann, D., Dimitrova, A., Hein-Kropp, C., et al. (2003). Cerebellar agenesis: Clinical, neuropsychological and MR findings. Neurocase, 9, 402–413. doi:10.1076/neur.9.5.402.16555 Tomasello, M. (1990). Cultural transmission in the tool use and communicatory signaling of chimpanzees? In S. T. Parker & K. R. Gibson (eds.), Language and intelligence in monkeys and apes: Comparative developmental perspectives (pp. 274–311.). New York, NY: Cambridge University Press. Tomasello, M. (1992). First verbs: A case study of early grammatical development. Cambridge, UK: Cambridge University Press. Tomasello, M. (1996). Do apes ape? In C. M. Heyes & B. G. Galef (eds.), Social learning in animals: The roots of culture (pp. 319–346). San Diego, CA: Academic Press.

300

References

Tomasello, M. (2000a). Do young children have adult syntactic competence? Cognition, 74, 209–253. doi:10.1016/S0010-0277(99)00069-4 Tomasello, M. (2000b). The item-based nature of children’s early syntactic development. TRENDS in Cognitive Sciences, 4, 156–163. doi:10.1016/S1364-6613(00) 01462-5 Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Cambridge, MA: Harvard University Press. Tomasello, M., & Akhtar, N. (1995). Two-year-olds use pragmatic cues to differentiate reference to objects and actions. Cognitive Development, 10, 201–224. doi:10.1016/ 0885-2014(95)90009-8 Tomasello, M., & Bates, E. (eds.). (2001). Language development: The essential readings. Malden, MA: Blackwell. Tomasello, M., & Zuberbühler, K. (2002). Primate vocal and gestural communication. In M. Bekoff, C. Allen, & G. M. Burghardt (eds.), The cognitive animal: Empirical and theoretical perspectives on animal cognition (pp. 293–229). Cambridge, MA: MIT Press. Topalidou, M., Kase, D., Boraud, T., et al. (2016). Dissociation of reinforcement and Hebbian learning induces covert acquisition of value in the basal ganglia. bioRxiv. doi:10.1101/060236 Toscano, J. C., & McMurray, B. (2015). The time-course of speaking rate compensation: Effects of sentential rate and vowel length on voicing judgments. Language, Cognition and Neuroscience, 30, 529–543. doi:10.1080/23273798.2014.946427 Treiman, R., & Kessler, B. (1995). In defense of an onset-rime syllable structure for English. Language and Speech, 38, 127–142. doi:10.1177/002383099503800201 Tremblay, P.-L., Bedard, M.-A., Langlois, D., et al. (2010). Movement chunking during sequence learning is a dopamine-dependant process: A study conducted in Parkinson’s disease. Experimental Brain Research, 205, 375–385. doi:10.1007/ s00221-010-2372-6 Trubetzkoy, N. S. (1939/1971). Principles of phonology. Los Angeles, CA: University of California Press. Tulving, E. (1972). Episodic and semantic memory. In E. Tulving & W. Donaldson (eds.), Organization of memory (pp. 381–403). New York, NY: Academic. Press. Tulving, E., & Thomson, D. M. (1973). Encoding specificity and retrieval processes in episodic memory. Psychological Review, 80, 352–373. doi:10.1037/h0020071 Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59, 433–460. doi:10.1007/978-1-4020-6710-5_3 Twaddell, W. F. (1935). On defining the phoneme. Baltimore, MD: Waverly Press. Tyler, L. K., Bright, P., Fletcher, P., et al. (2004). Neural processing of nouns and verbs: The role of inflectional morphology. Neuropsychologia, 42, 512–523. doi:10.1016/j. neuropsychologia.2003.10.001 Uppstad, P. H., & Tønnessen, F. E. (2010). The status of the concept of “phoneme” in psycholinguistics. Journal of Psycholinguistic Research, 39, 429–442. doi:10.1007/ s10936-010-9149-8 Vaissière, J. (1983). Language-independent prosodic features. In A. Cutler & D. R. Ladd (eds.), Prosody: Models and measurements (pp. 53–66). New York, NY: Springer.

References

301

Vale, G. L., Davis, S. J., Lambeth, S. P., et al. (2017). Acquisition of a socially learned tool use sequence in chimpanzees: Implications for cumulative culture. Evolution and Human Behavior, 38, 635–644. doi:10.1016/j.evolhumbehav.2017.04.007 Vale, G. L., Flynn, E. G., Pender, L., et al. (2016). Robust retention and transfer of tool construction techniques in chimpanzees (Pan troglodytes). Journal of Comparative Psychology, 130, 24–35. doi:10.1037/a0040000 van der Meer, M. A. A., Johnson, A., Schmitzer-Torbert, N. C., et al. (2010). Triple dissociation of information processing in dorsal striatum, ventral striatum, and hippocampus on a learned spatial decision task. Neuron, 67, 25–32. doi:10.1016/j. neuron.2010.06.023 Varela, F. J., Lachaux, J.-P., Rodriguez, E., et al. (2001). The brainweb: Phase synchronization and large-scale integration. Nature Reviews Neuroscience, 2, 229–239. doi:10.1038/35067550 Varghese, L., Bharadwaj, H. M., & Shinn-Cunningham, B. G. (2015). Evidence against attentional state modulating scalp-recorded auditory brainstem steady-state responses. Brain Research, 1626, 146–164. doi:10.1016/j.brainres.2015.06.038 Veldhuis, D., & Kurvers, J. (2012). Offline segmentation and online language processing units: The influence of literacy. Written Language & Literacy, 15, 165–184. doi:10.1075/wll.15.2.03vel Vertes, R. P., Hoover, W. B., Szigeti-Buck, K., et al. (2007). Nucleus reuniens of the midline thalamus: Link between the medial prefrontal cortex and the hippocampus. Brain Research Bulletin, 71, 601–609. doi:10.1016/j.brainresbull.2006.12.002 Vertes, R. P., Linley, S. B., & Hoover, W. B. (2015). Limbic circuitry of the midline thalamus. Neuroscience & Biobehavioral Reviews, 54, 89–107. doi:10.1016/j. neubiorev.2015.01.014 Viena, T. D., Linley, S. B., & Vertes, R. P. (2018). Inactivation of nucleus reuniens impairs spatial working memory and behavioral flexibility in the rat. Hippocampus, 28, 297–311. doi:10.1002/hipo.22831 Vierordt, K., & Ludwig, C. (1855). Beiträge zu der Lehre von den Atembewegungen,. Archive für Physiologie. Heilkunde, 14, 253–271. Vigliocco, G., Kousta, S.-T., Della Rosa, P. A., et al. (2013). The neural representation of abstract words: The role of emotion. Cerebral Cortex, 24, 1767–1777. doi:10.1093/ cercor/bht025 Vigliocco, G., Vinson, D. P., Arciuli, J., et al. (2008). The role of grammatical class on word recognition. Brain and Language, 105, 175–184. doi:10.1016/j. bandl.2007.10.003 Vigliocco, G., Vinson, D. P., Druks, J., et al. (2011). Nouns and verbs in the brain: A review of behavioural, electrophysiological, neuropsychological and imaging studies. Neuroscience & Biobehavioral Reviews, 35, 407–426. doi:10.1016/j. neubiorev.2010.04.007 Vousden, J. I., Brown, G. D. A., & Harley, T. A. (2000). Serial control of phonology in speech production: A hierarchical model. Cognitive Psychology, 41, 101–175. doi:10.1006/cogp.2000.0739 Wade-Woolley, L. (1999). First language influences on second language word reading: All roads lead to Rome. Language Learning, 49, 447–471. doi:10.1111/00238333.00096

302

References

Wagner, R. K., Torgesen, J. K., & Rashotte, C. A. (1999). Comprehensive test of phonological processing: CTOPP. Austin, TX: Pro-ed. Walker, P., Bremner, J. G., Mason, U., et al. (2010). Preverbal infants’ sensitivity to synaesthetic cross-modality correspondences. Psychological Science, 21, 21–25. doi:10.1177/0956797609354734 Wallace, M. T., & Stevenson, R. A. (2014). The construct of the multisensory temporal binding window and its dysregulation in developmental disabilities. Neuropsychologia, 64, 105–123. doi:10.1016/j.neuropsychologia.2014.08.005 Wang, Y.-T., Green, J. R., Nip, I. S. B., et al. (2010). Breath group analysis for reading and spontaneous speech in healthy adults. Folia Phoniatrica et Logopaedica, 62, 297–302. doi:10.1159/000316976 Warren, R. M. (2008). Auditory perception: An analysis and synthesis (3rd ed.). Cambridge, UK: Cambridge University Press. Warrington, E. K. (1975). The selective impairment of semantic memory. Quarterly Journal of Experimental Psychology, 27, 635–657. doi:10.1080/ 14640747508400525 Wasow, T., & Arnold, J. (2005). Intuitions in linguistic argumentation. Lingua, 115, 1481–1496. doi:10.1016/j.lingua.2004.07.001 Watkins, K. E., Strafella, A. P., & Paus, T. (2003). Seeing and hearing speech excites the motor system involved in speech production. Neuropsychologia, 41, 989–994. doi:10.1016/S0028-3932(02)00316-0 Watrous, A. J., Fell, J., Ekstrom, A. D., et al. (2015). More than spikes: Common oscillatory mechanisms for content-specific neural representations during perception and memory. Current Opinion in Neurobiology, 31, 33–39. doi:10.1016/j. conb.2014.07.024 Watrous, A. J., Lee, D. J., Izadi, A., et al. (2013). A comparative study of human and rat hippocampal low-frequency oscillations during spatial navigation. Hippocampus, 23, 656–661. doi:10.1002/hipo.22124 Watson, P. J., & Montgomery, E. B. (2006). The relationship of neuronal activity within the sensori-motor region of the subthalamic nucleus to speech. Brain and Language, 97, 233–240. doi:10.1016/j.bandl.2005.11.004 Wauters, L. N., Tellings, A. E. J. M., Van Bon, W. H. J., et al. (2003). Mode of acquisition of word meanings: The viability of a theoretical construct. Applied Psycholinguistics, 24, 385–406. doi:10.1017/S0142716403000201 Wechsler, D. (1997). WAIS-III: Wechsler adult intelligence scale: Administration and scoring manual. San Antonio, TX: Harcourt Brace. Werker, J. F., & Tees, R. C. (1984). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development, 7, 49–63. doi:10.1016/S0163-6383(84)80022-3 Werker, J. F., & Yeung, H. H. (2005). Infant speech perception bootstraps word learning. TRENDS in Cognitive Sciences, 9, 519–527. doi:10.1016/j.tics.2005.09.003 Whalen, D. H. (1990). Coarticulation is largely planned. Journal of Phonetics, 18, 3–35. doi:10.1016/S0095-4470(19)30356-0 Whiten, A., McGuigan, N., Marshall-Pescini, S., et al. (2009). Emulation, imitation, over-imitation and the scope of culture for child and chimpanzee. Philosophical

References

303

Transactions of the Royal Society B: Biological Sciences, 364, 2417–2428. doi:10.1098/rstb.2009.0069 Willems, R. M. (ed.) (2015). Cognitive neuroscience of natural language use. Cambridge, UK: Cambridge University Press. Williams, P. L., Beer, R. D., & Gasser, M. (2008). Evolving referential communication in embodied dynamical agents. ALIFE, 702–709. Wilson, S. M., Saygin, A. P., Sereno, M. I., et al. (2004). Listening to speech activates motor areas involved in speech production. Nature Neuroscience, 7, 701–702. doi:10.1038/nn1263 Winkworth, A. L., Davis, P. J., Adams, R. D., et al. (1995). Breathing patterns during spontaneous speech. Journal of Speech, Language, and Hearing Research, 38, 124–144. doi:10.1044/jshr.3801.124 Wirth, F. P., & O’Leary, J. L. (1974). Locomotor behavior of decerebellated arboreal mammals: Monkey and raccoon. Journal of Comparative Neurology, 157, 53–85. doi:10.1002/cne.901570106 Wirth, S., Yanike, M., Frank, L. M., et al. (2003). Single neurons in the monkey hippocampus and learning of new associations. Science, 300, 1578–1581. doi:10.1126/science.1084324 Wolf, G., & Love, N. (1997). Linguistics inside out: Roy Harris and his critics. Amsterdam, NL: John Benjamins. Wolpert, D. M., Miall, R. C., & Kawato, M. (1998). Internal models in the cerebellum. TRENDS in Cognitive Sciences, 2, 338–347. doi:10.1016/S13646613(98)01221-2 Wray, A., & Perkins, M. R. (2000). The functions of formulaic language: An integrated model. Language and Communication, 20, 1–28. doi:10.1016/S0271-5309(99) 00015-4 Wydell, T. N., & Kondo, T. (2003). Phonological deficit and the reliance on orthographic approximation for reading: A follow-up study on an English-Japanese bilingual with monolingual dyslexia. Journal of Research in Reading, 26, 33–48. doi:10.1111/14679817.261004 Yakusheva, T. A., Blazquez, P. M., Chen, A., et al. (2013). Spatiotemporal properties of optic flow and vestibular tuning in the cerebellar nodulus and uvula. Journal of Neuroscience, 33, 15145–15160. doi:10.1523/JNEUROSCI.2118-13.2013 Yang, C. D. (2004). Universal grammar, statistics or both? TRENDS in Cognitive Sciences, 8, 451–456. doi:10.1016/j.tics.2004.08.006 Yee, E., Jones, M. N., & McRae, K. (2018). Semantic memory. In J. T. Wixted & S. Thompson-Schill (eds.), Stevens’ handbook of experimental psychology and cognitive neuroscience (Vol. 3, pp. 319–356). New York, NY: Wiley. Yu, C., & Smith, L. B. (2013). Joint attention without gaze following: Human infants and their parents coordinate visual attention to objects through eye-hand coordination. PLoS One, 8, e79659. doi:10.1371/journal. pone.0079659 Yu, W., & Krook-Magnuson, E. (2015). Cognitive collaborations: Bidirectional functional connectivity between the cerebellum and the hippocampus. Frontiers in systems neuroscience, 9. doi:/10.3389/fnsys.2015.00177

304

References

Zhou, X., Ostrin, R. K., & Tyler, L. K. (1993). The noun-verb problem and Chinese aphasia: Comments on Bates et al. 1991. Brain and Language, 45, 86–93. doi:10.1006/brln.1993.1035 Zingeser, L. B., & Berndt, R. S. (1988). Grammatical class and context effects in a case of pure anomia: Implications for models of language production. Cognitive Neuropsychology, 5, 473–516. doi:10.1080/02643298808253270 Zoefel, B., & VanRullen, R. (2016). EEG oscillations entrain their phase to high-level features of speech sound. NeuroImage, 124, 16–23. doi:10.1016/j. neuroimage.2015.08.054

Index

abstract vs concrete lexemes, 67, 68, 204, 206 and vocabulary development, 207, 226 definition of, 206 abugidas, 86, 87, 94 action priming, 203. See also lexico-semantic tasks and priming protocols; context effects on semantic processes alphasyllabaries, 85 Al-Sayyid Bedouin Sign Language, 41, 42 Ambridge & Lieven, 66 American Sign Language, 35 amusia, 133 amygdala, 129, 204. See limbic system anencephaly, 19 aphasia, 21, 22, 63, 67, 113, 156, 223 thalamic aphasia, 22, 203, 223, 238 apraxia, verbal, 80 Arbib, 39 arbitrary (symbolic) signs, 13, 23, 45, 195 as human-specific, 34, 38, 43, 50 association memory in symbol learning, 35 development of, 50 vs iconic signs, 34, 44 articulatory synthesizers, in motor-control models, 147, 149, 151, 152, 153, 154, 156, 157 artificial languages, 64 associative chain models (linguistic theory), 53 ataxia, cerebellar lesions, 138, 223, 225, 238. See cerebellum auditory brainstem, 129–131 auditory cortex, 134, 135, 173, 202, 218, 227 babble, emergence of, 46, 52, 170, 187, 188 Baroni, 112 basal ganglia, 43, 126, 137, 140, 224, 228, 229, 235 and action selection, 221, 223, 235 and semantic processes, 22, 140, 221, 223, 224, 230 and valence coding, 126, 228. See also limbic system

role in chunking, 124, 140, 141, 195, 226, 235, 237 Bedouin Hijaze, 99 behaviorism, 53 bonobo, 35 bootstrapping hypothesis, 60, 61, 66, 69 prosodic bootstrapping, 63, 169 bound forms, the problem of, 60, 62, 63, 106, 122, 169, 190, 200, 210, 213, 214, 217. See segmentation of meaning across languages Braille alphabet, 94, 97 Branigan & Pickering, 75 Broca’s area, 22, 23, 31, 39, 40, 202 Browman & Goldstein, 84 Bybee, 65 Calderone, 134 Canolty, 217 Cayuga, 4 centrism in language analysis, 1, 14, 18, 31, 97, 110, 111–113, 213 cerebellar agenesis, 139, 237 cerebellum, 126, 128, 132, 138–140, 225, 228, 235 and semantic processes, 22, 223, 237 cerebro-acoustic coherence, measures of, 172, 174 vs inter-trial phase coherence, 179, 181 Chandrasekaran & Kraus, 129 Chenery, 140 child-directed speech, 64, 106 chimpanzee, 35, 36, 43 Chinese, 1, 4, 47, 105, 113, 155, 175 Chinese room paradox, 122 Chomsky, 54, 55, 56 chunking, action chunks, 101, 177, 195, 213, 214 and focus of attention, 177, 181, 211 and semantic processes, 141, 211, 221, 235 as meaningful blocks of action, 219, 226

305

306

Index

chunking, action chunks (cont.) internal compensatory timing, 101, 103, 168, 177 signature marks, 177, 180 vs phonological words, phrases, 168 clitics, 169. See bound forms co-articulation, 29, 65, 77–79, 162, 166 compound forms, 212. See bound forms consonantaries, 85, 86, 87, 99, 104 context effects on semantic processes, 15, 17, 120, 140, 202 and activation of episodic memory, 202, 215, 219, 230 and activation of semantic memory, 202, 204, 207, 208, 221, 230 and the Dual-Stream model, 202 and vocabulary development, 206. See modes of acquisition definition of, 201 context-independent semantic memory, 201, 207, 209, 214, 227, 231 core semantic concepts, assumed, 201. See context-independent semantic memory cortico-centrism, 21, 202, 240 Coulmas, 108 coupling of sensory information to utterance structure, 9, 211, 219, 226. See also motorsensory coupling; entrainment of neural oscillations cross-frequency coupling of neural oscillations, 135, 217, 219. See also entrainment of neural oscillations phase-amplitude coupling, 137, 218, 227 Crystal, 184 Czech, 107 Dabrowska, 121 Deacon, 40 decerabration, in animals, 240 decerebellation, in animals, 138 decoupling of respiration and locomotion, 51 decoupling of the nasopharynx, 46, 51, 188. See also babble DeHaene, 88 delta oscillations, 134, 172, 174–179, 180, 218, 226, 227, 233 descent of the larynx, 46. See also decoupling of the nasopharynx dictation systems, 89 diphone, 89 disembodied semantics, 203–204 distinctive feature, problems in defining the. See also phoneme and graded control of motor speech

as distinctions in literal meaning of lexemes, 26, 27, 33 restricted to transcribed speech, 13, 17, 27, 30, 33, 128 subjectively established, 26, 27 DIVA model, 152–155 Dixon & Aikhenwald, 112 Donald, 43 Dual-Stream model, 202 dyslexia, 105, 112, 113 Ehri & Wilce, 95 embodied cognitivism, 204, 205, 209, 211, 215 embodied semantics, 118, 204, 210, 211, 215, 220, 222, 239 emotional valence and abstract lexemes, 207 emotional valence of lexemes, measures of, 204 entrainment of neural oscillations, 9, 89, 124, 132, 133, 134, 135, 176, 219, 227 and windows of sensory processing, 140, 173, 179 to chunks, 140, 179, 181, 220, 227 to syllable-like cycles, 144, 172, 174, 180 EP hypothesis, 149, 152, 156 episodic and semantic memory, 9, 118, 120, 204, 206, 219, 227, 231 epistemology of language study, 13, 56, 75, 89, 241 Evans & Levinson, 111 excision of Broca’s area, 22 executive function, 224, 229 Faber, 85 feedback, in motor-sensory coupling, 124, 125, 126, 127, 133, 153, 214 feedforward control, 153, 154 Ferreira, 57 formulas and semantic schemas, 4, 60, 62, 67, 121, 190, 193, 195, 207, 208, 231, 239 and chunking, 9, 61, 64, 65, 106, 121, 170, 214, 221, 226 forward (control) models, 126, 138, 229. See also feedforward control Fowler, 85, 99, 103, 104 French, 63, 82, 190, 209 Frequency Following Response, 129 Fromkin, 78 frontal cortex, 36, 223, 229 and semantic processes, 140, 224 gamma oscillations, 134, 172, 173, 217, 232 generative phonology, 79, 169, 170 German, 175, 212 Goswami, 94

Index graded control of motor speech, 81, 166, 167, 168 vs +/- features, 166 Graybiel, 140 Greek, 85 Greek alphabet, 85, 87, 88, 104 Grimaldi, 4 Guenther, 154 hànzi, 1, 91 Harris, R., 20 hemidecortication, in humans, 22 Hindi, 94 hippocampus, 229–232, 238 and semantic processes, 229, 232–233 HSFC model, 156 iconic signs, definition of, 34. See also arbitrary (symbolic) signs imitation, emulation in non-human primates, 36 Indonesian, 14, 16 Ingram, 2 interactional instinct, 40 intonational phrase, 169, 178. See also phonological phrase intrinsic factors of speech motion, 157, 159 Inuktitut, 4, 212, 213, 214 invariance, the problem of, 79, 89, 148 inverse (control) models, 138, 154 IPA, history of the, 26, 31 Jackendoff, 5, 6, 57 Japanese, 1, 92, 113, 209 Jespersen, 25 Jones, 95 Jusczyk & Aslin, 61 katakana, 1, 96, 105 Kazanina, 99 Kemmerer, 209 Korean, 63 Lakatos, 136 Langacker, 69 Language Acquisition Device, 55, 61, 65, 66, 67. See also bootstrapping hypothesis language autonomy, 15, 31, 52, 69 language competence, 5, 52, 240 as modality independent, 33, 35, 38, 39, 41, 49. See also speech–language division language faculty, 17, 18, 20, 21, 39, 192, 240 as an organ, 55 as localized in Broca’s area, 21, 40 clinical counterevidence, 22–23

307 language games, 86, 87 Last Common Ancestor, 38, 39 Latash, 147 lateralization of speech and hand control, 39, 40 Latin, 106 Latin grammar, assumed categories of as substantive universals, 61, 67, 240 counterevidence to substantive universals, 68–69 in relation to consonant and vowel letters, 1, 77, 92 in relation to words in text, 1, 2, 16, 54, 60, 64, 67, 111, 190, 240 Levelt’s psycholinguistic model, 156, 165, 166 lexical stress, 62, 63, 169 languages with no lexical stress, 63, 101, 170 lexico-semantic tasks and priming protocols, 138, 199, 200–202, 205, 207 Liberman, 89, 125 limbic system, 36, 126, 132, 204, 228, 229 and valence coding, 126, 204 Lindblom, 31, 146 logographic writing systems, 1, 88, 97, 112 Lowe, 212 MacNeilage, 43 MacNeilage & DeClerk, 79 MacWhinney, 59 Malay, 113 Martin, 217 Matheson & Barsalou, 211 McClelland, 146 mentalism, cognitivism, 5, 23, 24, 33, 34, 35, 40, 49, 55, 240 Merge, 36 Meteyard & Vigliocco, 201, 208 metrical stress, 169, 173. See also lexical stress Miller, G., 65 Miller, K. F., 91 mirror neurons, 39, 40, 125 MLU (mean length of utterance) as a developmental index, 183 modes of acquisition, vocabulary, 206–207 Mohawk, 4, 209, 212, 214 mora, 92, 96 Morgan & Demuth, 60 morpheme counts of MLU, 184, 185 morpheme, problems in defining the. See bound forms morpheme vs syllable counts of MLU, 185, 188 motoneurons, laryngeal and tongue muscles, 38, 45, 46

308

Index

motor-sensory coupling, 19, 124–128. See also coupling of sensory information to utterance structure motor-speech development. See speech articulation, maturational factors and speech breathing, maturational factors Motor-Theory of Speech Perception, 125 Mowrey & Mackay, 81 multimodal sensory integration, 117, 118, 120, 128, 132, 137, 227. See also crossfrequency coupling of neural oscillations in sensory cortices, 135 in the basal ganglia, 137 in the cerebellum, 137, 138–139 in the colliculi, 129, 137 Murdoch, 22 Murdoch & Whelan, 222, 223, 224, 225 muscle fibers (types), contractile properties, 160 negative aperture targets in speech-motor control, 151, 167 neural network simulations, 45, 64, 152 neural oscillations. See entrainment of neural oscillations Newmeyer, 16 Olson, 17, 105 ontological incommensurability problem, 2, 5, 6, 7, 8, 15, 18, 33, 56, 194, 239 definition of the, 2, 5 pallidotomy, 223, 224, 225. See also basal ganglia paraphasias, 80, 81, 156 Parisse, 112 Parity Condition, 124, 125, 141, 195, 201 Parkinson’s disease, 140, 141, 204, 221, 223, 229. See also basal ganglia partitioning of meaning across languages, 208, 209 Penfield & Roberts, 22 Pergnier, 107 Perrier, 148 perseveration and speech errors, 82 Persian, 107 phase-amplitude coupling. See cross-frequency coupling of neural oscillations phoneme awareness, 87, 89–92 and phonological awareness, 90, 92 as innate, 85, 86, 97, 100 as motivated by graphic signs, 57, 86, 92, 94, 95, 97 vs awareness of syllables, 91, 94 phoneme, problems in defining the

as “not in the physical world,” 6 as convenient fictions, 27 as letters in an alphabet, 54 as mental images, 27 as not perceptually localized in a segment, 89 as one letter per phoneme, 27, 95 as one physical speech sound, 91 phonological feature. See distinctive feature phonological phrase, 169, 170. See also phrase phonological word, 168, 169, 170. See also word phonotactic regularities, 64, 121 phrase, problems in defining the, 1, 4, 7, 64, 169 as a phonological unit, 169 linked to knowledge of writing, 176 related to orthographic script, 54 Piai, 232 Poeppel, 144 Poeppel & Embick, 2, 3 polysynthetic, synthetic, analytic, classes of spoken languages, 112, 212 Port, 108 Postal, 56 Pouplier & Hardcastle, 81 Poverty of Stimulus argument, 58, 66 Prague School of phonology, 26, 27 Predictive Coding Theory, 127, 207, 220, 221 prefrontal cortex, 140, 229, 235 and semantic processes, 204, 230, 232, 233, 234 primacy of linguistic analysis, 8, 30, 79, 240 as epistemologically unjustified, 8, 30, 76, 97 procedural learning in non-human primates, 43 prosodic constituents in generative phonology, 169. See also phonological word; phonological phrase psychoacoustic scales, 31 psycholinguistic evidence, 79, 156, 166, 168 and transcripts, 57, 76, 80, 81 recursion, 36 reinforcement learning, 126, 132, 208, 238 roots and affixes, 213. See bound forms Sandler, 42 Saussure, 19, 21, 23 Schroeder, 135, 172 Schwartze & Kotz, 139 scriptism, 7, 9, 14, 17–18, 108, 194, 240 segmentation of meaning across languages, 120, 208, 209, 210, 211, 213 selective attention, 124, 125, 128, 129, 131, 134, 139, 208 Selkirk, 169 semantic memory impairment, 205

Index semantic representations, nature of, 200, 202, 203, 205, 206, 209, 212, 213, 216, 219 semantics, object of study, 200 sentence, problems in defining the, 1, 7, 64, 107 linked to knowledge of writing, 183 related to orthographic script, 54 Share, 88 Share & Daniels, 112 shared intentionality, 37 shoehorning observations to orthographic concepts, 41, 58, 59, 69 sign language, 33, 38, 39, 41 vs spoken language, 42, 43, 52 silent reading, 175 size effects on prosodic grouping, 170 Skipper, 3, 221, 222 slips of the tongue, 82. See tongue twisters speaker-listener breath coordination, 182 speech, definition of, 14, 19 speech acts, definition of, 117 speech articulation, maturational factors, 45–49, 186, 187, 188. See also decoupling of the nasopharynx speech breathing, maturational factors, 186, 188 effects on utterance length, 188 structural effects on vocabulary development, 187, 191, 193 speech–language division, 5, 8, 19, 23, 24, 49, 52, 66, 69 and scriptism, 17 clinical counterevidence, 22–23 rejection of the, 70 speech perception, 3, 61, 125, 135, 143, 146 vs general auditory perception, 125, 143, 177 speech production, 62, 79, 92, 93, 102, 158 motor-control models, 143–157 spoonerisms, 8, 79, 81, 156, 165 Sproat, 87 Stahl & Murray, 90 statistical learning of verbal forms, 64, 121. See also transition probabilities and chunking, 64, 124 statistical learning, neurophysiology, 126, 220 in non-human primates, 36 Stetson, 28, 240 stimulus-response conditioning, 53 structure of spoken language, 7, 24. See syllable-like cycles; chunking; utterances Studdert-Kennedy, 70, 84, 103 substantive universals, 60 subthalamic nucleus, 235–237 supervised learning, 126 Sweet, 26 syllabaries, 1, 85, 97, 112

309 syllable-like cycles, 7, 84, 89, 93 as sensory frames of feature categorization, 89, 173 definition of, 143 represented in memory, 164 vs syllables as groups of phonemes, 92, 143, 163 syllables as onset, rhyme, coda groups of phonemes, 143, 162. See also prosodic constituents syntactic categories as place labels, 60, 67, 68 syntactico-centrism, 6, 57 syntax, and syntax acquisition generative theory, 53–58, 60, 63. See also Language Acquisition Device usage-based, constructivist theories, 58, 60, 62–66, 121 TD model, 149 teleological accounts of language origin, 41 thalamocortical interactions and neural oscillations, 137, 227, 228, 229 thalamotomy, 223, 225, 238. See also thalamus thalamus, 22, 229 and action selection, 223 and semantic processes, 225, 238 auditory thalamus, 227 motor thalamus, 227, 228, 235, 238 theory of mind, 37 theta oscillations, 134, 144, 172, 174, 226, 227, 232, 233 Tomasello, 111, 113 tongue twisters, 81, 156 tonic stretch reflex, 147 TRACE model, 146 transcripts. See also IPA biasing effects and limitations, 7, 13, 14, 16, 17, 27, 59, 64, 79, 94, 106, 112, 155, 167, 190, 192, 241 transition probabilities, 64 vs prosodic groups, 64 Trubetzkoy, 30 Turkish, 107 unique stress constraint, 63 universal grammar, 16, 55, 56. See also syntax utterances, definition of, 183 vs sentences, 170, 183 vervet monkeys, 37 Vigliocco, 68 visual cortex, 135, 218 vital capacity, measurement of, 185 vocabulary development, 206, 208. See also modes of acquisition

310

Index

vocabulary development (cont.) and MLU, 188, 191, 193. See also speechbreathing, structural effects on vocabulary development in deaf children, 43 vocalization, in human infants vs non-human primates, 38, 45 vocalization, reactive, 45 vowel systems, 31 Watson & Montgomery, 236 Wechsler digit-span test, 164 Wernicke’s area, 22, 202 word stress. See lexical stress

word, problems in defining the, 1, 3, 4, 7, 60, 105–107, 169, 200 and polysynthetic languages, 4, 63, 113, 212 as ’not present in signals’, 6 as a phonological unit, 169 in children’s production and perception, 61, 62 linked to knowledge of writing, 57, 63, 106, 176 origin of word divisions in text, 63, 212 related to an inborn ability to detect stress, 63 related to orthographic script, 54 writing systems. See abugidas; alphasyllabaries; consonantaries; Greek alphabet; logographic systems; syllabaries